linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Support managed interrupts for platform devices
@ 2020-10-28 12:33 John Garry
  2020-10-28 12:33 ` [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc() John Garry
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: John Garry @ 2020-10-28 12:33 UTC (permalink / raw)
  To: gregkh, rafael, martin.petersen, jejb, tglx
  Cc: linuxarm, linux-scsi, linux-kernel, maz, John Garry

So far, managed interrupts are only used for PCI MSIs. This series add
platform device support for managed interrupts. Initially this topic was
discussed at [0].

The method to enable managed interrupts is to allocate all the IRQs for
the device, and then switch the interrupts to managed - this is done
through new function irq_update_affinity_desc().

API platform_get_irqs_affinity() is added as a helper to manage this work,
such that we don't need to export irq_update_affinity_desc() or
irq_create_affinity_masks().

For now, the HiSilicon SAS v2 hw driver is switched over. This is used
in the D05 dev board.

Performance gain observed for 6x SAS SSDs is ~357K -> 420K IOPs for fio read.

[0] https://lore.kernel.org/lkml/84a9411b-4ae3-1928-3d35-1666f2687ec8@huawei.com/

Changes since v1:
- Update authorship on genirq change

John Garry (3):
  genirq/affinity: Add irq_update_affinity_desc()
  Driver core: platform: Add platform_get_irqs_affinity()
  scsi: hisi_sas: Expose HW queues for v2 hw

 drivers/base/platform.c                | 58 +++++++++++++++++++++
 drivers/scsi/hisi_sas/hisi_sas.h       |  4 ++
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 11 ++++
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 71 ++++++++++++++++++++++----
 include/linux/interrupt.h              |  8 +++
 include/linux/platform_device.h        |  5 ++
 kernel/irq/manage.c                    | 19 +++++++
 7 files changed, 165 insertions(+), 11 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-10-28 12:33 [PATCH v2 0/3] Support managed interrupts for platform devices John Garry
@ 2020-10-28 12:33 ` John Garry
  2020-10-28 18:22   ` Thomas Gleixner
  2020-10-28 12:33 ` [PATCH v2 2/3] Driver core: platform: Add platform_get_irqs_affinity() John Garry
  2020-10-28 12:33 ` [PATCH v2 3/3] scsi: hisi_sas: Expose HW queues for v2 hw John Garry
  2 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-10-28 12:33 UTC (permalink / raw)
  To: gregkh, rafael, martin.petersen, jejb, tglx
  Cc: linuxarm, linux-scsi, linux-kernel, maz, John Garry

Add a function to allow the affinity of an interrupt be switched to
managed, such that interrupts allocated for platform devices may be
managed.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 include/linux/interrupt.h |  8 ++++++++
 kernel/irq/manage.c       | 19 +++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index ee8299eb1f52..870b3251e174 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -352,6 +352,8 @@ extern int irq_can_set_affinity(unsigned int irq);
 extern int irq_select_affinity(unsigned int irq);
 
 extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
+extern int irq_update_affinity_desc(unsigned int irq,
+				    struct irq_affinity_desc *affinity);
 
 extern int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
@@ -387,6 +389,12 @@ static inline int irq_set_affinity_hint(unsigned int irq,
 	return -EINVAL;
 }
 
+static inline int irq_update_affinity_desc(unsigned int irq,
+					   struct irq_affinity_desc *affinity)
+{
+	return -EINVAL;
+}
+
 static inline int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
 {
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c460e0496006..b96af4cde4bc 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -371,6 +371,25 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask,
 	return ret;
 }
 
+int irq_update_affinity_desc(unsigned int irq,
+			     struct irq_affinity_desc *affinity)
+{
+	unsigned long flags;
+	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
+
+	if (!desc)
+		return -EINVAL;
+
+	if (affinity->is_managed) {
+		irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);
+		irqd_set(&desc->irq_data, IRQD_MANAGED_SHUTDOWN);
+	}
+
+	cpumask_copy(desc->irq_common_data.affinity, &affinity->mask);
+	irq_put_desc_unlock(desc, flags);
+	return 0;
+}
+
 int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 2/3] Driver core: platform: Add platform_get_irqs_affinity()
  2020-10-28 12:33 [PATCH v2 0/3] Support managed interrupts for platform devices John Garry
  2020-10-28 12:33 ` [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc() John Garry
@ 2020-10-28 12:33 ` John Garry
  2020-10-28 12:33 ` [PATCH v2 3/3] scsi: hisi_sas: Expose HW queues for v2 hw John Garry
  2 siblings, 0 replies; 29+ messages in thread
From: John Garry @ 2020-10-28 12:33 UTC (permalink / raw)
  To: gregkh, rafael, martin.petersen, jejb, tglx
  Cc: linuxarm, linux-scsi, linux-kernel, maz, John Garry

Drivers for multi-queue platform devices may also want managed interrupts
for handling HW queue completion interrupts, so add support.

The function accepts an affinity descriptor pointer, which covers all IRQs
expected for the device.

The platform device driver is expected to hold all the IRQ numbers, as
there is no point in holding these in the common platform_device structure.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/base/platform.c         | 58 +++++++++++++++++++++++++++++++++
 include/linux/platform_device.h |  5 +++
 2 files changed, 63 insertions(+)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 88aef93eb4dd..c110b35469d6 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -269,6 +269,64 @@ int platform_get_irq(struct platform_device *dev, unsigned int num)
 }
 EXPORT_SYMBOL_GPL(platform_get_irq);
 
+/**
+ * platform_get_irqs_affinity - get all IRQs for a device using an affinity
+ *				descriptor
+ * @dev: platform device pointer
+ * @affd: affinity descriptor, must be set
+ * @count: pointer to count of IRQS
+ * @irqs: pointer holder for IRQ numbers
+ *
+ * Gets a full set of IRQs for a platform device, and updates IRQ afffinty
+ * according to the passed affinity descriptor
+ *
+ * Return: 0 on success, negative error number on failure.
+ */
+int platform_get_irqs_affinity(struct platform_device *dev,
+			       struct irq_affinity *affd,
+			       unsigned int *count,
+			       int **irqs)
+{
+	struct irq_affinity_desc *desc;
+	int i, *pirqs;
+
+	if (!affd)
+		return -EPERM;
+
+	*count = platform_irq_count(dev);
+
+	if (*count <= affd->pre_vectors + affd->post_vectors)
+		return -EIO;
+
+	pirqs = kcalloc(*count, sizeof(int), GFP_KERNEL);
+	if (!pirqs)
+		return -ENOMEM;
+
+	for (i = 0; i < *count; i++) {
+		int irq = platform_get_irq(dev, i);
+		if (irq < 0) {
+			kfree(pirqs);
+			return irq;
+		}
+		pirqs[i] = irq;
+	}
+
+	desc = irq_create_affinity_masks(*count, affd);
+	if (!desc) {
+		kfree(pirqs);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < *count; i++)
+		irq_update_affinity_desc(pirqs[i], &desc[i]);
+
+	kfree(desc);
+	*irqs = pirqs;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(platform_get_irqs_affinity);
+
 /**
  * platform_irq_count - Count the number of IRQs a platform device uses
  * @dev: platform device
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 77a2aada106d..c3f4fc5a76b9 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -11,6 +11,7 @@
 #define _PLATFORM_DEVICE_H_
 
 #include <linux/device.h>
+#include <linux/interrupt.h>
 
 #define PLATFORM_DEVID_NONE	(-1)
 #define PLATFORM_DEVID_AUTO	(-2)
@@ -70,6 +71,10 @@ devm_platform_ioremap_resource_byname(struct platform_device *pdev,
 extern int platform_get_irq(struct platform_device *, unsigned int);
 extern int platform_get_irq_optional(struct platform_device *, unsigned int);
 extern int platform_irq_count(struct platform_device *);
+extern int platform_get_irqs_affinity(struct platform_device *dev,
+				      struct irq_affinity *affd,
+				      unsigned int *count,
+				      int **irqs);
 extern struct resource *platform_get_resource_byname(struct platform_device *,
 						     unsigned int,
 						     const char *);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 3/3] scsi: hisi_sas: Expose HW queues for v2 hw
  2020-10-28 12:33 [PATCH v2 0/3] Support managed interrupts for platform devices John Garry
  2020-10-28 12:33 ` [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc() John Garry
  2020-10-28 12:33 ` [PATCH v2 2/3] Driver core: platform: Add platform_get_irqs_affinity() John Garry
@ 2020-10-28 12:33 ` John Garry
  2 siblings, 0 replies; 29+ messages in thread
From: John Garry @ 2020-10-28 12:33 UTC (permalink / raw)
  To: gregkh, rafael, martin.petersen, jejb, tglx
  Cc: linuxarm, linux-scsi, linux-kernel, maz, John Garry

As a performance enhancement, make the completion queue interrupts
managed.

In addition, in commit bf0beec0607d ("blk-mq: drain I/O when all CPUs in a
hctx are offline"), CPU hotplug for MQ devices using managed interrupts
is made safe. So expose HW queues to blk-mq to take advantage of this.

Flag Scsi_host.host_tagset is also set to ensure that the HBA is not sent
more commands than it can handle. However the driver still does not use
request tag for IPTT as there are many HW bugs which means that special
rules apply for IPTT allocation.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  4 ++
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 11 ++++
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 71 ++++++++++++++++++++++----
 3 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index a25cfc11c96d..33c4fb45dd99 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -14,6 +14,7 @@
 #include <linux/debugfs.h>
 #include <linux/dmapool.h>
 #include <linux/iopoll.h>
+#include <linux/irq.h>
 #include <linux/lcm.h>
 #include <linux/libata.h>
 #include <linux/mfd/syscon.h>
@@ -312,6 +313,7 @@ enum {
 
 struct hisi_sas_hw {
 	int (*hw_init)(struct hisi_hba *hisi_hba);
+	int (*interrupt_preinit)(struct hisi_hba *hisi_hba);
 	void (*setup_itct)(struct hisi_hba *hisi_hba,
 			   struct hisi_sas_device *device);
 	int (*slot_index_alloc)(struct hisi_hba *hisi_hba,
@@ -418,6 +420,8 @@ struct hisi_hba {
 	u32 refclk_frequency_mhz;
 	u8 sas_addr[SAS_ADDR_SIZE];
 
+	int irq_map[128]; /* v2 hw */
+
 	int n_phy;
 	spinlock_t lock;
 	struct semaphore sem;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 128583dfccf2..56f914203679 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -2614,6 +2614,13 @@ static struct Scsi_Host *hisi_sas_shost_alloc(struct platform_device *pdev,
 	return NULL;
 }
 
+static int hisi_sas_interrupt_preinit(struct hisi_hba *hisi_hba)
+{
+	if (hisi_hba->hw->interrupt_preinit)
+		return hisi_hba->hw->interrupt_preinit(hisi_hba);
+	return 0;
+}
+
 int hisi_sas_probe(struct platform_device *pdev,
 		   const struct hisi_sas_hw *hw)
 {
@@ -2671,6 +2678,10 @@ int hisi_sas_probe(struct platform_device *pdev,
 		sha->sas_port[i] = &hisi_hba->port[i].sas_port;
 	}
 
+	rc = hisi_sas_interrupt_preinit(hisi_hba);
+	if (rc)
+		goto err_out_ha;
+
 	rc = scsi_add_host(shost, &pdev->dev);
 	if (rc)
 		goto err_out_ha;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index b57177b52fac..d6b933c3d0a2 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3302,6 +3302,37 @@ static irq_handler_t fatal_interrupts[HISI_SAS_FATAL_INT_NR] = {
 	fatal_axi_int_v2_hw
 };
 
+static int hisi_sas_v2_interrupt_preinit(struct hisi_hba *hisi_hba)
+{
+	struct platform_device *pdev = hisi_hba->platform_dev;
+	struct Scsi_Host *shost = hisi_hba->shost;
+	int rc, i, *irqs, count;
+	struct irq_affinity desc = {
+		.pre_vectors = 96, /* base of completion queue interrupts */
+		.post_vectors = 16,
+	};
+
+	rc = platform_get_irqs_affinity(pdev, &desc, &count, &irqs);
+	if (rc < 0)
+		return rc;
+
+	/* 128 interrupts are always expected */
+	if (count != 128) {
+		kfree(irqs);
+		return -EIO;
+	}
+
+	/* Store the IRQ numbers in the driver */
+	for (i = 0; i < 128; i++)
+		hisi_hba->irq_map[i] = irqs[i];
+
+	shost->nr_hw_queues = hisi_hba->cq_nvecs = hisi_hba->queue_count;
+
+	kfree(irqs);
+
+	return 0;
+}
+
 /*
  * There is a limitation in the hip06 chipset that we need
  * to map in all mbigen interrupts, even if they are not used.
@@ -3310,14 +3341,11 @@ static int interrupt_init_v2_hw(struct hisi_hba *hisi_hba)
 {
 	struct platform_device *pdev = hisi_hba->platform_dev;
 	struct device *dev = &pdev->dev;
-	int irq, rc = 0, irq_map[128];
+	int irq, rc = 0;
 	int i, phy_no, fatal_no, queue_no;
 
-	for (i = 0; i < 128; i++)
-		irq_map[i] = platform_get_irq(pdev, i);
-
 	for (i = 0; i < HISI_SAS_PHY_INT_NR; i++) {
-		irq = irq_map[i + 1]; /* Phy up/down is irq1 */
+		irq = hisi_hba->irq_map[i + 1]; /* Phy up/down is irq1 */
 		rc = devm_request_irq(dev, irq, phy_interrupts[i], 0,
 				      DRV_NAME " phy", hisi_hba);
 		if (rc) {
@@ -3331,7 +3359,7 @@ static int interrupt_init_v2_hw(struct hisi_hba *hisi_hba)
 	for (phy_no = 0; phy_no < hisi_hba->n_phy; phy_no++) {
 		struct hisi_sas_phy *phy = &hisi_hba->phy[phy_no];
 
-		irq = irq_map[phy_no + 72];
+		irq = hisi_hba->irq_map[phy_no + 72];
 		rc = devm_request_irq(dev, irq, sata_int_v2_hw, 0,
 				      DRV_NAME " sata", phy);
 		if (rc) {
@@ -3343,7 +3371,7 @@ static int interrupt_init_v2_hw(struct hisi_hba *hisi_hba)
 	}
 
 	for (fatal_no = 0; fatal_no < HISI_SAS_FATAL_INT_NR; fatal_no++) {
-		irq = irq_map[fatal_no + 81];
+		irq = hisi_hba->irq_map[fatal_no + 81];
 		rc = devm_request_irq(dev, irq, fatal_interrupts[fatal_no], 0,
 				      DRV_NAME " fatal", hisi_hba);
 		if (rc) {
@@ -3357,7 +3385,7 @@ static int interrupt_init_v2_hw(struct hisi_hba *hisi_hba)
 	for (queue_no = 0; queue_no < hisi_hba->queue_count; queue_no++) {
 		struct hisi_sas_cq *cq = &hisi_hba->cq[queue_no];
 
-		cq->irq_no = irq_map[queue_no + 96];
+		cq->irq_no = hisi_hba->irq_map[queue_no + 96];
 		rc = devm_request_threaded_irq(dev, cq->irq_no,
 					       cq_interrupt_v2_hw,
 					       cq_thread_v2_hw, IRQF_ONESHOT,
@@ -3368,10 +3396,8 @@ static int interrupt_init_v2_hw(struct hisi_hba *hisi_hba)
 			rc = -ENOENT;
 			goto err_out;
 		}
+		cq->irq_mask = irq_get_affinity_mask(cq->irq_no);
 	}
-
-	hisi_hba->cq_nvecs = hisi_hba->queue_count;
-
 err_out:
 	return rc;
 }
@@ -3529,6 +3555,26 @@ static struct device_attribute *host_attrs_v2_hw[] = {
 	NULL
 };
 
+static int map_queues_v2_hw(struct Scsi_Host *shost)
+{
+	struct hisi_hba *hisi_hba = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	for (queue = 0; queue < qmap->nr_queues; queue++) {
+		mask = irq_get_affinity_mask(hisi_hba->irq_map[96 + queue]);
+		if (!mask)
+			continue;
+
+		for_each_cpu(cpu, mask)
+			qmap->mq_map[cpu] = qmap->queue_offset + queue;
+	}
+
+	return 0;
+
+}
+
 static struct scsi_host_template sht_v2_hw = {
 	.name			= DRV_NAME,
 	.proc_name		= DRV_NAME,
@@ -3553,10 +3599,13 @@ static struct scsi_host_template sht_v2_hw = {
 #endif
 	.shost_attrs		= host_attrs_v2_hw,
 	.host_reset		= hisi_sas_host_reset,
+	.map_queues		= map_queues_v2_hw,
+	.host_tagset		= 1,
 };
 
 static const struct hisi_sas_hw hisi_sas_v2_hw = {
 	.hw_init = hisi_sas_v2_init,
+	.interrupt_preinit = hisi_sas_v2_interrupt_preinit,
 	.setup_itct = setup_itct_v2_hw,
 	.slot_index_alloc = slot_index_alloc_quirk_v2_hw,
 	.alloc_dev = alloc_dev_quirk_v2_hw,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-10-28 12:33 ` [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc() John Garry
@ 2020-10-28 18:22   ` Thomas Gleixner
  2020-11-02 17:32     ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2020-10-28 18:22 UTC (permalink / raw)
  To: John Garry, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz, John Garry

On Wed, Oct 28 2020 at 20:33, John Garry wrote:
>  
> +int irq_update_affinity_desc(unsigned int irq,
> +			     struct irq_affinity_desc *affinity)
> +{
> +	unsigned long flags;
> +	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
> +
> +	if (!desc)
> +		return -EINVAL;

Just looking at it some more. This needs a check whether the interrupt
is actually shut down. Otherwise the update will corrupt
state. Something like this:

        if (irqd_is_started(&desc->irq_data))
        	return -EBUSY;

But all of this can't work on x86 due to the way how vector allocation
works. Let me think about that.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-10-28 18:22   ` Thomas Gleixner
@ 2020-11-02 17:32     ` John Garry
  2020-11-02 20:35       ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-02 17:32 UTC (permalink / raw)
  To: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

On 28/10/2020 18:22, Thomas Gleixner wrote:
> On Wed, Oct 28 2020 at 20:33, John Garry wrote:

Hi Thomas,

>>   
>> +int irq_update_affinity_desc(unsigned int irq,
>> +			     struct irq_affinity_desc *affinity)
>> +{
>> +	unsigned long flags;
>> +	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
>> +
>> +	if (!desc)
>> +		return -EINVAL;
> Just looking at it some more. This needs a check whether the interrupt
> is actually shut down. Otherwise the update will corrupt
> state. Something like this:
> 
>          if (irqd_is_started(&desc->irq_data))
>          	return -EBUSY;
> 
> But all of this can't work on x86 due to the way how vector allocation
> works. Let me think about that.
> 

Is the problem that we reserve per-cpu managed interrupt space when 
allocated irq vectors on x86, and so later changing managed vs 
non-managed setting for irqs messes up this accounting somehow?

Cheers,
John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-02 17:32     ` John Garry
@ 2020-11-02 20:35       ` Thomas Gleixner
  2020-11-17 21:28         ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2020-11-02 20:35 UTC (permalink / raw)
  To: John Garry, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

On Mon, Nov 02 2020 at 17:32, John Garry wrote:
> On 28/10/2020 18:22, Thomas Gleixner wrote:
>> But all of this can't work on x86 due to the way how vector allocation
>> works. Let me think about that.
>
> Is the problem that we reserve per-cpu managed interrupt space when 
> allocated irq vectors on x86, and so later changing managed vs 
> non-managed setting for irqs messes up this accounting somehow?

Correct. I have a halfways working solution for that, but I need to fix
some other thing first.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-02 20:35       ` Thomas Gleixner
@ 2020-11-17 21:28         ` Thomas Gleixner
  2020-11-18 11:34           ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2020-11-17 21:28 UTC (permalink / raw)
  To: John Garry, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

On Mon, Nov 02 2020 at 21:35, Thomas Gleixner wrote:
> On Mon, Nov 02 2020 at 17:32, John Garry wrote:
> Correct. I have a halfways working solution for that, but I need to fix
> some other thing first.

Sorry for the delay. Supporting this truly on x86 needs some more
thought and surgery, but for ARM it should not matter. I made a few
tweaks to your original code. See below.

Thanks,

        tglx
---
From: John Garry <john.garry@huawei.com>
Subject: genirq/affinity: Add irq_update_affinity_desc()
Date: Wed, 28 Oct 2020 20:33:05 +0800

From: John Garry <john.garry@huawei.com>

Add a function to allow the affinity of an interrupt be switched to
managed, such that interrupts allocated for platform devices may be
managed.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 include/linux/interrupt.h |    8 ++++++
 kernel/irq/manage.c       |   56 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -352,6 +352,8 @@ extern int irq_can_set_affinity(unsigned
 extern int irq_select_affinity(unsigned int irq);
 
 extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
+extern int irq_update_affinity_desc(unsigned int irq,
+				    struct irq_affinity_desc *affinity);
 
 extern int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
@@ -386,6 +388,12 @@ static inline int irq_set_affinity_hint(
 {
 	return -EINVAL;
 }
+
+static inline int irq_update_affinity_desc(unsigned int irq,
+					   struct irq_affinity_desc *affinity)
+{
+	return -EINVAL;
+}
 
 static inline int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -371,6 +371,62 @@ int irq_set_affinity_locked(struct irq_d
 	return ret;
 }
 
+/**
+ * irq_update_affinity_desc - Update affinity management for an interrupt
+ * @irq:	The interrupt number to update
+ * @affinity:	Pointer to the affinity descriptor
+ *
+ * This interface can be used to configure the affinity management of
+ * interrupts which have been allocated already.
+ */
+int irq_update_affinity_desc(unsigned int irq,
+			     struct irq_affinity_desc *affinity)
+{
+	struct irq_desc *desc;
+	unsigned long flags;
+	bool activated;
+
+	/*
+	 * Supporting this with the reservation scheme used by x86 needs
+	 * some more thought. Fail it for now.
+	 */
+	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
+		return -EOPNOTSUPP;
+
+	desc = irq_get_desc_buslock(irq, &flags, 0);
+	if (!desc)
+		return -EINVAL;
+
+	/* Requires the interrupt to be shut down */
+	if (irqd_is_started(&desc->irq_data))
+		return -EBUSY;
+
+	/* Interrupts which are already managed cannot be modified */
+	if (irqd_is_managed(&desc->irq_data))
+		return -EBUSY;
+
+	/*
+	 * Deactivate the interrupt. That's required to undo
+	 * anything an earlier activation has established.
+	 */
+	activated = irqd_is_activated(&desc->irq_data);
+	if (activated)
+		irq_domain_deactivate_irq(&desc->irq_data);
+
+	if (affinity->is_managed) {
+		irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);
+		irqd_set(&desc->irq_data, IRQD_MANAGED_SHUTDOWN);
+	}
+
+	cpumask_copy(desc->irq_common_data.affinity, &affinity->mask);
+
+	/* Restore the activation state */
+	if (activated)
+		irq_domain_deactivate_irq(&desc->irq_data);
+	irq_put_desc_busunlock(desc, flags);
+	return 0;
+}
+
 int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
 {
 	struct irq_desc *desc = irq_to_desc(irq);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-17 21:28         ` Thomas Gleixner
@ 2020-11-18 11:34           ` John Garry
  2020-11-18 20:38             ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-18 11:34 UTC (permalink / raw)
  To: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

Hi Thomas,

> 
> Sorry for the delay. 

No worries, Thanks for the effort.

> Supporting this truly on x86 needs some more
> thought and surgery, but for ARM it should not matter. 

ok, as long as you are content not to support x86 atm.

> I made a few
> tweaks to your original code. See below.

I think maybe a few more tweaks, below. Apart from that, it looks to 
work ok.

> 
> Thanks,
> 
>          tglx
> ---
> From: John Garry <john.garry@huawei.com>
> Subject: genirq/affinity: Add irq_update_affinity_desc()
> Date: Wed, 28 Oct 2020 20:33:05 +0800
> 
> From: John Garry <john.garry@huawei.com>
> 
> Add a function to allow the affinity of an interrupt be switched to
> managed, such that interrupts allocated for platform devices may be
> managed.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>   include/linux/interrupt.h |    8 ++++++
>   kernel/irq/manage.c       |   56 ++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 64 insertions(+)
> 

...

> +/**
> + * irq_update_affinity_desc - Update affinity management for an interrupt
> + * @irq:	The interrupt number to update
> + * @affinity:	Pointer to the affinity descriptor
> + *
> + * This interface can be used to configure the affinity management of
> + * interrupts which have been allocated already.
> + */
> +int irq_update_affinity_desc(unsigned int irq,
> +			     struct irq_affinity_desc *affinity)

Just a note on the return value, in the only current callsite - 
platform_get_irqs_affinity() - we don't check the return value and 
propagate the error. This is because we don't want to fail the interrupt 
init just because of problems updating the affinity mask. So I could 
print a message to inform the user of error (at the callsite).

> +{
> +	struct irq_desc *desc;
> +	unsigned long flags;
> +	bool activated;
> +
> +	/*
> +	 * Supporting this with the reservation scheme used by x86 needs
> +	 * some more thought. Fail it for now.
> +	 */
> +	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
> +		return -EOPNOTSUPP;
> +
> +	desc = irq_get_desc_buslock(irq, &flags, 0);
> +	if (!desc)
> +		return -EINVAL;
> +
> +	/* Requires the interrupt to be shut down */
> +	if (irqd_is_started(&desc->irq_data))

We're missing the unlock here, right?

> +		return -EBUSY;
> +
> +	/* Interrupts which are already managed cannot be modified */
> +	if (irqd_is_managed(&desc->irq_data))

And here, and I figure that this should be irqd_affinity_is_managed()

> +		return -EBUSY;
> +
> +	/*
> +	 * Deactivate the interrupt. That's required to undo
> +	 * anything an earlier activation has established.
> +	 */
> +	activated = irqd_is_activated(&desc->irq_data);
> +	if (activated)
> +		irq_domain_deactivate_irq(&desc->irq_data);
> +
> +	if (affinity->is_managed) {
> +		irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);
> +		irqd_set(&desc->irq_data, IRQD_MANAGED_SHUTDOWN);
> +	}
> +
> +	cpumask_copy(desc->irq_common_data.affinity, &affinity->mask);
> +
> +	/* Restore the activation state */
> +	if (activated)
> +		irq_domain_deactivate_irq(&desc->irq_data);
> +	irq_put_desc_busunlock(desc, flags);
> +	return 0;
> +}
> +
>   int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
>   {
>   	struct irq_desc *desc = irq_to_desc(irq);
> .
> 

Thanks,
John


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-18 11:34           ` John Garry
@ 2020-11-18 20:38             ` Thomas Gleixner
  2020-11-19  9:31               ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2020-11-18 20:38 UTC (permalink / raw)
  To: John Garry, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

John,

On Wed, Nov 18 2020 at 11:34, John Garry wrote:
>> +/**
>> + * irq_update_affinity_desc - Update affinity management for an interrupt
>> + * @irq:	The interrupt number to update
>> + * @affinity:	Pointer to the affinity descriptor
>> + *
>> + * This interface can be used to configure the affinity management of
>> + * interrupts which have been allocated already.
>> + */
>> +int irq_update_affinity_desc(unsigned int irq,
>> +			     struct irq_affinity_desc *affinity)
>
> Just a note on the return value, in the only current callsite - 
> platform_get_irqs_affinity() - we don't check the return value and 
> propagate the error. This is because we don't want to fail the interrupt 
> init just because of problems updating the affinity mask. So I could 
> print a message to inform the user of error (at the callsite).

Well, not sure about that. During init on a platform which does not have
the issues with reservation mode there failure cases are:

 1) Interrupt does not exist. Definitely a full fail

 2) Interrupt is already started up. Not a good idea on init() and
    a clear fail.

 3) Interrupt has already been switched to managed. Double init is not
    really a good sign either.

>> +	/* Requires the interrupt to be shut down */
>> +	if (irqd_is_started(&desc->irq_data))
>
> We're missing the unlock here, right?

Duh yes.

>> +		return -EBUSY;
>> +
>> +	/* Interrupts which are already managed cannot be modified */
>> +	if (irqd_is_managed(&desc->irq_data))
>
> And here, and I figure that this should be irqd_affinity_is_managed()

More duh :)

I assume you send a fixed variant of this.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-18 20:38             ` Thomas Gleixner
@ 2020-11-19  9:31               ` John Garry
  2020-11-19 18:09                 ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-19  9:31 UTC (permalink / raw)
  To: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

Hi Thomas,

>>> +int irq_update_affinity_desc(unsigned int irq,
>>> +			     struct irq_affinity_desc *affinity)
>> Just a note on the return value, in the only current callsite -
>> platform_get_irqs_affinity() - we don't check the return value and
>> propagate the error. This is because we don't want to fail the interrupt
>> init just because of problems updating the affinity mask. So I could
>> print a message to inform the user of error (at the callsite).
> Well, not sure about that. During init on a platform which does not have
> the issues with reservation mode there failure cases are:
> 
>   1) Interrupt does not exist. Definitely a full fail
> 
>   2) Interrupt is already started up. Not a good idea on init() and
>      a clear fail.
> 
>   3) Interrupt has already been switched to managed. Double init is not
>      really a good sign either.

I just tested that and case 3) would be a problem. I don't see us 
clearing the managed flag when free'ing the interrupt. So with 
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, we attempt this affinity update 
twice, and error from the irqd_affinity_is_managed() check.

> 
>>> +	/* Requires the interrupt to be shut down */
>>> +	if (irqd_is_started(&desc->irq_data))
>> We're missing the unlock here, right?
> Duh yes.
> 
>>> +		return -EBUSY;
>>> +
>>> +	/* Interrupts which are already managed cannot be modified */
>>> +	if (irqd_is_managed(&desc->irq_data))
>> And here, and I figure that this should be irqd_affinity_is_managed()
> More duh:)
> 
> I assume you send a fixed variant of this.

Can do.

Thanks,
John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-19  9:31               ` John Garry
@ 2020-11-19 18:09                 ` Thomas Gleixner
  2020-11-19 19:56                   ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2020-11-19 18:09 UTC (permalink / raw)
  To: John Garry, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

On Thu, Nov 19 2020 at 09:31, John Garry wrote:
>>   1) Interrupt does not exist. Definitely a full fail
>> 
>>   2) Interrupt is already started up. Not a good idea on init() and
>>      a clear fail.
>> 
>>   3) Interrupt has already been switched to managed. Double init is not
>>      really a good sign either.
>
> I just tested that and case 3) would be a problem. I don't see us 
> clearing the managed flag when free'ing the interrupt. So with 
> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, we attempt this affinity update 
> twice, and error from the irqd_affinity_is_managed() check.

That means the interrupt is not deallocated and reallocated, which does
not make sense to me.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-19 18:09                 ` Thomas Gleixner
@ 2020-11-19 19:56                   ` John Garry
  2020-11-19 21:03                     ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-19 19:56 UTC (permalink / raw)
  To: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

Hi Thomas,

>>>    3) Interrupt has already been switched to managed. Double init is not
>>>       really a good sign either.
>> I just tested that and case 3) would be a problem. I don't see us
>> clearing the managed flag when free'ing the interrupt. So with
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, we attempt this affinity update
>> twice, and error from the irqd_affinity_is_managed() check.
> That means the interrupt is not deallocated and reallocated, which does
> not make sense to me.
> 

Just mentioning a couple of things here, which could be a clue to what 
is going on:
- the device is behind mbigen secondary irq controller
- the flow in the LLDD is to allocate all 128 interrupts during probe, 
but we only register handlers for a subset with device managed API

Thanks,
John


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-19 19:56                   ` John Garry
@ 2020-11-19 21:03                     ` Thomas Gleixner
  2020-11-20 11:52                       ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2020-11-19 21:03 UTC (permalink / raw)
  To: John Garry, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

On Thu, Nov 19 2020 at 19:56, John Garry wrote:
>>>>    3) Interrupt has already been switched to managed. Double init is not
>>>>       really a good sign either.
>>> I just tested that and case 3) would be a problem. I don't see us
>>> clearing the managed flag when free'ing the interrupt. So with
>>> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, we attempt this affinity update
>>> twice, and error from the irqd_affinity_is_managed() check.
>> That means the interrupt is not deallocated and reallocated, which does
>> not make sense to me.
>> 
>
> Just mentioning a couple of things here, which could be a clue to what 
> is going on:
> - the device is behind mbigen secondary irq controller
> - the flow in the LLDD is to allocate all 128 interrupts during probe, 
> but we only register handlers for a subset with device managed API

Right, but if the driver is removed then the interrupts should be
deallocated, right?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-19 21:03                     ` Thomas Gleixner
@ 2020-11-20 11:52                       ` John Garry
  2020-11-22 13:38                         ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-20 11:52 UTC (permalink / raw)
  To: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb
  Cc: linuxarm, linux-scsi, linux-kernel, maz

Hi Thomas,

>> Just mentioning a couple of things here, which could be a clue to what
>> is going on:
>> - the device is behind mbigen secondary irq controller
>> - the flow in the LLDD is to allocate all 128 interrupts during probe,
>> but we only register handlers for a subset with device managed API
> Right, but if the driver is removed then the interrupts should be
> deallocated, right?
> 

When removing the driver we just call free_irq(), which removes the 
handler and disables the interrupt.

But about the irq_desc, this is created when the mapping is created in 
irq_create_fwspec_mapping(), and I don't see this being torn down in the 
driver removal, so persistent in that regard.

So for pci msi I can see that we free the irq_desc in pci_disable_msi() 
-> free_msi_irqs() -> msi_domain_free_irqs() ...

So what I am missing here?

Thanks,
John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-20 11:52                       ` John Garry
@ 2020-11-22 13:38                         ` Marc Zyngier
  2020-11-23 12:54                           ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2020-11-22 13:38 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

On Fri, 20 Nov 2020 11:52:09 +0000,
John Garry <john.garry@huawei.com> wrote:
> 
> Hi Thomas,
> 
> >> Just mentioning a couple of things here, which could be a clue to what
> >> is going on:
> >> - the device is behind mbigen secondary irq controller
> >> - the flow in the LLDD is to allocate all 128 interrupts during probe,
> >> but we only register handlers for a subset with device managed API
> > Right, but if the driver is removed then the interrupts should be
> > deallocated, right?
> > 
> 
> When removing the driver we just call free_irq(), which removes the
> handler and disables the interrupt.
> 
> But about the irq_desc, this is created when the mapping is created in
> irq_create_fwspec_mapping(), and I don't see this being torn down in
> the driver removal, so persistent in that regard.

If the irq_descs are created via the platform_get_irq() calls in
platform_get_irqs_affinity(), I'd expect some equivalent helper to
tear things down as a result, calling irq_dispose_mapping() behind the
scenes.

> So for pci msi I can see that we free the irq_desc in
> pci_disable_msi() -> free_msi_irqs() -> msi_domain_free_irqs() ...
> 
> So what I am missing here?

I'm not sure the paths are strictly equivalent. On the PCI side, we
can have something that completely driver agnostic, as it is all
architectural. In your case, only the endpoint driver knows about what
happens, and needs to free things accordingly.

Finally, there is the issue in your driver that everything is
requested using devm_request_irq, which cannot play nicely with an
explicit irq_desc teardown. You'll probably need to provide the
equivalent devm helpers for your driver to safely be taken down.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-22 13:38                         ` Marc Zyngier
@ 2020-11-23 12:54                           ` John Garry
  2020-11-23 13:26                             ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-23 12:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

Hi Marc,

>>> Right, but if the driver is removed then the interrupts should be
>>> deallocated, right?
>>>
>>
>> When removing the driver we just call free_irq(), which removes the
>> handler and disables the interrupt.
>>
>> But about the irq_desc, this is created when the mapping is created in
>> irq_create_fwspec_mapping(), and I don't see this being torn down in
>> the driver removal, so persistent in that regard.
> 
> If the irq_descs are created via the platform_get_irq() calls in
> platform_get_irqs_affinity(), I'd expect some equivalent helper to
> tear things down as a result, calling irq_dispose_mapping() behind the
> scenes.
> 

So this looks lacking in the kernel AFAICS...

So is there a reason for which irq dispose mapping is not a requirement 
for drivers when finished with the irq? because of shared interrupts?

>> So for pci msi I can see that we free the irq_desc in
>> pci_disable_msi() -> free_msi_irqs() -> msi_domain_free_irqs() ...
>>
>> So what I am missing here?
> 
> I'm not sure the paths are strictly equivalent. On the PCI side, we
> can have something that completely driver agnostic, as it is all
> architectural. In your case, only the endpoint driver knows about what
> happens, and needs to free things accordingly.
> 
> Finally, there is the issue in your driver that everything is
> requested using devm_request_irq, which cannot play nicely with an
> explicit irq_desc teardown. You'll probably need to provide the
> equivalent devm helpers for your driver to safely be taken down.
> 

Yeah, so since we use the devm irq request method, we also need a devm 
dispose release method as we can't dispose the irq mapping in the 
remove() method, prior to the irq_free() in the later devm release method.

But it looks like there is more to it than that, which I'm worried is 
far from non-trivial. For example, just calling irq_dispose_mapping() 
for removal and then plaform_get_irq()->acpi_get_irq() second time fails 
as it looks like more tidy-up is needed for removal...

Cheers,
John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-23 12:54                           ` John Garry
@ 2020-11-23 13:26                             ` Marc Zyngier
  2020-11-23 15:45                               ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2020-11-23 13:26 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

Hi John,

On 2020-11-23 12:54, John Garry wrote:
> Hi Marc,
> 
>>>> Right, but if the driver is removed then the interrupts should be
>>>> deallocated, right?
>>>> 
>>> 
>>> When removing the driver we just call free_irq(), which removes the
>>> handler and disables the interrupt.
>>> 
>>> But about the irq_desc, this is created when the mapping is created 
>>> in
>>> irq_create_fwspec_mapping(), and I don't see this being torn down in
>>> the driver removal, so persistent in that regard.
>> 
>> If the irq_descs are created via the platform_get_irq() calls in
>> platform_get_irqs_affinity(), I'd expect some equivalent helper to
>> tear things down as a result, calling irq_dispose_mapping() behind the
>> scenes.
>> 
> 
> So this looks lacking in the kernel AFAICS...
> 
> So is there a reason for which irq dispose mapping is not a
> requirement for drivers when finished with the irq? because of shared
> interrupts?

For a bunch of reasons: IRQ number used to be created 1:1 with their
physical counterpart, so there wasn't a need to "get rid" of the
associated data structures. Also, people expected their drivers
to be there for the lifetime of the system (believe it or not,
hotplug devices are pretty new!).

Shared interrupts are just another part of the problem (although we
should be able to work out whether there is any action attached to
the descriptor before blowing it away.

> 
>>> So for pci msi I can see that we free the irq_desc in
>>> pci_disable_msi() -> free_msi_irqs() -> msi_domain_free_irqs() ...
>>> 
>>> So what I am missing here?
>> 
>> I'm not sure the paths are strictly equivalent. On the PCI side, we
>> can have something that completely driver agnostic, as it is all
>> architectural. In your case, only the endpoint driver knows about what
>> happens, and needs to free things accordingly.
>> 
>> Finally, there is the issue in your driver that everything is
>> requested using devm_request_irq, which cannot play nicely with an
>> explicit irq_desc teardown. You'll probably need to provide the
>> equivalent devm helpers for your driver to safely be taken down.
>> 
> 
> Yeah, so since we use the devm irq request method, we also need a devm
> dispose release method as we can't dispose the irq mapping in the
> remove() method, prior to the irq_free() in the later devm release
> method.
> 
> But it looks like there is more to it than that, which I'm worried is
> far from non-trivial. For example, just calling irq_dispose_mapping()
> for removal and then plaform_get_irq()->acpi_get_irq() second time
> fails as it looks like more tidy-up is needed for removal...

Most probably. I could imagine things failing if there is any trace
of an existing translation in the ITS or in the platform-MSI layer,
for example, or if the interrupt is still active...

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-23 13:26                             ` Marc Zyngier
@ 2020-11-23 15:45                               ` John Garry
  2020-11-24 16:51                                 ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-23 15:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

Hi Marc,

>> So is there a reason for which irq dispose mapping is not a
>> requirement for drivers when finished with the irq? because of shared
>> interrupts?
> 
> For a bunch of reasons: IRQ number used to be created 1:1 with their
> physical counterpart, so there wasn't a need to "get rid" of the
> associated data structures. Also, people expected their drivers
> to be there for the lifetime of the system (believe it or not,
> hotplug devices are pretty new!).
> 
> Shared interrupts are just another part of the problem (although we
> should be able to work out whether there is any action attached to
> the descriptor before blowing it away.
> 

OK, understood.

>> But it looks like there is more to it than that, which I'm worried is
>> far from non-trivial. For example, just calling irq_dispose_mapping()
>> for removal and then plaform_get_irq()->acpi_get_irq() second time
>> fails as it looks like more tidy-up is needed for removal...
> 
> Most probably. I could imagine things failing if there is any trace
> of an existing translation in the ITS or in the platform-MSI layer,
> for example, or if the interrupt is still active...

So this looks to be a problem I have. So if I hack the code to skip the 
check in acpi_get_irq() for the irq already being init'ed, I run into a 
use-after-free in the gic-v3-its driver. I may be skipping something 
with this hack, but I'll ask anyway.

So initially in the msi_prepare method we setup the its dev - this is 
from the mbigen probe. Then when all the irqs are unmapped later for end 
device driver removal, we release this its device in 
its_irq_domain_free(). But I don't see anything to set it up again. Is 
it improper to have released the its device in this scenario? Commenting 
out the release makes things "good" again.

Thanks,
john

Complete splat:

[   35.751627] 
==================================================================
[   35.758892] BUG: KASAN: use-after-free in its_irq_domain_alloc+0x44/0x1a8
[   35.765714] Read of size 8 at addr ffff04101bc93210 by task swapper/0/1
[   35.772357]
[   35.773854] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
5.10.0-rc4-18055-g3247a1b07719 #955
[   35.782072] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 IT21 Nemo 2.0 RC0 04/18/2018
[   35.791250] Call trace:
[   35.793706]  dump_backtrace+0x0/0x2d0
[   35.797384]  show_stack+0x18/0x68
[   35.800713]  dump_stack+0x100/0x16c
[   35.804216]  print_address_description.constprop.12+0x6c/0x4e8
[   35.810078]  kasan_report+0x130/0x200
[   35.813755]  __asan_load8+0x9c/0xd8
[   35.817257]  its_irq_domain_alloc+0x44/0x1a8
[   35.821548]  irq_domain_alloc_irqs_parent+0x68/0x88
[   35.826447]  msi_domain_alloc+0x98/0x180
[   35.830387]  irq_domain_alloc_irqs_hierarchy+0x58/0x78
[   35.835549]  msi_domain_populate_irqs+0x16c/0x1d8
[   35.840275]  platform_msi_domain_alloc+0x9c/0xd0
[   35.844914]  mbigen_irq_domain_alloc+0x100/0x180
[   35.849553]  __irq_domain_alloc_irqs+0x188/0x498
[   35.854191]  irq_create_fwspec_mapping+0x21c/0x4e0
[   35.859005]  acpi_irq_get+0x1f0/0x218
[   35.862683]  platform_get_irq_optional+0x1c4/0x4a0
[   35.867495]  platform_get_irq+0x20/0x68
[   35.871348]  hisi_sas_v2_probe+0x2c/0x88
[   35.875287]  platform_drv_probe+0x70/0xd0
[   35.879313]  really_probe+0x414/0x640
[   35.882990]  driver_probe_device+0x78/0xe8
[   35.887103]  device_driver_attach+0x9c/0xa8
[   35.891304]  __driver_attach+0x74/0x120
[   35.895156]  bus_for_each_dev+0xec/0x160
[   35.899094]  driver_attach+0x34/0x48
[   35.902684]  bus_add_driver+0x1b8/0x2c0
[   35.906535]  driver_register+0xc0/0x1e0
[   35.910386]  __platform_driver_register+0x80/0x90
[   35.915115]  hisi_sas_v2_driver_init+0x1c/0x28
[   35.919578]  do_one_initcall+0xd4/0x268
[   35.923431]  kernel_init_freeable+0x270/0x2d8
[   35.927807]  kernel_init+0x14/0x124
[   35.931309]  ret_from_fork+0x10/0x34
[   35.934895]
[   35.936386] Allocated by task 1:
[   35.939628]  stack_trace_save+0x94/0xc8
[   35.943480]  kasan_save_stack+0x28/0x58
[   35.947331]  __kasan_kmalloc.isra.6+0xcc/0xf0
[   35.951706]  kasan_kmalloc+0x10/0x20
[   35.955295]  its_create_device+0x1b8/0x448
[   35.959408]  its_msi_prepare+0x120/0x1d8
[   35.963347]  its_pmsi_prepare+0x20c/0x280
[   35.967373]  msi_domain_prepare_irqs+0x80/0x98
[   35.971836]  __platform_msi_create_device_domain+0xa8/0x130
[   35.977435]  mbigen_device_probe+0x298/0x340
[   35.981723]  platform_drv_probe+0x70/0xd0
[   35.985748]  really_probe+0x414/0x640
[   35.989424]  driver_probe_device+0x78/0xe8
[   35.993537]  device_driver_attach+0x9c/0xa8
[   35.997737]  __driver_attach+0x74/0x120
[   36.001589]  bus_for_each_dev+0xec/0x160
[   36.005527]  driver_attach+0x34/0x48
[   36.009118]  bus_add_driver+0x1b8/0x2c0
[   36.012968]  driver_register+0xc0/0x1e0
[   36.016819]  __platform_driver_register+0x80/0x90
[   36.021545]  mbigen_platform_driver_init+0x1c/0x28
[   36.026356]  do_one_initcall+0xd4/0x268
[   36.030207]  kernel_init_freeable+0x270/0x2d8
[   36.034583]  kernel_init+0x14/0x124
[   36.038084]  ret_from_fork+0x10/0x34
[   36.041671]
[   36.043162] Freed by task 1:
[   36.046053]  stack_trace_save+0x94/0xc8
[   36.049904]  kasan_save_stack+0x28/0x58
[   36.053754]  kasan_set_track+0x28/0x40
[   36.057518]  kasan_set_free_info+0x24/0x48
[   36.061631]  __kasan_slab_free+0x104/0x188
[   36.065744]  kasan_slab_free+0x14/0x20
[   36.069510]  slab_free_freelist_hook+0x8c/0x190
[   36.074060]  kfree+0x308/0x448
[   36.077125]  its_irq_domain_free+0x31c/0x338
[   36.081414]  irq_domain_free_irqs_common+0xd8/0x128
[   36.086314]  irq_domain_free_irqs_top+0x70/0x88
[   36.090864]  msi_domain_free+0xa8/0xc8
[   36.094629]  irq_domain_free_irqs_common+0xd8/0x128
[   36.099528]  platform_msi_domain_free+0xdc/0x1b0
[   36.104167]  mbigen_irq_domain_free+0x10/0x20
[   36.108543]  irq_domain_free_irqs+0x184/0x208
[   36.112918]  irq_dispose_mapping+0x54/0x98
[   36.117033]  devm_platform_get_irqs_affinity_release+0xbc/0xdc
[   36.122893]  release_nodes+0x350/0x3e8
[   36.126657]  devres_release_all+0x54/0x78
[   36.130683]  really_probe+0x234/0x640
[   36.134359]  driver_probe_device+0x78/0xe8
[   36.138473]  device_driver_attach+0x9c/0xa8
[   36.142673]  __driver_attach+0x74/0x120
[   36.146524]  bus_for_each_dev+0xec/0x160
[   36.150462]  driver_attach+0x34/0x48
[   36.154052]  bus_add_driver+0x1b8/0x2c0
[   36.157903]  driver_register+0xc0/0x1e0
[   36.161754]  __platform_driver_register+0x80/0x90
[   36.166479]  hisi_sas_v2_driver_init+0x1c/0x28
[   36.170942]  do_one_initcall+0xd4/0x268
[   36.174793]  kernel_init_freeable+0x270/0x2d8
[   36.179168]  kernel_init+0x14/0x124
[   36.182670]  ret_from_fork+0x10/0x34
[   36.186256]
[   36.187749] The buggy address belongs to the object at ffff04101bc93200
[   36.187749]  which belongs to the cache kmalloc-128 of size 128
[   36.200335] The buggy address is located 16 bytes inside of
[   36.200335]  128-byte region [ffff04101bc93200, ffff04101bc93280)
[   36.212046] The buggy address belongs to the page:
[   36.216864] page:(____ptrval____) refcount:1 mapcount:0 
mapping:0000000000000000 index:0x0 pfn:0x4101bc92
[   36.226479] head:(____ptrval____) order:1 compound_mapcount:0
[   36.232253] flags: 0x2bfffc0000010200(slab|head)
[   36.236895] raw: 2bfffc0000010200 dead000000000100 dead000000000122 
ffff00104000fc00
[   36.244679] raw: 0000000000000000 0000000000200020 00000001ffffffff 
0000000000000000
[   36.252459] page dumped because: kasan: bad access detected
[   36.258055]
[   36.259545] Memory state around the buggy address:
[   36.264358]  ffff04101bc93100: fa fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   36.271616]  ffff04101bc93180: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   36.278874] >ffff04101bc93200: fa fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   36.286130]                          ^
[   36.289893]  ffff04101bc93280: fc fc fc fc fc fc fc fc fc fc fc fc fc 
fc fc fc
[   36.297151]  ffff04101bc93300: fa fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   36.304406] 
==================================================================
[   36.311663] Disabling lock debugging due to kernel taint

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-23 15:45                               ` John Garry
@ 2020-11-24 16:51                                 ` Marc Zyngier
  2020-11-24 17:38                                   ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2020-11-24 16:51 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

On 2020-11-23 15:45, John Garry wrote:

Hi John,

>>> But it looks like there is more to it than that, which I'm worried is
>>> far from non-trivial. For example, just calling irq_dispose_mapping()
>>> for removal and then plaform_get_irq()->acpi_get_irq() second time
>>> fails as it looks like more tidy-up is needed for removal...
>> 
>> Most probably. I could imagine things failing if there is any trace
>> of an existing translation in the ITS or in the platform-MSI layer,
>> for example, or if the interrupt is still active...
> 
> So this looks to be a problem I have. So if I hack the code to skip
> the check in acpi_get_irq() for the irq already being init'ed, I run
> into a use-after-free in the gic-v3-its driver. I may be skipping
> something with this hack, but I'll ask anyway.
> 
> So initially in the msi_prepare method we setup the its dev - this is
> from the mbigen probe. Then when all the irqs are unmapped later for
> end device driver removal, we release this its device in
> its_irq_domain_free(). But I don't see anything to set it up again. Is
> it improper to have released the its device in this scenario?
> Commenting out the release makes things "good" again.

Huh, that's ugly. The issue is that the device that deals with the
interrupts isn't the device that the ITS knows about (there isn't a
1:1 mapping between mbigen and the endpoint).

The mbigen is responsible for the creation of the corresponding
irqdomain, and and crucially for the "prepare" phase, which results
in storing the its_dev pointer in info->scratchpad[0].

As we free all the interrupts associated with the endpoint, we
free the its_dev (nothing else needs it at this point). On the
next allocation, we reuse the damn its_dev pointer, and we're SOL.
This is wrong, because we haven't removed the mbigen, only the
device *connected* to the mbigen. And since the mbigen can be shared
across endpoints, we can't reliably tear it down at all. Boo.

The only thing to do is to convey that by marking the its_dev as
shared so that it isn't deleted when no LPIs are being used. After
all, it isn't like the mbigen is going anywhere.

It is just that passing that information down isn't a simple affair,
as msi_alloc_info_t isn't a generic type... Let me have a think.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-24 16:51                                 ` Marc Zyngier
@ 2020-11-24 17:38                                   ` John Garry
  2020-11-25 18:35                                     ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-24 17:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

Hi Marc,

>> So initially in the msi_prepare method we setup the its dev - this is
>> from the mbigen probe. Then when all the irqs are unmapped later for
>> end device driver removal, we release this its device in
>> its_irq_domain_free(). But I don't see anything to set it up again. Is
>> it improper to have released the its device in this scenario?
>> Commenting out the release makes things "good" again.
> 
> Huh, that's ugly. The issue is that the device that deals with the
> interrupts isn't the device that the ITS knows about (there isn't a
> 1:1 mapping between mbigen and the endpoint).
> 
> The mbigen is responsible for the creation of the corresponding
> irqdomain, and and crucially for the "prepare" phase, which results
> in storing the its_dev pointer in info->scratchpad[0].
> 
> As we free all the interrupts associated with the endpoint, we
> free the its_dev (nothing else needs it at this point). On the
> next allocation, we reuse the damn its_dev pointer, and we're SOL.
> This is wrong, because we haven't removed the mbigen, only the
> device *connected* to the mbigen. And since the mbigen can be shared
> across endpoints, we can't reliably tear it down at all. Boo.
> 
> The only thing to do is to convey that by marking the its_dev as
> shared so that it isn't deleted when no LPIs are being used. After
> all, it isn't like the mbigen is going anywhere.

Right, I did consider this.

> 
> It is just that passing that information down isn't a simple affair,
> as msi_alloc_info_t isn't a generic type... Let me have a think.

I think that there is a way to circumvent the problem, which you might 
call hacky, but OTOH, not sure if there's much point changing mbigen or 
related infrastructure at this stage.

Anyway, so we have 128 irqs in total for the mbigen domain, but the 
driver only is interesting in something like irq indexes 1,2,72-81, and 
96-112. So we can just dispose the mappings for irq index 0-112 at 
removal stage, thereby keeping the its device around. We do still call 
platform_irq_count(), which sets up all 128 mappings, so maybe we should 
be unmapping all of these - this would be the contentious part. But 
maybe not, as the device driver is only interested in that subset, and 
has no business unmapping the rest.

With that change, the platform.c API would work a bit more like the pci 
msi code equivalent, where we request a min and max number of vectors. 
In fact, that platform.c change needs to be made anyway as 
platform_get_irqs_affinity() is broken currently for when nr_cpus < #hw 
queues.

Thoughts?

Thanks,
John


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-24 17:38                                   ` John Garry
@ 2020-11-25 18:35                                     ` Marc Zyngier
  2020-11-26 10:47                                       ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2020-11-25 18:35 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

On 2020-11-24 17:38, John Garry wrote:
> Hi Marc,
> 
>>> So initially in the msi_prepare method we setup the its dev - this is
>>> from the mbigen probe. Then when all the irqs are unmapped later for
>>> end device driver removal, we release this its device in
>>> its_irq_domain_free(). But I don't see anything to set it up again. 
>>> Is
>>> it improper to have released the its device in this scenario?
>>> Commenting out the release makes things "good" again.
>> 
>> Huh, that's ugly. The issue is that the device that deals with the
>> interrupts isn't the device that the ITS knows about (there isn't a
>> 1:1 mapping between mbigen and the endpoint).
>> 
>> The mbigen is responsible for the creation of the corresponding
>> irqdomain, and and crucially for the "prepare" phase, which results
>> in storing the its_dev pointer in info->scratchpad[0].
>> 
>> As we free all the interrupts associated with the endpoint, we
>> free the its_dev (nothing else needs it at this point). On the
>> next allocation, we reuse the damn its_dev pointer, and we're SOL.
>> This is wrong, because we haven't removed the mbigen, only the
>> device *connected* to the mbigen. And since the mbigen can be shared
>> across endpoints, we can't reliably tear it down at all. Boo.
>> 
>> The only thing to do is to convey that by marking the its_dev as
>> shared so that it isn't deleted when no LPIs are being used. After
>> all, it isn't like the mbigen is going anywhere.
> 
> Right, I did consider this.

FWIW, I've pushed my hack branch[1] out with a couple of patches
for you to try (the top 3 patches). They allow platform-MSI domains
created by devices (mbigen, ICU) to be advertised as shared between
devices, so that the low-level driver can handle that in an appropriate
way.

I gave it a go on my D05 and nothing blew up, but I can't really remove
the kernel module, as that's where my disks are... :-/
Please let me know if that helps.

>> It is just that passing that information down isn't a simple affair,
>> as msi_alloc_info_t isn't a generic type... Let me have a think.
> 
> I think that there is a way to circumvent the problem, which you might
> call hacky, but OTOH, not sure if there's much point changing mbigen
> or related infrastructure at this stage.

Bah, it's a simple change, and there is now more than the mbigen using
the same API...

> 
> Anyway, so we have 128 irqs in total for the mbigen domain, but the
> driver only is interesting in something like irq indexes 1,2,72-81,
> and 96-112. So we can just dispose the mappings for irq index 0-112 at
> removal stage, thereby keeping the its device around. We do still call
> platform_irq_count(), which sets up all 128 mappings, so maybe we
> should be unmapping all of these - this would be the contentious part.
> But maybe not, as the device driver is only interested in that subset,
> and has no business unmapping the rest.

I don't think the driver should mess with interrupts it doesn't own.
And while the mbigen port that is connected to the SAS controller
doesn't seem to be shared between endpoints, some other ports definitely
are:

# cat /sys/kernel/debug/irq/domains/\\_SB.MBI1
name:   \_SB.MBI1
  size:   409
  mapped: 192
  flags:  0x00000003

[...]

I guess that the other 217 lines are connected somewhere.

> With that change, the platform.c API would work a bit more like the
> pci msi code equivalent, where we request a min and max number of
> vectors. In fact, that platform.c change needs to be made anyway as
> platform_get_irqs_affinity() is broken currently for when nr_cpus <
> #hw queues.
> 
> Thoughts?

I'm happy to look at some code! ;-)

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-25 18:35                                     ` Marc Zyngier
@ 2020-11-26 10:47                                       ` John Garry
  2020-11-26 11:08                                         ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-26 10:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

Hi Marc,

>> Right, I did consider this.
> 
> FWIW, I've pushed my hack branch[1]

Did you miss that reference?

> out with a couple of patches
> for you to try (the top 3 patches). They allow platform-MSI domains
> created by devices (mbigen, ICU) to be advertised as shared between
> devices, so that the low-level driver can handle that in an appropriate
> way.
> 
> I gave it a go on my D05 and nothing blew up, but I can't really remove
> the kernel module, as that's where my disks are... :-/

You still should be able to enable my favorite 
CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, while the distro still boot. But I'll 
just test if you want.

> Please let me know if that helps.
> 
>>> It is just that passing that information down isn't a simple affair,
>>> as msi_alloc_info_t isn't a generic type... Let me have a think.
>>
>> I think that there is a way to circumvent the problem, which you might
>> call hacky, but OTOH, not sure if there's much point changing mbigen
>> or related infrastructure at this stage.
> 
> Bah, it's a simple change, and there is now more than the mbigen using
> the same API...
> 
>>
>> Anyway, so we have 128 irqs in total for the mbigen domain, but the
>> driver only is interesting in something like irq indexes 1,2,72-81,
>> and 96-112. So we can just dispose the mappings for irq index 0-112 at
>> removal stage, thereby keeping the its device around. We do still call
>> platform_irq_count(), which sets up all 128 mappings, so maybe we
>> should be unmapping all of these - this would be the contentious part.
>> But maybe not, as the device driver is only interested in that subset,
>> and has no business unmapping the rest.
> 
> I don't think the driver should mess with interrupts it doesn't own.

I would tend to agree. But all 128 lines here are for the SAS 
controller. It's quite strange, as only about ~20 are useful.

> And while the mbigen port that is connected to the SAS controller
> doesn't seem to be shared between endpoints, some other ports definitely
> are:
> 
> # cat /sys/kernel/debug/irq/domains/\\_SB.MBI1
> name:   \_SB.MBI1
>   size:   409
>   mapped: 192
>   flags:  0x00000003
> 
> [...]
> 
> I guess that the other 217 lines are connected somewhere.
> 

I think that is not the right one. See 
https://github.com/tianocore/edk2-platforms/blob/master/Silicon/Hisilicon/Hi1616/D05AcpiTables/Dsdt/D05Sas.asl#L101


Thanks,
John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-26 10:47                                       ` John Garry
@ 2020-11-26 11:08                                         ` Marc Zyngier
  2020-11-26 11:29                                           ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2020-11-26 11:08 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

Hi John,

On 2020-11-26 10:47, John Garry wrote:
> Hi Marc,
> 
>>> Right, I did consider this.
>> 
>> FWIW, I've pushed my hack branch[1]
> 
> Did you miss that reference?

I did:

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/hacks

> 
>> out with a couple of patches
>> for you to try (the top 3 patches). They allow platform-MSI domains
>> created by devices (mbigen, ICU) to be advertised as shared between
>> devices, so that the low-level driver can handle that in an 
>> appropriate
>> way.
>> 
>> I gave it a go on my D05 and nothing blew up, but I can't really 
>> remove
>> the kernel module, as that's where my disks are... :-/
> 
> You still should be able to enable my favorite
> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, while the distro still boot. But
> I'll just test if you want.

Ah! Let me try that then. Having been debugging some ugly driver removal
lately, I wish I had known that config option and inflicted it on their
authors...

[...]

>> And while the mbigen port that is connected to the SAS controller
>> doesn't seem to be shared between endpoints, some other ports 
>> definitely
>> are:
>> 
>> # cat /sys/kernel/debug/irq/domains/\\_SB.MBI1
>> name:   \_SB.MBI1
>>   size:   409
>>   mapped: 192
>>   flags:  0x00000003
>> 
>> [...]
>> 
>> I guess that the other 217 lines are connected somewhere.
>> 
> 
> I think that is not the right one. See
> https://github.com/tianocore/edk2-platforms/blob/master/Silicon/Hisilicon/Hi1616/D05AcpiTables/Dsdt/D05Sas.asl#L101
> 

I know, I was just outlining the fact that some of the mbigen
ports are shared between devices, and MBI1 seems to be an example
of that (though my reading of ASL is... primitive).

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-26 11:08                                         ` Marc Zyngier
@ 2020-11-26 11:29                                           ` John Garry
  2020-11-26 16:52                                             ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-26 11:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

Hi Marc,

> 
> I did:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/hacks 

ok, I'll have a look

>>
>> You still should be able to enable my favorite
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, while the distro still boot. But
>> I'll just test if you want.
> 
> Ah! Let me try that then. Having been debugging some ugly driver removal
> lately, I wish I had known that config option and inflicted it on their
> authors...

Just in case - be careful with this one! It kills the boot of many 
systems, but D05 should be fine.

> 
> [...]
> 
>>> And while the mbigen port that is connected to the SAS controller
>>> doesn't seem to be shared between endpoints, some other ports definitely
>>> are:
>>>
>>> # cat /sys/kernel/debug/irq/domains/\\_SB.MBI1
>>> name:   \_SB.MBI1
>>>   size:   409
>>>   mapped: 192
>>>   flags:  0x00000003
>>>
>>> [...]
>>>
>>> I guess that the other 217 lines are connected somewhere.
>>>
>>
>> I think that is not the right one. See
>> https://github.com/tianocore/edk2-platforms/blob/master/Silicon/Hisilicon/Hi1616/D05AcpiTables/Dsdt/D05Sas.asl#L101 
>>
>>
> 
> I know, I was just outlining the fact that some of the mbigen
> ports are shared between devices, and MBI1 seems to be an example
> of that (though my reading of ASL is... primitive).

Ah, ok. So this one is for the networking subsystem, and it does have 
many devices in the topology model, but I can't claim to known much more 
than that.

Thanks,
John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-26 11:29                                           ` John Garry
@ 2020-11-26 16:52                                             ` John Garry
  2020-11-27  9:57                                               ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-26 16:52 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

Hi Marc,

>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/hacks 
> 
> 
> ok, I'll have a look

I tried that and it doesn't look to work.

I find that for the its_msi_prepare() call, its_dev->shared does not get 
set, as MSI_ALLOC_FLAGS_SHARED_DEVICE is not set in info->flags.

If I understand the code correctly, MSI_ALLOC_FLAGS_SHARED_DEVICE is 
supposed to be set in info->flags in platform_msi_set_desc(), but this 
is called per-msi after its_msi_prepare(), so we don't the flags set at 
the right time. That's how it looks to me...

Cheers,
John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-26 16:52                                             ` John Garry
@ 2020-11-27  9:57                                               ` Marc Zyngier
  2020-11-27 12:45                                                 ` John Garry
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Zyngier @ 2020-11-27  9:57 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

On 2020-11-26 16:52, John Garry wrote:
> Hi Marc,
> 
>>> 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/hacks
>> 
>> 
>> ok, I'll have a look
> 
> I tried that and it doesn't look to work.
> 
> I find that for the its_msi_prepare() call, its_dev->shared does not
> get set, as MSI_ALLOC_FLAGS_SHARED_DEVICE is not set in info->flags.
> 
> If I understand the code correctly, MSI_ALLOC_FLAGS_SHARED_DEVICE is
> supposed to be set in info->flags in platform_msi_set_desc(), but this
> is called per-msi after its_msi_prepare(), so we don't the flags set
> at the right time. That's how it looks to me...

Meh. I was trying multiple things, and of course commited the worse
possible approach.

I've updated the branch, having verified that we do get the flag in
the ITS now.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-27  9:57                                               ` Marc Zyngier
@ 2020-11-27 12:45                                                 ` John Garry
  2020-11-27 12:49                                                   ` Marc Zyngier
  0 siblings, 1 reply; 29+ messages in thread
From: John Garry @ 2020-11-27 12:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

On 27/11/2020 09:57, Marc Zyngier wrote:
>> If I understand the code correctly, MSI_ALLOC_FLAGS_SHARED_DEVICE is
>> supposed to be set in info->flags in platform_msi_set_desc(), but this
>> is called per-msi after its_msi_prepare(), so we don't the flags set
>> at the right time. That's how it looks to me...
> 
> Meh. I was trying multiple things, and of course commited the worse
> possible approach.
> 
> I've updated the branch, having verified that we do get the flag in
> the ITS now.

Hi Marc,

That looks to work.

So do you have an upstream plan for this? I ask, as if you go with this, 
then I may change my series to map and unmap all the irqs again - but 
not sure about that.

Thanks,
John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
  2020-11-27 12:45                                                 ` John Garry
@ 2020-11-27 12:49                                                   ` Marc Zyngier
  0 siblings, 0 replies; 29+ messages in thread
From: Marc Zyngier @ 2020-11-27 12:49 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, gregkh, rafael, martin.petersen, jejb, linuxarm,
	linux-scsi, linux-kernel

On 2020-11-27 12:45, John Garry wrote:
> On 27/11/2020 09:57, Marc Zyngier wrote:
>>> If I understand the code correctly, MSI_ALLOC_FLAGS_SHARED_DEVICE is
>>> supposed to be set in info->flags in platform_msi_set_desc(), but 
>>> this
>>> is called per-msi after its_msi_prepare(), so we don't the flags set
>>> at the right time. That's how it looks to me...
>> 
>> Meh. I was trying multiple things, and of course commited the worse
>> possible approach.
>> 
>> I've updated the branch, having verified that we do get the flag in
>> the ITS now.
> 
> Hi Marc,
> 
> That looks to work.

Thanks for having given it a go.

> So do you have an upstream plan for this? I ask, as if you go with
> this, then I may change my series to map and unmap all the irqs again
> - but not sure about that.

I'll write some commit messages, and post that. Either this weekend,
or on Monday at the latest.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-11-27 12:49 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28 12:33 [PATCH v2 0/3] Support managed interrupts for platform devices John Garry
2020-10-28 12:33 ` [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc() John Garry
2020-10-28 18:22   ` Thomas Gleixner
2020-11-02 17:32     ` John Garry
2020-11-02 20:35       ` Thomas Gleixner
2020-11-17 21:28         ` Thomas Gleixner
2020-11-18 11:34           ` John Garry
2020-11-18 20:38             ` Thomas Gleixner
2020-11-19  9:31               ` John Garry
2020-11-19 18:09                 ` Thomas Gleixner
2020-11-19 19:56                   ` John Garry
2020-11-19 21:03                     ` Thomas Gleixner
2020-11-20 11:52                       ` John Garry
2020-11-22 13:38                         ` Marc Zyngier
2020-11-23 12:54                           ` John Garry
2020-11-23 13:26                             ` Marc Zyngier
2020-11-23 15:45                               ` John Garry
2020-11-24 16:51                                 ` Marc Zyngier
2020-11-24 17:38                                   ` John Garry
2020-11-25 18:35                                     ` Marc Zyngier
2020-11-26 10:47                                       ` John Garry
2020-11-26 11:08                                         ` Marc Zyngier
2020-11-26 11:29                                           ` John Garry
2020-11-26 16:52                                             ` John Garry
2020-11-27  9:57                                               ` Marc Zyngier
2020-11-27 12:45                                                 ` John Garry
2020-11-27 12:49                                                   ` Marc Zyngier
2020-10-28 12:33 ` [PATCH v2 2/3] Driver core: platform: Add platform_get_irqs_affinity() John Garry
2020-10-28 12:33 ` [PATCH v2 3/3] scsi: hisi_sas: Expose HW queues for v2 hw John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).