About irq_create_affinity_masks() for a platform device driver

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* About irq_create_affinity_masks() for a platform device driver
@ 2020-01-22 10:09 John Garry
  2020-01-22 10:59 ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: John Garry @ 2020-01-22 10:09 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Marc Zyngier, linux-kernel, chenxiang

Hi Thomas,

A question about the function in the $subject:

Would there be any issue with a SCSI platform device driver referencing 
this function?

So I have a multi-queue platform device, and I want to spread interrupts 
over all possible CPUs, just like we can do for PCI MSI vectors. This 
topic was touched on in [0].

And, if so it's ok, could we export that same symbol?

Cheers,
John

[0] 
https://lore.kernel.org/lkml/88d64d51-4344-e908-b55b-0583b0137ddf@huawei.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: About irq_create_affinity_masks() for a platform device driver
  2020-01-22 10:09 About irq_create_affinity_masks() for a platform device driver John Garry
@ 2020-01-22 10:59 ` Thomas Gleixner
  2020-01-22 11:27   ` John Garry
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2020-01-22 10:59 UTC (permalink / raw)
  To: John Garry; +Cc: Marc Zyngier, linux-kernel, chenxiang

John,

John Garry <john.garry@huawei.com> writes:
> Would there be any issue with a SCSI platform device driver referencing 
> this function?
>
> So I have a multi-queue platform device, and I want to spread interrupts 
> over all possible CPUs, just like we can do for PCI MSI vectors. This 
> topic was touched on in [0].
>
> And, if so it's ok, could we export that same symbol?

I think you will need something similar to what we have in the pci/msi
code, but that shouldn't be in your device driver. So I'd rather create
platform infrastructure for this and export that.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: About irq_create_affinity_masks() for a platform device driver
  2020-01-22 10:59 ` Thomas Gleixner
@ 2020-01-22 11:27   ` John Garry
  2020-01-31 14:25     ` John Garry
  0 siblings, 1 reply; 8+ messages in thread
From: John Garry @ 2020-01-22 11:27 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Marc Zyngier, linux-kernel, chenxiang

On 22/01/2020 10:59, Thomas Gleixner wrote:

Hi Thomas,

> John Garry <john.garry@huawei.com> writes:
>> Would there be any issue with a SCSI platform device driver referencing
>> this function?
>>
>> So I have a multi-queue platform device, and I want to spread interrupts
>> over all possible CPUs, just like we can do for PCI MSI vectors. This
>> topic was touched on in [0].
>>
>> And, if so it's ok, could we export that same symbol?
> 
> I think you will need something similar to what we have in the pci/msi
> code, but that shouldn't be in your device driver. So I'd rather create
> platform infrastructure for this and export that.
> 

That would seem the proper thing do to.

So I was doing this for legacy hw as a cheap and quick performance 
boost, but I doubt how many other users there would be in future for any 
new API. Also, the effort could be more than the reward and so I may 
consider dropping the whole idea.

But I'll have a play with how the code could look now.

Cheers,
john

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: About irq_create_affinity_masks() for a platform device driver
  2020-01-22 11:27   ` John Garry
@ 2020-01-31 14:25     ` John Garry
  2020-01-31 21:41       ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: John Garry @ 2020-01-31 14:25 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Marc Zyngier, linux-kernel, chenxiang


>> John Garry <john.garry@huawei.com> writes:
>>> Would there be any issue with a SCSI platform device driver referencing
>>> this function?
>>>
>>> So I have a multi-queue platform device, and I want to spread interrupts
>>> over all possible CPUs, just like we can do for PCI MSI vectors. This
>>> topic was touched on in [0].
>>>
>>> And, if so it's ok, could we export that same symbol?
>>

Hi Thomas,

>> I think you will need something similar to what we have in the pci/msi
>> code, but that shouldn't be in your device driver. So I'd rather create
>> platform infrastructure for this and export that.
>>
> 
> That would seem the proper thing do to.
> 
> So I was doing this for legacy hw as a cheap and quick performance 
> boost, but I doubt how many other users there would be in future for any 
> new API. Also, the effort could be more than the reward and so I may 
> consider dropping the whole idea.
> 
> But I'll have a play with how the code could look now.

So I'd figure that an API like this would be required:

--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -11,6 +11,7 @@
  #define _PLATFORM_DEVICE_H_

  #include <linux/device.h>
+#include <linux/interrupt.h>

  #define PLATFORM_DEVID_NONE	(-1)
  #define PLATFORM_DEVID_AUTO	(-2)
@@ -27,6 +28,7 @@ struct platform_device {
  	u64		dma_mask;
  	u32		num_resources;
  	struct resource	*resource;
+	struct irq_affinity_desc *desc;

and in platform.c, adding:

/**
  * platform_get_irqs_affinity - get all IRQs for a device with affinity
  * @dev: platform device
  * @affd: Affinity descriptor
  * @count: pointer to count of IRQS
  * @irqs: pointer holder for irqs numbers
  *
  * Gets a full set of IRQs for a platform device
  *
  * Return: 0 on success, negative error number on failure.
  */
int platform_get_irqs_affinity(struct platform_device *dev, struct 
irq_affinity *affd, unsigned int *count, int **irqs)
{
	int i;
	int *pirqs;

	if (ACPI_COMPANION(&dev->dev)) {
		*count = acpi_irq_get_count(ACPI_HANDLE(&dev->dev));
	} else {
		// TODO
	}

	pirqs = kzalloc(*count * sizeof(int), GFP_KERNEL);
	if (!pirqs)
		return -ENOMEM;

	dev->desc = irq_create_affinity_masks(*count, affd);
	if (!dev->desc) {
		kfree(irqs);
		return -ENOMEM;
	}

	for (i = 0; i < *count; i++) {
		pirqs[i] = platform_get_irq(dev, i);
		if (irqs[i] < 0) {
			kfree(dev->desc);
			kfree(irqs);
			return -ENOMEM;
		}
	}

	*irqs = pirqs;

	return 0;
}
EXPORT_SYMBOL_GPL(platform_get_irqs_affinity);

Here we pass the affinity descriptor and allocate all IRQs for a device.

So this is less than a half-baked solution. We only create the affinity 
masks but do nothing with them, and the actual irq_desc 's generated 
would not would have their affinity mask set and would not be managed. 
Only the platform device driver itself would access the masks, to set 
the irq affinity hint, etc.

To achieve the proper result, we would somehow need to pass the per-IRQ 
affinity descriptor all the way down through 
platform_get_irq()->acpi_irq_get()->irq_create_fwspec_mapping()->irq_domain_alloc_irqs(), 
which could involve disruptive changes in different subsystems - not 
welcome, I'd say.

I could take the alt approach to generate the interrupt affinity masks 
in my LLDD instead. Considering I know some of the CPU and numa node 
properties of the device host, I could generate the masks in the LLDD 
itself simply, but I still would rather avoid this if possible and use 
standard APIs.

So if there are any better ideas on this, then it would be good to hear 
them.

Thanks,
john


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: About irq_create_affinity_masks() for a platform device driver
  2020-01-31 14:25     ` John Garry
@ 2020-01-31 21:41       ` Thomas Gleixner
  2020-02-03 15:00         ` John Garry
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2020-01-31 21:41 UTC (permalink / raw)
  To: John Garry; +Cc: Marc Zyngier, linux-kernel, chenxiang

John!

John Garry <john.garry@huawei.com> writes:
> So I'd figure that an API like this would be required:
>
> --- a/include/linux/platform_device.h
> +++ b/include/linux/platform_device.h
> @@ -11,6 +11,7 @@
>   #define _PLATFORM_DEVICE_H_
>
>   #include <linux/device.h>
> +#include <linux/interrupt.h>
>
>   #define PLATFORM_DEVID_NONE	(-1)
>   #define PLATFORM_DEVID_AUTO	(-2)
> @@ -27,6 +28,7 @@ struct platform_device {
>   	u64		dma_mask;
>   	u32		num_resources;
>   	struct resource	*resource;
> +	struct irq_affinity_desc *desc;
>
> and in platform.c, adding:
>
> /**
>   * platform_get_irqs_affinity - get all IRQs for a device with affinity
>   * @dev: platform device
>   * @affd: Affinity descriptor
>   * @count: pointer to count of IRQS
>   * @irqs: pointer holder for irqs numbers
>   *
>   * Gets a full set of IRQs for a platform device
>   *
>   * Return: 0 on success, negative error number on failure.
>   */
> int platform_get_irqs_affinity(struct platform_device *dev, struct 
> irq_affinity *affd, unsigned int *count, int **irqs)
> {
> 	int i;
> 	int *pirqs;
>
> 	if (ACPI_COMPANION(&dev->dev)) {
> 		*count = acpi_irq_get_count(ACPI_HANDLE(&dev->dev));
> 	} else {
> 		// TODO
> 	}
>
> 	pirqs = kzalloc(*count * sizeof(int), GFP_KERNEL);
> 	if (!pirqs)
> 		return -ENOMEM;
>
> 	dev->desc = irq_create_affinity_masks(*count, affd);
> 	if (!dev->desc) {
> 		kfree(irqs);

pirqs I assume and this also leaks the affinity masks and the pointer in
dev.

> 		return -ENOMEM;
> 	}
>
> 	for (i = 0; i < *count; i++) {
> 		pirqs[i] = platform_get_irq(dev, i);
> 		if (irqs[i] < 0) {
> 			kfree(dev->desc);
> 			kfree(irqs);
> 			return -ENOMEM;

That's obviously broken as well :)

> 		}
> 	}
>
> 	*irqs = pirqs;
>
> 	return 0;
> }
> EXPORT_SYMBOL_GPL(platform_get_irqs_affinity);
>
> Here we pass the affinity descriptor and allocate all IRQs for a device.
>
> So this is less than a half-baked solution. We only create the affinity 
> masks but do nothing with them, and the actual irq_desc 's generated 
> would not would have their affinity mask set and would not be managed. 
> Only the platform device driver itself would access the masks, to set 
> the irq affinity hint, etc.
>
> To achieve the proper result, we would somehow need to pass the
> per-IRQ affinity descriptor all the way down through
> platform_get_irq()->acpi_irq_get()->irq_create_fwspec_mapping()->irq_domain_alloc_irqs(),
> which could involve disruptive changes in different subsystems - not
> welcome, I'd say.
>
> I could take the alt approach to generate the interrupt affinity masks 
> in my LLDD instead. Considering I know some of the CPU and numa node 
> properties of the device host, I could generate the masks in the LLDD 
> itself simply, but I still would rather avoid this if possible and use 
> standard APIs.
>
> So if there are any better ideas on this, then it would be good to hear 
> them.

I wouldn't mind to expose a function which allows you to switch the
allocated interrupts to managed. The reason why we do it in one go in
the PCI code is that we get automatically the irq descriptors allocated
on the correct node. So if the node aware allocation is not a
showstopper for this then your function would do:

	...
	for (i = 0; i < count; i++) {
		pirqs[i] = platform_get_irq(dev, i);

                irq_update_affinity_desc(pirqs[i], affdescs + i);

        }

int irq_update_affinity_desc(unsigned int irq, irq_affinity_desc *affinity)
{
	unsigned long flags;
	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);

        if (!desc)
        	return -EINVAL;

        if (affinity->is_managed) {
        	irqd_set(&desc->irq_data, IRQD_IRQ_DISABLED);
	        irqd_set(&desc->irq_data, IRQD_IRQ_MASKED);
        }
        cpumask_copy(desc->irq_common_data.affinity, affinity);
        return 0;
}

That should just work.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: About irq_create_affinity_masks() for a platform device driver
  2020-01-31 21:41       ` Thomas Gleixner
@ 2020-02-03 15:00         ` John Garry
  2020-02-04  9:20           ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: John Garry @ 2020-02-03 15:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Marc Zyngier, linux-kernel, chenxiang

Hi Thomas,

>>
>> 	pirqs = kzalloc(*count * sizeof(int), GFP_KERNEL);
>> 	if (!pirqs)
>> 		return -ENOMEM;
>>
>> 	dev->desc = irq_create_affinity_masks(*count, affd);
>> 	if (!dev->desc) {
>> 		kfree(irqs);
> 
> pirqs I assume and this also leaks the affinity masks and the pointer in
> dev.

Right

> 
>> 		return -ENOMEM;
>> 	}
>>
>> 	for (i = 0; i < *count; i++) {
>> 		pirqs[i] = platform_get_irq(dev, i);
>> 		if (irqs[i] < 0) {
>> 			kfree(dev->desc);
>> 			kfree(irqs);
>> 			return -ENOMEM;
> 
> That's obviously broken as well :)

Right, again

> 
>> 		}
>> 	}
>>
>> 	*irqs = pirqs;
>>
>> 	return 0;
>> }
>> EXPORT_SYMBOL_GPL(platform_get_irqs_affinity);

[...]

> 
> I wouldn't mind to expose a function which allows you to switch the
> allocated interrupts to managed. The reason why we do it in one go in
> the PCI code is that we get automatically the irq descriptors allocated
> on the correct node. So if the node aware allocation is not a
> showstopper 

I wouldn't say so for now.

for this then your function would do:
> 
> 	...
> 	for (i = 0; i < count; i++) {
> 		pirqs[i] = platform_get_irq(dev, i);
> 
>                  irq_update_affinity_desc(pirqs[i], affdescs + i);
> 
>          }
> 
> int irq_update_affinity_desc(unsigned int irq, irq_affinity_desc *affinity)
> {
> 	unsigned long flags;
> 	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
> 
>          if (!desc)
>          	return -EINVAL;
> 
>          if (affinity->is_managed) {
>          	irqd_set(&desc->irq_data, IRQD_IRQ_DISABLED);
> 	        irqd_set(&desc->irq_data, IRQD_IRQ_MASKED);

Are these correct? I assume we want to follow alloc_descs() here.

>          }
>          cpumask_copy(desc->irq_common_data.affinity, affinity);
>          return 0;
> }

I see. So I made a couple of changes and it did work:

int irq_update_affinity_desc(unsigned int irq, struct irq_affinity_desc 
*affinity)
{
	unsigned long flags;
	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);

	if (!desc)
		return -EINVAL;

	if (affinity->is_managed) {
		irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);
		irqd_set(&desc->irq_data, IRQD_MANAGED_SHUTDOWN);
	}

	cpumask_copy(desc->irq_common_data.affinity, &affinity->mask);
	irq_put_desc_unlock(desc, flags);
	return 0;
}

And if we were to go this way, then we don't need to add the pointer in 
struct platform_device to hold affinity mask descriptors as we're using 
them immediately. Or even have a single function to do it all in the irq 
code (create the masks and update the affinity desc).

And since we're just updating the masks, I figure we shouldn't need to 
add acpi_irq_get_count(), which I invented to get the irq count (without 
creating the IRQ mapping).

Thanks,
John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: About irq_create_affinity_masks() for a platform device driver
  2020-02-03 15:00         ` John Garry
@ 2020-02-04  9:20           ` Thomas Gleixner
  2020-02-04  9:55             ` John Garry
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2020-02-04  9:20 UTC (permalink / raw)
  To: John Garry; +Cc: Marc Zyngier, linux-kernel, chenxiang

John,

John Garry <john.garry@huawei.com> writes:
>> I wouldn't mind to expose a function which allows you to switch the
>> allocated interrupts to managed. The reason why we do it in one go in
>> the PCI code is that we get automatically the irq descriptors allocated
>> on the correct node. So if the node aware allocation is not a
>> showstopper 
>
> I wouldn't say so for now.

Good.

> for this then your function would do:
>> 
>> 	...
>> 	for (i = 0; i < count; i++) {
>> 		pirqs[i] = platform_get_irq(dev, i);
>> 
>>                  irq_update_affinity_desc(pirqs[i], affdescs + i);
>> 
>>          }
>> 
>> int irq_update_affinity_desc(unsigned int irq, irq_affinity_desc *affinity)
>> {
>> 	unsigned long flags;
>> 	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
>> 
>>          if (!desc)
>>          	return -EINVAL;
>> 
>>          if (affinity->is_managed) {
>>          	irqd_set(&desc->irq_data, IRQD_IRQ_DISABLED);
>> 	        irqd_set(&desc->irq_data, IRQD_IRQ_MASKED);
>
> Are these correct? I assume we want to follow alloc_descs() here.

Yeah, copied the wrong chunk :)

>>          }
>>          cpumask_copy(desc->irq_common_data.affinity, affinity);
>>          return 0;
>> }
>
> I see. So I made a couple of changes and it did work:
>
> int irq_update_affinity_desc(unsigned int irq, struct irq_affinity_desc 
> *affinity)
> {
> 	unsigned long flags;
> 	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
>
> 	if (!desc)
> 		return -EINVAL;
>
> 	if (affinity->is_managed) {
> 		irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);
> 		irqd_set(&desc->irq_data, IRQD_MANAGED_SHUTDOWN);
> 	}
>
> 	cpumask_copy(desc->irq_common_data.affinity, &affinity->mask);
> 	irq_put_desc_unlock(desc, flags);
> 	return 0;
> }

Looks correct.

> And if we were to go this way, then we don't need to add the pointer in 
> struct platform_device to hold affinity mask descriptors as we're using 
> them immediately. Or even have a single function to do it all in the irq 
> code (create the masks and update the affinity desc).
>
> And since we're just updating the masks, I figure we shouldn't need to 
> add acpi_irq_get_count(), which I invented to get the irq count (without 
> creating the IRQ mapping).

Yes, you can create and apply the masks after setting up the interrupts.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: About irq_create_affinity_masks() for a platform device driver
  2020-02-04  9:20           ` Thomas Gleixner
@ 2020-02-04  9:55             ` John Garry
  0 siblings, 0 replies; 8+ messages in thread
From: John Garry @ 2020-02-04  9:55 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Marc Zyngier, linux-kernel, chenxiang

Hi Thomas,

> 
>> And if we were to go this way, then we don't need to add the pointer in
>> struct platform_device to hold affinity mask descriptors as we're using
>> them immediately. Or even have a single function to do it all in the irq
>> code (create the masks and update the affinity desc).
>>
>> And since we're just updating the masks, I figure we shouldn't need to
>> add acpi_irq_get_count(), which I invented to get the irq count (without
>> creating the IRQ mapping).
> Yes, you can create and apply the masks after setting up the interrupts.
> 

So my original concern was that the changes here would be quite 
disruptive, but that does not look to be the case.

I need a couple of more things to go into the kernel before I can safely 
switch to use managed interrupts in the LLDD, like "blk-mq: improvement 
CPU hotplug" series, so I will need to wait on that for the moment.

Thanks,
John

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-02-04  9:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-22 10:09 About irq_create_affinity_masks() for a platform device driver John Garry
2020-01-22 10:59 ` Thomas Gleixner
2020-01-22 11:27   ` John Garry
2020-01-31 14:25     ` John Garry
2020-01-31 21:41       ` Thomas Gleixner
2020-02-03 15:00         ` John Garry
2020-02-04  9:20           ` Thomas Gleixner
2020-02-04  9:55             ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).