x86/apic: Update virtual irq base for DT/OF based system as well
diff mbox series

Message ID 20190821081330.1187-1-rahul.tanwar@linux.intel.com
State New
Headers show
Series
  • x86/apic: Update virtual irq base for DT/OF based system as well
Related show

Commit Message

Rahul Tanwar Aug. 21, 2019, 8:13 a.m. UTC
'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is
updated to the end of hardware IRQ numbers but this is done only when IOAPIC
configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is
a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration
comes from devicetree.
Please see dtb_add_ioapic() in arch/x86/kernel/devicetree.c

In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base'
remains to zero initialized value. This means that for OF based systems,
virtual IRQ base will get set to zero. Zero value for a virtual IRQ is a
invalid value.
Please see https://yarchive.net/comp/linux/zero.html for more details.

Update 'ioapic_dynirq_base' for IOAPIC_DOMAIN_DYNAMIC case as well just like
it is presently updated for IOAPIC_DOMAIN_LEGACY & IOAPIC_DOMAIN_STRICT i.e.
set the virtual IRQ base to the end of hardware IRQ numbers when IOAPIC
configuration comes from devicetree.

Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
---
 arch/x86/kernel/apic/io_apic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Thomas Gleixner Aug. 21, 2019, 8:34 a.m. UTC | #1
On Wed, 21 Aug 2019, Rahul Tanwar wrote:

> 'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is
> updated to the end of hardware IRQ numbers but this is done only when IOAPIC
> configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is
> a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration
> comes from devicetree.
> Please see dtb_add_ioapic() in arch/x86/kernel/devicetree.c

We know how DT based ioapics are added. No need to point to it.

> In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base'
> remains to zero initialized value. This means that for OF based systems,
> virtual IRQ base will get set to zero. Zero value for a virtual IRQ is a
> invalid value.
> Please see https://yarchive.net/comp/linux/zero.html for more details.

First of all, please do not use random archive links. See
Documentation/process/ how links to LKML archives should look like

Secondly, this link is irrelevant. ioapic_dynirq_base has nothing to do
with virtual IRQ number 0. It's a boundary for the dynamic allocation of
virtual interrupt numbers so that the core allocator does not pick
interrupts out of the IOAPIC's fixed interrupt number space.

This can be legitimately 0 when IOAPIC is not enabled at all.

Can you please explain what kind of problem you were seing and what this
really fixes?
 
Thanks,

	tglx
Rahul Tanwar Aug. 21, 2019, 9:31 a.m. UTC | #2
On 21/8/2019 4:34 PM, Thomas Gleixner wrote:

> Secondly, this link is irrelevant. ioapic_dynirq_base has nothing to do
> with virtual IRQ number 0. It's a boundary for the dynamic allocation of
> virtual interrupt numbers so that the core allocator does not pick
> interrupts out of the IOAPIC's fixed interrupt number space.
>
> This can be legitimately 0 when IOAPIC is not enabled at all.
>
> Can you please explain what kind of problem you were seing and what this
> really fixes?

The problem is that device tree infrastructure considers 0 IRQ value as 
invalid/error

value whereas for ACPI, 0 is a valid value. Without this change, the 
problem that we

see is that the first driver using of_irq_get_xx() or its variants fails 
because of 0 IRQ

number. With this change, allocated IRQ number is never 0 so it works ok.
Thomas Gleixner Aug. 21, 2019, 11:20 a.m. UTC | #3
On Wed, 21 Aug 2019, Tanwar, Rahul wrote:
> On 21/8/2019 4:34 PM, Thomas Gleixner wrote:
> 
> > Secondly, this link is irrelevant. ioapic_dynirq_base has nothing to do
> > with virtual IRQ number 0. It's a boundary for the dynamic allocation of
> > virtual interrupt numbers so that the core allocator does not pick
> > interrupts out of the IOAPIC's fixed interrupt number space.
> > 
> > This can be legitimately 0 when IOAPIC is not enabled at all.
> > 
> > Can you please explain what kind of problem you were seing and what this
> > really fixes?
>
> The problem is that device tree infrastructure considers 0 IRQ value as
> invalid/error value whereas for ACPI, 0 is a valid value.

Sure.

> Without this change, the problem that we see is that the first driver
> using of_irq_get_xx() or its variants fails because of 0 IRQ number. With
> this change, allocated IRQ number is never 0 so it works ok.

Well, this still is not a proper explanation. Just because it works does
not make it correct in the first place.

ioapic_dynirq_base is pretty much irrelevant for a DT machine. The reason
why it exists is that for regular BIOS the interrupt numbers are hard
mapped to the IOAPIC pins. ioapic_dynirq_base is used to protect this hard
mapped interrupt number space. The core allocator does not allocate from
that space unless it is explicitely told to do so, which is the case for
IOAPIC_DOMAIN_STRICT where the allocation tells the core to allocate the
associated GSI number.

On DT the interrupt number is irrelevant as DT describes the irq controller
and the pin to which a device is connected and does not make assumptions
about the interrupt number. So the core can freely allocate any available
interrupt number except 0. That's already prevented in the core code.

But x86 implements arch_dynirq_lower_bound() which overrides the core limit
and because ioapic_dynirq_base is zero in the DT case it allows VIRQ 0 to
be allocated which then causes of_irq*() to fail.

So your change prevents that by excluding the 'GSI' range from allocation,
which means that the first irq number which is handed out is 24, assumed
you have one IOAPIC with 24 pins.

That's fine as the interrupt number space is big enough, but it needs

    - a coherent explanation in the changelog

    - proper comments to that effect in the code

Also this is presumably a stable candidate and needs a Fixes: ... tag.

Thanks,

	tglx
Andy Shevchenko Aug. 21, 2019, 12:34 p.m. UTC | #4
On Wed, Aug 21, 2019 at 01:20:53PM +0200, Thomas Gleixner wrote:
> On Wed, 21 Aug 2019, Tanwar, Rahul wrote:
> > On 21/8/2019 4:34 PM, Thomas Gleixner wrote:
> > 
> > > Secondly, this link is irrelevant. ioapic_dynirq_base has nothing to do
> > > with virtual IRQ number 0. It's a boundary for the dynamic allocation of
> > > virtual interrupt numbers so that the core allocator does not pick
> > > interrupts out of the IOAPIC's fixed interrupt number space.
> > > 
> > > This can be legitimately 0 when IOAPIC is not enabled at all.
> > > 
> > > Can you please explain what kind of problem you were seing and what this
> > > really fixes?
> >
> > The problem is that device tree infrastructure considers 0 IRQ value as
> > invalid/error value whereas for ACPI, 0 is a valid value.
> 
> Sure.
> 
> > Without this change, the problem that we see is that the first driver
> > using of_irq_get_xx() or its variants fails because of 0 IRQ number. With
> > this change, allocated IRQ number is never 0 so it works ok.
> 
> Well, this still is not a proper explanation. Just because it works does
> not make it correct in the first place.
> 
> ioapic_dynirq_base is pretty much irrelevant for a DT machine. The reason
> why it exists is that for regular BIOS the interrupt numbers are hard
> mapped to the IOAPIC pins. ioapic_dynirq_base is used to protect this hard
> mapped interrupt number space. The core allocator does not allocate from
> that space unless it is explicitely told to do so, which is the case for
> IOAPIC_DOMAIN_STRICT where the allocation tells the core to allocate the
> associated GSI number.
> 
> On DT the interrupt number is irrelevant as DT describes the irq controller
> and the pin to which a device is connected and does not make assumptions
> about the interrupt number. So the core can freely allocate any available
> interrupt number except 0. That's already prevented in the core code.
> 
> But x86 implements arch_dynirq_lower_bound() which overrides the core limit
> and because ioapic_dynirq_base is zero in the DT case it allows VIRQ 0 to
> be allocated which then causes of_irq*() to fail.
> 
> So your change prevents that by excluding the 'GSI' range from allocation,
> which means that the first irq number which is handed out is 24, assumed
> you have one IOAPIC with 24 pins.

I have tested this on the ACPI-based system where we have 55 lines of IOAPIC,
no PIC, and some GPIO lines. Overall I see that nr_irqs is 512 and shifting
by 55 freezes 10% of the space for nothing. Luckily we have SPARSE_IRQS
selected for any X86, so, it wouldn't waste memory.

I think we may do slightly better if we just limit the change to the certain
cases.
Thomas Gleixner Aug. 21, 2019, 1:16 p.m. UTC | #5
On Wed, 21 Aug 2019, Andy Shevchenko wrote:
> On Wed, Aug 21, 2019 at 01:20:53PM +0200, Thomas Gleixner wrote:
> > But x86 implements arch_dynirq_lower_bound() which overrides the core limit
> > and because ioapic_dynirq_base is zero in the DT case it allows VIRQ 0 to
> > be allocated which then causes of_irq*() to fail.
> > 
> > So your change prevents that by excluding the 'GSI' range from allocation,
> > which means that the first irq number which is handed out is 24, assumed
> > you have one IOAPIC with 24 pins.
> 
> I have tested this on the ACPI-based system where we have 55 lines of IOAPIC,
> no PIC, and some GPIO lines. Overall I see that nr_irqs is 512 and shifting
> by 55 freezes 10% of the space for nothing. Luckily we have SPARSE_IRQS
> selected for any X86, so, it wouldn't waste memory.
>
> I think we may do slightly better if we just limit the change to the certain
> cases.

For DT we can actually avoid that completely. See below.

For ACPI not unfortunately as the stupid GSI mapping is hard coded.

Thanks,

	tglx

8<-------------
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2438,7 +2438,13 @@ unsigned int arch_dynirq_lower_bound(uns
 	 * dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use
 	 * gsi_top if ioapic_dynirq_base hasn't been initialized yet.
 	 */
-	return ioapic_initialized ? ioapic_dynirq_base : gsi_top;
+	if (!ioapic_initialized)
+		return gsi_top;
+	/*
+	 * For DT enabled machines ioapic_dynirq_base is irrelevant and not
+	 * updated. So simply return @from if ioapic_dynirq_base == 0.
+	 */
+	return ioapic_dynirq_base ? : from;
 }
 
 #ifdef CONFIG_X86_32
Andy Shevchenko Aug. 21, 2019, 4:47 p.m. UTC | #6
On Wed, Aug 21, 2019 at 03:16:31PM +0200, Thomas Gleixner wrote:
> On Wed, 21 Aug 2019, Andy Shevchenko wrote:
> > On Wed, Aug 21, 2019 at 01:20:53PM +0200, Thomas Gleixner wrote:
> > > But x86 implements arch_dynirq_lower_bound() which overrides the core limit
> > > and because ioapic_dynirq_base is zero in the DT case it allows VIRQ 0 to
> > > be allocated which then causes of_irq*() to fail.
> > > 
> > > So your change prevents that by excluding the 'GSI' range from allocation,
> > > which means that the first irq number which is handed out is 24, assumed
> > > you have one IOAPIC with 24 pins.
> > 
> > I have tested this on the ACPI-based system where we have 55 lines of IOAPIC,
> > no PIC, and some GPIO lines. Overall I see that nr_irqs is 512 and shifting
> > by 55 freezes 10% of the space for nothing. Luckily we have SPARSE_IRQS
> > selected for any X86, so, it wouldn't waste memory.
> >
> > I think we may do slightly better if we just limit the change to the certain
> > cases.
> 
> For DT we can actually avoid that completely. See below.
> 
> For ACPI not unfortunately as the stupid GSI mapping is hard coded.

The below works better for my case, so, if you are going with that
Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com>

> 
> Thanks,
> 
> 	tglx
> 
> 8<-------------
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2438,7 +2438,13 @@ unsigned int arch_dynirq_lower_bound(uns
>  	 * dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use
>  	 * gsi_top if ioapic_dynirq_base hasn't been initialized yet.
>  	 */
> -	return ioapic_initialized ? ioapic_dynirq_base : gsi_top;
> +	if (!ioapic_initialized)
> +		return gsi_top;
> +	/*
> +	 * For DT enabled machines ioapic_dynirq_base is irrelevant and not
> +	 * updated. So simply return @from if ioapic_dynirq_base == 0.
> +	 */
> +	return ioapic_dynirq_base ? : from;
>  }
>  
>  #ifdef CONFIG_X86_32
Rahul Tanwar Aug. 22, 2019, 3:48 a.m. UTC | #7
Hi Thomas,

On 22/8/2019 12:47 AM, Andy Shevchenko wrote:
>> For DT we can actually avoid that completely. See below.
>>
>> For ACPI not unfortunately as the stupid GSI mapping is hard coded.
> The below works better for my case, so, if you are going with that
> Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
>
>
> 8<-------------
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2438,7 +2438,13 @@ unsigned int arch_dynirq_lower_bound(uns
>   	 * dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use
>   	 * gsi_top if ioapic_dynirq_base hasn't been initialized yet.
>   	 */
> -	return ioapic_initialized ? ioapic_dynirq_base : gsi_top;
> +	if (!ioapic_initialized)
> +		return gsi_top;
> +	/*
> +	 * For DT enabled machines ioapic_dynirq_base is irrelevant and not
> +	 * updated. So simply return @from if ioapic_dynirq_base == 0.
> +	 */
> +	return ioapic_dynirq_base ? : from;
>   }
>   
>   #ifdef CONFIG_X86_32


I have also tested above and it works fine. In fact, my first patch to

resolve it during internal review was exactly on similar lines. So if

you are going to add above then i will stop following up on this

topic further. Thanks.


Regards,

Rahul

Patch
diff mbox series

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c7bb6c69f21c..fe50cd91122b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2336,7 +2336,8 @@  static int mp_irqdomain_create(int ioapic)
 	ip->irqdomain->parent = parent;
 
 	if (cfg->type == IOAPIC_DOMAIN_LEGACY ||
-	    cfg->type == IOAPIC_DOMAIN_STRICT)
+	    cfg->type == IOAPIC_DOMAIN_STRICT ||
+	    cfg->type == IOAPIC_DOMAIN_DYNAMIC)
 		ioapic_dynirq_base = max(ioapic_dynirq_base,
 					 gsi_cfg->gsi_end + 1);