linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
@ 2022-03-07 19:06 Marc Zyngier
  2022-03-08  1:34 ` David Decotigny
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Marc Zyngier @ 2022-03-07 19:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, John Garry, David Decotigny

When booting with maxcpus=<small number>, interrupt controllers
such as the GICv3 ITS may not be able to satisfy the affinity of
some managed interrupts, as some of the HW resources are simply
not available.

In order to deal with this, do not try to activate such interrupt
if there is no online CPU capable of handling it. Instead, place
it in shutdown state. Once a capable CPU shows up, it will be
activated.

Reported-by: John Garry <john.garry@huawei.com>
Reported-by: David Decotigny <ddecotig@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 kernel/irq/msi.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 2bdfce5edafd..aa84ce84c2ec 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -818,6 +818,18 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
 		irqd_clr_can_reserve(irqd);
 		if (vflags & VIRQ_NOMASK_QUIRK)
 			irqd_set_msi_nomask_quirk(irqd);
+
+		/*
+		 * If the interrupt is managed but no CPU is available
+		 * to service it, shut it down until better times.
+		 */
+		if ((vflags & VIRQ_ACTIVATE) &&
+		    irqd_affinity_is_managed(irqd) &&
+		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
+					cpu_online_mask)) {
+			    irqd_set_managed_shutdown(irqd);
+			    return 0;
+		    }
 	}
 
 	if (!(vflags & VIRQ_ACTIVATE))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-03-07 19:06 [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Marc Zyngier
@ 2022-03-08  1:34 ` David Decotigny
  2022-03-09 10:20 ` John Garry
  2022-03-14 15:27 ` Thomas Gleixner
  2 siblings, 0 replies; 9+ messages in thread
From: David Decotigny @ 2022-03-08  1:34 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, Thomas Gleixner, John Garry

On Mon, Mar 7, 2022 at 11:06 AM Marc Zyngier <maz@kernel.org> wrote:
>
> When booting with maxcpus=<small number>, interrupt controllers
> such as the GICv3 ITS may not be able to satisfy the affinity of
> some managed interrupts, as some of the HW resources are simply
> not available.
>
> In order to deal with this, do not try to activate such interrupt
> if there is no online CPU capable of handling it. Instead, place
> it in shutdown state. Once a capable CPU shows up, it will be
> activated.
>
> Reported-by: John Garry <john.garry@huawei.com>
> Reported-by: David Decotigny <ddecotig@google.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  kernel/irq/msi.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 2bdfce5edafd..aa84ce84c2ec 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -818,6 +818,18 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
>                 irqd_clr_can_reserve(irqd);
>                 if (vflags & VIRQ_NOMASK_QUIRK)
>                         irqd_set_msi_nomask_quirk(irqd);
> +
> +               /*
> +                * If the interrupt is managed but no CPU is available
> +                * to service it, shut it down until better times.
> +                */
> +               if ((vflags & VIRQ_ACTIVATE) &&
> +                   irqd_affinity_is_managed(irqd) &&
> +                   !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> +                                       cpu_online_mask)) {
> +                           irqd_set_managed_shutdown(irqd);
> +                           return 0;
> +                   }
>         }
>
>         if (!(vflags & VIRQ_ACTIVATE))
> --
> 2.30.2
>

Thanks! I tried that patch, and it worked for us.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-03-07 19:06 [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Marc Zyngier
  2022-03-08  1:34 ` David Decotigny
@ 2022-03-09 10:20 ` John Garry
  2022-03-10  3:24   ` Xiongfeng Wang
  2022-03-14 15:27 ` Thomas Gleixner
  2 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2022-03-09 10:20 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel
  Cc: Thomas Gleixner, David Decotigny, wangxiongfeng2

+

On 07/03/2022 19:06, Marc Zyngier wrote:
> When booting with maxcpus=<small number>, interrupt controllers
> such as the GICv3 ITS may not be able to satisfy the affinity of
> some managed interrupts, as some of the HW resources are simply
> not available.
> 
> In order to deal with this, do not try to activate such interrupt
> if there is no online CPU capable of handling it. Instead, place
> it in shutdown state. Once a capable CPU shows up, it will be
> activated.
> 
> Reported-by: John Garry <john.garry@huawei.com>
> Reported-by: David Decotigny <ddecotig@google.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Tested-by: John Garry <john.garry@huawei.com>

> ---

JFYI, I could not recreate the same crash reported in the original 
thread for "nohz_full=5-127 isolcpus=nohz,domain,managed_irq,5-127 
maxcpus=1". Here's just showing what I set via cmdline:

estuary:/$ dmesg | grep -i hz
[    0.000000] Kernel command line: BOOT_IMAGE=/john/Image rdinit=/init 
console=ttyS0,115200 no_console_suspend nvme.use_threaded_interrupts=0 
iommu.strict=0 acpi=force earlycon=pl011,mmio32,0x602b0000 
nohz_full=5-127 isolcpus=nohz,domain,managed_irq,5-127 maxcpus=1
[    0.000000] NO_HZ: Full dynticks CPUs: 5-127.
[    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (phys).
[    0.000000] sched_clock: 57 bits at 100MHz, resolution 10ns, wraps 
every 4398046511100ns
[   15.314258] sbsa-gwdt sbsa-gwdt.0: Initialized with 10s timeout @ 
100000000 Hz, action=0

And for the kernel build:
$ more .config | grep NO_HZ
CONFIG_NO_HZ_COMMON=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
# CONFIG_NO_HZ is not set
$

Thanks,
John
>   kernel/irq/msi.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 2bdfce5edafd..aa84ce84c2ec 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -818,6 +818,18 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
>   		irqd_clr_can_reserve(irqd);
>   		if (vflags & VIRQ_NOMASK_QUIRK)
>   			irqd_set_msi_nomask_quirk(irqd);
> +
> +		/*
> +		 * If the interrupt is managed but no CPU is available
> +		 * to service it, shut it down until better times.
> +		 */
> +		if ((vflags & VIRQ_ACTIVATE) &&
> +		    irqd_affinity_is_managed(irqd) &&
> +		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> +					cpu_online_mask)) {
> +			    irqd_set_managed_shutdown(irqd);
> +			    return 0;
> +		    }
>   	}
>   
>   	if (!(vflags & VIRQ_ACTIVATE))


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-03-09 10:20 ` John Garry
@ 2022-03-10  3:24   ` Xiongfeng Wang
  2022-03-10  6:11     ` Xiongfeng Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Xiongfeng Wang @ 2022-03-10  3:24 UTC (permalink / raw)
  To: John Garry, Marc Zyngier, linux-kernel; +Cc: Thomas Gleixner, David Decotigny



On 2022/3/9 18:20, John Garry wrote:
> +
> 
> On 07/03/2022 19:06, Marc Zyngier wrote:
>> When booting with maxcpus=<small number>, interrupt controllers
>> such as the GICv3 ITS may not be able to satisfy the affinity of
>> some managed interrupts, as some of the HW resources are simply
>> not available.
>>
>> In order to deal with this, do not try to activate such interrupt
>> if there is no online CPU capable of handling it. Instead, place
>> it in shutdown state. Once a capable CPU shows up, it will be
>> activated.
>>
>> Reported-by: John Garry <john.garry@huawei.com>
>> Reported-by: David Decotigny <ddecotig@google.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
> Tested-by: John Garry <john.garry@huawei.com>
> 
>> ---
> 
> JFYI, I could not recreate the same crash reported in the original thread for
> "nohz_full=5-127 isolcpus=nohz,domain,managed_irq,5-127 maxcpus=1". Here's just
> showing what I set via cmdline:

I think it's the userspace online all the CPUs that cause the crash. Could you
please try to online all the CPUs after boot.

Thanks,
Xiongfeng

> 
> estuary:/$ dmesg | grep -i hz
> [    0.000000] Kernel command line: BOOT_IMAGE=/john/Image rdinit=/init
> console=ttyS0,115200 no_console_suspend nvme.use_threaded_interrupts=0
> iommu.strict=0 acpi=force earlycon=pl011,mmio32,0x602b0000 nohz_full=5-127
> isolcpus=nohz,domain,managed_irq,5-127 maxcpus=1
> [    0.000000] NO_HZ: Full dynticks CPUs: 5-127.
> [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (phys).
> [    0.000000] sched_clock: 57 bits at 100MHz, resolution 10ns, wraps every
> 4398046511100ns
> [   15.314258] sbsa-gwdt sbsa-gwdt.0: Initialized with 10s timeout @ 100000000
> Hz, action=0
> 
> And for the kernel build:
> $ more .config | grep NO_HZ
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_NO_HZ_IDLE is not set
> CONFIG_NO_HZ_FULL=y
> # CONFIG_NO_HZ is not set
> $
> 
> Thanks,
> John
>>   kernel/irq/msi.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>> index 2bdfce5edafd..aa84ce84c2ec 100644
>> --- a/kernel/irq/msi.c
>> +++ b/kernel/irq/msi.c
>> @@ -818,6 +818,18 @@ static int msi_init_virq(struct irq_domain *domain, int
>> virq, unsigned int vflag
>>           irqd_clr_can_reserve(irqd);
>>           if (vflags & VIRQ_NOMASK_QUIRK)
>>               irqd_set_msi_nomask_quirk(irqd);
>> +
>> +        /*
>> +         * If the interrupt is managed but no CPU is available
>> +         * to service it, shut it down until better times.
>> +         */
>> +        if ((vflags & VIRQ_ACTIVATE) &&
>> +            irqd_affinity_is_managed(irqd) &&
>> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>> +                    cpu_online_mask)) {
>> +                irqd_set_managed_shutdown(irqd);
>> +                return 0;
>> +            }
>>       }
>>         if (!(vflags & VIRQ_ACTIVATE))
> 
> .

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-03-10  3:24   ` Xiongfeng Wang
@ 2022-03-10  6:11     ` Xiongfeng Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Xiongfeng Wang @ 2022-03-10  6:11 UTC (permalink / raw)
  To: John Garry, Marc Zyngier, linux-kernel; +Cc: Thomas Gleixner, David Decotigny



On 2022/3/10 11:24, Xiongfeng Wang wrote:
> 
> 
> On 2022/3/9 18:20, John Garry wrote:
>> +
>>
>> On 07/03/2022 19:06, Marc Zyngier wrote:
>>> When booting with maxcpus=<small number>, interrupt controllers
>>> such as the GICv3 ITS may not be able to satisfy the affinity of
>>> some managed interrupts, as some of the HW resources are simply
>>> not available.
>>>
>>> In order to deal with this, do not try to activate such interrupt
>>> if there is no online CPU capable of handling it. Instead, place
>>> it in shutdown state. Once a capable CPU shows up, it will be
>>> activated.
>>>
>>> Reported-by: John Garry <john.garry@huawei.com>
>>> Reported-by: David Decotigny <ddecotig@google.com>
>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>
>> Tested-by: John Garry <john.garry@huawei.com>
>>
>>> ---
>>
>> JFYI, I could not recreate the same crash reported in the original thread for
>> "nohz_full=5-127 isolcpus=nohz,domain,managed_irq,5-127 maxcpus=1". Here's just
>> showing what I set via cmdline:
> 
> I think it's the userspace online all the CPUs that cause the crash. Could you
> please try to online all the CPUs after boot.

Sorry, please ignore what I said above. It's wrong.

This patch has no issues. When I test the managed irq, I apply this patch and
the following modification. It is the following modification and the kernel
parameters that cause the crash. This patch has no problems. Sorry for the
unclear discription before.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index eb0882d15366..0cea46bdaf99 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d,

 		cpu = cpumask_pick_least_loaded(d, tmpmask);
 	} else {
-		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+		cpumask_copy(tmpmask, aff_mask);

 		/* If we cannot cross sockets, limit the search to that node */
 		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&

Thanks,
Xiongfeng

> 
> Thanks,
> Xiongfeng
> 
>>
>> estuary:/$ dmesg | grep -i hz
>> [    0.000000] Kernel command line: BOOT_IMAGE=/john/Image rdinit=/init
>> console=ttyS0,115200 no_console_suspend nvme.use_threaded_interrupts=0
>> iommu.strict=0 acpi=force earlycon=pl011,mmio32,0x602b0000 nohz_full=5-127
>> isolcpus=nohz,domain,managed_irq,5-127 maxcpus=1
>> [    0.000000] NO_HZ: Full dynticks CPUs: 5-127.
>> [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (phys).
>> [    0.000000] sched_clock: 57 bits at 100MHz, resolution 10ns, wraps every
>> 4398046511100ns
>> [   15.314258] sbsa-gwdt sbsa-gwdt.0: Initialized with 10s timeout @ 100000000
>> Hz, action=0
>>
>> And for the kernel build:
>> $ more .config | grep NO_HZ
>> CONFIG_NO_HZ_COMMON=y
>> # CONFIG_NO_HZ_IDLE is not set
>> CONFIG_NO_HZ_FULL=y
>> # CONFIG_NO_HZ is not set
>> $
>>
>> Thanks,
>> John
>>>   kernel/irq/msi.c | 12 ++++++++++++
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>>> index 2bdfce5edafd..aa84ce84c2ec 100644
>>> --- a/kernel/irq/msi.c
>>> +++ b/kernel/irq/msi.c
>>> @@ -818,6 +818,18 @@ static int msi_init_virq(struct irq_domain *domain, int
>>> virq, unsigned int vflag
>>>           irqd_clr_can_reserve(irqd);
>>>           if (vflags & VIRQ_NOMASK_QUIRK)
>>>               irqd_set_msi_nomask_quirk(irqd);
>>> +
>>> +        /*
>>> +         * If the interrupt is managed but no CPU is available
>>> +         * to service it, shut it down until better times.
>>> +         */
>>> +        if ((vflags & VIRQ_ACTIVATE) &&
>>> +            irqd_affinity_is_managed(irqd) &&
>>> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>>> +                    cpu_online_mask)) {
>>> +                irqd_set_managed_shutdown(irqd);
>>> +                return 0;
>>> +            }
>>>       }
>>>         if (!(vflags & VIRQ_ACTIVATE))
>>
>> .
> .
> 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-03-07 19:06 [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Marc Zyngier
  2022-03-08  1:34 ` David Decotigny
  2022-03-09 10:20 ` John Garry
@ 2022-03-14 15:27 ` Thomas Gleixner
  2022-03-14 16:00   ` Marc Zyngier
  2 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2022-03-14 15:27 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel; +Cc: John Garry, David Decotigny

On Mon, Mar 07 2022 at 19:06, Marc Zyngier wrote:
> When booting with maxcpus=<small number>, interrupt controllers
> such as the GICv3 ITS may not be able to satisfy the affinity of
> some managed interrupts, as some of the HW resources are simply
> not available.

This is also true if you have offlined lots of CPUs, right?

> In order to deal with this, do not try to activate such interrupt
> if there is no online CPU capable of handling it. Instead, place
> it in shutdown state. Once a capable CPU shows up, it will be
> activated.
>
> Reported-by: John Garry <john.garry@huawei.com>
> Reported-by: David Decotigny <ddecotig@google.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  kernel/irq/msi.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 2bdfce5edafd..aa84ce84c2ec 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -818,6 +818,18 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
>  		irqd_clr_can_reserve(irqd);
>  		if (vflags & VIRQ_NOMASK_QUIRK)
>  			irqd_set_msi_nomask_quirk(irqd);
> +
> +		/*
> +		 * If the interrupt is managed but no CPU is available
> +		 * to service it, shut it down until better times.
> +		 */
> +		if ((vflags & VIRQ_ACTIVATE) &&
> +		    irqd_affinity_is_managed(irqd) &&
> +		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> +					cpu_online_mask)) {
> +			    irqd_set_managed_shutdown(irqd);

Hrm. Why is this in the !CAN_RESERVE path and not before the actual
activation call?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-03-14 15:27 ` Thomas Gleixner
@ 2022-03-14 16:00   ` Marc Zyngier
  2022-03-14 19:03     ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Zyngier @ 2022-03-14 16:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, John Garry, David Decotigny

On Mon, 14 Mar 2022 15:27:10 +0000,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Mon, Mar 07 2022 at 19:06, Marc Zyngier wrote:
> > When booting with maxcpus=<small number>, interrupt controllers
> > such as the GICv3 ITS may not be able to satisfy the affinity of
> > some managed interrupts, as some of the HW resources are simply
> > not available.
> 
> This is also true if you have offlined lots of CPUs, right?

Not quite. If you offline the CPUs, the interrupts will be placed in
the shutdown state as expected, having initially transitioned via an
activation state with an online CPU. The issue here is with the
initial activation of the interrupt, which currently happens even if
no matching CPU is present.

> 
> > In order to deal with this, do not try to activate such interrupt
> > if there is no online CPU capable of handling it. Instead, place
> > it in shutdown state. Once a capable CPU shows up, it will be
> > activated.
> >
> > Reported-by: John Garry <john.garry@huawei.com>
> > Reported-by: David Decotigny <ddecotig@google.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  kernel/irq/msi.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> > index 2bdfce5edafd..aa84ce84c2ec 100644
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -818,6 +818,18 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
> >  		irqd_clr_can_reserve(irqd);
> >  		if (vflags & VIRQ_NOMASK_QUIRK)
> >  			irqd_set_msi_nomask_quirk(irqd);
> > +
> > +		/*
> > +		 * If the interrupt is managed but no CPU is available
> > +		 * to service it, shut it down until better times.
> > +		 */
> > +		if ((vflags & VIRQ_ACTIVATE) &&
> > +		    irqd_affinity_is_managed(irqd) &&
> > +		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> > +					cpu_online_mask)) {
> > +			    irqd_set_managed_shutdown(irqd);
> 
> Hrm. Why is this in the !CAN_RESERVE path and not before the actual
> activation call?

VIRQ_CAN_RESERVE can only happen as a consequence of
GENERIC_IRQ_RESERVATION_MODE, which only exists on x86. Given that x86
is already super careful not to activate an interrupt that is not
immediately required, I though we could avoid putting this check on
that path.

But if I got the above wrong (which is, let's face it, extremely
likely), I'm happy to kick it down the road next to the activation
call.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-03-14 16:00   ` Marc Zyngier
@ 2022-03-14 19:03     ` Thomas Gleixner
  2022-03-15  9:46       ` Marc Zyngier
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2022-03-14 19:03 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, John Garry, David Decotigny

On Mon, Mar 14 2022 at 16:00, Marc Zyngier wrote:
> On Mon, 14 Mar 2022 15:27:10 +0000,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>> On Mon, Mar 07 2022 at 19:06, Marc Zyngier wrote:
>> > When booting with maxcpus=<small number>, interrupt controllers
>> > such as the GICv3 ITS may not be able to satisfy the affinity of
>> > some managed interrupts, as some of the HW resources are simply
>> > not available.
>> 
>> This is also true if you have offlined lots of CPUs, right?
>
> Not quite. If you offline the CPUs, the interrupts will be placed in
> the shutdown state as expected, having initially transitioned via an
> activation state with an online CPU. The issue here is with the
> initial activation of the interrupt, which currently happens even if
> no matching CPU is present.

Yes. But if you load the driver _after_ offlining lots of CPUs first
then the same thing should happen, right?

>> > +		/*
>> > +		 * If the interrupt is managed but no CPU is available
>> > +		 * to service it, shut it down until better times.
>> > +		 */
>> > +		if ((vflags & VIRQ_ACTIVATE) &&
>> > +		    irqd_affinity_is_managed(irqd) &&
>> > +		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>> > +					cpu_online_mask)) {
>> > +			    irqd_set_managed_shutdown(irqd);
>> 
>> Hrm. Why is this in the !CAN_RESERVE path and not before the actual
>> activation call?
>
> VIRQ_CAN_RESERVE can only happen as a consequence of
> GENERIC_IRQ_RESERVATION_MODE, which only exists on x86. Given that x86
> is already super careful not to activate an interrupt that is not
> immediately required, I though we could avoid putting this check on
> that path.
>
> But if I got the above wrong (which is, let's face it, extremely
> likely), I'm happy to kick it down the road next to the activation
> call.

I just rechecked. Yes, we could push it there, but actually on x86 the
reservation mode activation sets the entry to a spurious catch all on an
online CPU, which is intentional.

So yes, we can keep it where it is now, but that needs a comment.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-03-14 19:03     ` Thomas Gleixner
@ 2022-03-15  9:46       ` Marc Zyngier
  0 siblings, 0 replies; 9+ messages in thread
From: Marc Zyngier @ 2022-03-15  9:46 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, John Garry, David Decotigny

On Mon, 14 Mar 2022 19:03:49 +0000,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Mon, Mar 14 2022 at 16:00, Marc Zyngier wrote:
> > On Mon, 14 Mar 2022 15:27:10 +0000,
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >> 
> >> On Mon, Mar 07 2022 at 19:06, Marc Zyngier wrote:
> >> > When booting with maxcpus=<small number>, interrupt controllers
> >> > such as the GICv3 ITS may not be able to satisfy the affinity of
> >> > some managed interrupts, as some of the HW resources are simply
> >> > not available.
> >> 
> >> This is also true if you have offlined lots of CPUs, right?
> >
> > Not quite. If you offline the CPUs, the interrupts will be placed in
> > the shutdown state as expected, having initially transitioned via an
> > activation state with an online CPU. The issue here is with the
> > initial activation of the interrupt, which currently happens even if
> > no matching CPU is present.
> 
> Yes. But if you load the driver _after_ offlining lots of CPUs first
> then the same thing should happen, right?

Ah! yes, that's the exact same problem (modular drivers? that's an
idea that will never catch on...).

> 
> >> > +		/*
> >> > +		 * If the interrupt is managed but no CPU is available
> >> > +		 * to service it, shut it down until better times.
> >> > +		 */
> >> > +		if ((vflags & VIRQ_ACTIVATE) &&
> >> > +		    irqd_affinity_is_managed(irqd) &&
> >> > +		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> >> > +					cpu_online_mask)) {
> >> > +			    irqd_set_managed_shutdown(irqd);
> >> 
> >> Hrm. Why is this in the !CAN_RESERVE path and not before the actual
> >> activation call?
> >
> > VIRQ_CAN_RESERVE can only happen as a consequence of
> > GENERIC_IRQ_RESERVATION_MODE, which only exists on x86. Given that x86
> > is already super careful not to activate an interrupt that is not
> > immediately required, I though we could avoid putting this check on
> > that path.
> >
> > But if I got the above wrong (which is, let's face it, extremely
> > likely), I'm happy to kick it down the road next to the activation
> > call.
> 
> I just rechecked. Yes, we could push it there, but actually on x86 the
> reservation mode activation sets the entry to a spurious catch all on an
> online CPU, which is intentional.
> 
> So yes, we can keep it where it is now, but that needs a comment.

Yup, I'll add that.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-15  9:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 19:06 [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Marc Zyngier
2022-03-08  1:34 ` David Decotigny
2022-03-09 10:20 ` John Garry
2022-03-10  3:24   ` Xiongfeng Wang
2022-03-10  6:11     ` Xiongfeng Wang
2022-03-14 15:27 ` Thomas Gleixner
2022-03-14 16:00   ` Marc Zyngier
2022-03-14 19:03     ` Thomas Gleixner
2022-03-15  9:46       ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).