linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCI MSI issue for maxcpus=1
@ 2022-01-05 11:23 John Garry
  2022-01-06 15:49 ` Marc Zyngier
  0 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2022-01-05 11:23 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner
  Cc: chenxiang, Shameer Kolothum, linux-kernel, liuqi (BA)

Hi Marc,

Just a heads up, I noticed that commit 4c457e8cb75e ("genirq/msi: 
Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set") is 
causing an issue on our arm64 D06 board where the SAS driver probe fails 
for maxcpus=1.

This seems different to issue [0].

So it's the driver call to pci_alloc_irq_vectors_affinity() which errors 
[1]:

[    9.619070] hisi_sas_v3_hw: probe of 0000:74:02.0 failed with error -2

Some details:
- device supports 32 MSI
- min and max msi for that function is 17 and 32, respect.
- affd pre and post are 16 and 0, respect.

I haven't checked to see what the issue is yet and I think that the 
pci_alloc_irq_vectors_affinity() usage is ok...

[0] 
https://lore.kernel.org/lkml/ea730f9b-c635-317d-c70d-4057590b1d1a@huawei.com/
[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c?h=v5.11#n2388

Cheers,
John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-01-05 11:23 PCI MSI issue for maxcpus=1 John Garry
@ 2022-01-06 15:49 ` Marc Zyngier
  2022-01-07 11:24   ` John Garry
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2022-01-06 15:49 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel, liuqi (BA)

Hi John,

On Wed, 05 Jan 2022 11:23:47 +0000,
John Garry <john.garry@huawei.com> wrote:
> 
> Hi Marc,
> 
> Just a heads up, I noticed that commit 4c457e8cb75e ("genirq/msi:
> Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set") is
> causing an issue on our arm64 D06 board where the SAS driver probe
> fails for maxcpus=1.
> 
> This seems different to issue [0].
> 
> So it's the driver call to pci_alloc_irq_vectors_affinity() which
> errors [1]:
> 
> [    9.619070] hisi_sas_v3_hw: probe of 0000:74:02.0 failed with error -2

Can you log what error is returned from pci_alloc_irq_vectors_affinity()?

> Some details:
> - device supports 32 MSI
> - min and max msi for that function is 17 and 32, respect.

This 17 is a bit odd, owing to the fact that MultiMSI can only deal
with powers of 2. You will always allocate 32 in this case. Not sure
why that'd cause an issue though. Unless...

> - affd pre and post are 16 and 0, respect.
> 
> I haven't checked to see what the issue is yet and I think that the
> pci_alloc_irq_vectors_affinity() usage is ok...

... we really end-up with desc->nvec_used == 32 and try to activate
past vector 17 (which is likely to fail). Could you please check this?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-01-06 15:49 ` Marc Zyngier
@ 2022-01-07 11:24   ` John Garry
  2022-01-16 12:07     ` Marc Zyngier
  0 siblings, 1 reply; 16+ messages in thread
From: John Garry @ 2022-01-07 11:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel, liuqi (BA)

Hi Marc,

>> So it's the driver call to pci_alloc_irq_vectors_affinity() which
>> errors [1]:
>>
>> [    9.619070] hisi_sas_v3_hw: probe of 0000:74:02.0 failed with error -2
> Can you log what error is returned from pci_alloc_irq_vectors_affinity()?

-EINVAL

> 
>> Some details:
>> - device supports 32 MSI
>> - min and max msi for that function is 17 and 32, respect.
> This 17 is a bit odd, owing to the fact that MultiMSI can only deal
> with powers of 2. You will always allocate 32 in this case. Not sure
> why that'd cause an issue though. Unless...

Even though 17 is the min, we still try for nvec=32 in 
msi_capability_init() as possible CPUs is 96.

> 
>> - affd pre and post are 16 and 0, respect.
>>
>> I haven't checked to see what the issue is yet and I think that the
>> pci_alloc_irq_vectors_affinity() usage is ok...
> ... we really end-up with desc->nvec_used == 32 and try to activate
> past vector 17 (which is likely to fail). Could you please check this?

Yeah, that looks to fail. Reason being that in the GIC ITS driver when 
we try to activate the irq for this managed interrupt all cpus in the 
affinity mask are offline. Calling its_irq_domain_activate() -> 
its_select_cpu() it gives cpu=nr_cpu_ids. The affinity mask for that 
interrupt is 24-29.

Thanks,
John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-01-07 11:24   ` John Garry
@ 2022-01-16 12:07     ` Marc Zyngier
  2022-01-17  9:14       ` Marc Zyngier
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2022-01-16 12:07 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel, liuqi (BA)

On Fri, 07 Jan 2022 11:24:38 +0000,
John Garry <john.garry@huawei.com> wrote:
> 
> Hi Marc,
> 
> >> So it's the driver call to pci_alloc_irq_vectors_affinity() which
> >> errors [1]:
> >> 
> >> [    9.619070] hisi_sas_v3_hw: probe of 0000:74:02.0 failed with error -2
> > Can you log what error is returned from pci_alloc_irq_vectors_affinity()?
> 
> -EINVAL
> 
> > 
> >> Some details:
> >> - device supports 32 MSI
> >> - min and max msi for that function is 17 and 32, respect.
> > This 17 is a bit odd, owing to the fact that MultiMSI can only deal
> > with powers of 2. You will always allocate 32 in this case. Not sure
> > why that'd cause an issue though. Unless...
> 
> Even though 17 is the min, we still try for nvec=32 in
> msi_capability_init() as possible CPUs is 96.
> 
> > 
> >> - affd pre and post are 16 and 0, respect.
> >> 
> >> I haven't checked to see what the issue is yet and I think that the
> >> pci_alloc_irq_vectors_affinity() usage is ok...
> > ... we really end-up with desc->nvec_used == 32 and try to activate
> > past vector 17 (which is likely to fail). Could you please check this?
> 
> Yeah, that looks to fail. Reason being that in the GIC ITS driver when
> we try to activate the irq for this managed interrupt all cpus in the
> affinity mask are offline. Calling its_irq_domain_activate() ->
> its_select_cpu() it gives cpu=nr_cpu_ids. The affinity mask for that
> interrupt is 24-29.

I guess that for managed interrupts, it shouldn't matter, as these
interrupts should only be used when the relevant CPUs come online.

Would something like below help? Totally untested, as I don't have a
Multi-MSI capable device that I can plug in a GICv3 system (maybe I
should teach that to a virtio device...).

Thanks,

	M.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index d25b7a864bbb..850407294adb 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1632,6 +1632,10 @@ static int its_select_cpu(struct irq_data *d,
 			cpumask_and(tmpmask, tmpmask, cpumask_of_node(node));
 
 		cpu = cpumask_pick_least_loaded(d, tmpmask);
+
+		/* If all the possible CPUs are offline, just pick a victim. */
+		if (cpu == nr_cpu_ids)
+			cpu = cpumask_pick_least_loaded(d, irq_data_get_affinity_mask(d));
 	}
 out:
 	free_cpumask_var(tmpmask);

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-01-16 12:07     ` Marc Zyngier
@ 2022-01-17  9:14       ` Marc Zyngier
  2022-01-17 11:59         ` John Garry
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2022-01-17  9:14 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel, liuqi (BA)

On Sun, 16 Jan 2022 12:07:59 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> On Fri, 07 Jan 2022 11:24:38 +0000,
> John Garry <john.garry@huawei.com> wrote:
> > 
> > Hi Marc,
> > 
> > >> So it's the driver call to pci_alloc_irq_vectors_affinity() which
> > >> errors [1]:
> > >> 
> > >> [    9.619070] hisi_sas_v3_hw: probe of 0000:74:02.0 failed with error -2
> > > Can you log what error is returned from pci_alloc_irq_vectors_affinity()?
> > 
> > -EINVAL
> > 
> > > 
> > >> Some details:
> > >> - device supports 32 MSI
> > >> - min and max msi for that function is 17 and 32, respect.
> > > This 17 is a bit odd, owing to the fact that MultiMSI can only deal
> > > with powers of 2. You will always allocate 32 in this case. Not sure
> > > why that'd cause an issue though. Unless...
> > 
> > Even though 17 is the min, we still try for nvec=32 in
> > msi_capability_init() as possible CPUs is 96.
> > 
> > > 
> > >> - affd pre and post are 16 and 0, respect.
> > >> 
> > >> I haven't checked to see what the issue is yet and I think that the
> > >> pci_alloc_irq_vectors_affinity() usage is ok...
> > > ... we really end-up with desc->nvec_used == 32 and try to activate
> > > past vector 17 (which is likely to fail). Could you please check this?
> > 
> > Yeah, that looks to fail. Reason being that in the GIC ITS driver when
> > we try to activate the irq for this managed interrupt all cpus in the
> > affinity mask are offline. Calling its_irq_domain_activate() ->
> > its_select_cpu() it gives cpu=nr_cpu_ids. The affinity mask for that
> > interrupt is 24-29.
> 
> I guess that for managed interrupts, it shouldn't matter, as these
> interrupts should only be used when the relevant CPUs come online.
> 
> Would something like below help? Totally untested, as I don't have a
> Multi-MSI capable device that I can plug in a GICv3 system (maybe I
> should teach that to a virtio device...).

Actually, if the CPU online status doesn't matter for managed affinity
interrupts, then the correct fix is this:

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index d25b7a864bbb..af4e72a6be63 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1624,7 +1624,7 @@ static int its_select_cpu(struct irq_data *d,
 
 		cpu = cpumask_pick_least_loaded(d, tmpmask);
 	} else {
-		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+		cpumask_copy(tmpmask, irq_data_get_affinity_mask(d));
 
 		/* If we cannot cross sockets, limit the search to that node */
 		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-01-17  9:14       ` Marc Zyngier
@ 2022-01-17 11:59         ` John Garry
  2022-01-24 11:22           ` Marc Zyngier
  2022-03-04 12:53           ` John Garry
  0 siblings, 2 replies; 16+ messages in thread
From: John Garry @ 2022-01-17 11:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel, liuqi (BA)

On 17/01/2022 09:14, Marc Zyngier wrote:
>> I guess that for managed interrupts, it shouldn't matter, as these
>> interrupts should only be used when the relevant CPUs come online.
>>
>> Would something like below help? Totally untested, as I don't have a
>> Multi-MSI capable device that I can plug in a GICv3 system (maybe I
>> should teach that to a virtio device...).

JFYI, NVMe PCI uses the same API (pci_alloc_irq_vectors_affinity()), but 
does not suffer from this issue - for maxcpus=1 the driver looks to only 
want 1x vector

> Actually, if the CPU online status doesn't matter for managed affinity
> interrupts, then the correct fix is this:
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index d25b7a864bbb..af4e72a6be63 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1624,7 +1624,7 @@ static int its_select_cpu(struct irq_data *d,
>   
>   		cpu = cpumask_pick_least_loaded(d, tmpmask);
>   	} else {
> -		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
> +		cpumask_copy(tmpmask, irq_data_get_affinity_mask(d));
>   
>   		/* If we cannot cross sockets, limit the search to that node */
>   		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&

That produces a warn:

[ 7.833025] ------------[ cut here ]------------
[ 7.837634] WARNING: CPU: 0 PID: 44 at 
drivers/irqchip/irq-gic-v3-its.c:298 valid_col+0x14/0x24
[ 7.846324] Modules linked in:
[ 7.849368] CPU: 0 PID: 44 Comm: kworker/0:3 Not tainted 5.16.0-dirty #119
[ 7.856230] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 
- V1.16.01 03/15/2019
[ 7.864740] Workqueue: events work_for_cpu_fn
[ 7.869088] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 7.876037] pc : valid_col+0x14/0x24
[ 7.879600] lr : its_build_mapti_cmd+0x84/0x90

...

[ 7.961007]  valid_col+0x14/0x24
[ 7.964223]  its_send_single_command+0x4c/0x150
[ 7.968741]  its_irq_domain_activate+0xc8/0x104
[ 7.973259]  __irq_domain_activate_irq+0x5c/0xac
[ 7.977865]  __irq_domain_activate_irq+0x38/0xac
[ 7.982471]  irq_domain_activate_irq+0x3c/0x64
[ 7.986902]  __msi_domain_alloc_irqs+0x1a8/0x2f4
[ 7.991507]  msi_domain_alloc_irqs+0x20/0x2c
[ 7.995764]  __pci_enable_msi_range+0x2ec/0x590
[ 8.000284]  pci_alloc_irq_vectors_affinity+0xe0/0x140
[ 8.005410]  hisi_sas_v3_probe+0x300/0xbe0
[ 8.009494]  local_pci_probe+0x44/0xb0
[ 8.013232]  work_for_cpu_fn+0x20/0x34
[ 8.016969]  process_one_work+0x1d0/0x354
[ 8.020966]  worker_thread+0x2c0/0x470
[ 8.024703]  kthread+0x17c/0x190
[ 8.027920]  ret_from_fork+0x10/0x20
[ 8.031485] ---[ end trace bb67cfc7eded7361 ]---

Apart from this, I assume that if another cpu comes online later in the 
affinity mask I would figure that we want to target the irq to that cpu 
(which I think we would not do here).

Cheers,
John



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-01-17 11:59         ` John Garry
@ 2022-01-24 11:22           ` Marc Zyngier
  2022-03-04 12:53           ` John Garry
  1 sibling, 0 replies; 16+ messages in thread
From: Marc Zyngier @ 2022-01-24 11:22 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel, liuqi (BA)

On Mon, 17 Jan 2022 11:59:58 +0000,
John Garry <john.garry@huawei.com> wrote:
> 
> On 17/01/2022 09:14, Marc Zyngier wrote:
> >> I guess that for managed interrupts, it shouldn't matter, as these
> >> interrupts should only be used when the relevant CPUs come online.
> >> 
> >> Would something like below help? Totally untested, as I don't have a
> >> Multi-MSI capable device that I can plug in a GICv3 system (maybe I
> >> should teach that to a virtio device...).
> 
> JFYI, NVMe PCI uses the same API (pci_alloc_irq_vectors_affinity()),
> but does not suffer from this issue - for maxcpus=1 the driver looks
> to only want 1x vector
> 
> > Actually, if the CPU online status doesn't matter for managed affinity
> > interrupts, then the correct fix is this:
> > 
> > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> > index d25b7a864bbb..af4e72a6be63 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -1624,7 +1624,7 @@ static int its_select_cpu(struct irq_data *d,
> >     		cpu = cpumask_pick_least_loaded(d, tmpmask);
> >   	} else {
> > -		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
> > +		cpumask_copy(tmpmask, irq_data_get_affinity_mask(d));
> >     		/* If we cannot cross sockets, limit the search to
> > that node */
> >   		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&
> 
> That produces a warn:
> 
> [ 7.833025] ------------[ cut here ]------------
> [ 7.837634] WARNING: CPU: 0 PID: 44 at
> drivers/irqchip/irq-gic-v3-its.c:298 valid_col+0x14/0x24
> [ 7.846324] Modules linked in:
> [ 7.849368] CPU: 0 PID: 44 Comm: kworker/0:3 Not tainted 5.16.0-dirty #119
> [ 7.856230] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
> RC0 - V1.16.01 03/15/2019
> [ 7.864740] Workqueue: events work_for_cpu_fn
> [ 7.869088] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 7.876037] pc : valid_col+0x14/0x24
> [ 7.879600] lr : its_build_mapti_cmd+0x84/0x90

Ah, of course. the CPU hasn't booted yet, so its collection isn't
mapped. I was hoping that the core code would keep the interrupt in
shutdown state, but it doesn't seem to be the case...

> Apart from this, I assume that if another cpu comes online later in
> the affinity mask I would figure that we want to target the irq to
> that cpu (which I think we would not do here).

That's probably also something that should come from core code, as
we're not really in a position to decide this in the ITS driver.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-01-17 11:59         ` John Garry
  2022-01-24 11:22           ` Marc Zyngier
@ 2022-03-04 12:53           ` John Garry
  2022-03-05 15:40             ` Marc Zyngier
  1 sibling, 1 reply; 16+ messages in thread
From: John Garry @ 2022-03-04 12:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel,
	liuqi (BA),
	wangxiongfeng2

> ...

> 
> [ 7.961007]  valid_col+0x14/0x24
> [ 7.964223]  its_send_single_command+0x4c/0x150
> [ 7.968741]  its_irq_domain_activate+0xc8/0x104
> [ 7.973259]  __irq_domain_activate_irq+0x5c/0xac
> [ 7.977865]  __irq_domain_activate_irq+0x38/0xac
> [ 7.982471]  irq_domain_activate_irq+0x3c/0x64
> [ 7.986902]  __msi_domain_alloc_irqs+0x1a8/0x2f4
> [ 7.991507]  msi_domain_alloc_irqs+0x20/0x2c
> [ 7.995764]  __pci_enable_msi_range+0x2ec/0x590
> [ 8.000284]  pci_alloc_irq_vectors_affinity+0xe0/0x140
> [ 8.005410]  hisi_sas_v3_probe+0x300/0xbe0
> [ 8.009494]  local_pci_probe+0x44/0xb0
> [ 8.013232]  work_for_cpu_fn+0x20/0x34
> [ 8.016969]  process_one_work+0x1d0/0x354
> [ 8.020966]  worker_thread+0x2c0/0x470
> [ 8.024703]  kthread+0x17c/0x190
> [ 8.027920]  ret_from_fork+0x10/0x20
> [ 8.031485] ---[ end trace bb67cfc7eded7361 ]---
> 

...

> Ah, of course. the CPU hasn't booted yet, so its collection isn't
> mapped. I was hoping that the core code would keep the interrupt in
> shutdown state, but it doesn't seem to be the case...
> 
>  > Apart from this, I assume that if another cpu comes online later in
>  > the affinity mask I would figure that we want to target the irq to
>  > that cpu (which I think we would not do here).
> 
> That's probably also something that should come from core code, as
> we're not really in a position to decide this in the ITS driver.
> .


Hi Marc,

Have you had a chance to consider this issue further?

So I think that x86 avoids this issue as it uses matrix.c, which handles 
CPUs being offline when selecting target CPUs for managed interrupts.

So is your idea still that core code should keep the interrupt in 
shutdown state (for no CPUs online in affinity mask)?

Thanks,
John




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-03-04 12:53           ` John Garry
@ 2022-03-05 15:40             ` Marc Zyngier
  2022-03-07 13:48               ` John Garry
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2022-03-05 15:40 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel,
	liuqi (BA),
	wangxiongfeng2, David Decotigny

[+ David, who was chasing something similar]

Hi John,

On Fri, 04 Mar 2022 12:53:31 +0000,
John Garry <john.garry@huawei.com> wrote:
> 
> > ...
> 
> > 
> > [ 7.961007]  valid_col+0x14/0x24
> > [ 7.964223]  its_send_single_command+0x4c/0x150
> > [ 7.968741]  its_irq_domain_activate+0xc8/0x104
> > [ 7.973259]  __irq_domain_activate_irq+0x5c/0xac
> > [ 7.977865]  __irq_domain_activate_irq+0x38/0xac
> > [ 7.982471]  irq_domain_activate_irq+0x3c/0x64
> > [ 7.986902]  __msi_domain_alloc_irqs+0x1a8/0x2f4
> > [ 7.991507]  msi_domain_alloc_irqs+0x20/0x2c
> > [ 7.995764]  __pci_enable_msi_range+0x2ec/0x590
> > [ 8.000284]  pci_alloc_irq_vectors_affinity+0xe0/0x140
> > [ 8.005410]  hisi_sas_v3_probe+0x300/0xbe0
> > [ 8.009494]  local_pci_probe+0x44/0xb0
> > [ 8.013232]  work_for_cpu_fn+0x20/0x34
> > [ 8.016969]  process_one_work+0x1d0/0x354
> > [ 8.020966]  worker_thread+0x2c0/0x470
> > [ 8.024703]  kthread+0x17c/0x190
> > [ 8.027920]  ret_from_fork+0x10/0x20
> > [ 8.031485] ---[ end trace bb67cfc7eded7361 ]---
> > 
> 
> ...
> 
> > Ah, of course. the CPU hasn't booted yet, so its collection isn't
> > mapped. I was hoping that the core code would keep the interrupt in
> > shutdown state, but it doesn't seem to be the case...
> > 
> >  > Apart from this, I assume that if another cpu comes online later in
> >  > the affinity mask I would figure that we want to target the irq to
> >  > that cpu (which I think we would not do here).
> > 
> > That's probably also something that should come from core code, as
> > we're not really in a position to decide this in the ITS driver.
> > .
> 
> 
> Hi Marc,
> 
> Have you had a chance to consider this issue further?
> 
> So I think that x86 avoids this issue as it uses matrix.c, which
> handles CPUs being offline when selecting target CPUs for managed
> interrupts.
> 
> So is your idea still that core code should keep the interrupt in
> shutdown state (for no CPUs online in affinity mask)?

Yup. I came up with this:

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 2bdfce5edafd..97e9eb9aecc6 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
 	if (!(vflags & VIRQ_ACTIVATE))
 		return 0;
 
+	if (!(vflags & VIRQ_CAN_RESERVE)) {
+		/*
+		 * If the interrupt is managed but no CPU is available
+		 * to service it, shut it down until better times.
+		 */
+		if (irqd_affinity_is_managed(irqd) &&
+		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
+					cpu_online_mask)) {
+			irqd_set_managed_shutdown(irqd);
+			return 0;
+		}
+	}
+
 	ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
 	if (ret)
 		return ret;

With this in place, I get the following results (VM booted with 4
vcpus and maxcpus=1, the virtio device is using managed interrupts):

root@debian:~# cat /proc/interrupts 
           CPU0       
 10:       2298     GICv3  27 Level     arch_timer
 12:         84     GICv3  33 Level     uart-pl011
 49:          0     GICv3  41 Edge      ACPI:Ged
 50:          0   ITS-MSI 16384 Edge      virtio0-config
 51:       2088   ITS-MSI 16385 Edge      virtio0-req.0
 52:          0   ITS-MSI 16386 Edge      virtio0-req.1
 53:          0   ITS-MSI 16387 Edge      virtio0-req.2
 54:          0   ITS-MSI 16388 Edge      virtio0-req.3
 55:      11641   ITS-MSI 32768 Edge      xhci_hcd
 56:          0   ITS-MSI 32769 Edge      xhci_hcd
IPI0:         0       Rescheduling interrupts
IPI1:         0       Function call interrupts
IPI2:         0       CPU stop interrupts
IPI3:         0       CPU stop (for crash dump) interrupts
IPI4:         0       Timer broadcast interrupts
IPI5:         0       IRQ work interrupts
IPI6:         0       CPU wake-up interrupts
Err:          0
root@debian:~# echo 1 >/sys/devices/system/cpu/cpu2/online 
root@debian:~# cat /proc/interrupts 
           CPU0       CPU2       
 10:       2530         90     GICv3  27 Level     arch_timer
 12:        103          0     GICv3  33 Level     uart-pl011
 49:          0          0     GICv3  41 Edge      ACPI:Ged
 50:          0          0   ITS-MSI 16384 Edge      virtio0-config
 51:       2097          0   ITS-MSI 16385 Edge      virtio0-req.0
 52:          0          0   ITS-MSI 16386 Edge      virtio0-req.1
 53:          0         12   ITS-MSI 16387 Edge      virtio0-req.2
 54:          0          0   ITS-MSI 16388 Edge      virtio0-req.3
 55:      13487          0   ITS-MSI 32768 Edge      xhci_hcd
 56:          0          0   ITS-MSI 32769 Edge      xhci_hcd
IPI0:        38         45       Rescheduling interrupts
IPI1:         3          3       Function call interrupts
IPI2:         0          0       CPU stop interrupts
IPI3:         0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0       Timer broadcast interrupts
IPI5:         0          0       IRQ work interrupts
IPI6:         0          0       CPU wake-up interrupts
Err:          0

Would this solve your problem?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-03-05 15:40             ` Marc Zyngier
@ 2022-03-07 13:48               ` John Garry
  2022-03-07 14:01                 ` Marc Zyngier
  2022-03-08  3:57                 ` Xiongfeng Wang
  0 siblings, 2 replies; 16+ messages in thread
From: John Garry @ 2022-03-07 13:48 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel,
	liuqi (BA),
	wangxiongfeng2, David Decotigny

Hi Marc,

> 
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 2bdfce5edafd..97e9eb9aecc6 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
>   	if (!(vflags & VIRQ_ACTIVATE))
>   		return 0;
>   
> +	if (!(vflags & VIRQ_CAN_RESERVE)) {
> +		/*
> +		 * If the interrupt is managed but no CPU is available
> +		 * to service it, shut it down until better times.
> +		 */
> +		if (irqd_affinity_is_managed(irqd) &&
> +		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> +					cpu_online_mask)) {
> +			irqd_set_managed_shutdown(irqd);
> +			return 0;
> +		}
> +	}
> +
>   	ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
>   	if (ret)
>   		return ret;
> 

Yeah, that seems to solve the issue. I will test it a bit more.

We need to check the isolcpus cmdline issue as well - wang xiongfeng, 
please assist here. I assume that this feature just never worked for 
arm64 since it was added.

> With this in place, I get the following results (VM booted with 4
> vcpus and maxcpus=1, the virtio device is using managed interrupts):
> 
> root@debian:~# cat /proc/interrupts
>             CPU0
>   10:       2298     GICv3  27 Level     arch_timer
>   12:         84     GICv3  33 Level     uart-pl011
>   49:          0     GICv3  41 Edge      ACPI:Ged
>   50:          0   ITS-MSI 16384 Edge      virtio0-config
>   51:       2088   ITS-MSI 16385 Edge      virtio0-req.0
>   52:          0   ITS-MSI 16386 Edge      virtio0-req.1
>   53:          0   ITS-MSI 16387 Edge      virtio0-req.2
>   54:          0   ITS-MSI 16388 Edge      virtio0-req.3
>   55:      11641   ITS-MSI 32768 Edge      xhci_hcd
>   56:          0   ITS-MSI 32769 Edge      xhci_hcd
> IPI0:         0       Rescheduling interrupts
> IPI1:         0       Function call interrupts
> IPI2:         0       CPU stop interrupts
> IPI3:         0       CPU stop (for crash dump) interrupts
> IPI4:         0       Timer broadcast interrupts
> IPI5:         0       IRQ work interrupts
> IPI6:         0       CPU wake-up interrupts
> Err:          0
> root@debian:~# echo 1 >/sys/devices/system/cpu/cpu2/online
> root@debian:~# cat /proc/interrupts
>             CPU0       CPU2
>   10:       2530         90     GICv3  27 Level     arch_timer
>   12:        103          0     GICv3  33 Level     uart-pl011
>   49:          0          0     GICv3  41 Edge      ACPI:Ged
>   50:          0          0   ITS-MSI 16384 Edge      virtio0-config
>   51:       2097          0   ITS-MSI 16385 Edge      virtio0-req.0
>   52:          0          0   ITS-MSI 16386 Edge      virtio0-req.1
>   53:          0         12   ITS-MSI 16387 Edge      virtio0-req.2
>   54:          0          0   ITS-MSI 16388 Edge      virtio0-req.3
>   55:      13487          0   ITS-MSI 32768 Edge      xhci_hcd
>   56:          0          0   ITS-MSI 32769 Edge      xhci_hcd
> IPI0:        38         45       Rescheduling interrupts
> IPI1:         3          3       Function call interrupts
> IPI2:         0          0       CPU stop interrupts
> IPI3:         0          0       CPU stop (for crash dump) interrupts
> IPI4:         0          0       Timer broadcast interrupts
> IPI5:         0          0       IRQ work interrupts
> IPI6:         0          0       CPU wake-up interrupts
> Err:          0
> 

Out of interest, is the virtio managed interrupts support just in your 
sandbox? You did mention earlier in the thread that you were considering 
adding this feature.

Thanks,
John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-03-07 13:48               ` John Garry
@ 2022-03-07 14:01                 ` Marc Zyngier
  2022-03-07 14:03                   ` Marc Zyngier
  2022-03-08  3:57                 ` Xiongfeng Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2022-03-07 14:01 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel,
	liuqi (BA),
	wangxiongfeng2, David Decotigny

Hi John,

On Mon, 07 Mar 2022 13:48:11 +0000,
John Garry <john.garry@huawei.com> wrote:
> 
> Hi Marc,
> 
> > 
> > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> > index 2bdfce5edafd..97e9eb9aecc6 100644
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
> >   	if (!(vflags & VIRQ_ACTIVATE))
> >   		return 0;
> >   +	if (!(vflags & VIRQ_CAN_RESERVE)) {
> > +		/*
> > +		 * If the interrupt is managed but no CPU is available
> > +		 * to service it, shut it down until better times.
> > +		 */
> > +		if (irqd_affinity_is_managed(irqd) &&
> > +		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> > +					cpu_online_mask)) {
> > +			irqd_set_managed_shutdown(irqd);
> > +			return 0;
> > +		}
> > +	}
> > +
> >   	ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
> >   	if (ret)
> >   		return ret;
> > 
> 
> Yeah, that seems to solve the issue. I will test it a bit more.

Thanks. For the record, I have pushed a branch at [1]. The patch is
extremely similar, just moved up a tiny bit to avoid duplicating the
!VIRQ_CAN_RESERVE case.

> We need to check the isolcpus cmdline issue as well - wang xiongfeng,
> please assist here. I assume that this feature just never worked for
> arm64 since it was added.

That one is still on my list. isolcpus certainly has had as little
testing as you can imagine.

> Out of interest, is the virtio managed interrupts support just in
> your sandbox? You did mention earlier in the thread that you were
> considering adding this feature.

As it turns out, QEMU's non-legacy virtio support allows the kernel to
do the right thing (multi-queue support and affinity management).
Using kvmtool, I only get a single interrupt although the device
pretends to support some MQ extension. I haven't dug into it yet.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-03-07 14:01                 ` Marc Zyngier
@ 2022-03-07 14:03                   ` Marc Zyngier
  2022-03-08  1:37                     ` David Decotigny
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2022-03-07 14:03 UTC (permalink / raw)
  To: John Garry
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel,
	liuqi (BA),
	wangxiongfeng2, David Decotigny

On Mon, 07 Mar 2022 14:01:02 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> Hi John,
> 
> On Mon, 07 Mar 2022 13:48:11 +0000,
> John Garry <john.garry@huawei.com> wrote:
> > 
> > Hi Marc,
> > 
> > > 
> > > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> > > index 2bdfce5edafd..97e9eb9aecc6 100644
> > > --- a/kernel/irq/msi.c
> > > +++ b/kernel/irq/msi.c
> > > @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
> > >   	if (!(vflags & VIRQ_ACTIVATE))
> > >   		return 0;
> > >   +	if (!(vflags & VIRQ_CAN_RESERVE)) {
> > > +		/*
> > > +		 * If the interrupt is managed but no CPU is available
> > > +		 * to service it, shut it down until better times.
> > > +		 */
> > > +		if (irqd_affinity_is_managed(irqd) &&
> > > +		    !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> > > +					cpu_online_mask)) {
> > > +			irqd_set_managed_shutdown(irqd);
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > >   	ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
> > >   	if (ret)
> > >   		return ret;
> > > 
> > 
> > Yeah, that seems to solve the issue. I will test it a bit more.
> 
> Thanks. For the record, I have pushed a branch at [1]. The patch is
> extremely similar, just moved up a tiny bit to avoid duplicating the
> !VIRQ_CAN_RESERVE case.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/msi-shutdown-on-init

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-03-07 14:03                   ` Marc Zyngier
@ 2022-03-08  1:37                     ` David Decotigny
  0 siblings, 0 replies; 16+ messages in thread
From: David Decotigny @ 2022-03-08  1:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: John Garry, Thomas Gleixner, chenxiang, Shameer Kolothum,
	linux-kernel, liuqi (BA),
	wangxiongfeng2

Thanks, Marc! That solved the issue on our end as well.

On Mon, Mar 7, 2022 at 6:03 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Mon, 07 Mar 2022 14:01:02 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi John,
> >
> > On Mon, 07 Mar 2022 13:48:11 +0000,
> > John Garry <john.garry@huawei.com> wrote:
> > >
> > > Hi Marc,
> > >
> > > >
> > > > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> > > > index 2bdfce5edafd..97e9eb9aecc6 100644
> > > > --- a/kernel/irq/msi.c
> > > > +++ b/kernel/irq/msi.c
> > > > @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
> > > >           if (!(vflags & VIRQ_ACTIVATE))
> > > >                   return 0;
> > > >   +       if (!(vflags & VIRQ_CAN_RESERVE)) {
> > > > +         /*
> > > > +          * If the interrupt is managed but no CPU is available
> > > > +          * to service it, shut it down until better times.
> > > > +          */
> > > > +         if (irqd_affinity_is_managed(irqd) &&
> > > > +             !cpumask_intersects(irq_data_get_affinity_mask(irqd),
> > > > +                                 cpu_online_mask)) {
> > > > +                 irqd_set_managed_shutdown(irqd);
> > > > +                 return 0;
> > > > +         }
> > > > + }
> > > > +
> > > >           ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
> > > >           if (ret)
> > > >                   return ret;
> > > >
> > >
> > > Yeah, that seems to solve the issue. I will test it a bit more.
> >
> > Thanks. For the record, I have pushed a branch at [1]. The patch is
> > extremely similar, just moved up a tiny bit to avoid duplicating the
> > !VIRQ_CAN_RESERVE case.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/msi-shutdown-on-init
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
  2022-03-07 13:48               ` John Garry
  2022-03-07 14:01                 ` Marc Zyngier
@ 2022-03-08  3:57                 ` Xiongfeng Wang
       [not found]                   ` <87zgm0zfw7.wl-maz@kernel.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Xiongfeng Wang @ 2022-03-08  3:57 UTC (permalink / raw)
  To: John Garry, Marc Zyngier
  Cc: Thomas Gleixner, chenxiang, Shameer Kolothum, linux-kernel,
	liuqi (BA),
	David Decotigny

Hi,

On 2022/3/7 21:48, John Garry wrote:
> Hi Marc,
> 
>>
>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>> index 2bdfce5edafd..97e9eb9aecc6 100644
>> --- a/kernel/irq/msi.c
>> +++ b/kernel/irq/msi.c
>> @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int
>> virq, unsigned int vflag
>>       if (!(vflags & VIRQ_ACTIVATE))
>>           return 0;
>>   +    if (!(vflags & VIRQ_CAN_RESERVE)) {
>> +        /*
>> +         * If the interrupt is managed but no CPU is available
>> +         * to service it, shut it down until better times.
>> +         */
>> +        if (irqd_affinity_is_managed(irqd) &&
>> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>> +                    cpu_online_mask)) {
>> +            irqd_set_managed_shutdown(irqd);
>> +            return 0;
>> +        }
>> +    }
>> +
>>       ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
>>       if (ret)
>>           return ret;
>>

I applied the above modification and add kernel parameter 'maxcpus=1'. It can
boot successfully on D06.

Then I remove 'maxcpus=1' and add 'nohz_full=5-127
isolcpus=nohz,domain,managed_irq,5-127'. The 'effective_affinity' of the kernel
managed irq is not correct.
[root@localhost wxf]# cat /proc/interrupts | grep 350
350:          0          0          0          0          0        522
(ignored info)
0          0                  0   ITS-MSI 60882972 Edge      hisi_sas_v3_hw cq
[root@localhost wxf]# cat /proc/irq/350/smp_affinity
00000000,00000000,00000000,000000ff
[root@localhost wxf]# cat /proc/irq/350/effective_affinity
00000000,00000000,00000000,00000020

Then I apply the following modification.
Refer to https://lore.kernel.org/all/87a6fl8jgb.wl-maz@kernel.org/
The 'effective_affinity' is correct now.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index eb0882d15366..0cea46bdaf99 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d,

 		cpu = cpumask_pick_least_loaded(d, tmpmask);
 	} else {
-		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+		cpumask_copy(tmpmask, aff_mask);

 		/* If we cannot cross sockets, limit the search to that node */
 		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&

Then I add both kernel parameters.
nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127 maxcpus=1
It crashed with the following message.
[   51.813803][T21132] cma_alloc: 29 callbacks suppressed
[   51.813809][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
pages, ret: -12
[   51.897537][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
pages, ret: -12
[   52.014432][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
pages, ret: -12
[   52.067313][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
pages, ret: -12
[   52.180011][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
pages, ret: -12
[   52.270846][    T0] Detected VIPT I-cache on CPU1
[   52.275541][    T0] GICv3: CPU1: found redistributor 80100 region
1:0x00000000ae140000
[   52.283425][    T0] GICv3: CPU1: using allocated LPI pending table
@0x00000040808b0000
[   52.291381][    T0] CPU1: Booted secondary processor 0x0000080100 [0x481fd010]
[   52.432971][    T0] Detected VIPT I-cache on CPU101
[   52.437914][    T0] GICv3: CPU101: found redistributor 390100 region
101:0x00002000aa240000
[   52.446233][    T0] GICv3: CPU101: using allocated LPI pending table
@0x0000004081170000
[   52.ULL pointer dereference at virtual address 00000000000000a0
[   52.471539][T24563] Mem abort info:
[   52.475011][T24563]   ESR = 0x96000044
[   52.478742][T24563]   EC = 0x25: DABT (current EL), IL = 32 bits
[   52.484721][T24563]   SET = 0, FnV = 0
[   52.488451][T24563]   EA = 0, S1PTW = 0
[   52.492269][T24563]   FSC = 0x04: level 0 translation fault
[   52.497815][T24563] Data abort info:
[   52.501374][T24563]   ISV = 0, ISS = 0x00000044
[   52.505884][T24563]   CM = 0, WnR = 1
[   52.509530][T24563] [00000000000000a0] user address but active_mm is swapper
[   52.516548][T24563] Internal error: Oops: 96000044 [#1] SMP
[   52.522096][T24563] Modules linked in: ghash_ce sha2_ce sha256_arm64 sha1_ce
sbsa_gwdt hns_roce_hw_v2 vfat fat ib_uverbs ib_core ipmi_ssif sg acpi_ipmi
ipmi_si ipmi_devintf ipmi_msghandler hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu
hisi_uncore_l3c_pmu hisi_uncore_pmu ip_tables xfs libcrc32c sd_mod realtek hclge
nvme hisi_sas_v3_hw nvme_core hisi_sas_main t10_pi libsas ahci libahci hns3
scsi_transport_sas libata hnae3 i2c_designware_platform i2c_designware_core nfit
libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[   52.567181][T24563] CPU: 101 PID: 24563 Comm: cpuhp/101 Not tainted
5.17.0-rc7+ #5
[   52.574716][T24563] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD,
BIOS 1.79 08/21/2021
[   52.583547][T24563] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[   52.591170][T24563] pc : lpi_update_config+0xe0/0x300
[   52.59620000ce6bb90 x28: 0000000000000000 x27: 0000000000000060
[   52.613021][T24563] x26: ffff20800798b818 x25: 0000000000002781 x24:
ffff80000962f460
[   52.620815][T24563] x23: 0000000000000000 x22: 0000000000000060 x21:
ffff80000962ec58
[   52.628610][T24563] x20: ffff20800633b540 x19: ffff208007946e00 x18:
0000000000000000
[   52.636404][T24563] x17: 3731313830343030 x16: 3030303078304020 x15:
0000000000000000
[   52.644199][T24563] x14: 0000000000000000 x13: 0000000000000000 x12:
0000000000000000
[   52.651993][T24563] x11: 0000000000000000 x10: 0000000000000000 x9 :
ffff80000867a99c
[   52.659788][T24563] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
ffff800008d3dda0
[   52.667582][T24563] x5 : ffff800028e00000 x4 : 0000000000000000 x3 :
ffff20be7f837780
[   52.675376][T24563] x2 : 0000000000000001 x1 : 00000000000000a0 x0 :
0000000000000000
[   52.683170][T24563] Call trace:
[   52.686298][T24563]  lpi_update_config+0xe0/0x300
[   52.690982][T24563]  its_unmask_irq+0x34/0x68
[   52.695318][T24563]  irq_chip_unmask_parent+0x20/0x28
[   52.700349][T24563]  its_unmask_msi_irq+0x24/0x30
[   52.705032][T24563]  unmask_irq.part.0+0x2c/0x48
[   52.709630][T24563]  irq_enable+0x70/0x80
[   52.713623][T24563]  __irq_startup+0x7c/0xa8
[   52.717875][T24563]  irq_startup+0x134/0x158
[   52.722127][T24563]  irq_affinity_online_cpu+0x1c0/0x210
[   52.727415][T24563]  cpuhp_invoke_callback+0x14c/0x590
[   52.732533][T24563]  cpuhp_thread_fun+0xd4/0x188
[   52.737130][T24563]   52.749890][T24563] Code: f94002a0 8b000020 f9400400
91028001 (f9000039)
[   52.756649][T24563] ---[ end trace 0000000000000000 ]---
[   52.787287][T24563] Kernel panic - not syncing: Oops: Fatal exception
[   52.793701][T24563] SMP: stopping secondary CPUs
[   52.798309][T24563] Kernel Offset: 0xb0000 from 0xffff800008000000
[   52.804462][T24563] PHYS_OFFSET: 0x0
[   52.808021][T24563] CPU features: 0x00,00000803,46402c40
[   52.813308][T24563] Memory Limit: none
[   52.841424][T24563] ---[ end Kernel panic - not syncing: Oops: Fatal
exception ]---

Then I only add kernel parameter 'maxcpus=1. It also crash with the same Call Trace.

Then I add the cpu_online_mask check like below. Add both kernel parameters. It
won't crash now.
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index d25b7a864bbb..17c15d3b2784 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1624,7 +1624,10 @@ static int its_select_cpu(struct irq_data *d,

 		cpu = cpumask_pick_least_loaded(d, tmpmask);
 	} else {
-		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+		cpumask_and(tmpmask, aff_mask, cpu_online_mask);
+		if (cpumask_empty(tmpmask))
+			cpumask_and(tmpmask, irq_data_get_affinity_mask(d),
+				    cpu_online_mask);

 		/* If we cannot cross sockets, limit the search to that node */
 		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&

Thanks,
Xiongfeng


> 
> Yeah, that seems to solve the issue. I will test it a bit more.
> 
> We need to check the isolcpus cmdline issue as well - wang xiongfeng, please
> assist here. I assume that this feature just never worked for arm64 since it was
> added.
> 
>> With this in place, I get the following results (VM booted with 4
>> vcpus and maxcpus=1, the virtio device is using managed interrupts):
>>
>> root@debian:~# cat /proc/interrupts
>>             CPU0
>>   10:       2298     GICv3  27 Level     arch_timer
>>   12:         84     GICv3  33 Level     uart-pl011
>>   49:          0     GICv3  41 Edge      ACPI:Ged
>>   50:          0   ITS-MSI 16384 Edge      virtio0-config
>>   51:       2088   ITS-MSI 16385 Edge      virtio0-req.0
>>   52:          0   ITS-MSI 16386 Edge      virtio0-req.1
>>   53:          0   ITS-MSI 16387 Edge      virtio0-req.2
>>   54:          0   ITS-MSI 16388 Edge      virtio0-req.3
>>   55:      11641   ITS-MSI 32768 Edge      xhci_hcd
>>   56:          0   ITS-MSI 32769 Edge      xhci_hcd
>> IPI0:         0       Rescheduling interrupts
>> IPI1:         0       Function call interrupts
>> IPI2:         0       CPU stop interrupts
>> IPI3:         0       CPU stop (for crash dump) interrupts
>> IPI4:         0       Timer broadcast interrupts
>> IPI5:         0       IRQ work interrupts
>> IPI6:         0       CPU wake-up interrupts
>> Err:          0
>> root@debian:~# echo 1 >/sys/devices/system/cpu/cpu2/online
>> root@debian:~# cat /proc/interrupts
>>             CPU0       CPU2
>>   10:       2530         90     GICv3  27 Level     arch_timer
>>   12:        103          0     GICv3  33 Level     uart-pl011
>>   49:          0          0     GICv3  41 Edge      ACPI:Ged
>>   50:          0          0   ITS-MSI 16384 Edge      virtio0-config
>>   51:       2097          0   ITS-MSI 16385 Edge      virtio0-req.0
>>   52:          0          0   ITS-MSI 16386 Edge      virtio0-req.1
>>   53:          0         12   ITS-MSI 16387 Edge      virtio0-req.2
>>   54:          0          0   ITS-MSI 16388 Edge      virtio0-req.3
>>   55:      13487          0   ITS-MSI 32768 Edge      xhci_hcd
>>   56:          0          0   ITS-MSI 32769 Edge      xhci_hcd
>> IPI0:        38         45       Rescheduling interrupts
>> IPI1:         3          3       Function call interrupts
>> IPI2:         0          0       CPU stop interrupts
>> IPI3:         0          0       CPU stop (for crash dump) interrupts
>> IPI4:         0          0       Timer broadcast interrupts
>> IPI5:         0          0       IRQ work interrupts
>> IPI6:         0          0       CPU wake-up interrupts
>> Err:          0
>>
> 
> Out of interest, is the virtio managed interrupts support just in your sandbox?
> You did mention earlier in the thread that you were considering adding this
> feature.
> 
> Thanks,
> John
> .

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
       [not found]                   ` <87zgm0zfw7.wl-maz@kernel.org>
@ 2022-03-10  3:19                     ` Xiongfeng Wang
       [not found]                       ` <87o82eyxmz.wl-maz@kernel.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Xiongfeng Wang @ 2022-03-10  3:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: John Garry, Thomas Gleixner, chenxiang, Shameer Kolothum,
	linux-kernel, liuqi (BA),
	David Decotigny

Hi,

On 2022/3/8 22:18, Marc Zyngier wrote:
> On Tue, 08 Mar 2022 03:57:33 +0000,
> Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
>>
>> Hi,
>>
>> On 2022/3/7 21:48, John Garry wrote:
>>> Hi Marc,
>>>
>>>>
>>>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>>>> index 2bdfce5edafd..97e9eb9aecc6 100644
>>>> --- a/kernel/irq/msi.c
>>>> +++ b/kernel/irq/msi.c
>>>> @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int
>>>> virq, unsigned int vflag
>>>>       if (!(vflags & VIRQ_ACTIVATE))
>>>>           return 0;
>>>>   +    if (!(vflags & VIRQ_CAN_RESERVE)) {
>>>> +        /*
>>>> +         * If the interrupt is managed but no CPU is available
>>>> +         * to service it, shut it down until better times.
>>>> +         */
>>>> +        if (irqd_affinity_is_managed(irqd) &&
>>>> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>>>> +                    cpu_online_mask)) {
>>>> +            irqd_set_managed_shutdown(irqd);
>>>> +            return 0;
>>>> +        }
>>>> +    }
>>>> +
>>>>       ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
>>>>       if (ret)
>>>>           return ret;
>>>>
>>
>> I applied the above modification and add kernel parameter 'maxcpus=1'. It can
>> boot successfully on D06.
>>
>> Then I remove 'maxcpus=1' and add 'nohz_full=5-127
>> isolcpus=nohz,domain,managed_irq,5-127'. The 'effective_affinity' of the kernel
>> managed irq is not correct.
>> [root@localhost wxf]# cat /proc/interrupts | grep 350
>> 350:          0          0          0          0          0        522
>> (ignored info)
>> 0          0                  0   ITS-MSI 60882972 Edge      hisi_sas_v3_hw cq
>> [root@localhost wxf]# cat /proc/irq/350/smp_affinity
>> 00000000,00000000,00000000,000000ff
>> [root@localhost wxf]# cat /proc/irq/350/effective_affinity
>> 00000000,00000000,00000000,00000020
>>
>> Then I apply the following modification.
>> Refer to https://lore.kernel.org/all/87a6fl8jgb.wl-maz@kernel.org/
>> The 'effective_affinity' is correct now.
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index eb0882d15366..0cea46bdaf99 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d,
>>
>>  		cpu = cpumask_pick_least_loaded(d, tmpmask);
>>  	} else {
>> -		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
>> +		cpumask_copy(tmpmask, aff_mask);
>>
>>  		/* If we cannot cross sockets, limit the search to that node */
>>  		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&
>>
>> Then I add both kernel parameters.
>> nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127 maxcpus=1
>> It crashed with the following message.
>> [   51.813803][T21132] cma_alloc: 29 callbacks suppressed
>> [   51.813809][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>> pages, ret: -12
>> [   51.897537][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
>> pages, ret: -12
>> [   52.014432][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>> pages, ret: -12
>> [   52.067313][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
>> pages, ret: -12
>> [   52.180011][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>> pages, ret: -12
>> [   52.270846][    T0] Detected VIPT I-cache on CPU1
>> [   52.275541][    T0] GICv3: CPU1: found redistributor 80100 region
>> 1:0x00000000ae140000
>> [   52.283425][    T0] GICv3: CPU1: using allocated LPI pending table
>> @0x00000040808b0000
>> [   52.291381][    T0] CPU1: Booted secondary processor 0x0000080100 [0x481fd010]
>> [   52.432971][    T0] Detected VIPT I-cache on CPU101
>> [   52.437914][    T0] GICv3: CPU101: found redistributor 390100 region
>> 101:0x00002000aa240000
>> [   52.446233][    T0] GICv3: CPU101: using allocated LPI pending table
>> @0x0000004081170000
>> [   52.ULL pointer dereference at virtual address 00000000000000a0
> 
> This is pretty odd. If you passed maxcpus=1, how comes CPU1 and 101
> are booting right from the beginning? Or is it userspace doing that?

Yes, it is the userspace will online all the CPUs.

> 
>> [   52.471539][T24563] Mem abort info:
>> [   52.475011][T24563]   ESR = 0x96000044
>> [   52.478742][T24563]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [   52.484721][T24563]   SET = 0, FnV = 0
>> [   52.488451][T24563]   EA = 0, S1PTW = 0
>> [   52.492269][T24563]   FSC = 0x04: level 0 translation fault
>> [   52.497815][T24563] Data abort info:
>> [   52.501374][T24563]   ISV = 0, ISS = 0x00000044
>> [   52.505884][T24563]   CM = 0, WnR = 1
>> [   52.509530][T24563] [00000000000000a0] user address but active_mm is swapper
>> [   52.516548][T24563] Internal error: Oops: 96000044 [#1] SMP
>> [   52.522096][T24563] Modules linked in: ghash_ce sha2_ce sha256_arm64 sha1_ce
>> sbsa_gwdt hns_roce_hw_v2 vfat fat ib_uverbs ib_core ipmi_ssif sg acpi_ipmi
>> ipmi_si ipmi_devintf ipmi_msghandler hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu
>> hisi_uncore_l3c_pmu hisi_uncore_pmu ip_tables xfs libcrc32c sd_mod realtek hclge
>> nvme hisi_sas_v3_hw nvme_core hisi_sas_main t10_pi libsas ahci libahci hns3
>> scsi_transport_sas libata hnae3 i2c_designware_platform i2c_designware_core nfit
>> libnvdimm dm_mirror dm_region_hash dm_log dm_mod
>> [   52.567181][T24563] CPU: 101 PID: 24563 Comm: cpuhp/101 Not tainted
>> 5.17.0-rc7+ #5
>> [   52.574716][T24563] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD,
>> BIOS 1.79 08/21/2021
>> [   52.583547][T24563] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [   52.591170][T24563] pc : lpi_update_config+0xe0/0x300
> 
> Can you please feed this to scripts/faddr2line? All this should do is
> to update the property table, which is global. If you get a NULL
> pointer there, something is really bad.

I found the CallTrace is the same with the following one I got.
This one: https://lkml.org/lkml/2022/1/25/529.

        gic_write_lpir(val, rdbase + GICR_INVLPIR);
    56bc:       91028001        add     x1, x0, #0xa0
    56c0:       f9000039        str     x25, [x1]
The fault instruction is 'str     x25, [x1]'. I think it may be because the
'rdbase' is null.

> 
> I also can't reproduce it locally, but that doesn't mean much.
> 
>> [   52.59620000ce6bb90 x28: 0000000000000000 x27: 0000000000000060
>> [   52.613021][T24563] x26: ffff20800798b818 x25: 0000000000002781 x24:
>> ffff80000962f460
>> [   52.620815][T24563] x23: 0000000000000000 x22: 0000000000000060 x21:
>> ffff80000962ec58
>> [   52.628610][T24563] x20: ffff20800633b540 x19: ffff208007946e00 x18:
>> 0000000000000000
>> [   52.636404][T24563] x17: 3731313830343030 x16: 3030303078304020 x15:
>> 0000000000000000
>> [   52.644199][T24563] x14: 0000000000000000 x13: 0000000000000000 x12:
>> 0000000000000000
>> [   52.651993][T24563] x11: 0000000000000000 x10: 0000000000000000 x9 :
>> ffff80000867a99c
>> [   52.659788][T24563] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
>> ffff800008d3dda0
>> [   52.667582][T24563] x5 : ffff800028e00000 x4 : 0000000000000000 x3 :
>> ffff20be7f837780
>> [   52.675376][T24563] x2 : 0000000000000001 x1 : 00000000000000a0 x0 :
>> 0000000000000000
>> [   52.683170][T24563] Call trace:
>> [   52.686298][T24563]  lpi_update_config+0xe0/0x300
>> [   52.690982][T24563]  its_unmask_irq+0x34/0x68
>> [   52.695318][T24563]  irq_chip_unmask_parent+0x20/0x28
>> [   52.700349][T24563]  its_unmask_msi_irq+0x24/0x30
>> [   52.705032][T24563]  unmask_irq.part.0+0x2c/0x48
>> [   52.709630][T24563]  irq_enable+0x70/0x80
>> [   52.713623][T24563]  __irq_startup+0x7c/0xa8
>> [   52.717875][T24563]  irq_startup+0x134/0x158
>> [   52.722127][T24563]  irq_affinity_online_cpu+0x1c0/0x210
>> [   52.727415][T24563]  cpuhp_invoke_callback+0x14c/0x590
>> [   52.732533][T24563]  cpuhp_thread_fun+0xd4/0x188
>> [   52.737130][T24563]   52.749890][T24563] Code: f94002a0 8b000020 f9400400
>> 91028001 (f9000039)
>> [   52.756649][T24563] ---[ end trace 0000000000000000 ]---
>> [   52.787287][T24563] Kernel panic - not syncing: Oops: Fatal exception
>> [   52.793701][T24563] SMP: stopping secondary CPUs
>> [   52.798309][T24563] Kernel Offset: 0xb0000 from 0xffff800008000000
>> [   52.804462][T24563] PHYS_OFFSET: 0x0
>> [   52.808021][T24563] CPU features: 0x00,00000803,46402c40
>> [   52.813308][T24563] Memory Limit: none
>> [   52.841424][T24563] ---[ end Kernel panic - not syncing: Oops: Fatal
>> exception ]---
>>
>> Then I only add kernel parameter 'maxcpus=1. It also crash with the same Call Trace.
>>
>> Then I add the cpu_online_mask check like below. Add both kernel parameters. It
>> won't crash now.
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index d25b7a864bbb..17c15d3b2784 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -1624,7 +1624,10 @@ static int its_select_cpu(struct irq_data *d,
>>
>>  		cpu = cpumask_pick_least_loaded(d, tmpmask);
>>  	} else {
>> -		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
>> +		cpumask_and(tmpmask, aff_mask, cpu_online_mask);
>> +		if (cpumask_empty(tmpmask))
>> +			cpumask_and(tmpmask, irq_data_get_affinity_mask(d),
>> +				    cpu_online_mask);
> 
> I don't get what this is trying to do.
> 
> For a managed interrupt, we really should never reach set_affinity if
> no CPUs that are able to deal with this interrupt are online. The
> current ITS code is buggy in that respect (it really should ignore
> cpu_online_mask), but I don't think we should end-up here the first
> place (this should all be core code).

Thanks, I got it.

Thanks,
Xiongfeng

> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PCI MSI issue for maxcpus=1
       [not found]                       ` <87o82eyxmz.wl-maz@kernel.org>
@ 2022-03-10 12:58                         ` Xiongfeng Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Xiongfeng Wang @ 2022-03-10 12:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: John Garry, Thomas Gleixner, chenxiang, Shameer Kolothum,
	linux-kernel, liuqi (BA),
	David Decotigny



On 2022/3/10 17:17, Marc Zyngier wrote:
> On Thu, 10 Mar 2022 03:19:52 +0000,
> Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
>>
>> Hi,
>>
>> On 2022/3/8 22:18, Marc Zyngier wrote:
>>> On Tue, 08 Mar 2022 03:57:33 +0000,
>>> Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 2022/3/7 21:48, John Garry wrote:
>>>>> Hi Marc,
>>>>>
>>>>>>
>>>>>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>>>>>> index 2bdfce5edafd..97e9eb9aecc6 100644
>>>>>> --- a/kernel/irq/msi.c
>>>>>> +++ b/kernel/irq/msi.c
>>>>>> @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int
>>>>>> virq, unsigned int vflag
>>>>>>       if (!(vflags & VIRQ_ACTIVATE))
>>>>>>           return 0;
>>>>>>   +    if (!(vflags & VIRQ_CAN_RESERVE)) {
>>>>>> +        /*
>>>>>> +         * If the interrupt is managed but no CPU is available
>>>>>> +         * to service it, shut it down until better times.
>>>>>> +         */
>>>>>> +        if (irqd_affinity_is_managed(irqd) &&
>>>>>> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>>>>>> +                    cpu_online_mask)) {
>>>>>> +            irqd_set_managed_shutdown(irqd);
>>>>>> +            return 0;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>       ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
>>>>>>       if (ret)
>>>>>>           return ret;
>>>>>>
>>>>
>>>> I applied the above modification and add kernel parameter 'maxcpus=1'. It can
>>>> boot successfully on D06.
>>>>
>>>> Then I remove 'maxcpus=1' and add 'nohz_full=5-127
>>>> isolcpus=nohz,domain,managed_irq,5-127'. The 'effective_affinity' of the kernel
>>>> managed irq is not correct.
>>>> [root@localhost wxf]# cat /proc/interrupts | grep 350
>>>> 350:          0          0          0          0          0        522
>>>> (ignored info)
>>>> 0          0                  0   ITS-MSI 60882972 Edge      hisi_sas_v3_hw cq
>>>> [root@localhost wxf]# cat /proc/irq/350/smp_affinity
>>>> 00000000,00000000,00000000,000000ff
>>>> [root@localhost wxf]# cat /proc/irq/350/effective_affinity
>>>> 00000000,00000000,00000000,00000020
>>>>
>>>> Then I apply the following modification.
>>>> Refer to https://lore.kernel.org/all/87a6fl8jgb.wl-maz@kernel.org/
>>>> The 'effective_affinity' is correct now.
>>>>
>>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>>> index eb0882d15366..0cea46bdaf99 100644
>>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>>> @@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d,
>>>>
>>>>  		cpu = cpumask_pick_least_loaded(d, tmpmask);
>>>>  	} else {
>>>> -		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
>>>> +		cpumask_copy(tmpmask, aff_mask);
>>>>
>>>>  		/* If we cannot cross sockets, limit the search to that node */
>>>>  		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&
>>>>
>>>> Then I add both kernel parameters.
>>>> nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127 maxcpus=1
>>>> It crashed with the following message.
>>>> [   51.813803][T21132] cma_alloc: 29 callbacks suppressed
>>>> [   51.813809][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>>>> pages, ret: -12
>>>> [   51.897537][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
>>>> pages, ret: -12
>>>> [   52.014432][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>>>> pages, ret: -12
>>>> [   52.067313][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
>>>> pages, ret: -12
>>>> [   52.180011][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>>>> pages, ret: -12
>>>> [   52.270846][    T0] Detected VIPT I-cache on CPU1
>>>> [   52.275541][    T0] GICv3: CPU1: found redistributor 80100 region
>>>> 1:0x00000000ae140000
>>>> [   52.283425][    T0] GICv3: CPU1: using allocated LPI pending table
>>>> @0x00000040808b0000
>>>> [   52.291381][    T0] CPU1: Booted secondary processor 0x0000080100 [0x481fd010]
>>>> [   52.432971][    T0] Detected VIPT I-cache on CPU101
>>>> [   52.437914][    T0] GICv3: CPU101: found redistributor 390100 region
>>>> 101:0x00002000aa240000
>>>> [   52.446233][    T0] GICv3: CPU101: using allocated LPI pending table
>>>> @0x0000004081170000
>>>> [   52.ULL pointer dereference at virtual address 00000000000000a0
>>>
>>> This is pretty odd. If you passed maxcpus=1, how comes CPU1 and 101
>>> are booting right from the beginning? Or is it userspace doing that?
>>
>> Yes, it is the userspace will online all the CPUs.
>>
>>>
>>>> [   52.471539][T24563] Mem abort info:
>>>> [   52.475011][T24563]   ESR = 0x96000044
>>>> [   52.478742][T24563]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [   52.484721][T24563]   SET = 0, FnV = 0
>>>> [   52.488451][T24563]   EA = 0, S1PTW = 0
>>>> [   52.492269][T24563]   FSC = 0x04: level 0 translation fault
>>>> [   52.497815][T24563] Data abort info:
>>>> [   52.501374][T24563]   ISV = 0, ISS = 0x00000044
>>>> [   52.505884][T24563]   CM = 0, WnR = 1
>>>> [   52.509530][T24563] [00000000000000a0] user address but active_mm is swapper
>>>> [   52.516548][T24563] Internal error: Oops: 96000044 [#1] SMP
>>>> [   52.522096][T24563] Modules linked in: ghash_ce sha2_ce sha256_arm64 sha1_ce
>>>> sbsa_gwdt hns_roce_hw_v2 vfat fat ib_uverbs ib_core ipmi_ssif sg acpi_ipmi
>>>> ipmi_si ipmi_devintf ipmi_msghandler hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu
>>>> hisi_uncore_l3c_pmu hisi_uncore_pmu ip_tables xfs libcrc32c sd_mod realtek hclge
>>>> nvme hisi_sas_v3_hw nvme_core hisi_sas_main t10_pi libsas ahci libahci hns3
>>>> scsi_transport_sas libata hnae3 i2c_designware_platform i2c_designware_core nfit
>>>> libnvdimm dm_mirror dm_region_hash dm_log dm_mod
>>>> [   52.567181][T24563] CPU: 101 PID: 24563 Comm: cpuhp/101 Not tainted
>>>> 5.17.0-rc7+ #5
>>>> [   52.574716][T24563] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD,
>>>> BIOS 1.79 08/21/2021
>>>> [   52.583547][T24563] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS
>>>> BTYPE=--)
>>>> [   52.591170][T24563] pc : lpi_update_config+0xe0/0x300
>>>
>>> Can you please feed this to scripts/faddr2line? All this should do is
>>> to update the property table, which is global. If you get a NULL
>>> pointer there, something is really bad.
>>
>> I found the CallTrace is the same with the following one I got.
>> This one: https://lkml.org/lkml/2022/1/25/529.
>>
>>         gic_write_lpir(val, rdbase + GICR_INVLPIR);
>>     56bc:       91028001        add     x1, x0, #0xa0
>>     56c0:       f9000039        str     x25, [x1]
>> The fault instruction is 'str     x25, [x1]'. I think it may be because the
>> 'rdbase' is null.
> 
> Ah, you're of course using direct invalidation, which is why I
> couldn't get this to explode in a VM. Maybe I should add support for
> this in KVM, if only as an option.
> 
> I'll try and work out what goes wrong.

Thanks a lot!

I add some debug info and got the dmesg as belows. Hope it be helpful.
It seems that irq_to_cpuid_lock() returns an offline CPU and the crash occurs.


diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index cd77297..f9fc953 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1453,6 +1453,7 @@ static void direct_lpi_inv(struct irq_data *d)

        /* Target the redistributor this LPI is currently routed to */
        cpu = irq_to_cpuid_lock(d, &flags);
+       pr_info("direct_lpi_inv CPU%d current CPU%d\n", cpu, smp_processor_id());
        raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
        rdbase = per_cpu_ptr(gic_rdists->rdist, cpu)->rd_base;
        gic_write_lpir(val, rdbase + GICR_INVLPIR);


[   16.052692][ T2280] direct_lpi_inv CPU0 current CPU0
[   16.058914][  T336] hns3 0000:7d:00.0: hclge driver initialization finished.
[   16.066711][    T7] direct_lpi_inv CPU0 current CPU0
[   16.072089][ T2280] nvme nvme1: Shutdown timeout set to 8 seconds
[   16.080703][ T2280] direct_lpi_inv CPU0 current CPU0
[   16.087717][    T7] direct_lpi_inv CPU1 current CPU0
[   16.092663][    T7] Unable to handle kernel NULL pointer dereference at
virtual address 00000000000000a0
[   16.102097][    T7] Mem abort info:
[   16.105569][    T7]   ESR = 0x96000044
[   16.109301][    T7]   EC = 0x25: DABT (current EL), IL = 32 bits
[   16.115280][    T7]   SET = 0, FnV = 0
[   16.119012][    T7]   EA = 0, S1PTW = 0
[   16.122830][    T7]   FSC = 0x04: level 0 translation fault
[   16.128377][    T7] Data abort info:
[   16.131934][    T7]   ISV = 0, ISS = 0x00000044
[   16.136443][    T7]   CM = 0, WnR = 1
[   16.140089][    T7] user pgtable: 4k pages, 48-bit VAs, pgdp=000000409a00f000
[   16.147191][    T7] [00000000000000a0] pgd=0000000000000000, p4d=0000000000000000
[   16.154642][    T7] Internal error: Oops: 96000044 [#1] SMP
[   16.160189][    T7] Modules linked in: nvmec_designware_platform
i2c_designware_core nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[   16.183467][    T7] CPU: 0 PID: 7 Comm: kworker/u256:0 Not tainted 5.17.0-rc7+ #8
[   16.190916][    T7] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD,
BIOS 1.79 08/21/2021
[   16.199746][    T7] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[   16.205990][    T7] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[   16.213611][    T7] pc : lpi_update_config+0x10c/0x320
[   16.218729][    T7] lr : lpi_update_config+0xb4/0x320
[   16.223758][    T7] sp : ffff80000a9f3950
[   16.227748][    T7] x29: ffff80000a9f3950 x28: 0000000000000030 x27:
ffff0040a0b81680
[   16.235543][    T7] x26: 0000000000000000 x25: 0000000000000001 x24:
0000000000000000
[   16.243337][    T7] x23: 00000000000028bb x22: ffff8000095cf460 x21:
ffff004087612380
[   16.251131][    T7] x20: ffff8000095cec58 x19: ffff0040a076e600 x18:
0000000000000000
[   16.258925][    T7] x17: ffff807ef6793000 x16: ffff800008004000 x15:
ffff004087612ac8
[   16.266719][    T7] x14: 0000000000000000 x13: 205d375420202020 x12:
5b5d373137373830
[   16.274513][    T7] x11: ffff800009983388 x10: ffff8000098c3348 x9 :
ffff80000825c408
[   16.282306][    T7] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
0000000000000001
[   16.290100][    T7] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
ffff007effa56780
[   16.297894][    T7] x2 : 0000000000000001 x1 : 00000000000000a0 x0 :
0000000000000000
[   0x34/0x68
[   16.317920][    T7]  irq_chip_unmask_parent+0x20/0x28
[   16.322950][    T7]  its_unmask_msi_irq+0x24/0x30
[   16.327632][    T7]  unmask_irq.part.0+0x2c/0x48
[   16.332228][    T7]  irq_enable+0x70/0x80
[   16.336220][    T7]  __irq_startup+0x7c/0xa8
[   16.340472][    T7]  irq_startup+0x134/0x158
[   16.344724][    T7]  __setup_irq+0x808/0x940
[   16.348973][    T7]  request_threaded_irq+0xf0/0x1a8
[   16.353915][    T7]  pci_request_irq+0xbc/0x108
[   16.358426][    T7]  queue_request_irq+0x70/0x78 [nvme]
[   16.363629][    T7]  nvme_create_io_queues+0x208/0x368 [nvme]
[   16.369350][    T7]  nvme_reset_work+0x828/0xdd8 [nvme]
[   16.374552][    T7]  process_one_work+0x1dc/0x478
[   16.379236][    T7]  worker_thread+0x150/0x4f0
[   16.383660][    T7]  kthread+0xd0/0xe0
[   16.387393][    T7]  ret_from_fork+0x10/0x20
[   16.391648][    T7] Code: f9400280 8b000020 f9400400 91028001 (f9000037)
[   16.398404][    T7] ---[ end trace 0000000000000000 ]---

Thanks,
Xiongfeng

> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-03-10 12:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-05 11:23 PCI MSI issue for maxcpus=1 John Garry
2022-01-06 15:49 ` Marc Zyngier
2022-01-07 11:24   ` John Garry
2022-01-16 12:07     ` Marc Zyngier
2022-01-17  9:14       ` Marc Zyngier
2022-01-17 11:59         ` John Garry
2022-01-24 11:22           ` Marc Zyngier
2022-03-04 12:53           ` John Garry
2022-03-05 15:40             ` Marc Zyngier
2022-03-07 13:48               ` John Garry
2022-03-07 14:01                 ` Marc Zyngier
2022-03-07 14:03                   ` Marc Zyngier
2022-03-08  1:37                     ` David Decotigny
2022-03-08  3:57                 ` Xiongfeng Wang
     [not found]                   ` <87zgm0zfw7.wl-maz@kernel.org>
2022-03-10  3:19                     ` Xiongfeng Wang
     [not found]                       ` <87o82eyxmz.wl-maz@kernel.org>
2022-03-10 12:58                         ` Xiongfeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).