Re: PCI MSI issue for maxcpus=1

From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
To: John Garry <john.garry@huawei.com>, Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	chenxiang <chenxiang66@hisilicon.com>,
	Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"liuqi (BA)" <liuqi115@huawei.com>,
	"David Decotigny" <decot@google.com>
Subject: Re: PCI MSI issue for maxcpus=1
Date: Tue, 8 Mar 2022 11:57:33 +0800	[thread overview]
Message-ID: <645767eb-c5a5-cafa-eb1e-b8d999484ea8@huawei.com> (raw)
In-Reply-To: <452d97ed-459f-7936-99e4-600380608615@huawei.com>

Hi,

On 2022/3/7 21:48, John Garry wrote:
> Hi Marc,
> 
>>
>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>> index 2bdfce5edafd..97e9eb9aecc6 100644
>> --- a/kernel/irq/msi.c
>> +++ b/kernel/irq/msi.c
>> @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int
>> virq, unsigned int vflag
>>       if (!(vflags & VIRQ_ACTIVATE))
>>           return 0;
>>   +    if (!(vflags & VIRQ_CAN_RESERVE)) {
>> +        /*
>> +         * If the interrupt is managed but no CPU is available
>> +         * to service it, shut it down until better times.
>> +         */
>> +        if (irqd_affinity_is_managed(irqd) &&
>> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>> +                    cpu_online_mask)) {
>> +            irqd_set_managed_shutdown(irqd);
>> +            return 0;
>> +        }
>> +    }
>> +
>>       ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
>>       if (ret)
>>           return ret;
>>

I applied the above modification and add kernel parameter 'maxcpus=1'. It can
boot successfully on D06.

Then I remove 'maxcpus=1' and add 'nohz_full=5-127
isolcpus=nohz,domain,managed_irq,5-127'. The 'effective_affinity' of the kernel
managed irq is not correct.
[root@localhost wxf]# cat /proc/interrupts | grep 350
350:          0          0          0          0          0        522
(ignored info)
0          0                  0   ITS-MSI 60882972 Edge      hisi_sas_v3_hw cq
[root@localhost wxf]# cat /proc/irq/350/smp_affinity
00000000,00000000,00000000,000000ff
[root@localhost wxf]# cat /proc/irq/350/effective_affinity
00000000,00000000,00000000,00000020

Then I apply the following modification.
Refer to https://lore.kernel.org/all/87a6fl8jgb.wl-maz@kernel.org/
The 'effective_affinity' is correct now.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index eb0882d15366..0cea46bdaf99 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d,

 		cpu = cpumask_pick_least_loaded(d, tmpmask);
 	} else {
-		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+		cpumask_copy(tmpmask, aff_mask);

 		/* If we cannot cross sockets, limit the search to that node */
 		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&

Then I add both kernel parameters.
nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127 maxcpus=1
It crashed with the following message.
[   51.813803][T21132] cma_alloc: 29 callbacks suppressed
[   51.813809][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
pages, ret: -12
[   51.897537][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
pages, ret: -12
[   52.014432][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
pages, ret: -12
[   52.067313][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
pages, ret: -12
[   52.180011][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
pages, ret: -12
[   52.270846][    T0] Detected VIPT I-cache on CPU1
[   52.275541][    T0] GICv3: CPU1: found redistributor 80100 region
1:0x00000000ae140000
[   52.283425][    T0] GICv3: CPU1: using allocated LPI pending table
@0x00000040808b0000
[   52.291381][    T0] CPU1: Booted secondary processor 0x0000080100 [0x481fd010]
[   52.432971][    T0] Detected VIPT I-cache on CPU101
[   52.437914][    T0] GICv3: CPU101: found redistributor 390100 region
101:0x00002000aa240000
[   52.446233][    T0] GICv3: CPU101: using allocated LPI pending table
@0x0000004081170000
[   52.ULL pointer dereference at virtual address 00000000000000a0
[   52.471539][T24563] Mem abort info:
[   52.475011][T24563]   ESR = 0x96000044
[   52.478742][T24563]   EC = 0x25: DABT (current EL), IL = 32 bits
[   52.484721][T24563]   SET = 0, FnV = 0
[   52.488451][T24563]   EA = 0, S1PTW = 0
[   52.492269][T24563]   FSC = 0x04: level 0 translation fault
[   52.497815][T24563] Data abort info:
[   52.501374][T24563]   ISV = 0, ISS = 0x00000044
[   52.505884][T24563]   CM = 0, WnR = 1
[   52.509530][T24563] [00000000000000a0] user address but active_mm is swapper
[   52.516548][T24563] Internal error: Oops: 96000044 [#1] SMP
[   52.522096][T24563] Modules linked in: ghash_ce sha2_ce sha256_arm64 sha1_ce
sbsa_gwdt hns_roce_hw_v2 vfat fat ib_uverbs ib_core ipmi_ssif sg acpi_ipmi
ipmi_si ipmi_devintf ipmi_msghandler hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu
hisi_uncore_l3c_pmu hisi_uncore_pmu ip_tables xfs libcrc32c sd_mod realtek hclge
nvme hisi_sas_v3_hw nvme_core hisi_sas_main t10_pi libsas ahci libahci hns3
scsi_transport_sas libata hnae3 i2c_designware_platform i2c_designware_core nfit
libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[   52.567181][T24563] CPU: 101 PID: 24563 Comm: cpuhp/101 Not tainted
5.17.0-rc7+ #5
[   52.574716][T24563] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD,
BIOS 1.79 08/21/2021
[   52.583547][T24563] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[   52.591170][T24563] pc : lpi_update_config+0xe0/0x300
[   52.59620000ce6bb90 x28: 0000000000000000 x27: 0000000000000060
[   52.613021][T24563] x26: ffff20800798b818 x25: 0000000000002781 x24:
ffff80000962f460
[   52.620815][T24563] x23: 0000000000000000 x22: 0000000000000060 x21:
ffff80000962ec58
[   52.628610][T24563] x20: ffff20800633b540 x19: ffff208007946e00 x18:
0000000000000000
[   52.636404][T24563] x17: 3731313830343030 x16: 3030303078304020 x15:
0000000000000000
[   52.644199][T24563] x14: 0000000000000000 x13: 0000000000000000 x12:
0000000000000000
[   52.651993][T24563] x11: 0000000000000000 x10: 0000000000000000 x9 :
ffff80000867a99c
[   52.659788][T24563] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
ffff800008d3dda0
[   52.667582][T24563] x5 : ffff800028e00000 x4 : 0000000000000000 x3 :
ffff20be7f837780
[   52.675376][T24563] x2 : 0000000000000001 x1 : 00000000000000a0 x0 :
0000000000000000
[   52.683170][T24563] Call trace:
[   52.686298][T24563]  lpi_update_config+0xe0/0x300
[   52.690982][T24563]  its_unmask_irq+0x34/0x68
[   52.695318][T24563]  irq_chip_unmask_parent+0x20/0x28
[   52.700349][T24563]  its_unmask_msi_irq+0x24/0x30
[   52.705032][T24563]  unmask_irq.part.0+0x2c/0x48
[   52.709630][T24563]  irq_enable+0x70/0x80
[   52.713623][T24563]  __irq_startup+0x7c/0xa8
[   52.717875][T24563]  irq_startup+0x134/0x158
[   52.722127][T24563]  irq_affinity_online_cpu+0x1c0/0x210
[   52.727415][T24563]  cpuhp_invoke_callback+0x14c/0x590
[   52.732533][T24563]  cpuhp_thread_fun+0xd4/0x188
[   52.737130][T24563]   52.749890][T24563] Code: f94002a0 8b000020 f9400400
91028001 (f9000039)
[   52.756649][T24563] ---[ end trace 0000000000000000 ]---
[   52.787287][T24563] Kernel panic - not syncing: Oops: Fatal exception
[   52.793701][T24563] SMP: stopping secondary CPUs
[   52.798309][T24563] Kernel Offset: 0xb0000 from 0xffff800008000000
[   52.804462][T24563] PHYS_OFFSET: 0x0
[   52.808021][T24563] CPU features: 0x00,00000803,46402c40
[   52.813308][T24563] Memory Limit: none
[   52.841424][T24563] ---[ end Kernel panic - not syncing: Oops: Fatal
exception ]---

Then I only add kernel parameter 'maxcpus=1. It also crash with the same Call Trace.

Then I add the cpu_online_mask check like below. Add both kernel parameters. It
won't crash now.
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index d25b7a864bbb..17c15d3b2784 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1624,7 +1624,10 @@ static int its_select_cpu(struct irq_data *d,

 		cpu = cpumask_pick_least_loaded(d, tmpmask);
 	} else {
-		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+		cpumask_and(tmpmask, aff_mask, cpu_online_mask);
+		if (cpumask_empty(tmpmask))
+			cpumask_and(tmpmask, irq_data_get_affinity_mask(d),
+				    cpu_online_mask);

 		/* If we cannot cross sockets, limit the search to that node */
 		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&

Thanks,
Xiongfeng


> 
> Yeah, that seems to solve the issue. I will test it a bit more.
> 
> We need to check the isolcpus cmdline issue as well - wang xiongfeng, please
> assist here. I assume that this feature just never worked for arm64 since it was
> added.
> 
>> With this in place, I get the following results (VM booted with 4
>> vcpus and maxcpus=1, the virtio device is using managed interrupts):
>>
>> root@debian:~# cat /proc/interrupts
>>             CPU0
>>   10:       2298     GICv3  27 Level     arch_timer
>>   12:         84     GICv3  33 Level     uart-pl011
>>   49:          0     GICv3  41 Edge      ACPI:Ged
>>   50:          0   ITS-MSI 16384 Edge      virtio0-config
>>   51:       2088   ITS-MSI 16385 Edge      virtio0-req.0
>>   52:          0   ITS-MSI 16386 Edge      virtio0-req.1
>>   53:          0   ITS-MSI 16387 Edge      virtio0-req.2
>>   54:          0   ITS-MSI 16388 Edge      virtio0-req.3
>>   55:      11641   ITS-MSI 32768 Edge      xhci_hcd
>>   56:          0   ITS-MSI 32769 Edge      xhci_hcd
>> IPI0:         0       Rescheduling interrupts
>> IPI1:         0       Function call interrupts
>> IPI2:         0       CPU stop interrupts
>> IPI3:         0       CPU stop (for crash dump) interrupts
>> IPI4:         0       Timer broadcast interrupts
>> IPI5:         0       IRQ work interrupts
>> IPI6:         0       CPU wake-up interrupts
>> Err:          0
>> root@debian:~# echo 1 >/sys/devices/system/cpu/cpu2/online
>> root@debian:~# cat /proc/interrupts
>>             CPU0       CPU2
>>   10:       2530         90     GICv3  27 Level     arch_timer
>>   12:        103          0     GICv3  33 Level     uart-pl011
>>   49:          0          0     GICv3  41 Edge      ACPI:Ged
>>   50:          0          0   ITS-MSI 16384 Edge      virtio0-config
>>   51:       2097          0   ITS-MSI 16385 Edge      virtio0-req.0
>>   52:          0          0   ITS-MSI 16386 Edge      virtio0-req.1
>>   53:          0         12   ITS-MSI 16387 Edge      virtio0-req.2
>>   54:          0          0   ITS-MSI 16388 Edge      virtio0-req.3
>>   55:      13487          0   ITS-MSI 32768 Edge      xhci_hcd
>>   56:          0          0   ITS-MSI 32769 Edge      xhci_hcd
>> IPI0:        38         45       Rescheduling interrupts
>> IPI1:         3          3       Function call interrupts
>> IPI2:         0          0       CPU stop interrupts
>> IPI3:         0          0       CPU stop (for crash dump) interrupts
>> IPI4:         0          0       Timer broadcast interrupts
>> IPI5:         0          0       IRQ work interrupts
>> IPI6:         0          0       CPU wake-up interrupts
>> Err:          0
>>
> 
> Out of interest, is the virtio managed interrupts support just in your sandbox?
> You did mention earlier in the thread that you were considering adding this
> feature.
> 
> Thanks,
> John
> .