All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: John Garry <john.garry@huawei.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	chenxiang <chenxiang66@hisilicon.com>,
	Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"liuqi (BA)" <liuqi115@huawei.com>,
	"David Decotigny" <decot@google.com>
Subject: Re: PCI MSI issue for maxcpus=1
Date: Thu, 10 Mar 2022 20:58:23 +0800	[thread overview]
Message-ID: <b2da939e-9bf0-72a0-8566-9efa60310159@huawei.com> (raw)
In-Reply-To: <87o82eyxmz.wl-maz@kernel.org>



On 2022/3/10 17:17, Marc Zyngier wrote:
> On Thu, 10 Mar 2022 03:19:52 +0000,
> Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
>>
>> Hi,
>>
>> On 2022/3/8 22:18, Marc Zyngier wrote:
>>> On Tue, 08 Mar 2022 03:57:33 +0000,
>>> Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 2022/3/7 21:48, John Garry wrote:
>>>>> Hi Marc,
>>>>>
>>>>>>
>>>>>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>>>>>> index 2bdfce5edafd..97e9eb9aecc6 100644
>>>>>> --- a/kernel/irq/msi.c
>>>>>> +++ b/kernel/irq/msi.c
>>>>>> @@ -823,6 +823,19 @@ static int msi_init_virq(struct irq_domain *domain, int
>>>>>> virq, unsigned int vflag
>>>>>>       if (!(vflags & VIRQ_ACTIVATE))
>>>>>>           return 0;
>>>>>>   +    if (!(vflags & VIRQ_CAN_RESERVE)) {
>>>>>> +        /*
>>>>>> +         * If the interrupt is managed but no CPU is available
>>>>>> +         * to service it, shut it down until better times.
>>>>>> +         */
>>>>>> +        if (irqd_affinity_is_managed(irqd) &&
>>>>>> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>>>>>> +                    cpu_online_mask)) {
>>>>>> +            irqd_set_managed_shutdown(irqd);
>>>>>> +            return 0;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>       ret = irq_domain_activate_irq(irqd, vflags & VIRQ_CAN_RESERVE);
>>>>>>       if (ret)
>>>>>>           return ret;
>>>>>>
>>>>
>>>> I applied the above modification and add kernel parameter 'maxcpus=1'. It can
>>>> boot successfully on D06.
>>>>
>>>> Then I remove 'maxcpus=1' and add 'nohz_full=5-127
>>>> isolcpus=nohz,domain,managed_irq,5-127'. The 'effective_affinity' of the kernel
>>>> managed irq is not correct.
>>>> [root@localhost wxf]# cat /proc/interrupts | grep 350
>>>> 350:          0          0          0          0          0        522
>>>> (ignored info)
>>>> 0          0                  0   ITS-MSI 60882972 Edge      hisi_sas_v3_hw cq
>>>> [root@localhost wxf]# cat /proc/irq/350/smp_affinity
>>>> 00000000,00000000,00000000,000000ff
>>>> [root@localhost wxf]# cat /proc/irq/350/effective_affinity
>>>> 00000000,00000000,00000000,00000020
>>>>
>>>> Then I apply the following modification.
>>>> Refer to https://lore.kernel.org/all/87a6fl8jgb.wl-maz@kernel.org/
>>>> The 'effective_affinity' is correct now.
>>>>
>>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>>> index eb0882d15366..0cea46bdaf99 100644
>>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>>> @@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d,
>>>>
>>>>  		cpu = cpumask_pick_least_loaded(d, tmpmask);
>>>>  	} else {
>>>> -		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
>>>> +		cpumask_copy(tmpmask, aff_mask);
>>>>
>>>>  		/* If we cannot cross sockets, limit the search to that node */
>>>>  		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&
>>>>
>>>> Then I add both kernel parameters.
>>>> nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127 maxcpus=1
>>>> It crashed with the following message.
>>>> [   51.813803][T21132] cma_alloc: 29 callbacks suppressed
>>>> [   51.813809][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>>>> pages, ret: -12
>>>> [   51.897537][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
>>>> pages, ret: -12
>>>> [   52.014432][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>>>> pages, ret: -12
>>>> [   52.067313][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 8
>>>> pages, ret: -12
>>>> [   52.180011][T21132] cma: cma_alloc: reserved: alloc failed, req-size: 4
>>>> pages, ret: -12
>>>> [   52.270846][    T0] Detected VIPT I-cache on CPU1
>>>> [   52.275541][    T0] GICv3: CPU1: found redistributor 80100 region
>>>> 1:0x00000000ae140000
>>>> [   52.283425][    T0] GICv3: CPU1: using allocated LPI pending table
>>>> @0x00000040808b0000
>>>> [   52.291381][    T0] CPU1: Booted secondary processor 0x0000080100 [0x481fd010]
>>>> [   52.432971][    T0] Detected VIPT I-cache on CPU101
>>>> [   52.437914][    T0] GICv3: CPU101: found redistributor 390100 region
>>>> 101:0x00002000aa240000
>>>> [   52.446233][    T0] GICv3: CPU101: using allocated LPI pending table
>>>> @0x0000004081170000
>>>> [   52.ULL pointer dereference at virtual address 00000000000000a0
>>>
>>> This is pretty odd. If you passed maxcpus=1, how comes CPU1 and 101
>>> are booting right from the beginning? Or is it userspace doing that?
>>
>> Yes, it is the userspace will online all the CPUs.
>>
>>>
>>>> [   52.471539][T24563] Mem abort info:
>>>> [   52.475011][T24563]   ESR = 0x96000044
>>>> [   52.478742][T24563]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [   52.484721][T24563]   SET = 0, FnV = 0
>>>> [   52.488451][T24563]   EA = 0, S1PTW = 0
>>>> [   52.492269][T24563]   FSC = 0x04: level 0 translation fault
>>>> [   52.497815][T24563] Data abort info:
>>>> [   52.501374][T24563]   ISV = 0, ISS = 0x00000044
>>>> [   52.505884][T24563]   CM = 0, WnR = 1
>>>> [   52.509530][T24563] [00000000000000a0] user address but active_mm is swapper
>>>> [   52.516548][T24563] Internal error: Oops: 96000044 [#1] SMP
>>>> [   52.522096][T24563] Modules linked in: ghash_ce sha2_ce sha256_arm64 sha1_ce
>>>> sbsa_gwdt hns_roce_hw_v2 vfat fat ib_uverbs ib_core ipmi_ssif sg acpi_ipmi
>>>> ipmi_si ipmi_devintf ipmi_msghandler hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu
>>>> hisi_uncore_l3c_pmu hisi_uncore_pmu ip_tables xfs libcrc32c sd_mod realtek hclge
>>>> nvme hisi_sas_v3_hw nvme_core hisi_sas_main t10_pi libsas ahci libahci hns3
>>>> scsi_transport_sas libata hnae3 i2c_designware_platform i2c_designware_core nfit
>>>> libnvdimm dm_mirror dm_region_hash dm_log dm_mod
>>>> [   52.567181][T24563] CPU: 101 PID: 24563 Comm: cpuhp/101 Not tainted
>>>> 5.17.0-rc7+ #5
>>>> [   52.574716][T24563] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD,
>>>> BIOS 1.79 08/21/2021
>>>> [   52.583547][T24563] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS
>>>> BTYPE=--)
>>>> [   52.591170][T24563] pc : lpi_update_config+0xe0/0x300
>>>
>>> Can you please feed this to scripts/faddr2line? All this should do is
>>> to update the property table, which is global. If you get a NULL
>>> pointer there, something is really bad.
>>
>> I found the CallTrace is the same with the following one I got.
>> This one: https://lkml.org/lkml/2022/1/25/529.
>>
>>         gic_write_lpir(val, rdbase + GICR_INVLPIR);
>>     56bc:       91028001        add     x1, x0, #0xa0
>>     56c0:       f9000039        str     x25, [x1]
>> The fault instruction is 'str     x25, [x1]'. I think it may be because the
>> 'rdbase' is null.
> 
> Ah, you're of course using direct invalidation, which is why I
> couldn't get this to explode in a VM. Maybe I should add support for
> this in KVM, if only as an option.
> 
> I'll try and work out what goes wrong.

Thanks a lot!

I add some debug info and got the dmesg as belows. Hope it be helpful.
It seems that irq_to_cpuid_lock() returns an offline CPU and the crash occurs.


diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index cd77297..f9fc953 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1453,6 +1453,7 @@ static void direct_lpi_inv(struct irq_data *d)

        /* Target the redistributor this LPI is currently routed to */
        cpu = irq_to_cpuid_lock(d, &flags);
+       pr_info("direct_lpi_inv CPU%d current CPU%d\n", cpu, smp_processor_id());
        raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
        rdbase = per_cpu_ptr(gic_rdists->rdist, cpu)->rd_base;
        gic_write_lpir(val, rdbase + GICR_INVLPIR);


[   16.052692][ T2280] direct_lpi_inv CPU0 current CPU0
[   16.058914][  T336] hns3 0000:7d:00.0: hclge driver initialization finished.
[   16.066711][    T7] direct_lpi_inv CPU0 current CPU0
[   16.072089][ T2280] nvme nvme1: Shutdown timeout set to 8 seconds
[   16.080703][ T2280] direct_lpi_inv CPU0 current CPU0
[   16.087717][    T7] direct_lpi_inv CPU1 current CPU0
[   16.092663][    T7] Unable to handle kernel NULL pointer dereference at
virtual address 00000000000000a0
[   16.102097][    T7] Mem abort info:
[   16.105569][    T7]   ESR = 0x96000044
[   16.109301][    T7]   EC = 0x25: DABT (current EL), IL = 32 bits
[   16.115280][    T7]   SET = 0, FnV = 0
[   16.119012][    T7]   EA = 0, S1PTW = 0
[   16.122830][    T7]   FSC = 0x04: level 0 translation fault
[   16.128377][    T7] Data abort info:
[   16.131934][    T7]   ISV = 0, ISS = 0x00000044
[   16.136443][    T7]   CM = 0, WnR = 1
[   16.140089][    T7] user pgtable: 4k pages, 48-bit VAs, pgdp=000000409a00f000
[   16.147191][    T7] [00000000000000a0] pgd=0000000000000000, p4d=0000000000000000
[   16.154642][    T7] Internal error: Oops: 96000044 [#1] SMP
[   16.160189][    T7] Modules linked in: nvmec_designware_platform
i2c_designware_core nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[   16.183467][    T7] CPU: 0 PID: 7 Comm: kworker/u256:0 Not tainted 5.17.0-rc7+ #8
[   16.190916][    T7] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD,
BIOS 1.79 08/21/2021
[   16.199746][    T7] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[   16.205990][    T7] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[   16.213611][    T7] pc : lpi_update_config+0x10c/0x320
[   16.218729][    T7] lr : lpi_update_config+0xb4/0x320
[   16.223758][    T7] sp : ffff80000a9f3950
[   16.227748][    T7] x29: ffff80000a9f3950 x28: 0000000000000030 x27:
ffff0040a0b81680
[   16.235543][    T7] x26: 0000000000000000 x25: 0000000000000001 x24:
0000000000000000
[   16.243337][    T7] x23: 00000000000028bb x22: ffff8000095cf460 x21:
ffff004087612380
[   16.251131][    T7] x20: ffff8000095cec58 x19: ffff0040a076e600 x18:
0000000000000000
[   16.258925][    T7] x17: ffff807ef6793000 x16: ffff800008004000 x15:
ffff004087612ac8
[   16.266719][    T7] x14: 0000000000000000 x13: 205d375420202020 x12:
5b5d373137373830
[   16.274513][    T7] x11: ffff800009983388 x10: ffff8000098c3348 x9 :
ffff80000825c408
[   16.282306][    T7] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
0000000000000001
[   16.290100][    T7] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
ffff007effa56780
[   16.297894][    T7] x2 : 0000000000000001 x1 : 00000000000000a0 x0 :
0000000000000000
[   0x34/0x68
[   16.317920][    T7]  irq_chip_unmask_parent+0x20/0x28
[   16.322950][    T7]  its_unmask_msi_irq+0x24/0x30
[   16.327632][    T7]  unmask_irq.part.0+0x2c/0x48
[   16.332228][    T7]  irq_enable+0x70/0x80
[   16.336220][    T7]  __irq_startup+0x7c/0xa8
[   16.340472][    T7]  irq_startup+0x134/0x158
[   16.344724][    T7]  __setup_irq+0x808/0x940
[   16.348973][    T7]  request_threaded_irq+0xf0/0x1a8
[   16.353915][    T7]  pci_request_irq+0xbc/0x108
[   16.358426][    T7]  queue_request_irq+0x70/0x78 [nvme]
[   16.363629][    T7]  nvme_create_io_queues+0x208/0x368 [nvme]
[   16.369350][    T7]  nvme_reset_work+0x828/0xdd8 [nvme]
[   16.374552][    T7]  process_one_work+0x1dc/0x478
[   16.379236][    T7]  worker_thread+0x150/0x4f0
[   16.383660][    T7]  kthread+0xd0/0xe0
[   16.387393][    T7]  ret_from_fork+0x10/0x20
[   16.391648][    T7] Code: f9400280 8b000020 f9400400 91028001 (f9000037)
[   16.398404][    T7] ---[ end trace 0000000000000000 ]---

Thanks,
Xiongfeng

> 
> Thanks,
> 
> 	M.
> 

      parent reply	other threads:[~2022-03-10 12:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-05 11:23 PCI MSI issue for maxcpus=1 John Garry
2022-01-06 15:49 ` Marc Zyngier
2022-01-07 11:24   ` John Garry
2022-01-16 12:07     ` Marc Zyngier
2022-01-17  9:14       ` Marc Zyngier
2022-01-17 11:59         ` John Garry
2022-01-24 11:22           ` Marc Zyngier
2022-03-04 12:53           ` John Garry
2022-03-05 15:40             ` Marc Zyngier
2022-03-07 13:48               ` John Garry
2022-03-07 14:01                 ` Marc Zyngier
2022-03-07 14:03                   ` Marc Zyngier
2022-03-08  1:37                     ` David Decotigny
2022-03-08  3:57                 ` Xiongfeng Wang
     [not found]                   ` <87zgm0zfw7.wl-maz@kernel.org>
2022-03-10  3:19                     ` Xiongfeng Wang
     [not found]                       ` <87o82eyxmz.wl-maz@kernel.org>
2022-03-10 12:58                         ` Xiongfeng Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2da939e-9bf0-72a0-8566-9efa60310159@huawei.com \
    --to=wangxiongfeng2@huawei.com \
    --cc=chenxiang66@hisilicon.com \
    --cc=decot@google.com \
    --cc=john.garry@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuqi115@huawei.com \
    --cc=maz@kernel.org \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.