From: dongli.zhang@oracle.com (Dongli Zhang)
Subject: A kernel warning when entering suspend
Date: Sat, 6 Apr 2019 23:31:51 +0800 [thread overview]
Message-ID: <5088f9f9-678d-4e67-daf0-c8f9629baeea@oracle.com> (raw)
In-Reply-To: <a72ed150-8210-2dad-fa14-423ae5fd4589@oracle.com>
On 04/06/2019 10:29 PM, Dongli Zhang wrote:
>
>
> On 04/05/2019 06:19 AM, Ming Lei wrote:
>> Hi Thomas,
>>
>> On Thu, Apr 04, 2019@06:18:16PM +0200, Thomas Gleixner wrote:
>>> On Thu, 4 Apr 2019, Dongli Zhang wrote:
>>>>
>>>>
>>>> On 04/04/2019 04:55 PM, Ming Lei wrote:
>>>>> On Thu, Apr 04, 2019@08:23:59AM +0000, fin4478 fin4478 wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
>>>>>> If there is a patch to fix this, I can test it.
>>>>>>
>>>>>> [ 53.403033] PM: suspend entry (deep)
>>>>>> [ 53.403034] PM: Syncing filesystems ... done.
>>>>>> [ 53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>>>>> [ 53.405972] OOM killer disabled.
>>>>>> [ 53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>>> [ 53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
>>>>>> [ 53.407491] ACPI Debug: "RRIO"
>>>>>> [ 53.407505] serial 00:03: disabled
>>>>>> [ 53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
>>>>>> [ 53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
>>>>>> [ 53.415065] sd 5:0:0:0: [sda] Stopping disk
>>>>>> [ 53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
>>>>>
>>>>> Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.
>>>>>
>>>>> irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
>>>>> can include at least one online CPU.
>>>>>
>>>> As the warning is triggered by below line 203,
>>>
>>> Yeah, that's obvious.
>>>
>>>> I can reproduce on purpose with qemu (and nvme) with some cpu offline. Here is
>>>> related qemu cmdline:
>>>>
>>>> -machine q35 -smp 2,maxcpus=4 \
>>>> -device nvme,drive=nvmedrive,serial=deadbeaf1 \
>>>> -drive file=test.img,if=none,id=nvmedrive \
>>>> -monitor stdio \
>>>>
>>>> As there are only 2 cpus online initially, line 203 may return >= nr_cpu_ids
>>>> during suspend/resume.
>>>>
>>>> 1. Boot VM with above qemu cmdline.
>>>>
>>>> 2. In VM, run "echo freeze > /sys/power/state" and VM (with nvme) will suspend.
>>>>
>>>> 3. Run "system_powerdown" in qemu monitor to wakeup VM.
>>>>
>>>> There will be below logs:
>>>>
>>>> [ 50.956594] PM: suspend entry (s2idle)
>>>> [ 50.956595] PM: Syncing filesystems ... done.
>>>> [ 51.112756] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>>> [ 51.113772] OOM killer disabled.
>>>> [ 51.113772] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>> [ 51.114776] printk: Suspending console(s) (use no_console_suspend to debug)
>>>> [ 51.115549] e1000e: EEE TX LPI TIMER: 00000000
>>>> [ 51.118829] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>>> [ 51.119221] WARNING: CPU: 0 PID: 2517 at kernel/irq/chip.c:210
>>>> irq_startup+0xdc/0xf0
>>>> [ 51.119222] Modules linked in:
>>>> [ 51.119223] CPU: 0 PID: 2517 Comm: kworker/u8:9 Not tainted 5.1.0-rc2+ #1
>>>> [ 51.119224] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>>>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>>>> [ 51.119225] Workqueue: events_unbound async_run_entry_fn
>>>> [ 51.119227] RIP: 0010:irq_startup+0xdc/0xf0
>>>> [ 51.119228] Code: 31 f6 4c 89 ef e8 24 2d 00 00 85 c0 75 20 48 89 ee 31 d2 4c
>>>> 89 ef e8 d3 d3 ff ff 48 89 df e8 cb fe ff ff 89 c5 e9 55 ff ff ff <0f> 0b eb b7
>>>> 0f 0b eb b3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 53
>>>> [ 51.119229] RSP: 0018:ffffb7c6808a7c30 EFLAGS: 00010002
>>>> [ 51.119229] RAX: 0000000000000040 RBX: ffff98933701b400 RCX: 0000000000000040
>>>> [ 51.119230] RDX: 0000000000000040 RSI: ffffffff98340d60 RDI: ffff98933701b418
>>>> [ 51.119230] RBP: ffff98933701b418 R08: 0000000000000000 R09: 0000000000000000
>>>> [ 51.119231] R10: 0000000000000000 R11: ffffffff9824b1a8 R12: 0000000000000001
>>>> [ 51.119231] R13: 0000000000000001 R14: ffff989336959800 R15: ffff989336d2c900
>>>> [ 51.119232] FS: 0000000000000000(0000) GS:ffff989338200000(0000)
>>>> knlGS:0000000000000000
>>>> [ 51.119234] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 51.119235] CR2: 000055d6ae984288 CR3: 000000013360e000 CR4: 00000000000006f0
>>>> [ 51.119235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [ 51.119236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [ 51.119236] Call Trace:
>>>> [ 51.119244] ? __irq_get_desc_lock+0x46/0x80
>>>> [ 51.119245] enable_irq+0x41/0x80
>>>
>>> Why is this enabling an interrupt which is managed and bound to an offline
>>> CPU?
>>
>> I understand it is hard for driver to check if one interrupt is bound to any
>> online CPU because CPU hotplug may happen any time. Especially now
>> managed irq takes static affinity(mapping), almost every driver doesn't
>> handle CPU hotplug event.
>>
>
> To suspend, the driver would reclaim and should process all completed request
> from cq ring buffer, between disable_irq() and enable_irq() as from line 1096 to
> 1098.
>
> 1080 static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
> ...
> 1091 if (nvmeq->cq_vector == -1) {
> 1092 spin_lock(&nvmeq->cq_poll_lock);
> 1093 found = nvme_process_cq(nvmeq, &start, &end, tag);
> 1094 spin_unlock(&nvmeq->cq_poll_lock);
> 1095 } else {
> 1096 disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
> 1097 found = nvme_process_cq(nvmeq, &start, &end, tag);
> 1098 enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
> 1099 }
>
> As Ming mentioned, it is hard for driver to check if one interrupt is bound to
> any online CPU because CPU hotplug may happen any time.
>
> If there is way to know that, more specifically, if there is way to know if all
> cpus in the affinity of one io queue are offline, there is no need to
> disable_irq/enable_irq in order to call nvme_process_cq() for that io queue?
Eero pointed to this patch set: https://patchwork.kernel.org/patch/10708055/
I seems due to below which removes the fast path cqe_pending check.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b2a8a9f4b564c7d923597828d93cd1f69ce40e0
I am not able to reproduce anymore with below patch:
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d..90f6e80 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1083,6 +1083,9 @@ static int nvme_poll_irqdisable(struct nvme_queue *nvmeq,
unsigned int tag)
u16 start, end;
int found;
+ if (!nvme_cqe_pending(nvmeq))
+ return 0;
+
/*
* For a poll queue we need to protect against the polling thread
* using the CQ lock. For normal interrupt driven threads we have
Dongli Zhang
next prev parent reply other threads:[~2019-04-06 15:31 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-04 8:23 A kernel warning when entering suspend fin4478 fin4478
2019-04-04 8:55 ` Ming Lei
2019-04-04 11:14 ` Thomas Gleixner
2019-12-23 14:52 ` Maxim Levitsky
2020-01-16 13:58 ` Maxim Levitsky
2019-04-04 15:37 ` Dongli Zhang
2019-04-04 16:18 ` Thomas Gleixner
2019-04-04 22:19 ` Ming Lei
2019-04-04 22:29 ` Keith Busch
2019-04-04 22:50 ` Ming Lei
2019-04-05 11:35 ` Thomas Gleixner
2019-04-06 14:29 ` Dongli Zhang
2019-04-06 15:31 ` Dongli Zhang [this message]
2019-04-06 16:12 ` fin4478 fin4478
2019-04-08 15:07 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5088f9f9-678d-4e67-daf0-c8f9629baeea@oracle.com \
--to=dongli.zhang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).