All of lore.kernel.org
 help / color / mirror / Atom feed
* A kernel warning when entering suspend
@ 2019-04-04  8:23 fin4478 fin4478
  2019-04-04  8:55 ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: fin4478 fin4478 @ 2019-04-04  8:23 UTC (permalink / raw)


Hi,

I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
If there is a patch to fix this, I can test it.

[   53.403033] PM: suspend entry (deep)
[   53.403034] PM: Syncing filesystems ... done.
[   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   53.405972] OOM killer disabled.
[   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
[   53.407491] ACPI Debug:  "RRIO"
[   53.407505] serial 00:03: disabled
[   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
[   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
[   53.415065] sd 5:0:0:0: [sda] Stopping disk
[   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
[   53.428944] Modules linked in: rtl2832_sdr videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev cmac r820t cxd2841er rtl2832 i2c_mux bnep btusb btrtl btbcm btintel bluetooth dvb_usb_rtl28xxu dvb_usb_v2 dvb_core ecdh_generic input_leds cfg80211 8021q amdgpu snd_hda_codec_realtek snd_hda_codec_generic mfd_core snd_hda_codec_hdmi chash kvm_amd amd_iommu_v2 gpu_sched snd_hda_intel kvm snd_hda_codec snd_hda_core irqbypass k10temp nft_masq mq_deadline nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink snd_aloop snd_pcm ip_tables x_tables ipv6 r8169 realtek xhci_pci nvme xhci_hcd nvme_core
[   53.428961] CPU: 10 PID: 3127 Comm: kworker/u64:22 Not tainted 5.1.0-rc3 #1
[   53.428962] Hardware name: System manufacturer System Product Name/PRIME B350M-K, BIOS 4602 03/07/2019
[   53.428964] Workqueue: events_unbound async_run_entry_fn
[   53.428966] RIP: 0010:irq_startup+0xd6/0xe0
[   53.428967] Code: 31 f6 4c 89 ef e8 1a 2a 00 00 85 c0 75 20 48 89 ee 31 d2 4c 89 ef e8 39 d6 ff ff 48 89 df e8 d1 fe ff ff 89 c5 e9 57 ff ff ff <0f> 0b eb b9 0f 0b eb b5 66 90 55 48 89 fd 53 48 8b 47 38 89 f3 8b
[   53.428967] RSP: 0018:ffffc900012c7c90 EFLAGS: 00010002
[   53.428968] RAX: 0000000000000040 RBX: ffff888214b2aa00 RCX: 0000000000000040
[   53.428969] RDX: 0000000000000000 RSI: ffffffff822a5e20 RDI: ffff888214b2aa18
[   53.428969] RBP: ffff888214b2aa18 R08: 0000000000000000 R09: ffff888215c008a8
[   53.428970] R10: 0000000000000000 R11: ffffffff822396a8 R12: 0000000000000001
[   53.428970] R13: 0000000000000001 R14: 0000000000000000 R15: ffff88821413e800
[   53.428971] FS:  0000000000000000(0000) GS:ffff888216880000(0000) knlGS:0000000000000000
[   53.428971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.428972] CR2: 000055ed5a3d2486 CR3: 00000001d83aa000 CR4: 00000000003406e0
[   53.428972] Call Trace:
[   53.428975]  enable_irq+0x41/0x80
[   53.428978]  nvme_poll_irqdisable+0xd4/0x2d0 [nvme]
[   53.428980]  __nvme_disable_io_queues+0x1ae/0x1e0 [nvme]
[   53.428981]  ? nvme_del_queue_end+0x20/0x20 [nvme]
[   53.428982]  nvme_dev_disable+0x1b6/0x1d0 [nvme]
[   53.428984]  nvme_suspend+0x11/0x20 [nvme]
[   53.428986]  pci_pm_suspend+0x6b/0x1b0
[   53.428987]  ? pci_pm_suspend_noirq+0x270/0x270
[   53.428989]  dpm_run_callback.isra.20+0xf/0x30
[   53.428990]  __device_suspend+0xdd/0x2f0
[   53.428991]  async_suspend+0x15/0x90
[   53.428992]  async_run_entry_fn+0x2c/0xd0
[   53.428994]  process_one_work+0x1c9/0x360
[   53.428995]  worker_thread+0x28/0x3c0
[   53.428996]  ? process_one_work+0x360/0x360
[   53.428997]  kthread+0x106/0x120
[   53.428998]  ? kthread_create_on_node+0x60/0x60
[   53.429000]  ret_from_fork+0x1f/0x30
[   53.429001] ---[ end trace 57f482be77daad27 ]---
[   53.764133] ACPI: Preparing to enter system sleep state S3
[   54.073201] ACPI Debug:  "SIOS"

I have  the Asgard 256GB nvme drive and using a non debug 1000Hz timer
kernel from kernel.org. My System: 

Kernel: 5.1.0-rc3 x86_64 bits: 64 
Desktop: Xfce 4.13.2 Distro: Debian GNU/Linux buster/sid 
Machine:   Type: Desktop Mobo: ASUSTeK model: PRIME B350M-K
v: Rev X.0x serial: <root required> UEFI [Legacy]: American Megatrends
v: 4602 date: 03/07/2019 
CPU:       6-Core: AMD Ryzen 5 1600 type: MT MCP speed: 3392 MHz 
Graphics:  Device-1: Advanced Micro Devices
[AMD/ATI] Ellesmere [Radeon RX 470/480] driver: amdgpu v: kernel
Display: x11 server: X.Org 1.20.4 driver: amdgpu resolution:
3840x2160~60Hz OpenGL: renderer: Radeon RX 570 Series (POLARIS10 DRM
3.30.0 5.1.0-rc3 LLVM 9.0.0) v: 4.5 Mesa 19.1.0-devel - padoka PPA
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit
Ethernet driver: r8169 Drives:    Local Storage: total: 476.95 GiB
used: 282.82 GiB (59.3%) Info:      Processes: 235 Uptime: 26m Memory:
7.79 GiB used: 1.29 GiB (16.6%) Shell: bash inxi: 3.0.32 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-04  8:23 A kernel warning when entering suspend fin4478 fin4478
@ 2019-04-04  8:55 ` Ming Lei
  2019-04-04 11:14   ` Thomas Gleixner
  2019-04-04 15:37   ` Dongli Zhang
  0 siblings, 2 replies; 17+ messages in thread
From: Ming Lei @ 2019-04-04  8:55 UTC (permalink / raw)


On Thu, Apr 04, 2019@08:23:59AM +0000, fin4478 fin4478 wrote:
> Hi,
> 
> I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
> If there is a patch to fix this, I can test it.
> 
> [   53.403033] PM: suspend entry (deep)
> [   53.403034] PM: Syncing filesystems ... done.
> [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [   53.405972] OOM killer disabled.
> [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
> [   53.407491] ACPI Debug:  "RRIO"
> [   53.407505] serial 00:03: disabled
> [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
> [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
> [   53.415065] sd 5:0:0:0: [sda] Stopping disk
> [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0

Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.

irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
can include at least one online CPU.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-04  8:55 ` Ming Lei
@ 2019-04-04 11:14   ` Thomas Gleixner
  2019-12-23 14:52     ` Maxim Levitsky
  2019-04-04 15:37   ` Dongli Zhang
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2019-04-04 11:14 UTC (permalink / raw)


On Thu, 4 Apr 2019, Ming Lei wrote:

> On Thu, Apr 04, 2019@08:23:59AM +0000, fin4478 fin4478 wrote:
> > Hi,
> > 
> > I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
> > If there is a patch to fix this, I can test it.
> > 
> > [   53.403033] PM: suspend entry (deep)
> > [   53.403034] PM: Syncing filesystems ... done.
> > [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
> > [   53.405972] OOM killer disabled.
> > [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
> > [   53.407491] ACPI Debug:  "RRIO"
> > [   53.407505] serial 00:03: disabled
> > [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
> > [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
> > [   53.415065] sd 5:0:0:0: [sda] Stopping disk
> > [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
> 
> Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.

Why?

> irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
> can include at least one online CPU.

Then why is it tried to start up an interrupt which has no online cpu in
the mask? I'm missing context obviously.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-04  8:55 ` Ming Lei
  2019-04-04 11:14   ` Thomas Gleixner
@ 2019-04-04 15:37   ` Dongli Zhang
  2019-04-04 16:18     ` Thomas Gleixner
  1 sibling, 1 reply; 17+ messages in thread
From: Dongli Zhang @ 2019-04-04 15:37 UTC (permalink / raw)




On 04/04/2019 04:55 PM, Ming Lei wrote:
> On Thu, Apr 04, 2019@08:23:59AM +0000, fin4478 fin4478 wrote:
>> Hi,
>>
>> I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
>> If there is a patch to fix this, I can test it.
>>
>> [   53.403033] PM: suspend entry (deep)
>> [   53.403034] PM: Syncing filesystems ... done.
>> [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
>> [   53.405972] OOM killer disabled.
>> [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
>> [   53.407491] ACPI Debug:  "RRIO"
>> [   53.407505] serial 00:03: disabled
>> [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
>> [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
>> [   53.415065] sd 5:0:0:0: [sda] Stopping disk
>> [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
> 
> Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.
> 
> irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
> can include at least one online CPU.
> 


As the warning is triggered by below line 203,

194 __irq_startup_managed(struct irq_desc *desc, struct cpumask *aff, bool force)
... ...
203         if (cpumask_any_and(aff, cpu_online_mask) >= nr_cpu_ids) {
204                 /*
205                  * Catch code which fiddles with enable_irq() on a managed
206                  * and potentially shutdown IRQ. Chained interrupt
207                  * installment or irq auto probing should not happen on
208                  * managed irqs either.
209                  */
210                 if (WARN_ON_ONCE(force))
211                         return IRQ_STARTUP_ABORT;
212                 /*
213                  * The interrupt was requested, but there is no online CPU
214                  * in it's affinity mask. Put it into managed shutdown
215                  * state and let the cpu hotplug mechanism start it up once
216                  * a CPU in the mask becomes available.
217                  */
218                 return IRQ_STARTUP_ABORT;
219         }


I can reproduce on purpose with qemu (and nvme) with some cpu offline. Here is
related qemu cmdline:

-machine q35 -smp 2,maxcpus=4 \
-device nvme,drive=nvmedrive,serial=deadbeaf1 \
-drive file=test.img,if=none,id=nvmedrive \
-monitor stdio \

As there are only 2 cpus online initially, line 203 may return >= nr_cpu_ids
during suspend/resume.

1. Boot VM with above qemu cmdline.

2. In VM, run "echo freeze > /sys/power/state" and VM (with nvme) will suspend.

3. Run "system_powerdown" in qemu monitor to wakeup VM.

There will be below logs:

[   50.956594] PM: suspend entry (s2idle)
[   50.956595] PM: Syncing filesystems ... done.
[   51.112756] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   51.113772] OOM killer disabled.
[   51.113772] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   51.114776] printk: Suspending console(s) (use no_console_suspend to debug)
[   51.115549] e1000e: EEE TX LPI TIMER: 00000000
[   51.118829] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   51.119221] WARNING: CPU: 0 PID: 2517 at kernel/irq/chip.c:210
irq_startup+0xdc/0xf0
[   51.119222] Modules linked in:
[   51.119223] CPU: 0 PID: 2517 Comm: kworker/u8:9 Not tainted 5.1.0-rc2+ #1
[   51.119224] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   51.119225] Workqueue: events_unbound async_run_entry_fn
[   51.119227] RIP: 0010:irq_startup+0xdc/0xf0
[   51.119228] Code: 31 f6 4c 89 ef e8 24 2d 00 00 85 c0 75 20 48 89 ee 31 d2 4c
89 ef e8 d3 d3 ff ff 48 89 df e8 cb fe ff ff 89 c5 e9 55 ff ff ff <0f> 0b eb b7
0f 0b eb b3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 53
[   51.119229] RSP: 0018:ffffb7c6808a7c30 EFLAGS: 00010002
[   51.119229] RAX: 0000000000000040 RBX: ffff98933701b400 RCX: 0000000000000040
[   51.119230] RDX: 0000000000000040 RSI: ffffffff98340d60 RDI: ffff98933701b418
[   51.119230] RBP: ffff98933701b418 R08: 0000000000000000 R09: 0000000000000000
[   51.119231] R10: 0000000000000000 R11: ffffffff9824b1a8 R12: 0000000000000001
[   51.119231] R13: 0000000000000001 R14: ffff989336959800 R15: ffff989336d2c900
[   51.119232] FS:  0000000000000000(0000) GS:ffff989338200000(0000)
knlGS:0000000000000000
[   51.119234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   51.119235] CR2: 000055d6ae984288 CR3: 000000013360e000 CR4: 00000000000006f0
[   51.119235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   51.119236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   51.119236] Call Trace:
[   51.119244]  ? __irq_get_desc_lock+0x46/0x80
[   51.119245]  enable_irq+0x41/0x80
[   51.119247]  nvme_poll_irqdisable+0x122/0x390
[   51.119249]  ? wake_up_q+0x70/0x70
[   51.119251]  __nvme_disable_io_queues+0x167/0x1b0
[   51.119252]  ? nvme_del_queue_end+0x20/0x20
[   51.119254]  nvme_dev_disable+0x1b7/0x1d0
[   51.119255]  nvme_suspend+0x11/0x20
[   51.119257]  pci_pm_suspend+0x7a/0x150
[   51.119258]  ? pci_pm_freeze+0xd0/0xd0
[   51.119259]  dpm_run_callback+0x3c/0x160
[   51.119260]  __device_suspend+0x117/0x450
[   51.119261]  async_suspend+0x15/0x90
[   51.119262]  async_run_entry_fn+0x32/0xe0
[   51.119264]  process_one_work+0x14d/0x3e0
[   51.119265]  worker_thread+0x44/0x410
[   51.119267]  ? create_worker+0x190/0x190
[   51.119268]  kthread+0xf3/0x130
[   51.119270]  ? kthread_destroy_worker+0x40/0x40
[   51.119271]  ret_from_fork+0x35/0x40
[   51.119273] ---[ end trace 7069b223b9a2c9c9 ]---
[   51.139379] sd 0:0:0:0: [sda] Stopping disk
[   59.121791] sd 0:0:0:0: [sda] Starting disk
[   59.124066] nvme nvme0: 4/0/0 default/read/poll queues
[   59.426226] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   59.426750] ata6: SATA link down (SStatus 0 SControl 300)
[   59.427040] ata2: SATA link down (SStatus 0 SControl 300)
[   59.427307] ata5: SATA link down (SStatus 0 SControl 300)
[   59.427569] ata4: SATA link down (SStatus 0 SControl 300)
[   59.428226] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   59.428833] ata3.00: configured for UDMA/100
[   59.429420] ata1.00: configured for UDMA/100
[   59.430248] OOM killer enabled.
[   59.430249] Restarting tasks ... done.
[   59.439243] PM: suspend exit
[   61.719297] e1000e: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: Rx/Tx

I cannot reproduce with virtio-blk in similar situation.

Dongli Zhang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-04 15:37   ` Dongli Zhang
@ 2019-04-04 16:18     ` Thomas Gleixner
  2019-04-04 22:19       ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2019-04-04 16:18 UTC (permalink / raw)


On Thu, 4 Apr 2019, Dongli Zhang wrote:
> 
> 
> On 04/04/2019 04:55 PM, Ming Lei wrote:
> > On Thu, Apr 04, 2019@08:23:59AM +0000, fin4478 fin4478 wrote:
> >> Hi,
> >>
> >> I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
> >> If there is a patch to fix this, I can test it.
> >>
> >> [   53.403033] PM: suspend entry (deep)
> >> [   53.403034] PM: Syncing filesystems ... done.
> >> [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
> >> [   53.405972] OOM killer disabled.
> >> [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> >> [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
> >> [   53.407491] ACPI Debug:  "RRIO"
> >> [   53.407505] serial 00:03: disabled
> >> [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
> >> [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
> >> [   53.415065] sd 5:0:0:0: [sda] Stopping disk
> >> [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
> > 
> > Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.
> > 
> > irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
> > can include at least one online CPU.
> > 
> As the warning is triggered by below line 203,

Yeah, that's obvious.

> I can reproduce on purpose with qemu (and nvme) with some cpu offline. Here is
> related qemu cmdline:
> 
> -machine q35 -smp 2,maxcpus=4 \
> -device nvme,drive=nvmedrive,serial=deadbeaf1 \
> -drive file=test.img,if=none,id=nvmedrive \
> -monitor stdio \
> 
> As there are only 2 cpus online initially, line 203 may return >= nr_cpu_ids
> during suspend/resume.
> 
> 1. Boot VM with above qemu cmdline.
> 
> 2. In VM, run "echo freeze > /sys/power/state" and VM (with nvme) will suspend.
> 
> 3. Run "system_powerdown" in qemu monitor to wakeup VM.
> 
> There will be below logs:
> 
> [   50.956594] PM: suspend entry (s2idle)
> [   50.956595] PM: Syncing filesystems ... done.
> [   51.112756] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [   51.113772] OOM killer disabled.
> [   51.113772] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   51.114776] printk: Suspending console(s) (use no_console_suspend to debug)
> [   51.115549] e1000e: EEE TX LPI TIMER: 00000000
> [   51.118829] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [   51.119221] WARNING: CPU: 0 PID: 2517 at kernel/irq/chip.c:210
> irq_startup+0xdc/0xf0
> [   51.119222] Modules linked in:
> [   51.119223] CPU: 0 PID: 2517 Comm: kworker/u8:9 Not tainted 5.1.0-rc2+ #1
> [   51.119224] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [   51.119225] Workqueue: events_unbound async_run_entry_fn
> [   51.119227] RIP: 0010:irq_startup+0xdc/0xf0
> [   51.119228] Code: 31 f6 4c 89 ef e8 24 2d 00 00 85 c0 75 20 48 89 ee 31 d2 4c
> 89 ef e8 d3 d3 ff ff 48 89 df e8 cb fe ff ff 89 c5 e9 55 ff ff ff <0f> 0b eb b7
> 0f 0b eb b3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 53
> [   51.119229] RSP: 0018:ffffb7c6808a7c30 EFLAGS: 00010002
> [   51.119229] RAX: 0000000000000040 RBX: ffff98933701b400 RCX: 0000000000000040
> [   51.119230] RDX: 0000000000000040 RSI: ffffffff98340d60 RDI: ffff98933701b418
> [   51.119230] RBP: ffff98933701b418 R08: 0000000000000000 R09: 0000000000000000
> [   51.119231] R10: 0000000000000000 R11: ffffffff9824b1a8 R12: 0000000000000001
> [   51.119231] R13: 0000000000000001 R14: ffff989336959800 R15: ffff989336d2c900
> [   51.119232] FS:  0000000000000000(0000) GS:ffff989338200000(0000)
> knlGS:0000000000000000
> [   51.119234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   51.119235] CR2: 000055d6ae984288 CR3: 000000013360e000 CR4: 00000000000006f0
> [   51.119235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   51.119236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   51.119236] Call Trace:
> [   51.119244]  ? __irq_get_desc_lock+0x46/0x80
> [   51.119245]  enable_irq+0x41/0x80

Why is this enabling an interrupt which is managed and bound to an offline
CPU?

This is just wrong and the check rightfully catches this.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-04 16:18     ` Thomas Gleixner
@ 2019-04-04 22:19       ` Ming Lei
  2019-04-04 22:29         ` Keith Busch
  2019-04-06 14:29         ` Dongli Zhang
  0 siblings, 2 replies; 17+ messages in thread
From: Ming Lei @ 2019-04-04 22:19 UTC (permalink / raw)


Hi Thomas,

On Thu, Apr 04, 2019@06:18:16PM +0200, Thomas Gleixner wrote:
> On Thu, 4 Apr 2019, Dongli Zhang wrote:
> > 
> > 
> > On 04/04/2019 04:55 PM, Ming Lei wrote:
> > > On Thu, Apr 04, 2019@08:23:59AM +0000, fin4478 fin4478 wrote:
> > >> Hi,
> > >>
> > >> I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
> > >> If there is a patch to fix this, I can test it.
> > >>
> > >> [   53.403033] PM: suspend entry (deep)
> > >> [   53.403034] PM: Syncing filesystems ... done.
> > >> [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
> > >> [   53.405972] OOM killer disabled.
> > >> [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > >> [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
> > >> [   53.407491] ACPI Debug:  "RRIO"
> > >> [   53.407505] serial 00:03: disabled
> > >> [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
> > >> [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
> > >> [   53.415065] sd 5:0:0:0: [sda] Stopping disk
> > >> [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
> > > 
> > > Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.
> > > 
> > > irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
> > > can include at least one online CPU.
> > > 
> > As the warning is triggered by below line 203,
> 
> Yeah, that's obvious.
> 
> > I can reproduce on purpose with qemu (and nvme) with some cpu offline. Here is
> > related qemu cmdline:
> > 
> > -machine q35 -smp 2,maxcpus=4 \
> > -device nvme,drive=nvmedrive,serial=deadbeaf1 \
> > -drive file=test.img,if=none,id=nvmedrive \
> > -monitor stdio \
> > 
> > As there are only 2 cpus online initially, line 203 may return >= nr_cpu_ids
> > during suspend/resume.
> > 
> > 1. Boot VM with above qemu cmdline.
> > 
> > 2. In VM, run "echo freeze > /sys/power/state" and VM (with nvme) will suspend.
> > 
> > 3. Run "system_powerdown" in qemu monitor to wakeup VM.
> > 
> > There will be below logs:
> > 
> > [   50.956594] PM: suspend entry (s2idle)
> > [   50.956595] PM: Syncing filesystems ... done.
> > [   51.112756] Freezing user space processes ... (elapsed 0.001 seconds) done.
> > [   51.113772] OOM killer disabled.
> > [   51.113772] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > [   51.114776] printk: Suspending console(s) (use no_console_suspend to debug)
> > [   51.115549] e1000e: EEE TX LPI TIMER: 00000000
> > [   51.118829] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> > [   51.119221] WARNING: CPU: 0 PID: 2517 at kernel/irq/chip.c:210
> > irq_startup+0xdc/0xf0
> > [   51.119222] Modules linked in:
> > [   51.119223] CPU: 0 PID: 2517 Comm: kworker/u8:9 Not tainted 5.1.0-rc2+ #1
> > [   51.119224] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> > [   51.119225] Workqueue: events_unbound async_run_entry_fn
> > [   51.119227] RIP: 0010:irq_startup+0xdc/0xf0
> > [   51.119228] Code: 31 f6 4c 89 ef e8 24 2d 00 00 85 c0 75 20 48 89 ee 31 d2 4c
> > 89 ef e8 d3 d3 ff ff 48 89 df e8 cb fe ff ff 89 c5 e9 55 ff ff ff <0f> 0b eb b7
> > 0f 0b eb b3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 53
> > [   51.119229] RSP: 0018:ffffb7c6808a7c30 EFLAGS: 00010002
> > [   51.119229] RAX: 0000000000000040 RBX: ffff98933701b400 RCX: 0000000000000040
> > [   51.119230] RDX: 0000000000000040 RSI: ffffffff98340d60 RDI: ffff98933701b418
> > [   51.119230] RBP: ffff98933701b418 R08: 0000000000000000 R09: 0000000000000000
> > [   51.119231] R10: 0000000000000000 R11: ffffffff9824b1a8 R12: 0000000000000001
> > [   51.119231] R13: 0000000000000001 R14: ffff989336959800 R15: ffff989336d2c900
> > [   51.119232] FS:  0000000000000000(0000) GS:ffff989338200000(0000)
> > knlGS:0000000000000000
> > [   51.119234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   51.119235] CR2: 000055d6ae984288 CR3: 000000013360e000 CR4: 00000000000006f0
> > [   51.119235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   51.119236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [   51.119236] Call Trace:
> > [   51.119244]  ? __irq_get_desc_lock+0x46/0x80
> > [   51.119245]  enable_irq+0x41/0x80
> 
> Why is this enabling an interrupt which is managed and bound to an offline
> CPU?

I understand it is hard for driver to check if one interrupt is bound to any
online CPU because CPU hotplug may happen any time. Especially now
managed irq takes static affinity(mapping), almost every driver doesn't
handle CPU hotplug event.

Also in current blk-mq implementation, one irq may become shutdown
because of CPU hotplug even though when there is in-flight request
on the queue served by the irq. Then we depend on timeout handler to
cover this case, and this irq may be enabled in the timeout handler too,
please see nvme_poll_irqdisable().

Thanks,
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-04 22:19       ` Ming Lei
@ 2019-04-04 22:29         ` Keith Busch
  2019-04-04 22:50             ` Ming Lei
  2019-04-06 14:29         ` Dongli Zhang
  1 sibling, 1 reply; 17+ messages in thread
From: Keith Busch @ 2019-04-04 22:29 UTC (permalink / raw)


On Fri, Apr 05, 2019@06:19:50AM +0800, Ming Lei wrote:
> Also in current blk-mq implementation, one irq may become shutdown
> because of CPU hotplug even though when there is in-flight request
> on the queue served by the irq. Then we depend on timeout handler to
> cover this case, and this irq may be enabled in the timeout handler too,
> please see nvme_poll_irqdisable().

Right, but when the last CPU mapped to an hctx is taken offline, we really
ought to have blk-mq wait for that hctx to reap all outstanding requests
before letting the notifier continue with offlining that CPU. We just
don't have the infrastructure to freeze an individual hctx yet.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A kernel warning when entering suspend
  2019-04-04 22:29         ` Keith Busch
@ 2019-04-04 22:50             ` Ming Lei
  0 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2019-04-04 22:50 UTC (permalink / raw)
  To: Keith Busch
  Cc: Thomas Gleixner, Dongli Zhang, fin4478 fin4478, keith.busch,
	axboe, linux-nvme, linux-kernel, linux-block

On Thu, Apr 04, 2019 at 04:29:56PM -0600, Keith Busch wrote:
> On Fri, Apr 05, 2019 at 06:19:50AM +0800, Ming Lei wrote:
> > Also in current blk-mq implementation, one irq may become shutdown
> > because of CPU hotplug even though when there is in-flight request
> > on the queue served by the irq. Then we depend on timeout handler to
> > cover this case, and this irq may be enabled in the timeout handler too,
> > please see nvme_poll_irqdisable().
> 
> Right, but when the last CPU mapped to an hctx is taken offline, we really
> ought to have blk-mq wait for that hctx to reap all outstanding requests
> before letting the notifier continue with offlining that CPU. We just
> don't have the infrastructure to freeze an individual hctx yet.

Looks this issue isn't unique for storage device, anyone knows how other
device drivers deal with this situation? For example, one network packet is
submitted to NIC controller and not got completed, then the interrupt
may become down because of CPU hotplug.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
@ 2019-04-04 22:50             ` Ming Lei
  0 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2019-04-04 22:50 UTC (permalink / raw)


On Thu, Apr 04, 2019@04:29:56PM -0600, Keith Busch wrote:
> On Fri, Apr 05, 2019@06:19:50AM +0800, Ming Lei wrote:
> > Also in current blk-mq implementation, one irq may become shutdown
> > because of CPU hotplug even though when there is in-flight request
> > on the queue served by the irq. Then we depend on timeout handler to
> > cover this case, and this irq may be enabled in the timeout handler too,
> > please see nvme_poll_irqdisable().
> 
> Right, but when the last CPU mapped to an hctx is taken offline, we really
> ought to have blk-mq wait for that hctx to reap all outstanding requests
> before letting the notifier continue with offlining that CPU. We just
> don't have the infrastructure to freeze an individual hctx yet.

Looks this issue isn't unique for storage device, anyone knows how other
device drivers deal with this situation? For example, one network packet is
submitted to NIC controller and not got completed, then the interrupt
may become down because of CPU hotplug.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A kernel warning when entering suspend
  2019-04-04 22:50             ` Ming Lei
@ 2019-04-05 11:35               ` Thomas Gleixner
  -1 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2019-04-05 11:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, Dongli Zhang, fin4478 fin4478, keith.busch, axboe,
	linux-nvme, linux-kernel, linux-block

On Fri, 5 Apr 2019, Ming Lei wrote:
> On Thu, Apr 04, 2019 at 04:29:56PM -0600, Keith Busch wrote:
> > On Fri, Apr 05, 2019 at 06:19:50AM +0800, Ming Lei wrote:
> > > Also in current blk-mq implementation, one irq may become shutdown
> > > because of CPU hotplug even though when there is in-flight request
> > > on the queue served by the irq. Then we depend on timeout handler to
> > > cover this case, and this irq may be enabled in the timeout handler too,
> > > please see nvme_poll_irqdisable().
> > 
> > Right, but when the last CPU mapped to an hctx is taken offline, we really
> > ought to have blk-mq wait for that hctx to reap all outstanding requests
> > before letting the notifier continue with offlining that CPU. We just
> > don't have the infrastructure to freeze an individual hctx yet.
> 
> Looks this issue isn't unique for storage device, anyone knows how other
> device drivers deal with this situation? For example, one network packet is
> submitted to NIC controller and not got completed, then the interrupt
> may become down because of CPU hotplug.

If the interrupt is managed yes.

That was the constraint of managed interrupts from the very beginning:

  The driver/subsystem has to quiesce the interrupt line and the associated
  queue _before_ it gets shutdown in CPU unplug and not fiddle with it
  until it's restarted by the core when the CPU is plugged in again.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
@ 2019-04-05 11:35               ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2019-04-05 11:35 UTC (permalink / raw)


On Fri, 5 Apr 2019, Ming Lei wrote:
> On Thu, Apr 04, 2019@04:29:56PM -0600, Keith Busch wrote:
> > On Fri, Apr 05, 2019@06:19:50AM +0800, Ming Lei wrote:
> > > Also in current blk-mq implementation, one irq may become shutdown
> > > because of CPU hotplug even though when there is in-flight request
> > > on the queue served by the irq. Then we depend on timeout handler to
> > > cover this case, and this irq may be enabled in the timeout handler too,
> > > please see nvme_poll_irqdisable().
> > 
> > Right, but when the last CPU mapped to an hctx is taken offline, we really
> > ought to have blk-mq wait for that hctx to reap all outstanding requests
> > before letting the notifier continue with offlining that CPU. We just
> > don't have the infrastructure to freeze an individual hctx yet.
> 
> Looks this issue isn't unique for storage device, anyone knows how other
> device drivers deal with this situation? For example, one network packet is
> submitted to NIC controller and not got completed, then the interrupt
> may become down because of CPU hotplug.

If the interrupt is managed yes.

That was the constraint of managed interrupts from the very beginning:

  The driver/subsystem has to quiesce the interrupt line and the associated
  queue _before_ it gets shutdown in CPU unplug and not fiddle with it
  until it's restarted by the core when the CPU is plugged in again.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-04 22:19       ` Ming Lei
  2019-04-04 22:29         ` Keith Busch
@ 2019-04-06 14:29         ` Dongli Zhang
  2019-04-06 15:31           ` Dongli Zhang
  1 sibling, 1 reply; 17+ messages in thread
From: Dongli Zhang @ 2019-04-06 14:29 UTC (permalink / raw)




On 04/05/2019 06:19 AM, Ming Lei wrote:
> Hi Thomas,
> 
> On Thu, Apr 04, 2019@06:18:16PM +0200, Thomas Gleixner wrote:
>> On Thu, 4 Apr 2019, Dongli Zhang wrote:
>>>
>>>
>>> On 04/04/2019 04:55 PM, Ming Lei wrote:
>>>> On Thu, Apr 04, 2019@08:23:59AM +0000, fin4478 fin4478 wrote:
>>>>> Hi,
>>>>>
>>>>> I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
>>>>> If there is a patch to fix this, I can test it.
>>>>>
>>>>> [   53.403033] PM: suspend entry (deep)
>>>>> [   53.403034] PM: Syncing filesystems ... done.
>>>>> [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>>>> [   53.405972] OOM killer disabled.
>>>>> [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>> [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
>>>>> [   53.407491] ACPI Debug:  "RRIO"
>>>>> [   53.407505] serial 00:03: disabled
>>>>> [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
>>>>> [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
>>>>> [   53.415065] sd 5:0:0:0: [sda] Stopping disk
>>>>> [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
>>>>
>>>> Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.
>>>>
>>>> irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
>>>> can include at least one online CPU.
>>>>
>>> As the warning is triggered by below line 203,
>>
>> Yeah, that's obvious.
>>
>>> I can reproduce on purpose with qemu (and nvme) with some cpu offline. Here is
>>> related qemu cmdline:
>>>
>>> -machine q35 -smp 2,maxcpus=4 \
>>> -device nvme,drive=nvmedrive,serial=deadbeaf1 \
>>> -drive file=test.img,if=none,id=nvmedrive \
>>> -monitor stdio \
>>>
>>> As there are only 2 cpus online initially, line 203 may return >= nr_cpu_ids
>>> during suspend/resume.
>>>
>>> 1. Boot VM with above qemu cmdline.
>>>
>>> 2. In VM, run "echo freeze > /sys/power/state" and VM (with nvme) will suspend.
>>>
>>> 3. Run "system_powerdown" in qemu monitor to wakeup VM.
>>>
>>> There will be below logs:
>>>
>>> [   50.956594] PM: suspend entry (s2idle)
>>> [   50.956595] PM: Syncing filesystems ... done.
>>> [   51.112756] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>> [   51.113772] OOM killer disabled.
>>> [   51.113772] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>> [   51.114776] printk: Suspending console(s) (use no_console_suspend to debug)
>>> [   51.115549] e1000e: EEE TX LPI TIMER: 00000000
>>> [   51.118829] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>> [   51.119221] WARNING: CPU: 0 PID: 2517 at kernel/irq/chip.c:210
>>> irq_startup+0xdc/0xf0
>>> [   51.119222] Modules linked in:
>>> [   51.119223] CPU: 0 PID: 2517 Comm: kworker/u8:9 Not tainted 5.1.0-rc2+ #1
>>> [   51.119224] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>>> [   51.119225] Workqueue: events_unbound async_run_entry_fn
>>> [   51.119227] RIP: 0010:irq_startup+0xdc/0xf0
>>> [   51.119228] Code: 31 f6 4c 89 ef e8 24 2d 00 00 85 c0 75 20 48 89 ee 31 d2 4c
>>> 89 ef e8 d3 d3 ff ff 48 89 df e8 cb fe ff ff 89 c5 e9 55 ff ff ff <0f> 0b eb b7
>>> 0f 0b eb b3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 53
>>> [   51.119229] RSP: 0018:ffffb7c6808a7c30 EFLAGS: 00010002
>>> [   51.119229] RAX: 0000000000000040 RBX: ffff98933701b400 RCX: 0000000000000040
>>> [   51.119230] RDX: 0000000000000040 RSI: ffffffff98340d60 RDI: ffff98933701b418
>>> [   51.119230] RBP: ffff98933701b418 R08: 0000000000000000 R09: 0000000000000000
>>> [   51.119231] R10: 0000000000000000 R11: ffffffff9824b1a8 R12: 0000000000000001
>>> [   51.119231] R13: 0000000000000001 R14: ffff989336959800 R15: ffff989336d2c900
>>> [   51.119232] FS:  0000000000000000(0000) GS:ffff989338200000(0000)
>>> knlGS:0000000000000000
>>> [   51.119234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   51.119235] CR2: 000055d6ae984288 CR3: 000000013360e000 CR4: 00000000000006f0
>>> [   51.119235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [   51.119236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [   51.119236] Call Trace:
>>> [   51.119244]  ? __irq_get_desc_lock+0x46/0x80
>>> [   51.119245]  enable_irq+0x41/0x80
>>
>> Why is this enabling an interrupt which is managed and bound to an offline
>> CPU?
> 
> I understand it is hard for driver to check if one interrupt is bound to any
> online CPU because CPU hotplug may happen any time. Especially now
> managed irq takes static affinity(mapping), almost every driver doesn't
> handle CPU hotplug event.
> 

To suspend, the driver would reclaim and should process all completed request
from cq ring buffer, between disable_irq() and enable_irq() as from line 1096 to
1098.

1080 static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
...
1091         if (nvmeq->cq_vector == -1) {
1092                 spin_lock(&nvmeq->cq_poll_lock);
1093                 found = nvme_process_cq(nvmeq, &start, &end, tag);
1094                 spin_unlock(&nvmeq->cq_poll_lock);
1095         } else {
1096                 disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
1097                 found = nvme_process_cq(nvmeq, &start, &end, tag);
1098                 enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
1099         }

As Ming mentioned, it is hard for driver to check if one interrupt is bound to
any online CPU because CPU hotplug may happen any time.

If there is way to know that, more specifically, if there is way to know if all
cpus in the affinity of one io queue are offline, there is no need to
disable_irq/enable_irq in order to call nvme_process_cq() for that io queue?


I could not reproduce with virtio-blk. It looks virtio-blk just quiesce the
queue without reclaiming any pending response on ring buffer?

 949 #ifdef CONFIG_PM_SLEEP
 950 static int virtblk_freeze(struct virtio_device *vdev)
 951 {
 952         struct virtio_blk *vblk = vdev->priv;
 953
 954         /* Ensure we don't receive any more interrupts */
 955         vdev->config->reset(vdev);
 956
 957         /* Make sure no work handler is accessing the device. */
 958         flush_work(&vblk->config_work);
 959
 960         blk_mq_quiesce_queue(vblk->disk->queue);
 961
 962         vdev->config->del_vqs(vdev);
 963         return 0;
 964 }

Dongli Zhang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-06 14:29         ` Dongli Zhang
@ 2019-04-06 15:31           ` Dongli Zhang
  2019-04-06 16:12             ` fin4478 fin4478
  0 siblings, 1 reply; 17+ messages in thread
From: Dongli Zhang @ 2019-04-06 15:31 UTC (permalink / raw)




On 04/06/2019 10:29 PM, Dongli Zhang wrote:
> 
> 
> On 04/05/2019 06:19 AM, Ming Lei wrote:
>> Hi Thomas,
>>
>> On Thu, Apr 04, 2019@06:18:16PM +0200, Thomas Gleixner wrote:
>>> On Thu, 4 Apr 2019, Dongli Zhang wrote:
>>>>
>>>>
>>>> On 04/04/2019 04:55 PM, Ming Lei wrote:
>>>>> On Thu, Apr 04, 2019@08:23:59AM +0000, fin4478 fin4478 wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
>>>>>> If there is a patch to fix this, I can test it.
>>>>>>
>>>>>> [   53.403033] PM: suspend entry (deep)
>>>>>> [   53.403034] PM: Syncing filesystems ... done.
>>>>>> [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>>>>> [   53.405972] OOM killer disabled.
>>>>>> [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>>> [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
>>>>>> [   53.407491] ACPI Debug:  "RRIO"
>>>>>> [   53.407505] serial 00:03: disabled
>>>>>> [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
>>>>>> [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
>>>>>> [   53.415065] sd 5:0:0:0: [sda] Stopping disk
>>>>>> [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
>>>>>
>>>>> Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.
>>>>>
>>>>> irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
>>>>> can include at least one online CPU.
>>>>>
>>>> As the warning is triggered by below line 203,
>>>
>>> Yeah, that's obvious.
>>>
>>>> I can reproduce on purpose with qemu (and nvme) with some cpu offline. Here is
>>>> related qemu cmdline:
>>>>
>>>> -machine q35 -smp 2,maxcpus=4 \
>>>> -device nvme,drive=nvmedrive,serial=deadbeaf1 \
>>>> -drive file=test.img,if=none,id=nvmedrive \
>>>> -monitor stdio \
>>>>
>>>> As there are only 2 cpus online initially, line 203 may return >= nr_cpu_ids
>>>> during suspend/resume.
>>>>
>>>> 1. Boot VM with above qemu cmdline.
>>>>
>>>> 2. In VM, run "echo freeze > /sys/power/state" and VM (with nvme) will suspend.
>>>>
>>>> 3. Run "system_powerdown" in qemu monitor to wakeup VM.
>>>>
>>>> There will be below logs:
>>>>
>>>> [   50.956594] PM: suspend entry (s2idle)
>>>> [   50.956595] PM: Syncing filesystems ... done.
>>>> [   51.112756] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>>> [   51.113772] OOM killer disabled.
>>>> [   51.113772] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>> [   51.114776] printk: Suspending console(s) (use no_console_suspend to debug)
>>>> [   51.115549] e1000e: EEE TX LPI TIMER: 00000000
>>>> [   51.118829] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>>> [   51.119221] WARNING: CPU: 0 PID: 2517 at kernel/irq/chip.c:210
>>>> irq_startup+0xdc/0xf0
>>>> [   51.119222] Modules linked in:
>>>> [   51.119223] CPU: 0 PID: 2517 Comm: kworker/u8:9 Not tainted 5.1.0-rc2+ #1
>>>> [   51.119224] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>>>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>>>> [   51.119225] Workqueue: events_unbound async_run_entry_fn
>>>> [   51.119227] RIP: 0010:irq_startup+0xdc/0xf0
>>>> [   51.119228] Code: 31 f6 4c 89 ef e8 24 2d 00 00 85 c0 75 20 48 89 ee 31 d2 4c
>>>> 89 ef e8 d3 d3 ff ff 48 89 df e8 cb fe ff ff 89 c5 e9 55 ff ff ff <0f> 0b eb b7
>>>> 0f 0b eb b3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 53
>>>> [   51.119229] RSP: 0018:ffffb7c6808a7c30 EFLAGS: 00010002
>>>> [   51.119229] RAX: 0000000000000040 RBX: ffff98933701b400 RCX: 0000000000000040
>>>> [   51.119230] RDX: 0000000000000040 RSI: ffffffff98340d60 RDI: ffff98933701b418
>>>> [   51.119230] RBP: ffff98933701b418 R08: 0000000000000000 R09: 0000000000000000
>>>> [   51.119231] R10: 0000000000000000 R11: ffffffff9824b1a8 R12: 0000000000000001
>>>> [   51.119231] R13: 0000000000000001 R14: ffff989336959800 R15: ffff989336d2c900
>>>> [   51.119232] FS:  0000000000000000(0000) GS:ffff989338200000(0000)
>>>> knlGS:0000000000000000
>>>> [   51.119234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   51.119235] CR2: 000055d6ae984288 CR3: 000000013360e000 CR4: 00000000000006f0
>>>> [   51.119235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [   51.119236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [   51.119236] Call Trace:
>>>> [   51.119244]  ? __irq_get_desc_lock+0x46/0x80
>>>> [   51.119245]  enable_irq+0x41/0x80
>>>
>>> Why is this enabling an interrupt which is managed and bound to an offline
>>> CPU?
>>
>> I understand it is hard for driver to check if one interrupt is bound to any
>> online CPU because CPU hotplug may happen any time. Especially now
>> managed irq takes static affinity(mapping), almost every driver doesn't
>> handle CPU hotplug event.
>>
> 
> To suspend, the driver would reclaim and should process all completed request
> from cq ring buffer, between disable_irq() and enable_irq() as from line 1096 to
> 1098.
> 
> 1080 static int nvme_poll_irqdisable(struct nvme_queue *nvmeq, unsigned int tag)
> ...
> 1091         if (nvmeq->cq_vector == -1) {
> 1092                 spin_lock(&nvmeq->cq_poll_lock);
> 1093                 found = nvme_process_cq(nvmeq, &start, &end, tag);
> 1094                 spin_unlock(&nvmeq->cq_poll_lock);
> 1095         } else {
> 1096                 disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
> 1097                 found = nvme_process_cq(nvmeq, &start, &end, tag);
> 1098                 enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector));
> 1099         }
> 
> As Ming mentioned, it is hard for driver to check if one interrupt is bound to
> any online CPU because CPU hotplug may happen any time.
> 
> If there is way to know that, more specifically, if there is way to know if all
> cpus in the affinity of one io queue are offline, there is no need to
> disable_irq/enable_irq in order to call nvme_process_cq() for that io queue?


Eero pointed to this patch set: https://patchwork.kernel.org/patch/10708055/

I seems due to below which removes the fast path cqe_pending check.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b2a8a9f4b564c7d923597828d93cd1f69ce40e0


I am not able to reproduce anymore with below patch:

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d..90f6e80 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1083,6 +1083,9 @@ static int nvme_poll_irqdisable(struct nvme_queue *nvmeq,
unsigned int tag)
        u16 start, end;
        int found;

+       if (!nvme_cqe_pending(nvmeq))
+               return 0;
+
        /*
         * For a poll queue we need to protect against the polling thread
         * using the CQ lock.  For normal interrupt driven threads we have

Dongli Zhang

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-06 15:31           ` Dongli Zhang
@ 2019-04-06 16:12             ` fin4478 fin4478
  2019-04-08 15:07               ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: fin4478 fin4478 @ 2019-04-06 16:12 UTC (permalink / raw)


On Sat, 6 Apr 2019 23:31:51 +0800
Dongli Zhang <dongli.zhang@oracle.com> wrote:


> I am not able to reproduce anymore with below patch:
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a90cf5d..90f6e80 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1083,6 +1083,9 @@ static int nvme_poll_irqdisable(struct
> nvme_queue *nvmeq, unsigned int tag)
>         u16 start, end;
>         int found;
> 
> +       if (!nvme_cqe_pending(nvmeq))
> +               return 0;
> +
>         /*
>          * For a poll queue we need to protect against the polling
> thread
>          * using the CQ lock.  For normal interrupt driven threads we
> have
> 
> Dongli Zhang

Confirming. This bug is fixed with this patch.
[   48.690299] PM: suspend entry (deep)
[   48.690300] PM: Syncing filesystems ... done.
[   48.691215] Freezing user space processes ... (elapsed 0.000
seconds) done. [   48.692025] OOM killer disabled.
[   48.692025] Freezing remaining freezable tasks ... (elapsed 0.000
seconds) done. [   48.693033] printk: Suspending console(s) (use
no_console_suspend to debug) [   48.693569] r8169 0000:07:00.0 enp7s0:
Link is Down [   48.693586] serial 00:03: disabled
[   48.702049] sd 5:0:0:0: [sda] Synchronizing SCSI cache
[   48.702076] sd 5:0:0:0: [sda] Stopping disk
[   49.049134] ACPI: Preparing to enter system sleep state S3
[   49.353355] PM: Saving platform NVS memory
[   49.353362] Disabling non-boot CPUs ...
[   49.354764] smpboot: CPU 1 is now offline

Thank You.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A kernel warning when entering suspend
  2019-04-06 16:12             ` fin4478 fin4478
@ 2019-04-08 15:07               ` Keith Busch
  0 siblings, 0 replies; 17+ messages in thread
From: Keith Busch @ 2019-04-08 15:07 UTC (permalink / raw)


On Sat, Apr 06, 2019@09:12:12AM -0700, fin4478 fin4478 wrote:
> On Sat, 6 Apr 2019 23:31:51 +0800
> Dongli Zhang <dongli.zhang@oracle.com> wrote:
> 
> 
> > I am not able to reproduce anymore with below patch:
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index a90cf5d..90f6e80 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -1083,6 +1083,9 @@ static int nvme_poll_irqdisable(struct
> > nvme_queue *nvmeq, unsigned int tag)
> >         u16 start, end;
> >         int found;
> > 
> > +       if (!nvme_cqe_pending(nvmeq))
> > +               return 0;
> > +
> >         /*
> >          * For a poll queue we need to protect against the polling
> > thread
> >          * using the CQ lock.  For normal interrupt driven threads we
> > have
> > 
> > Dongli Zhang
> 
> Confirming. This bug is fixed with this patch.

I think the patch makes sense, though you'd still have a problem if
there is a cqe pending when you reach this point.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A kernel warning when entering suspend
  2019-04-04 11:14   ` Thomas Gleixner
@ 2019-12-23 14:52     ` Maxim Levitsky
  2020-01-16 13:58       ` Maxim Levitsky
  0 siblings, 1 reply; 17+ messages in thread
From: Maxim Levitsky @ 2019-12-23 14:52 UTC (permalink / raw)
  To: Thomas Gleixner, Ming Lei; +Cc: fin4478 fin4478, keith.busch, axboe, linux-nvme

On Thu, 2019-04-04 at 13:14 +0200, Thomas Gleixner wrote:
> On Thu, 4 Apr 2019, Ming Lei wrote:
> 
> > On Thu, Apr 04, 2019 at 08:23:59AM +0000, fin4478 fin4478 wrote:
> > > Hi,
> > > 
> > > I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
> > > If there is a patch to fix this, I can test it.
> > > 
> > > [   53.403033] PM: suspend entry (deep)
> > > [   53.403034] PM: Syncing filesystems ... done.
> > > [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
> > > [   53.405972] OOM killer disabled.
> > > [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > > [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
> > > [   53.407491] ACPI Debug:  "RRIO"
> > > [   53.407505] serial 00:03: disabled
> > > [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
> > > [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
> > > [   53.415065] sd 5:0:0:0: [sda] Stopping disk
> > > [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
> > 
> > Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.
> 
> Why?
> 
> > irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
> > can include at least one online CPU.
> 
> Then why is it tried to start up an interrupt which has no online cpu in
> the mask? I'm missing context obviously.

Hi!

The problem here is that recently an optimization was added to avoid taking a spinlock in
the interrupt handler in expense of enable_irq/disable_irq around the place where the race can happen.

The patch is https://patchwork.kernel.org/patch/10708055/

And the problem here is that indeed if nvme driver allocates more interrupts that number
of currently online cpus (which happens if number of possible cpus is larger that number of cpus
the system was booted with), then some of the nvme queues exist but effectively disabled (till relevant cpus
are plugged which can happen any time).

I think that this is a valid use case, and since other than that warning, enable_irq in this case doesn't
really enable it, maybe the warning should be removed?

We have a bug report about this issue, and I was told that the warning breaks up some QE scripts.
https://bugzilla.redhat.com/show_bug.cgi?id=1779247


Best regards,
	Maxim Levitsky


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A kernel warning when entering suspend
  2019-12-23 14:52     ` Maxim Levitsky
@ 2020-01-16 13:58       ` Maxim Levitsky
  0 siblings, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2020-01-16 13:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ming Lei
  Cc: fin4478 fin4478, keith.busch, axboe, linux-nvme, Christoph Hellwig

On Mon, 2019-12-23 at 16:52 +0200, Maxim Levitsky wrote:
> On Thu, 2019-04-04 at 13:14 +0200, Thomas Gleixner wrote:
> > On Thu, 4 Apr 2019, Ming Lei wrote:
> > 
> > > On Thu, Apr 04, 2019 at 08:23:59AM +0000, fin4478 fin4478 wrote:
> > > > Hi,
> > > > 
> > > > I do not use suspend/resume but noticed this kernel warning when testing it. This warning is present in earlier kernels too. My system works fine after resume.
> > > > If there is a patch to fix this, I can test it.
> > > > 
> > > > [   53.403033] PM: suspend entry (deep)
> > > > [   53.403034] PM: Syncing filesystems ... done.
> > > > [   53.404775] Freezing user space processes ... (elapsed 0.001 seconds) done.
> > > > [   53.405972] OOM killer disabled.
> > > > [   53.405973] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > > > [   53.407036] printk: Suspending console(s) (use no_console_suspend to debug)
> > > > [   53.407491] ACPI Debug:  "RRIO"
> > > > [   53.407505] serial 00:03: disabled
> > > > [   53.407560] r8169 0000:07:00.0 enp7s0: Link is Down
> > > > [   53.415042] sd 5:0:0:0: [sda] Synchronizing SCSI cache
> > > > [   53.415065] sd 5:0:0:0: [sda] Stopping disk
> > > > [   53.428943] WARNING: CPU: 10 PID: 3127 at kernel/irq/chip.c:210 irq_startup+0xd6/0xe0
> > > 
> > > Looks the 'WARN_ON_ONCE(force)' in irq_startup() is a bit too strict.
> > 
> > Why?
> > 
> > > irq_build_affinity_masks() doesn't guarantee that each IRQ's affinity
> > > can include at least one online CPU.
> > 
> > Then why is it tried to start up an interrupt which has no online cpu in
> > the mask? I'm missing context obviously.
> 
> Hi!
> 
> The problem here is that recently an optimization was added to avoid taking a spinlock in
> the interrupt handler in expense of enable_irq/disable_irq around the place where the race can happen.
> 
> The patch is https://patchwork.kernel.org/patch/10708055/
> 
> And the problem here is that indeed if nvme driver allocates more interrupts that number
> of currently online cpus (which happens if number of possible cpus is larger that number of cpus
> the system was booted with), then some of the nvme queues exist but effectively disabled (till relevant cpus
> are plugged which can happen any time).
> 
> I think that this is a valid use case, and since other than that warning, enable_irq in this case doesn't
> really enable it, maybe the warning should be removed?
> 
> We have a bug report about this issue, and I was told that the warning breaks up some QE scripts.
> https://bugzilla.redhat.com/show_bug.cgi?id=1779247
> 

Ping.

Best regards,
	Maxim Levitsky


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-01-16 13:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-04  8:23 A kernel warning when entering suspend fin4478 fin4478
2019-04-04  8:55 ` Ming Lei
2019-04-04 11:14   ` Thomas Gleixner
2019-12-23 14:52     ` Maxim Levitsky
2020-01-16 13:58       ` Maxim Levitsky
2019-04-04 15:37   ` Dongli Zhang
2019-04-04 16:18     ` Thomas Gleixner
2019-04-04 22:19       ` Ming Lei
2019-04-04 22:29         ` Keith Busch
2019-04-04 22:50           ` Ming Lei
2019-04-04 22:50             ` Ming Lei
2019-04-05 11:35             ` Thomas Gleixner
2019-04-05 11:35               ` Thomas Gleixner
2019-04-06 14:29         ` Dongli Zhang
2019-04-06 15:31           ` Dongli Zhang
2019-04-06 16:12             ` fin4478 fin4478
2019-04-08 15:07               ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.