linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "Plug non-maskable MSI affinity race" triggers a warning with CPU hotplugs
@ 2020-02-12  1:56 Qian Cai
  2020-02-12 11:19 ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Qian Cai @ 2020-02-12  1:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Evan Green, Rajat Jain, Bjorn Helgaas, linux-pci,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Marc Zyngier

The linux-next commit 6f1a4891a592 (“x86/apic/msi: Plug non-maskable MSI affinity race”)
Introduced a bug which is always triggered during the CPU hotplugs on this server. See
the trace and line numbers below.

[30801.862014][ T5760] LTP: starting hugeshmget05 (hugeshmget05 -i 10) 
[30803.097105][ T5760] LTP: starting cpuhotplug02 (cpuhotplug02.sh -c 1 -l 1) 
[30809.441006][   T16] IRQ 104: no longer affine to CPU1 
[30809.466544][   T16] IRQ 125: no longer affine to CPU1 
[30809.492799][   T16] process 21134 (cpuhotplug_do_s) no longer affine to cpu1 
[30809.494344][T21112] smpboot: CPU 1 is now offline 
[30811.071179][T21112] x86: Booting SMP configuration: 
[30811.097476][T21112] smpboot: Booting Node 0 Processor 1 APIC 0x2 
[30813.213453][ T5760] LTP: starting cpuhotplug03 (cpuhotplug03.sh -c 1 -l 1) 
[30817.456909][T21146] smpboot: CPU 1 is now offline 
[30828.916751][T21146] x86: Booting SMP configuration: 
[30828.942396][T21146] smpboot: Booting Node 0 Processor 1 APIC 0x2 
[-- MARK -- Tue Feb 11 22:40:00 2020] 
[30855.147353][ T5760] LTP: starting cpuhotplug04 (cpuhotplug04.sh -l 1) 
[30859.353961][   T16] IRQ 119: no longer affine to CPU1 
[30859.381931][T22008] smpboot: CPU 1 is now offline 
[30859.557430][   T21] IRQ 106: no longer affine to CPU2 
[30859.582708][T22008] smpboot: CPU 2 is now offline 
[30860.042902][   T26] IRQ 113: no longer affine to CPU3 
[30860.069520][T22008] smpboot: CPU 3 is now offline 
[30860.464752][T22008] smpboot: CPU 4 is now offline 
[30860.987206][   T36] IRQ 124: no longer affine to CPU5 
[30861.015866][T22008] smpboot: CPU 5 is now offline 
[30861.632409][T22008] smpboot: CPU 6 is now offline 
[30862.051962][   T46] IRQ 125: no longer affine to CPU7 
[30862.076156][T22008] smpboot: CPU 7 is now offline 
[30862.577746][T22008] smpboot: CPU 8 is now offline 
[30863.014959][   T56] IRQ 123: no longer affine to CPU9 
[30863.040760][T22008] smpboot: CPU 9 is now offline 
[30863.494227][T22008] smpboot: CPU 10 is now offline 
[30863.922548][   T66] IRQ 121: no longer affine to CPU11 
[30863.949053][T22008] smpboot: CPU 11 is now offline 
[30865.734181][T22008] smpboot: CPU 12 is now offline 
[30867.883120][T22008] smpboot: CPU 13 is now offline 
[30870.013237][T22008] smpboot: CPU 14 is now offline 
[30872.123324][T22008] smpboot: CPU 15 is now offline 
[30875.003486][T22008] smpboot: CPU 16 is now offline 
[30877.233482][T22008] smpboot: CPU 17 is now offline 
[30879.383349][T22008] smpboot: CPU 18 is now offline 
[30881.233183][T22008] smpboot: CPU 19 is now offline 
[30883.473628][T22008] smpboot: CPU 20 is now offline 
[30886.093474][T22008] smpboot: CPU 21 is now offline 
[30888.085684][T22008] smpboot: CPU 22 is now offline 
[30890.165790][T22008] smpboot: CPU 23 is now offline 
[30890.556773][  T132] IRQ 122: no longer affine to CPU24 
[30890.582921][T22008] smpboot: CPU 24 is now offline 
[30890.803132][  T137] IRQ 119: no longer affine to CPU25 
[30890.833943][T22008] smpboot: CPU 25 is now offline 
[30891.252422][  T142] IRQ 109: no longer affine to CPU26 
[30891.275535][  T142] IRQ 116: no longer affine to CPU26 
[30891.302447][T22008] smpboot: CPU 26 is now offline 
[30891.580237][  T147] IRQ 115: no longer affine to CPU27 
[30891.607772][  T147] IRQ fixup: irq 123 move in progress, old vector 34 
[30891.642439][T22008] smpboot: CPU 27 is now offline 
[30891.893758][  T152] IRQ 107: no longer affine to CPU28 
[30891.917209][  T152] IRQ 114: no longer affine to CPU28 
[30891.940381][  T152] IRQ fixup: irq 121 move in progress, old vector 34 
[30891.971105][T22008] smpboot: CPU 28 is now offline 
[30892.245251][  T157] IRQ 108: no longer affine to CPU29 
[30892.272343][  T157] IRQ 113: no longer affine to CPU29 
[30892.300690][T22008] smpboot: CPU 29 is now offline 
[30892.574296][  T162] IRQ 106: no longer affine to CPU30 
[30892.598705][T22008] smpboot: CPU 30 is now offline 
[30892.801901][T22008] smpboot: CPU 31 is now offline 
[30893.246486][T22008] smpboot: CPU 32 is now offline 
[30893.549519][T22008] smpboot: CPU 33 is now offline 
[30893.908288][T22008] smpboot: CPU 34 is now offline 
[30894.091385][T22008] smpboot: CPU 35 is now offline 
[30894.472537][ T2794] ------------[ cut here ]------------ 
[30894.498223][ T2794] WARNING: CPU: 0 PID: 2794 at arch/x86/kernel/apic/msi.c:103 msi_set_affinity+0x278/0x330 
[30894.543016][ T2794] Modules linked in: brd ext4 crc16 mbcache jbd2 loop nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem dax_pmem_core intel_rapl_perf efivars ip_tables x_tables xfs sd_mod ahci libahci libata igb hpsa i2c_algo_bit scsi_transport_sas i2c_core tg3 firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod efivarfs [last unloaded: binfmt_misc] 
[30894.709559][ T2794] CPU: 0 PID: 2794 Comm: irqbalance Tainted: G             L    5.6.0-rc1-next-20200211 #1 
[30894.753946][ T2794] Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 05/05/2016 
[30894.799739][ T2794] RIP: 0010:msi_set_affinity+0x278/0x330 
[30894.824950][ T2794] Code: 49 c7 44 05 00 00 00 00 00 44 89 f0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 75 6b 48 83 ec 80 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 4c 89 e6 48 89 df e8 9b fc ff ff eb bd 48 8d 7b 18 e8 20 05 
[30894.912256][ T2794] RSP: 0018:ffffc90005e4fa58 EFLAGS: 00010093 
[30894.939043][ T2794] RAX: 0000000000000000 RBX: ffff88836f0b0428 RCX: ffffffffa5e8ef91 
[30894.974228][ T2794] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff88836dfe8c40 
[30895.009472][ T2794] RBP: ffffc90005e4fb00 R08: fffffbfff4e98461 R09: fffffbfff4e98461 
[30895.044982][ T2794] R10: fffffbfff4e98460 R11: ffffffffa74c2303 R12: ffff88836dfe8c40 
[30895.080170][ T2794] R13: 1ffff92000bc9f4f R14: 0000000000000000 R15: ffff88836dfe8c44 
[30895.115503][ T2794] FS:  00007f30e37a3e80(0000) GS:ffff888452600000(0000) knlGS:0000000000000000 
[30895.160815][ T2794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[30895.189818][ T2794] CR2: 00007ffdac9e4f20 CR3: 00000007268d0002 CR4: 00000000001606f0 
[30895.225606][ T2794] Call Trace: 
[30895.239777][ T2794]  ? irq_msi_update_msg+0xe0/0xe0 
[30895.261672][ T2794]  ? rwlock_bug.part.1+0x60/0x60 
[30895.284071][ T2794]  irq_do_set_affinity+0x75/0x180 
irq_do_set_affinity+0x75/0x180:
irq_do_set_affinity at kernel/irq/manage.c:259
[30895.306065][ T2794]  irq_setup_affinity+0x167/0x250 
irq_setup_affinity at kernel/irq/manage.c:474
[30895.328157][ T2794]  irq_select_affinity_usr+0x30/0x50 
irq_select_affinity_usr at kernel/irq/manage.c:496
[30895.351507][ T2794]  write_irq_affinity.isra.0+0x137/0x1e0 
[30895.376891][ T2794]  ? irq_node_proc_show+0x50/0x50 
[30895.398821][ T2794]  ? match_held_lock+0x35/0x250 
[30895.419982][ T2794]  irq_affinity_proc_write+0x19/0x20 
[30895.444225][ T2794]  proc_reg_write+0x12e/0x190 
[30895.464538][ T2794]  ? proc_reg_read+0x190/0x190 
[30895.485249][ T2794]  ? rcu_read_lock_held+0xc0/0xc0 
[30895.507212][ T2794]  __vfs_write+0x50/0xa0 
[30895.526323][ T2794]  vfs_write+0x105/0x280 
[30895.545621][ T2794]  ksys_write+0xc6/0x160 
[30895.564993][ T2794]  ? __x64_sys_read+0x50/0x50 
[30895.585337][ T2794]  ? do_syscall_64+0x79/0xaec 
[30895.605686][ T2794]  ? do_syscall_64+0x79/0xaec 
[30895.626066][ T2794]  __x64_sys_write+0x43/0x50 
[30895.648476][ T2794]  do_syscall_64+0xcc/0xaec 
[30895.671473][ T2794]  ? trace_hardirqs_on_thunk+0x1a/0x1c 
[30895.696452][ T2794]  ? syscall_return_slowpath+0x580/0x580 
[30895.721226][ T2794]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe 
[30895.748025][ T2794]  ? trace_hardirqs_off_caller+0x3a/0x150 
[30895.773033][ T2794]  ? trace_hardirqs_off_thunk+0x1a/0x1c 
[30895.797308][ T2794]  entry_SYSCALL_64_after_hwframe+0x49/0xbe 
[30895.823118][ T2794] RIP: 0033:0x7f30e23a1e4f 
[30895.842371][ T2794] Code: 00 00 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 23 ca 01 00 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 5c ca 01 00 48 
[30895.930359][ T2794] RSP: 002b:00007ffdac9e1760 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 
[30895.967557][ T2794] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f30e23a1e4f 
[30896.002786][ T2794] RDX: 0000000000000011 RSI: 0000556488e3d310 RDI: 0000000000000007 
[30896.038063][ T2794] RBP: 0000556488e3d310 R08: 0000000000000000 R09: 0000556486f3b200 
[30896.073471][ T2794] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000011 
[30896.108344][ T2794] R13: 0000000000000011 R14: 00007f30e266e740 R15: 0000000000000011 
[30896.143550][ T2794] irq event stamp: 378868746 
[30896.165644][ T2794] hardirqs last  enabled at (378868745): [<ffffffffa6210123>] quarantine_put+0x73/0x270 
[30896.214017][ T2794] hardirqs last disabled at (378868746): [<ffffffffa6892748>] _raw_spin_lock_irqsave+0x18/0x50 
[30896.261385][ T2794] softirqs last  enabled at (378868364): [<ffffffffa6c00447>] __do_softirq+0x447/0x766 
[30896.303183][ T2794] softirqs last disabled at (378867251): [<ffffffffa5ed1da6>] irq_exit+0xd6/0xf0 
[30896.343633][ T2794] ---[ end trace a0a160876a02c150 ]--- 
[30897.964246][T22008] smpboot: CPU 36 is now offline 
[30906.533125][T22008] smpboot: CPU 37 is now offline 
[30914.823295][T22008] smpboot: CPU 38 is now offline 
[30920.594545][T22008] smpboot: CPU 39 is now offline 
[30926.504299][T22008] smpboot: CPU 40 is now offline 
[30931.962884][T22008] smpboot: CPU 41 is now offline 
[30937.492970][T22008] smpboot: CPU 42 is now offline 
[30942.822984][T22008] smpboot: CPU 43 is now offline 
[30948.543231][T22008] smpboot: CPU 44 is now offline 
[30954.005524][T22008] smpboot: CPU 45 is now offline 
[30962.765295][T22008] smpboot: CPU 46 is now offline 
[30963.112374][  T247] irq_migrate_all_off_this_cpu: 10 callbacks suppressed 
[30963.112380][  T247] IRQ 3: no longer affine to CPU47



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-13  0:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12  1:56 "Plug non-maskable MSI affinity race" triggers a warning with CPU hotplugs Qian Cai
2020-02-12 11:19 ` Thomas Gleixner
2020-02-13  0:44   ` Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).