All of lore.kernel.org
 help / color / mirror / Atom feed
* "Plug non-maskable MSI affinity race" triggers a warning with CPU hotplugs
@ 2020-02-12  1:56 Qian Cai
  2020-02-12 11:19 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Qian Cai @ 2020-02-12  1:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Evan Green, Rajat Jain, Bjorn Helgaas, linux-pci,
	Linux Kernel Mailing List, the arch/x86 maintainers,
	Marc Zyngier

The linux-next commit 6f1a4891a592 (“x86/apic/msi: Plug non-maskable MSI affinity race”)
Introduced a bug which is always triggered during the CPU hotplugs on this server. See
the trace and line numbers below.

[30801.862014][ T5760] LTP: starting hugeshmget05 (hugeshmget05 -i 10) 
[30803.097105][ T5760] LTP: starting cpuhotplug02 (cpuhotplug02.sh -c 1 -l 1) 
[30809.441006][   T16] IRQ 104: no longer affine to CPU1 
[30809.466544][   T16] IRQ 125: no longer affine to CPU1 
[30809.492799][   T16] process 21134 (cpuhotplug_do_s) no longer affine to cpu1 
[30809.494344][T21112] smpboot: CPU 1 is now offline 
[30811.071179][T21112] x86: Booting SMP configuration: 
[30811.097476][T21112] smpboot: Booting Node 0 Processor 1 APIC 0x2 
[30813.213453][ T5760] LTP: starting cpuhotplug03 (cpuhotplug03.sh -c 1 -l 1) 
[30817.456909][T21146] smpboot: CPU 1 is now offline 
[30828.916751][T21146] x86: Booting SMP configuration: 
[30828.942396][T21146] smpboot: Booting Node 0 Processor 1 APIC 0x2 
[-- MARK -- Tue Feb 11 22:40:00 2020] 
[30855.147353][ T5760] LTP: starting cpuhotplug04 (cpuhotplug04.sh -l 1) 
[30859.353961][   T16] IRQ 119: no longer affine to CPU1 
[30859.381931][T22008] smpboot: CPU 1 is now offline 
[30859.557430][   T21] IRQ 106: no longer affine to CPU2 
[30859.582708][T22008] smpboot: CPU 2 is now offline 
[30860.042902][   T26] IRQ 113: no longer affine to CPU3 
[30860.069520][T22008] smpboot: CPU 3 is now offline 
[30860.464752][T22008] smpboot: CPU 4 is now offline 
[30860.987206][   T36] IRQ 124: no longer affine to CPU5 
[30861.015866][T22008] smpboot: CPU 5 is now offline 
[30861.632409][T22008] smpboot: CPU 6 is now offline 
[30862.051962][   T46] IRQ 125: no longer affine to CPU7 
[30862.076156][T22008] smpboot: CPU 7 is now offline 
[30862.577746][T22008] smpboot: CPU 8 is now offline 
[30863.014959][   T56] IRQ 123: no longer affine to CPU9 
[30863.040760][T22008] smpboot: CPU 9 is now offline 
[30863.494227][T22008] smpboot: CPU 10 is now offline 
[30863.922548][   T66] IRQ 121: no longer affine to CPU11 
[30863.949053][T22008] smpboot: CPU 11 is now offline 
[30865.734181][T22008] smpboot: CPU 12 is now offline 
[30867.883120][T22008] smpboot: CPU 13 is now offline 
[30870.013237][T22008] smpboot: CPU 14 is now offline 
[30872.123324][T22008] smpboot: CPU 15 is now offline 
[30875.003486][T22008] smpboot: CPU 16 is now offline 
[30877.233482][T22008] smpboot: CPU 17 is now offline 
[30879.383349][T22008] smpboot: CPU 18 is now offline 
[30881.233183][T22008] smpboot: CPU 19 is now offline 
[30883.473628][T22008] smpboot: CPU 20 is now offline 
[30886.093474][T22008] smpboot: CPU 21 is now offline 
[30888.085684][T22008] smpboot: CPU 22 is now offline 
[30890.165790][T22008] smpboot: CPU 23 is now offline 
[30890.556773][  T132] IRQ 122: no longer affine to CPU24 
[30890.582921][T22008] smpboot: CPU 24 is now offline 
[30890.803132][  T137] IRQ 119: no longer affine to CPU25 
[30890.833943][T22008] smpboot: CPU 25 is now offline 
[30891.252422][  T142] IRQ 109: no longer affine to CPU26 
[30891.275535][  T142] IRQ 116: no longer affine to CPU26 
[30891.302447][T22008] smpboot: CPU 26 is now offline 
[30891.580237][  T147] IRQ 115: no longer affine to CPU27 
[30891.607772][  T147] IRQ fixup: irq 123 move in progress, old vector 34 
[30891.642439][T22008] smpboot: CPU 27 is now offline 
[30891.893758][  T152] IRQ 107: no longer affine to CPU28 
[30891.917209][  T152] IRQ 114: no longer affine to CPU28 
[30891.940381][  T152] IRQ fixup: irq 121 move in progress, old vector 34 
[30891.971105][T22008] smpboot: CPU 28 is now offline 
[30892.245251][  T157] IRQ 108: no longer affine to CPU29 
[30892.272343][  T157] IRQ 113: no longer affine to CPU29 
[30892.300690][T22008] smpboot: CPU 29 is now offline 
[30892.574296][  T162] IRQ 106: no longer affine to CPU30 
[30892.598705][T22008] smpboot: CPU 30 is now offline 
[30892.801901][T22008] smpboot: CPU 31 is now offline 
[30893.246486][T22008] smpboot: CPU 32 is now offline 
[30893.549519][T22008] smpboot: CPU 33 is now offline 
[30893.908288][T22008] smpboot: CPU 34 is now offline 
[30894.091385][T22008] smpboot: CPU 35 is now offline 
[30894.472537][ T2794] ------------[ cut here ]------------ 
[30894.498223][ T2794] WARNING: CPU: 0 PID: 2794 at arch/x86/kernel/apic/msi.c:103 msi_set_affinity+0x278/0x330 
[30894.543016][ T2794] Modules linked in: brd ext4 crc16 mbcache jbd2 loop nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass intel_cstate intel_uncore dax_pmem dax_pmem_core intel_rapl_perf efivars ip_tables x_tables xfs sd_mod ahci libahci libata igb hpsa i2c_algo_bit scsi_transport_sas i2c_core tg3 firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod efivarfs [last unloaded: binfmt_misc] 
[30894.709559][ T2794] CPU: 0 PID: 2794 Comm: irqbalance Tainted: G             L    5.6.0-rc1-next-20200211 #1 
[30894.753946][ T2794] Hardware name: HP ProLiant XL450 Gen9 Server/ProLiant XL450 Gen9 Server, BIOS U21 05/05/2016 
[30894.799739][ T2794] RIP: 0010:msi_set_affinity+0x278/0x330 
[30894.824950][ T2794] Code: 49 c7 44 05 00 00 00 00 00 44 89 f0 48 8b 4d d0 65 48 33 0c 25 28 00 00 00 75 6b 48 83 ec 80 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 4c 89 e6 48 89 df e8 9b fc ff ff eb bd 48 8d 7b 18 e8 20 05 
[30894.912256][ T2794] RSP: 0018:ffffc90005e4fa58 EFLAGS: 00010093 
[30894.939043][ T2794] RAX: 0000000000000000 RBX: ffff88836f0b0428 RCX: ffffffffa5e8ef91 
[30894.974228][ T2794] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff88836dfe8c40 
[30895.009472][ T2794] RBP: ffffc90005e4fb00 R08: fffffbfff4e98461 R09: fffffbfff4e98461 
[30895.044982][ T2794] R10: fffffbfff4e98460 R11: ffffffffa74c2303 R12: ffff88836dfe8c40 
[30895.080170][ T2794] R13: 1ffff92000bc9f4f R14: 0000000000000000 R15: ffff88836dfe8c44 
[30895.115503][ T2794] FS:  00007f30e37a3e80(0000) GS:ffff888452600000(0000) knlGS:0000000000000000 
[30895.160815][ T2794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[30895.189818][ T2794] CR2: 00007ffdac9e4f20 CR3: 00000007268d0002 CR4: 00000000001606f0 
[30895.225606][ T2794] Call Trace: 
[30895.239777][ T2794]  ? irq_msi_update_msg+0xe0/0xe0 
[30895.261672][ T2794]  ? rwlock_bug.part.1+0x60/0x60 
[30895.284071][ T2794]  irq_do_set_affinity+0x75/0x180 
irq_do_set_affinity+0x75/0x180:
irq_do_set_affinity at kernel/irq/manage.c:259
[30895.306065][ T2794]  irq_setup_affinity+0x167/0x250 
irq_setup_affinity at kernel/irq/manage.c:474
[30895.328157][ T2794]  irq_select_affinity_usr+0x30/0x50 
irq_select_affinity_usr at kernel/irq/manage.c:496
[30895.351507][ T2794]  write_irq_affinity.isra.0+0x137/0x1e0 
[30895.376891][ T2794]  ? irq_node_proc_show+0x50/0x50 
[30895.398821][ T2794]  ? match_held_lock+0x35/0x250 
[30895.419982][ T2794]  irq_affinity_proc_write+0x19/0x20 
[30895.444225][ T2794]  proc_reg_write+0x12e/0x190 
[30895.464538][ T2794]  ? proc_reg_read+0x190/0x190 
[30895.485249][ T2794]  ? rcu_read_lock_held+0xc0/0xc0 
[30895.507212][ T2794]  __vfs_write+0x50/0xa0 
[30895.526323][ T2794]  vfs_write+0x105/0x280 
[30895.545621][ T2794]  ksys_write+0xc6/0x160 
[30895.564993][ T2794]  ? __x64_sys_read+0x50/0x50 
[30895.585337][ T2794]  ? do_syscall_64+0x79/0xaec 
[30895.605686][ T2794]  ? do_syscall_64+0x79/0xaec 
[30895.626066][ T2794]  __x64_sys_write+0x43/0x50 
[30895.648476][ T2794]  do_syscall_64+0xcc/0xaec 
[30895.671473][ T2794]  ? trace_hardirqs_on_thunk+0x1a/0x1c 
[30895.696452][ T2794]  ? syscall_return_slowpath+0x580/0x580 
[30895.721226][ T2794]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe 
[30895.748025][ T2794]  ? trace_hardirqs_off_caller+0x3a/0x150 
[30895.773033][ T2794]  ? trace_hardirqs_off_thunk+0x1a/0x1c 
[30895.797308][ T2794]  entry_SYSCALL_64_after_hwframe+0x49/0xbe 
[30895.823118][ T2794] RIP: 0033:0x7f30e23a1e4f 
[30895.842371][ T2794] Code: 00 00 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 23 ca 01 00 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 5c ca 01 00 48 
[30895.930359][ T2794] RSP: 002b:00007ffdac9e1760 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 
[30895.967557][ T2794] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f30e23a1e4f 
[30896.002786][ T2794] RDX: 0000000000000011 RSI: 0000556488e3d310 RDI: 0000000000000007 
[30896.038063][ T2794] RBP: 0000556488e3d310 R08: 0000000000000000 R09: 0000556486f3b200 
[30896.073471][ T2794] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000011 
[30896.108344][ T2794] R13: 0000000000000011 R14: 00007f30e266e740 R15: 0000000000000011 
[30896.143550][ T2794] irq event stamp: 378868746 
[30896.165644][ T2794] hardirqs last  enabled at (378868745): [<ffffffffa6210123>] quarantine_put+0x73/0x270 
[30896.214017][ T2794] hardirqs last disabled at (378868746): [<ffffffffa6892748>] _raw_spin_lock_irqsave+0x18/0x50 
[30896.261385][ T2794] softirqs last  enabled at (378868364): [<ffffffffa6c00447>] __do_softirq+0x447/0x766 
[30896.303183][ T2794] softirqs last disabled at (378867251): [<ffffffffa5ed1da6>] irq_exit+0xd6/0xf0 
[30896.343633][ T2794] ---[ end trace a0a160876a02c150 ]--- 
[30897.964246][T22008] smpboot: CPU 36 is now offline 
[30906.533125][T22008] smpboot: CPU 37 is now offline 
[30914.823295][T22008] smpboot: CPU 38 is now offline 
[30920.594545][T22008] smpboot: CPU 39 is now offline 
[30926.504299][T22008] smpboot: CPU 40 is now offline 
[30931.962884][T22008] smpboot: CPU 41 is now offline 
[30937.492970][T22008] smpboot: CPU 42 is now offline 
[30942.822984][T22008] smpboot: CPU 43 is now offline 
[30948.543231][T22008] smpboot: CPU 44 is now offline 
[30954.005524][T22008] smpboot: CPU 45 is now offline 
[30962.765295][T22008] smpboot: CPU 46 is now offline 
[30963.112374][  T247] irq_migrate_all_off_this_cpu: 10 callbacks suppressed 
[30963.112380][  T247] IRQ 3: no longer affine to CPU47



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-14  8:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12  1:56 "Plug non-maskable MSI affinity race" triggers a warning with CPU hotplugs Qian Cai
2020-02-12 11:19 ` Thomas Gleixner
2020-02-13  0:44   ` Qian Cai
2020-02-14  8:48   ` [tip: irq/urgent] genirq/proc: Reject invalid affinity masks (again) tip-bot2 for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.