* system hung up when offlining CPUs
@ 2017-08-08 19:25 YASUAKI ISHIMATSU
2017-08-09 11:42 ` Marc Zyngier
0 siblings, 1 reply; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-08-08 19:25 UTC (permalink / raw)
To: tglx; +Cc: axboe, marc.zyngier, mpe, keith.busch, peterz, yasu.isimatu, LKML
Hi Thomas,
When offlining all CPUs except cpu0, system hung up with the following message.
[...] INFO: task kworker/u384:1:1234 blocked for more than 120 seconds.
[...] Not tainted 4.12.0-rc6+ #19
[...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[...] kworker/u384:1 D 0 1234 2 0x00000000
[...] Workqueue: writeback wb_workfn (flush-253:0)
[...] Call Trace:
[...] __schedule+0x28a/0x880
[...] schedule+0x36/0x80
[...] schedule_timeout+0x249/0x300
[...] ? __schedule+0x292/0x880
[...] __down_common+0xfc/0x132
[...] ? _xfs_buf_find+0x2bb/0x510 [xfs]
[...] __down+0x1d/0x1f
[...] down+0x41/0x50
[...] xfs_buf_lock+0x3c/0xf0 [xfs]
[...] _xfs_buf_find+0x2bb/0x510 [xfs]
[...] xfs_buf_get_map+0x2a/0x280 [xfs]
[...] xfs_buf_read_map+0x2d/0x180 [xfs]
[...] xfs_trans_read_buf_map+0xf5/0x310 [xfs]
[...] xfs_btree_read_buf_block.constprop.35+0x78/0xc0 [xfs]
[...] xfs_btree_lookup_get_block+0x88/0x160 [xfs]
[...] xfs_btree_lookup+0xd0/0x3b0 [xfs]
[...] ? xfs_allocbt_init_cursor+0x41/0xe0 [xfs]
[...] xfs_alloc_ag_vextent_near+0xaf/0xaa0 [xfs]
[...] xfs_alloc_ag_vextent+0x13c/0x150 [xfs]
[...] xfs_alloc_vextent+0x425/0x590 [xfs]
[...] xfs_bmap_btalloc+0x448/0x770 [xfs]
[...] xfs_bmap_alloc+0xe/0x10 [xfs]
[...] xfs_bmapi_write+0x61d/0xc10 [xfs]
[...] ? kmem_zone_alloc+0x96/0x100 [xfs]
[...] xfs_iomap_write_allocate+0x199/0x3a0 [xfs]
[...] xfs_map_blocks+0x1e8/0x260 [xfs]
[...] xfs_do_writepage+0x1ca/0x680 [xfs]
[...] write_cache_pages+0x26f/0x510
[...] ? xfs_vm_set_page_dirty+0x1d0/0x1d0 [xfs]
[...] ? blk_mq_dispatch_rq_list+0x305/0x410
[...] ? deadline_remove_request+0x7d/0xc0
[...] xfs_vm_writepages+0xb6/0xd0 [xfs]
[...] do_writepages+0x1c/0x70
[...] __writeback_single_inode+0x45/0x320
[...] writeback_sb_inodes+0x280/0x570
[...] __writeback_inodes_wb+0x8c/0xc0
[...] wb_writeback+0x276/0x310
[...] ? get_nr_dirty_inodes+0x4d/0x80
[...] wb_workfn+0x2d4/0x3b0
[...] process_one_work+0x149/0x360
[...] worker_thread+0x4d/0x3c0
[...] kthread+0x109/0x140
[...] ? rescuer_thread+0x380/0x380
[...] ? kthread_park+0x60/0x60
[...] ret_from_fork+0x25/0x30
I bisected upstream kernel. And I found that the following commit lead
the issue.
commit c5cb83bb337c25caae995d992d1cdf9b317f83de
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Tue Jun 20 01:37:51 2017 +0200
genirq/cpuhotplug: Handle managed IRQs on CPU hotplug
Thanks,
Yasuaki Ishimatsu
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-08-08 19:25 system hung up when offlining CPUs YASUAKI ISHIMATSU
@ 2017-08-09 11:42 ` Marc Zyngier
2017-08-09 19:09 ` YASUAKI ISHIMATSU
0 siblings, 1 reply; 43+ messages in thread
From: Marc Zyngier @ 2017-08-09 11:42 UTC (permalink / raw)
To: YASUAKI ISHIMATSU; +Cc: tglx, axboe, mpe, keith.busch, peterz, LKML
On Tue, 8 Aug 2017 15:25:35 -0400
YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> wrote:
> Hi Thomas,
>
> When offlining all CPUs except cpu0, system hung up with the following message.
>
> [...] INFO: task kworker/u384:1:1234 blocked for more than 120 seconds.
> [...] Not tainted 4.12.0-rc6+ #19
> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [...] kworker/u384:1 D 0 1234 2 0x00000000
> [...] Workqueue: writeback wb_workfn (flush-253:0)
> [...] Call Trace:
> [...] __schedule+0x28a/0x880
> [...] schedule+0x36/0x80
> [...] schedule_timeout+0x249/0x300
> [...] ? __schedule+0x292/0x880
> [...] __down_common+0xfc/0x132
> [...] ? _xfs_buf_find+0x2bb/0x510 [xfs]
> [...] __down+0x1d/0x1f
> [...] down+0x41/0x50
> [...] xfs_buf_lock+0x3c/0xf0 [xfs]
> [...] _xfs_buf_find+0x2bb/0x510 [xfs]
> [...] xfs_buf_get_map+0x2a/0x280 [xfs]
> [...] xfs_buf_read_map+0x2d/0x180 [xfs]
> [...] xfs_trans_read_buf_map+0xf5/0x310 [xfs]
> [...] xfs_btree_read_buf_block.constprop.35+0x78/0xc0 [xfs]
> [...] xfs_btree_lookup_get_block+0x88/0x160 [xfs]
> [...] xfs_btree_lookup+0xd0/0x3b0 [xfs]
> [...] ? xfs_allocbt_init_cursor+0x41/0xe0 [xfs]
> [...] xfs_alloc_ag_vextent_near+0xaf/0xaa0 [xfs]
> [...] xfs_alloc_ag_vextent+0x13c/0x150 [xfs]
> [...] xfs_alloc_vextent+0x425/0x590 [xfs]
> [...] xfs_bmap_btalloc+0x448/0x770 [xfs]
> [...] xfs_bmap_alloc+0xe/0x10 [xfs]
> [...] xfs_bmapi_write+0x61d/0xc10 [xfs]
> [...] ? kmem_zone_alloc+0x96/0x100 [xfs]
> [...] xfs_iomap_write_allocate+0x199/0x3a0 [xfs]
> [...] xfs_map_blocks+0x1e8/0x260 [xfs]
> [...] xfs_do_writepage+0x1ca/0x680 [xfs]
> [...] write_cache_pages+0x26f/0x510
> [...] ? xfs_vm_set_page_dirty+0x1d0/0x1d0 [xfs]
> [...] ? blk_mq_dispatch_rq_list+0x305/0x410
> [...] ? deadline_remove_request+0x7d/0xc0
> [...] xfs_vm_writepages+0xb6/0xd0 [xfs]
> [...] do_writepages+0x1c/0x70
> [...] __writeback_single_inode+0x45/0x320
> [...] writeback_sb_inodes+0x280/0x570
> [...] __writeback_inodes_wb+0x8c/0xc0
> [...] wb_writeback+0x276/0x310
> [...] ? get_nr_dirty_inodes+0x4d/0x80
> [...] wb_workfn+0x2d4/0x3b0
> [...] process_one_work+0x149/0x360
> [...] worker_thread+0x4d/0x3c0
> [...] kthread+0x109/0x140
> [...] ? rescuer_thread+0x380/0x380
> [...] ? kthread_park+0x60/0x60
> [...] ret_from_fork+0x25/0x30
>
>
> I bisected upstream kernel. And I found that the following commit lead
> the issue.
>
> commit c5cb83bb337c25caae995d992d1cdf9b317f83de
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Tue Jun 20 01:37:51 2017 +0200
>
> genirq/cpuhotplug: Handle managed IRQs on CPU hotplug
Can you please post your /proc/interrupts and details of which
interrupt you think goes wrong? This backtrace is not telling us much
in terms of where to start looking...
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-08-09 11:42 ` Marc Zyngier
@ 2017-08-09 19:09 ` YASUAKI ISHIMATSU
2017-08-10 11:54 ` Marc Zyngier
0 siblings, 1 reply; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-08-09 19:09 UTC (permalink / raw)
To: Marc Zyngier; +Cc: tglx, axboe, mpe, keith.busch, peterz, LKML, yasu.isimatu
Hi Marc,
On 08/09/2017 07:42 AM, Marc Zyngier wrote:
> On Tue, 8 Aug 2017 15:25:35 -0400
> YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> wrote:
>
>> Hi Thomas,
>>
>> When offlining all CPUs except cpu0, system hung up with the following message.
>>
>> [...] INFO: task kworker/u384:1:1234 blocked for more than 120 seconds.
>> [...] Not tainted 4.12.0-rc6+ #19
>> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [...] kworker/u384:1 D 0 1234 2 0x00000000
>> [...] Workqueue: writeback wb_workfn (flush-253:0)
>> [...] Call Trace:
>> [...] __schedule+0x28a/0x880
>> [...] schedule+0x36/0x80
>> [...] schedule_timeout+0x249/0x300
>> [...] ? __schedule+0x292/0x880
>> [...] __down_common+0xfc/0x132
>> [...] ? _xfs_buf_find+0x2bb/0x510 [xfs]
>> [...] __down+0x1d/0x1f
>> [...] down+0x41/0x50
>> [...] xfs_buf_lock+0x3c/0xf0 [xfs]
>> [...] _xfs_buf_find+0x2bb/0x510 [xfs]
>> [...] xfs_buf_get_map+0x2a/0x280 [xfs]
>> [...] xfs_buf_read_map+0x2d/0x180 [xfs]
>> [...] xfs_trans_read_buf_map+0xf5/0x310 [xfs]
>> [...] xfs_btree_read_buf_block.constprop.35+0x78/0xc0 [xfs]
>> [...] xfs_btree_lookup_get_block+0x88/0x160 [xfs]
>> [...] xfs_btree_lookup+0xd0/0x3b0 [xfs]
>> [...] ? xfs_allocbt_init_cursor+0x41/0xe0 [xfs]
>> [...] xfs_alloc_ag_vextent_near+0xaf/0xaa0 [xfs]
>> [...] xfs_alloc_ag_vextent+0x13c/0x150 [xfs]
>> [...] xfs_alloc_vextent+0x425/0x590 [xfs]
>> [...] xfs_bmap_btalloc+0x448/0x770 [xfs]
>> [...] xfs_bmap_alloc+0xe/0x10 [xfs]
>> [...] xfs_bmapi_write+0x61d/0xc10 [xfs]
>> [...] ? kmem_zone_alloc+0x96/0x100 [xfs]
>> [...] xfs_iomap_write_allocate+0x199/0x3a0 [xfs]
>> [...] xfs_map_blocks+0x1e8/0x260 [xfs]
>> [...] xfs_do_writepage+0x1ca/0x680 [xfs]
>> [...] write_cache_pages+0x26f/0x510
>> [...] ? xfs_vm_set_page_dirty+0x1d0/0x1d0 [xfs]
>> [...] ? blk_mq_dispatch_rq_list+0x305/0x410
>> [...] ? deadline_remove_request+0x7d/0xc0
>> [...] xfs_vm_writepages+0xb6/0xd0 [xfs]
>> [...] do_writepages+0x1c/0x70
>> [...] __writeback_single_inode+0x45/0x320
>> [...] writeback_sb_inodes+0x280/0x570
>> [...] __writeback_inodes_wb+0x8c/0xc0
>> [...] wb_writeback+0x276/0x310
>> [...] ? get_nr_dirty_inodes+0x4d/0x80
>> [...] wb_workfn+0x2d4/0x3b0
>> [...] process_one_work+0x149/0x360
>> [...] worker_thread+0x4d/0x3c0
>> [...] kthread+0x109/0x140
>> [...] ? rescuer_thread+0x380/0x380
>> [...] ? kthread_park+0x60/0x60
>> [...] ret_from_fork+0x25/0x30
>>
>>
>> I bisected upstream kernel. And I found that the following commit lead
>> the issue.
>>
>> commit c5cb83bb337c25caae995d992d1cdf9b317f83de
>> Author: Thomas Gleixner <tglx@linutronix.de>
>> Date: Tue Jun 20 01:37:51 2017 +0200
>>
>> genirq/cpuhotplug: Handle managed IRQs on CPU hotplug
>
> Can you please post your /proc/interrupts and details of which
> interrupt you think goes wrong? This backtrace is not telling us much
> in terms of where to start looking...
Thank you for giving advise.
The issue is easily reproduced on physical/virtual machine by offling CPUs except cpu0.
Here are my /proc/interrupts on kvm guest before reproducing the issue. And when offlining
cpu1, the issue occurred. But when offling cpu0, the issue didn't occur.
CPU0 CPU1
0: 127 0 IO-APIC 2-edge timer
1: 10 0 IO-APIC 1-edge i8042
4: 227 0 IO-APIC 4-edge ttyS0
6: 3 0 IO-APIC 6-edge floppy
8: 0 0 IO-APIC 8-edge rtc0
9: 0 0 IO-APIC 9-fasteoi acpi
10: 10822 0 IO-APIC 10-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, virtio3
11: 23 0 IO-APIC 11-fasteoi uhci_hcd:usb3, uhci_hcd:usb4, qxl
12: 15 0 IO-APIC 12-edge i8042
14: 218 0 IO-APIC 14-edge ata_piix
15: 0 0 IO-APIC 15-edge ata_piix
24: 0 0 PCI-MSI 49152-edge virtio0-config
25: 359 0 PCI-MSI 49153-edge virtio0-input.0
26: 1 0 PCI-MSI 49154-edge virtio0-output.0
27: 0 0 PCI-MSI 114688-edge virtio2-config
28: 1 3639 PCI-MSI 114689-edge virtio2-req.0
29: 0 0 PCI-MSI 98304-edge virtio1-config
30: 4 0 PCI-MSI 98305-edge virtio1-virtqueues
31: 189 0 PCI-MSI 65536-edge snd_hda_intel:card0
NMI: 0 0 Non-maskable interrupts
LOC: 16115 12845 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 0 0 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 3016 2135 Rescheduling interrupts
CAL: 3666 557 Function call interrupts
TLB: 65 12 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
DFR: 0 0 Deferred Error APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1 1 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 Posted-interrupt notification event
NPI: 0 0 Nested posted-interrupt event
PIW: 0 0 Posted-interrupt wakeup event
Thanks,
Yasuaki Ishimatsu
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-08-09 19:09 ` YASUAKI ISHIMATSU
@ 2017-08-10 11:54 ` Marc Zyngier
2017-08-21 12:07 ` Christoph Hellwig
2017-08-21 13:18 ` Christoph Hellwig
0 siblings, 2 replies; 43+ messages in thread
From: Marc Zyngier @ 2017-08-10 11:54 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: tglx, axboe, mpe, keith.busch, peterz, LKML, Christoph Hellwig
+ Christoph, since he's the one who came up with the idea
On 09/08/17 20:09, YASUAKI ISHIMATSU wrote:
> Hi Marc,
>
> On 08/09/2017 07:42 AM, Marc Zyngier wrote:
>> On Tue, 8 Aug 2017 15:25:35 -0400
>> YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> wrote:
>>
>>> Hi Thomas,
>>>
>>> When offlining all CPUs except cpu0, system hung up with the following message.
>>>
>>> [...] INFO: task kworker/u384:1:1234 blocked for more than 120 seconds.
>>> [...] Not tainted 4.12.0-rc6+ #19
>>> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [...] kworker/u384:1 D 0 1234 2 0x00000000
>>> [...] Workqueue: writeback wb_workfn (flush-253:0)
>>> [...] Call Trace:
>>> [...] __schedule+0x28a/0x880
>>> [...] schedule+0x36/0x80
>>> [...] schedule_timeout+0x249/0x300
>>> [...] ? __schedule+0x292/0x880
>>> [...] __down_common+0xfc/0x132
>>> [...] ? _xfs_buf_find+0x2bb/0x510 [xfs]
>>> [...] __down+0x1d/0x1f
>>> [...] down+0x41/0x50
>>> [...] xfs_buf_lock+0x3c/0xf0 [xfs]
>>> [...] _xfs_buf_find+0x2bb/0x510 [xfs]
>>> [...] xfs_buf_get_map+0x2a/0x280 [xfs]
>>> [...] xfs_buf_read_map+0x2d/0x180 [xfs]
>>> [...] xfs_trans_read_buf_map+0xf5/0x310 [xfs]
>>> [...] xfs_btree_read_buf_block.constprop.35+0x78/0xc0 [xfs]
>>> [...] xfs_btree_lookup_get_block+0x88/0x160 [xfs]
>>> [...] xfs_btree_lookup+0xd0/0x3b0 [xfs]
>>> [...] ? xfs_allocbt_init_cursor+0x41/0xe0 [xfs]
>>> [...] xfs_alloc_ag_vextent_near+0xaf/0xaa0 [xfs]
>>> [...] xfs_alloc_ag_vextent+0x13c/0x150 [xfs]
>>> [...] xfs_alloc_vextent+0x425/0x590 [xfs]
>>> [...] xfs_bmap_btalloc+0x448/0x770 [xfs]
>>> [...] xfs_bmap_alloc+0xe/0x10 [xfs]
>>> [...] xfs_bmapi_write+0x61d/0xc10 [xfs]
>>> [...] ? kmem_zone_alloc+0x96/0x100 [xfs]
>>> [...] xfs_iomap_write_allocate+0x199/0x3a0 [xfs]
>>> [...] xfs_map_blocks+0x1e8/0x260 [xfs]
>>> [...] xfs_do_writepage+0x1ca/0x680 [xfs]
>>> [...] write_cache_pages+0x26f/0x510
>>> [...] ? xfs_vm_set_page_dirty+0x1d0/0x1d0 [xfs]
>>> [...] ? blk_mq_dispatch_rq_list+0x305/0x410
>>> [...] ? deadline_remove_request+0x7d/0xc0
>>> [...] xfs_vm_writepages+0xb6/0xd0 [xfs]
>>> [...] do_writepages+0x1c/0x70
>>> [...] __writeback_single_inode+0x45/0x320
>>> [...] writeback_sb_inodes+0x280/0x570
>>> [...] __writeback_inodes_wb+0x8c/0xc0
>>> [...] wb_writeback+0x276/0x310
>>> [...] ? get_nr_dirty_inodes+0x4d/0x80
>>> [...] wb_workfn+0x2d4/0x3b0
>>> [...] process_one_work+0x149/0x360
>>> [...] worker_thread+0x4d/0x3c0
>>> [...] kthread+0x109/0x140
>>> [...] ? rescuer_thread+0x380/0x380
>>> [...] ? kthread_park+0x60/0x60
>>> [...] ret_from_fork+0x25/0x30
>>>
>>>
>>> I bisected upstream kernel. And I found that the following commit lead
>>> the issue.
>>>
>>> commit c5cb83bb337c25caae995d992d1cdf9b317f83de
>>> Author: Thomas Gleixner <tglx@linutronix.de>
>>> Date: Tue Jun 20 01:37:51 2017 +0200
>>>
>>> genirq/cpuhotplug: Handle managed IRQs on CPU hotplug
>>
>> Can you please post your /proc/interrupts and details of which
>> interrupt you think goes wrong? This backtrace is not telling us much
>> in terms of where to start looking...
>
> Thank you for giving advise.
>
> The issue is easily reproduced on physical/virtual machine by offling CPUs except cpu0.
> Here are my /proc/interrupts on kvm guest before reproducing the issue. And when offlining
> cpu1, the issue occurred. But when offling cpu0, the issue didn't occur.
>
> CPU0 CPU1
> 0: 127 0 IO-APIC 2-edge timer
> 1: 10 0 IO-APIC 1-edge i8042
> 4: 227 0 IO-APIC 4-edge ttyS0
> 6: 3 0 IO-APIC 6-edge floppy
> 8: 0 0 IO-APIC 8-edge rtc0
> 9: 0 0 IO-APIC 9-fasteoi acpi
> 10: 10822 0 IO-APIC 10-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, virtio3
> 11: 23 0 IO-APIC 11-fasteoi uhci_hcd:usb3, uhci_hcd:usb4, qxl
> 12: 15 0 IO-APIC 12-edge i8042
> 14: 218 0 IO-APIC 14-edge ata_piix
> 15: 0 0 IO-APIC 15-edge ata_piix
> 24: 0 0 PCI-MSI 49152-edge virtio0-config
> 25: 359 0 PCI-MSI 49153-edge virtio0-input.0
> 26: 1 0 PCI-MSI 49154-edge virtio0-output.0
> 27: 0 0 PCI-MSI 114688-edge virtio2-config
> 28: 1 3639 PCI-MSI 114689-edge virtio2-req.0
> 29: 0 0 PCI-MSI 98304-edge virtio1-config
> 30: 4 0 PCI-MSI 98305-edge virtio1-virtqueues
> 31: 189 0 PCI-MSI 65536-edge snd_hda_intel:card0
> NMI: 0 0 Non-maskable interrupts
> LOC: 16115 12845 Local timer interrupts
> SPU: 0 0 Spurious interrupts
> PMI: 0 0 Performance monitoring interrupts
> IWI: 0 0 IRQ work interrupts
> RTR: 0 0 APIC ICR read retries
> RES: 3016 2135 Rescheduling interrupts
> CAL: 3666 557 Function call interrupts
> TLB: 65 12 TLB shootdowns
> TRM: 0 0 Thermal event interrupts
> THR: 0 0 Threshold APIC interrupts
> DFR: 0 0 Deferred Error APIC interrupts
> MCE: 0 0 Machine check exceptions
> MCP: 1 1 Machine check polls
> ERR: 0
> MIS: 0
> PIN: 0 0 Posted-interrupt notification event
> NPI: 0 0 Nested posted-interrupt event
> PIW: 0 0 Posted-interrupt wakeup event
I was able to reproduce this with an arm64 VM pretty easily. The issue
is that (in the above case), IRQ27 is affine to CPU0, and IRQ28 to CPU1.
If you have more CPUs, IRQ27 is affine to the first half of the CPUs,
and IRQ28 to the others.
When CPU1 is offlined, the fact that we have a "managed" interrupt
affinity prevents the interrupt from being moved to CPU0, and you loose
your disk. I don't think that's the expected effect... The opposite case
(offlining CPU0) only "works" because you're not getting any config
interrupt (IOW, you're lucky).
I'm not sure how this is supposed to work. Shutting down the interrupt
in migrate_one_irq() just breaks everything (unless the driver somehow
knows about it, which doesn't seem to be the case).
Christoph?
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-08-10 11:54 ` Marc Zyngier
@ 2017-08-21 12:07 ` Christoph Hellwig
2017-08-21 13:18 ` Christoph Hellwig
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2017-08-21 12:07 UTC (permalink / raw)
To: Marc Zyngier
Cc: YASUAKI ISHIMATSU, tglx, axboe, mpe, keith.busch, peterz, LKML,
Christoph Hellwig
Hi Marc,
in general the driver should know not to use the queue / irq,
as blk-mq will never schedule I/O to queues that have no online
cpus.
The real bugs seems to be that we're using affinity for a device
that only has one real queue (as the config queue should not
have affinity). Let me dig into what's going on here with virtio.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-08-10 11:54 ` Marc Zyngier
2017-08-21 12:07 ` Christoph Hellwig
@ 2017-08-21 13:18 ` Christoph Hellwig
2017-08-21 13:37 ` Marc Zyngier
1 sibling, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-08-21 13:18 UTC (permalink / raw)
To: Marc Zyngier
Cc: YASUAKI ISHIMATSU, tglx, axboe, mpe, keith.busch, peterz, LKML,
Christoph Hellwig
Can you try the patch below please?
---
>From d5f59cb7a629de8439b318e1384660e6b56e7dd8 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 21 Aug 2017 14:24:11 +0200
Subject: virtio_pci: fix cpu affinity support
Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
virtqueues"") removed the adjustment of the pre_vectors for the virtio
MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
allow drivers to request IRQ affinity when creating VQs"). This will
lead to an incorrect assignment of MSI-X vectors, and potential
deadlocks when offlining cpus.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
---
drivers/virtio/virtio_pci_common.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 007a4f366086..1c4797e53f68 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -107,6 +107,7 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
const char *name = dev_name(&vp_dev->vdev.dev);
+ unsigned flags = PCI_IRQ_MSIX;
unsigned i, v;
int err = -ENOMEM;
@@ -126,10 +127,13 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors,
GFP_KERNEL))
goto error;
+ if (desc) {
+ flags |= PCI_IRQ_AFFINITY;
+ desc->pre_vectors++; /* virtio config vector */
+ }
+
err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
- nvectors, PCI_IRQ_MSIX |
- (desc ? PCI_IRQ_AFFINITY : 0),
- desc);
+ nvectors, flags, desc);
if (err < 0)
goto error;
vp_dev->msix_enabled = 1;
--
2.11.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-08-21 13:18 ` Christoph Hellwig
@ 2017-08-21 13:37 ` Marc Zyngier
2017-09-07 20:23 ` YASUAKI ISHIMATSU
0 siblings, 1 reply; 43+ messages in thread
From: Marc Zyngier @ 2017-08-21 13:37 UTC (permalink / raw)
To: Christoph Hellwig
Cc: YASUAKI ISHIMATSU, tglx, axboe, mpe, keith.busch, peterz, LKML
On 21/08/17 14:18, Christoph Hellwig wrote:
> Can you try the patch below please?
>
> ---
> From d5f59cb7a629de8439b318e1384660e6b56e7dd8 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Mon, 21 Aug 2017 14:24:11 +0200
> Subject: virtio_pci: fix cpu affinity support
>
> Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
> virtqueues"") removed the adjustment of the pre_vectors for the virtio
> MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
> allow drivers to request IRQ affinity when creating VQs"). This will
> lead to an incorrect assignment of MSI-X vectors, and potential
> deadlocks when offlining cpus.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Just gave it a go on an arm64 VM, and the behaviour seems much saner
(the virtio queue affinity now spans the whole system).
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Thanks,
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-08-21 13:37 ` Marc Zyngier
@ 2017-09-07 20:23 ` YASUAKI ISHIMATSU
2017-09-12 18:15 ` YASUAKI ISHIMATSU
0 siblings, 1 reply; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-09-07 20:23 UTC (permalink / raw)
To: Marc Zyngier, Christoph Hellwig
Cc: tglx, axboe, mpe, keith.busch, peterz, LKML, yasu.isimatu
Hi Mark and Christoph,
Sorry for the late reply. I appreciated that you fixed the issue on kvm environment.
But the issue still occurs on physical server.
Here ares irq information that I summarized megasas irqs from /proc/interrupts
and /proc/irq/*/smp_affinity_list on my server:
---
IRQ affinity_list IRQ_TYPE
42 0-5 IR-PCI-MSI 1048576-edge megasas
43 0-5 IR-PCI-MSI 1048577-edge megasas
44 0-5 IR-PCI-MSI 1048578-edge megasas
45 0-5 IR-PCI-MSI 1048579-edge megasas
46 0-5 IR-PCI-MSI 1048580-edge megasas
47 0-5 IR-PCI-MSI 1048581-edge megasas
48 0-5 IR-PCI-MSI 1048582-edge megasas
49 0-5 IR-PCI-MSI 1048583-edge megasas
50 0-5 IR-PCI-MSI 1048584-edge megasas
51 0-5 IR-PCI-MSI 1048585-edge megasas
52 0-5 IR-PCI-MSI 1048586-edge megasas
53 0-5 IR-PCI-MSI 1048587-edge megasas
54 0-5 IR-PCI-MSI 1048588-edge megasas
55 0-5 IR-PCI-MSI 1048589-edge megasas
56 0-5 IR-PCI-MSI 1048590-edge megasas
57 0-5 IR-PCI-MSI 1048591-edge megasas
58 0-5 IR-PCI-MSI 1048592-edge megasas
59 0-5 IR-PCI-MSI 1048593-edge megasas
60 0-5 IR-PCI-MSI 1048594-edge megasas
61 0-5 IR-PCI-MSI 1048595-edge megasas
62 0-5 IR-PCI-MSI 1048596-edge megasas
63 0-5 IR-PCI-MSI 1048597-edge megasas
64 0-5 IR-PCI-MSI 1048598-edge megasas
65 0-5 IR-PCI-MSI 1048599-edge megasas
66 24-29 IR-PCI-MSI 1048600-edge megasas
67 24-29 IR-PCI-MSI 1048601-edge megasas
68 24-29 IR-PCI-MSI 1048602-edge megasas
69 24-29 IR-PCI-MSI 1048603-edge megasas
70 24-29 IR-PCI-MSI 1048604-edge megasas
71 24-29 IR-PCI-MSI 1048605-edge megasas
72 24-29 IR-PCI-MSI 1048606-edge megasas
73 24-29 IR-PCI-MSI 1048607-edge megasas
74 24-29 IR-PCI-MSI 1048608-edge megasas
75 24-29 IR-PCI-MSI 1048609-edge megasas
76 24-29 IR-PCI-MSI 1048610-edge megasas
77 24-29 IR-PCI-MSI 1048611-edge megasas
78 24-29 IR-PCI-MSI 1048612-edge megasas
79 24-29 IR-PCI-MSI 1048613-edge megasas
80 24-29 IR-PCI-MSI 1048614-edge megasas
81 24-29 IR-PCI-MSI 1048615-edge megasas
82 24-29 IR-PCI-MSI 1048616-edge megasas
83 24-29 IR-PCI-MSI 1048617-edge megasas
84 24-29 IR-PCI-MSI 1048618-edge megasas
85 24-29 IR-PCI-MSI 1048619-edge megasas
86 24-29 IR-PCI-MSI 1048620-edge megasas
87 24-29 IR-PCI-MSI 1048621-edge megasas
88 24-29 IR-PCI-MSI 1048622-edge megasas
89 24-29 IR-PCI-MSI 1048623-edge megasas
---
In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline CPU#24-29,
I/O does not work, showing the following messages.
---
[...] sd 0:2:0:0: [sda] tag#1 task abort called for scmd(ffff8820574d7560)
[...] sd 0:2:0:0: [sda] tag#1 CDB: Read(10) 28 00 0d e8 cf 78 00 00 08 00
[...] sd 0:2:0:0: task abort: FAILED scmd(ffff8820574d7560)
[...] sd 0:2:0:0: [sda] tag#0 task abort called for scmd(ffff882057426560)
[...] sd 0:2:0:0: [sda] tag#0 CDB: Write(10) 2a 00 0d 58 37 00 00 00 08 00
[...] sd 0:2:0:0: task abort: FAILED scmd(ffff882057426560)
[...] sd 0:2:0:0: target reset called for scmd(ffff8820574d7560)
[...] sd 0:2:0:0: [sda] tag#1 megasas: target reset FAILED!!
[...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO timeout
[...] SCSI command pointer: (ffff882057426560) SCSI host state: 5 SCSI
[...] IO request frame:
[...]
<snip>
[...]
[...] megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
[...] INFO: task auditd:1200 blocked for more than 120 seconds.
[...] Not tainted 4.13.0+ #15
[...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[...] auditd D 0 1200 1 0x00000000
[...] Call Trace:
[...] __schedule+0x28d/0x890
[...] schedule+0x36/0x80
[...] io_schedule+0x16/0x40
[...] wait_on_page_bit_common+0x109/0x1c0
[...] ? page_cache_tree_insert+0xf0/0xf0
[...] __filemap_fdatawait_range+0x127/0x190
[...] ? __filemap_fdatawrite_range+0xd1/0x100
[...] file_write_and_wait_range+0x60/0xb0
[...] xfs_file_fsync+0x67/0x1d0 [xfs]
[...] vfs_fsync_range+0x3d/0xb0
[...] do_fsync+0x3d/0x70
[...] SyS_fsync+0x10/0x20
[...] entry_SYSCALL_64_fastpath+0x1a/0xa5
[...] RIP: 0033:0x7f0bd9633d2d
[...] RSP: 002b:00007f0bd751ed30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[...] RAX: ffffffffffffffda RBX: 00005590566d0080 RCX: 00007f0bd9633d2d
[...] RDX: 00005590566d1260 RSI: 0000000000000000 RDI: 0000000000000005
[...] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000017
[...] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[...] R13: 00007f0bd751f9c0 R14: 00007f0bd751f700 R15: 0000000000000000
---
Thanks,
Yasuaki Ishimatsu
On 08/21/2017 09:37 AM, Marc Zyngier wrote:
> On 21/08/17 14:18, Christoph Hellwig wrote:
>> Can you try the patch below please?
>>
>> ---
>> From d5f59cb7a629de8439b318e1384660e6b56e7dd8 Mon Sep 17 00:00:00 2001
>> From: Christoph Hellwig <hch@lst.de>
>> Date: Mon, 21 Aug 2017 14:24:11 +0200
>> Subject: virtio_pci: fix cpu affinity support
>>
>> Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
>> virtqueues"") removed the adjustment of the pre_vectors for the virtio
>> MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
>> allow drivers to request IRQ affinity when creating VQs"). This will
>> lead to an incorrect assignment of MSI-X vectors, and potential
>> deadlocks when offlining cpus.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
>> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
>
> Just gave it a go on an arm64 VM, and the behaviour seems much saner
> (the virtio queue affinity now spans the whole system).
>
> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-09-07 20:23 ` YASUAKI ISHIMATSU
@ 2017-09-12 18:15 ` YASUAKI ISHIMATSU
2017-09-13 11:13 ` Hannes Reinecke
0 siblings, 1 reply; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-09-12 18:15 UTC (permalink / raw)
To: Marc Zyngier, Christoph Hellwig
Cc: tglx, axboe, mpe, keith.busch, peterz, LKML, linux-scsi,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara
+ linux-scsi and maintainers of megasas
When offlining CPU, I/O stops. Do you have any ideas?
On 09/07/2017 04:23 PM, YASUAKI ISHIMATSU wrote:
> Hi Mark and Christoph,
>
> Sorry for the late reply. I appreciated that you fixed the issue on kvm environment.
> But the issue still occurs on physical server.
>
> Here ares irq information that I summarized megasas irqs from /proc/interrupts
> and /proc/irq/*/smp_affinity_list on my server:
>
> ---
> IRQ affinity_list IRQ_TYPE
> 42 0-5 IR-PCI-MSI 1048576-edge megasas
> 43 0-5 IR-PCI-MSI 1048577-edge megasas
> 44 0-5 IR-PCI-MSI 1048578-edge megasas
> 45 0-5 IR-PCI-MSI 1048579-edge megasas
> 46 0-5 IR-PCI-MSI 1048580-edge megasas
> 47 0-5 IR-PCI-MSI 1048581-edge megasas
> 48 0-5 IR-PCI-MSI 1048582-edge megasas
> 49 0-5 IR-PCI-MSI 1048583-edge megasas
> 50 0-5 IR-PCI-MSI 1048584-edge megasas
> 51 0-5 IR-PCI-MSI 1048585-edge megasas
> 52 0-5 IR-PCI-MSI 1048586-edge megasas
> 53 0-5 IR-PCI-MSI 1048587-edge megasas
> 54 0-5 IR-PCI-MSI 1048588-edge megasas
> 55 0-5 IR-PCI-MSI 1048589-edge megasas
> 56 0-5 IR-PCI-MSI 1048590-edge megasas
> 57 0-5 IR-PCI-MSI 1048591-edge megasas
> 58 0-5 IR-PCI-MSI 1048592-edge megasas
> 59 0-5 IR-PCI-MSI 1048593-edge megasas
> 60 0-5 IR-PCI-MSI 1048594-edge megasas
> 61 0-5 IR-PCI-MSI 1048595-edge megasas
> 62 0-5 IR-PCI-MSI 1048596-edge megasas
> 63 0-5 IR-PCI-MSI 1048597-edge megasas
> 64 0-5 IR-PCI-MSI 1048598-edge megasas
> 65 0-5 IR-PCI-MSI 1048599-edge megasas
> 66 24-29 IR-PCI-MSI 1048600-edge megasas
> 67 24-29 IR-PCI-MSI 1048601-edge megasas
> 68 24-29 IR-PCI-MSI 1048602-edge megasas
> 69 24-29 IR-PCI-MSI 1048603-edge megasas
> 70 24-29 IR-PCI-MSI 1048604-edge megasas
> 71 24-29 IR-PCI-MSI 1048605-edge megasas
> 72 24-29 IR-PCI-MSI 1048606-edge megasas
> 73 24-29 IR-PCI-MSI 1048607-edge megasas
> 74 24-29 IR-PCI-MSI 1048608-edge megasas
> 75 24-29 IR-PCI-MSI 1048609-edge megasas
> 76 24-29 IR-PCI-MSI 1048610-edge megasas
> 77 24-29 IR-PCI-MSI 1048611-edge megasas
> 78 24-29 IR-PCI-MSI 1048612-edge megasas
> 79 24-29 IR-PCI-MSI 1048613-edge megasas
> 80 24-29 IR-PCI-MSI 1048614-edge megasas
> 81 24-29 IR-PCI-MSI 1048615-edge megasas
> 82 24-29 IR-PCI-MSI 1048616-edge megasas
> 83 24-29 IR-PCI-MSI 1048617-edge megasas
> 84 24-29 IR-PCI-MSI 1048618-edge megasas
> 85 24-29 IR-PCI-MSI 1048619-edge megasas
> 86 24-29 IR-PCI-MSI 1048620-edge megasas
> 87 24-29 IR-PCI-MSI 1048621-edge megasas
> 88 24-29 IR-PCI-MSI 1048622-edge megasas
> 89 24-29 IR-PCI-MSI 1048623-edge megasas
> ---
>
> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline CPU#24-29,
> I/O does not work, showing the following messages.
>
> ---
> [...] sd 0:2:0:0: [sda] tag#1 task abort called for scmd(ffff8820574d7560)
> [...] sd 0:2:0:0: [sda] tag#1 CDB: Read(10) 28 00 0d e8 cf 78 00 00 08 00
> [...] sd 0:2:0:0: task abort: FAILED scmd(ffff8820574d7560)
> [...] sd 0:2:0:0: [sda] tag#0 task abort called for scmd(ffff882057426560)
> [...] sd 0:2:0:0: [sda] tag#0 CDB: Write(10) 2a 00 0d 58 37 00 00 00 08 00
> [...] sd 0:2:0:0: task abort: FAILED scmd(ffff882057426560)
> [...] sd 0:2:0:0: target reset called for scmd(ffff8820574d7560)
> [...] sd 0:2:0:0: [sda] tag#1 megasas: target reset FAILED!!
> [...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO timeout
> [...] SCSI command pointer: (ffff882057426560) SCSI host state: 5 SCSI
> [...] IO request frame:
> [...]
> <snip>
> [...]
> [...] megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
> [...] INFO: task auditd:1200 blocked for more than 120 seconds.
> [...] Not tainted 4.13.0+ #15
> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [...] auditd D 0 1200 1 0x00000000
> [...] Call Trace:
> [...] __schedule+0x28d/0x890
> [...] schedule+0x36/0x80
> [...] io_schedule+0x16/0x40
> [...] wait_on_page_bit_common+0x109/0x1c0
> [...] ? page_cache_tree_insert+0xf0/0xf0
> [...] __filemap_fdatawait_range+0x127/0x190
> [...] ? __filemap_fdatawrite_range+0xd1/0x100
> [...] file_write_and_wait_range+0x60/0xb0
> [...] xfs_file_fsync+0x67/0x1d0 [xfs]
> [...] vfs_fsync_range+0x3d/0xb0
> [...] do_fsync+0x3d/0x70
> [...] SyS_fsync+0x10/0x20
> [...] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [...] RIP: 0033:0x7f0bd9633d2d
> [...] RSP: 002b:00007f0bd751ed30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
> [...] RAX: ffffffffffffffda RBX: 00005590566d0080 RCX: 00007f0bd9633d2d
> [...] RDX: 00005590566d1260 RSI: 0000000000000000 RDI: 0000000000000005
> [...] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000017
> [...] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> [...] R13: 00007f0bd751f9c0 R14: 00007f0bd751f700 R15: 0000000000000000
> ---
>
> Thanks,
> Yasuaki Ishimatsu
>
> On 08/21/2017 09:37 AM, Marc Zyngier wrote:
>> On 21/08/17 14:18, Christoph Hellwig wrote:
>>> Can you try the patch below please?
>>>
>>> ---
>>> From d5f59cb7a629de8439b318e1384660e6b56e7dd8 Mon Sep 17 00:00:00 2001
>>> From: Christoph Hellwig <hch@lst.de>
>>> Date: Mon, 21 Aug 2017 14:24:11 +0200
>>> Subject: virtio_pci: fix cpu affinity support
>>>
>>> Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
>>> virtqueues"") removed the adjustment of the pre_vectors for the virtio
>>> MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
>>> allow drivers to request IRQ affinity when creating VQs"). This will
>>> lead to an incorrect assignment of MSI-X vectors, and potential
>>> deadlocks when offlining cpus.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
>>> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
>>
>> Just gave it a go on an arm64 VM, and the behaviour seems much saner
>> (the virtio queue affinity now spans the whole system).
>>
>> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
>>
>> Thanks,
>>
>> M.
>>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-09-12 18:15 ` YASUAKI ISHIMATSU
@ 2017-09-13 11:13 ` Hannes Reinecke
2017-09-13 11:35 ` Kashyap Desai
0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2017-09-13 11:13 UTC (permalink / raw)
To: YASUAKI ISHIMATSU, Marc Zyngier, Christoph Hellwig
Cc: tglx, axboe, mpe, keith.busch, peterz, LKML, linux-scsi,
kashyap.desai, sumit.saxena, shivasharan.srikanteshwara
On 09/12/2017 08:15 PM, YASUAKI ISHIMATSU wrote:
> + linux-scsi and maintainers of megasas
>
> When offlining CPU, I/O stops. Do you have any ideas?
>
> On 09/07/2017 04:23 PM, YASUAKI ISHIMATSU wrote:
>> Hi Mark and Christoph,
>>
>> Sorry for the late reply. I appreciated that you fixed the issue on kvm environment.
>> But the issue still occurs on physical server.
>>
>> Here ares irq information that I summarized megasas irqs from /proc/interrupts
>> and /proc/irq/*/smp_affinity_list on my server:
>>
>> ---
>> IRQ affinity_list IRQ_TYPE
>> 42 0-5 IR-PCI-MSI 1048576-edge megasas
>> 43 0-5 IR-PCI-MSI 1048577-edge megasas
>> 44 0-5 IR-PCI-MSI 1048578-edge megasas
>> 45 0-5 IR-PCI-MSI 1048579-edge megasas
>> 46 0-5 IR-PCI-MSI 1048580-edge megasas
>> 47 0-5 IR-PCI-MSI 1048581-edge megasas
>> 48 0-5 IR-PCI-MSI 1048582-edge megasas
>> 49 0-5 IR-PCI-MSI 1048583-edge megasas
>> 50 0-5 IR-PCI-MSI 1048584-edge megasas
>> 51 0-5 IR-PCI-MSI 1048585-edge megasas
>> 52 0-5 IR-PCI-MSI 1048586-edge megasas
>> 53 0-5 IR-PCI-MSI 1048587-edge megasas
>> 54 0-5 IR-PCI-MSI 1048588-edge megasas
>> 55 0-5 IR-PCI-MSI 1048589-edge megasas
>> 56 0-5 IR-PCI-MSI 1048590-edge megasas
>> 57 0-5 IR-PCI-MSI 1048591-edge megasas
>> 58 0-5 IR-PCI-MSI 1048592-edge megasas
>> 59 0-5 IR-PCI-MSI 1048593-edge megasas
>> 60 0-5 IR-PCI-MSI 1048594-edge megasas
>> 61 0-5 IR-PCI-MSI 1048595-edge megasas
>> 62 0-5 IR-PCI-MSI 1048596-edge megasas
>> 63 0-5 IR-PCI-MSI 1048597-edge megasas
>> 64 0-5 IR-PCI-MSI 1048598-edge megasas
>> 65 0-5 IR-PCI-MSI 1048599-edge megasas
>> 66 24-29 IR-PCI-MSI 1048600-edge megasas
>> 67 24-29 IR-PCI-MSI 1048601-edge megasas
>> 68 24-29 IR-PCI-MSI 1048602-edge megasas
>> 69 24-29 IR-PCI-MSI 1048603-edge megasas
>> 70 24-29 IR-PCI-MSI 1048604-edge megasas
>> 71 24-29 IR-PCI-MSI 1048605-edge megasas
>> 72 24-29 IR-PCI-MSI 1048606-edge megasas
>> 73 24-29 IR-PCI-MSI 1048607-edge megasas
>> 74 24-29 IR-PCI-MSI 1048608-edge megasas
>> 75 24-29 IR-PCI-MSI 1048609-edge megasas
>> 76 24-29 IR-PCI-MSI 1048610-edge megasas
>> 77 24-29 IR-PCI-MSI 1048611-edge megasas
>> 78 24-29 IR-PCI-MSI 1048612-edge megasas
>> 79 24-29 IR-PCI-MSI 1048613-edge megasas
>> 80 24-29 IR-PCI-MSI 1048614-edge megasas
>> 81 24-29 IR-PCI-MSI 1048615-edge megasas
>> 82 24-29 IR-PCI-MSI 1048616-edge megasas
>> 83 24-29 IR-PCI-MSI 1048617-edge megasas
>> 84 24-29 IR-PCI-MSI 1048618-edge megasas
>> 85 24-29 IR-PCI-MSI 1048619-edge megasas
>> 86 24-29 IR-PCI-MSI 1048620-edge megasas
>> 87 24-29 IR-PCI-MSI 1048621-edge megasas
>> 88 24-29 IR-PCI-MSI 1048622-edge megasas
>> 89 24-29 IR-PCI-MSI 1048623-edge megasas
>> ---
>>
>> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline CPU#24-29,
>> I/O does not work, showing the following messages.
>>
>> ---
>> [...] sd 0:2:0:0: [sda] tag#1 task abort called for scmd(ffff8820574d7560)
>> [...] sd 0:2:0:0: [sda] tag#1 CDB: Read(10) 28 00 0d e8 cf 78 00 00 08 00
>> [...] sd 0:2:0:0: task abort: FAILED scmd(ffff8820574d7560)
>> [...] sd 0:2:0:0: [sda] tag#0 task abort called for scmd(ffff882057426560)
>> [...] sd 0:2:0:0: [sda] tag#0 CDB: Write(10) 2a 00 0d 58 37 00 00 00 08 00
>> [...] sd 0:2:0:0: task abort: FAILED scmd(ffff882057426560)
>> [...] sd 0:2:0:0: target reset called for scmd(ffff8820574d7560)
>> [...] sd 0:2:0:0: [sda] tag#1 megasas: target reset FAILED!!
>> [...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO timeout
>> [...] SCSI command pointer: (ffff882057426560) SCSI host state: 5 SCSI
>> [...] IO request frame:
>> [...]
>> <snip>
>> [...]
>> [...] megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
>> [...] INFO: task auditd:1200 blocked for more than 120 seconds.
>> [...] Not tainted 4.13.0+ #15
>> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [...] auditd D 0 1200 1 0x00000000
>> [...] Call Trace:
>> [...] __schedule+0x28d/0x890
>> [...] schedule+0x36/0x80
>> [...] io_schedule+0x16/0x40
>> [...] wait_on_page_bit_common+0x109/0x1c0
>> [...] ? page_cache_tree_insert+0xf0/0xf0
>> [...] __filemap_fdatawait_range+0x127/0x190
>> [...] ? __filemap_fdatawrite_range+0xd1/0x100
>> [...] file_write_and_wait_range+0x60/0xb0
>> [...] xfs_file_fsync+0x67/0x1d0 [xfs]
>> [...] vfs_fsync_range+0x3d/0xb0
>> [...] do_fsync+0x3d/0x70
>> [...] SyS_fsync+0x10/0x20
>> [...] entry_SYSCALL_64_fastpath+0x1a/0xa5
>> [...] RIP: 0033:0x7f0bd9633d2d
>> [...] RSP: 002b:00007f0bd751ed30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
>> [...] RAX: ffffffffffffffda RBX: 00005590566d0080 RCX: 00007f0bd9633d2d
>> [...] RDX: 00005590566d1260 RSI: 0000000000000000 RDI: 0000000000000005
>> [...] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000017
>> [...] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
>> [...] R13: 00007f0bd751f9c0 R14: 00007f0bd751f700 R15: 0000000000000000
>> ---
>>
>> Thanks,
>> Yasuaki Ishimatsu
>>
This indeed looks like a problem.
We're going to great lengths to submit and complete I/O on the same CPU,
so if the CPU is offlined while I/O is in flight we won't be getting a
completion for this particular I/O.
However, the megasas driver should be able to cope with this situation;
after all, the firmware maintains completions queues, so it would be
dead easy to look at _other_ completions queues, too, if a timeout occurs.
Also the IRQ affinity looks bogus (we should spread IRQs to _all_ CPUs,
not just a subset), and the driver should make sure to receive
completions even if the respective CPUs are offlined.
Alternatively it should not try to submit a command abort via an
offlined CPUs; that's guaranteed to run into the same problems.
So it looks more like a driver issue to me...
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: system hung up when offlining CPUs
2017-09-13 11:13 ` Hannes Reinecke
@ 2017-09-13 11:35 ` Kashyap Desai
0 siblings, 0 replies; 43+ messages in thread
From: Kashyap Desai @ 2017-09-13 11:35 UTC (permalink / raw)
To: Hannes Reinecke, YASUAKI ISHIMATSU, Marc Zyngier, Christoph Hellwig
Cc: tglx, axboe, mpe, keith.busch, peterz, LKML, linux-scsi,
Sumit Saxena, Shivasharan Srikanteshwara
>
> On 09/12/2017 08:15 PM, YASUAKI ISHIMATSU wrote:
> > + linux-scsi and maintainers of megasas
> >
> > When offlining CPU, I/O stops. Do you have any ideas?
> >
> > On 09/07/2017 04:23 PM, YASUAKI ISHIMATSU wrote:
> >> Hi Mark and Christoph,
> >>
> >> Sorry for the late reply. I appreciated that you fixed the issue on kvm
> environment.
> >> But the issue still occurs on physical server.
> >>
> >> Here ares irq information that I summarized megasas irqs from
> >> /proc/interrupts and /proc/irq/*/smp_affinity_list on my server:
> >>
> >> ---
> >> IRQ affinity_list IRQ_TYPE
> >> 42 0-5 IR-PCI-MSI 1048576-edge megasas
> >> 43 0-5 IR-PCI-MSI 1048577-edge megasas
> >> 44 0-5 IR-PCI-MSI 1048578-edge megasas
> >> 45 0-5 IR-PCI-MSI 1048579-edge megasas
> >> 46 0-5 IR-PCI-MSI 1048580-edge megasas
> >> 47 0-5 IR-PCI-MSI 1048581-edge megasas
> >> 48 0-5 IR-PCI-MSI 1048582-edge megasas
> >> 49 0-5 IR-PCI-MSI 1048583-edge megasas
> >> 50 0-5 IR-PCI-MSI 1048584-edge megasas
> >> 51 0-5 IR-PCI-MSI 1048585-edge megasas
> >> 52 0-5 IR-PCI-MSI 1048586-edge megasas
> >> 53 0-5 IR-PCI-MSI 1048587-edge megasas
> >> 54 0-5 IR-PCI-MSI 1048588-edge megasas
> >> 55 0-5 IR-PCI-MSI 1048589-edge megasas
> >> 56 0-5 IR-PCI-MSI 1048590-edge megasas
> >> 57 0-5 IR-PCI-MSI 1048591-edge megasas
> >> 58 0-5 IR-PCI-MSI 1048592-edge megasas
> >> 59 0-5 IR-PCI-MSI 1048593-edge megasas
> >> 60 0-5 IR-PCI-MSI 1048594-edge megasas
> >> 61 0-5 IR-PCI-MSI 1048595-edge megasas
> >> 62 0-5 IR-PCI-MSI 1048596-edge megasas
> >> 63 0-5 IR-PCI-MSI 1048597-edge megasas
> >> 64 0-5 IR-PCI-MSI 1048598-edge megasas
> >> 65 0-5 IR-PCI-MSI 1048599-edge megasas
> >> 66 24-29 IR-PCI-MSI 1048600-edge megasas
> >> 67 24-29 IR-PCI-MSI 1048601-edge megasas
> >> 68 24-29 IR-PCI-MSI 1048602-edge megasas
> >> 69 24-29 IR-PCI-MSI 1048603-edge megasas
> >> 70 24-29 IR-PCI-MSI 1048604-edge megasas
> >> 71 24-29 IR-PCI-MSI 1048605-edge megasas
> >> 72 24-29 IR-PCI-MSI 1048606-edge megasas
> >> 73 24-29 IR-PCI-MSI 1048607-edge megasas
> >> 74 24-29 IR-PCI-MSI 1048608-edge megasas
> >> 75 24-29 IR-PCI-MSI 1048609-edge megasas
> >> 76 24-29 IR-PCI-MSI 1048610-edge megasas
> >> 77 24-29 IR-PCI-MSI 1048611-edge megasas
> >> 78 24-29 IR-PCI-MSI 1048612-edge megasas
> >> 79 24-29 IR-PCI-MSI 1048613-edge megasas
> >> 80 24-29 IR-PCI-MSI 1048614-edge megasas
> >> 81 24-29 IR-PCI-MSI 1048615-edge megasas
> >> 82 24-29 IR-PCI-MSI 1048616-edge megasas
> >> 83 24-29 IR-PCI-MSI 1048617-edge megasas
> >> 84 24-29 IR-PCI-MSI 1048618-edge megasas
> >> 85 24-29 IR-PCI-MSI 1048619-edge megasas
> >> 86 24-29 IR-PCI-MSI 1048620-edge megasas
> >> 87 24-29 IR-PCI-MSI 1048621-edge megasas
> >> 88 24-29 IR-PCI-MSI 1048622-edge megasas
> >> 89 24-29 IR-PCI-MSI 1048623-edge megasas
> >> ---
> >>
> >> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline
> >> CPU#24-29, I/O does not work, showing the following messages.
> >>
> >> ---
> >> [...] sd 0:2:0:0: [sda] tag#1 task abort called for
> >> scmd(ffff8820574d7560) [...] sd 0:2:0:0: [sda] tag#1 CDB: Read(10) 28
> >> 00 0d e8 cf 78 00 00 08 00 [...] sd 0:2:0:0: task abort: FAILED
> >> scmd(ffff8820574d7560) [...] sd 0:2:0:0: [sda] tag#0 task abort
> >> called for scmd(ffff882057426560) [...] sd 0:2:0:0: [sda] tag#0 CDB:
> >> Write(10) 2a 00 0d 58 37 00 00 00 08 00 [...] sd 0:2:0:0: task abort:
> >> FAILED scmd(ffff882057426560) [...] sd 0:2:0:0: target reset called
> >> for scmd(ffff8820574d7560) [...] sd 0:2:0:0: [sda] tag#1 megasas:
> >> target
> reset FAILED!!
> >> [...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO
> >> timeout
> >> [...] SCSI command pointer: (ffff882057426560) SCSI host state: 5
> >> SCSI
> >> [...] IO request frame:
> >> [...]
> >> <snip>
> >> [...]
> >> [...] megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to
> >> complete for scsi0 [...] INFO: task auditd:1200 blocked for more than
> >> 120
> seconds.
> >> [...] Not tainted 4.13.0+ #15
> >> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> >> [...] auditd D 0 1200 1 0x00000000
> >> [...] Call Trace:
> >> [...] __schedule+0x28d/0x890
> >> [...] schedule+0x36/0x80
> >> [...] io_schedule+0x16/0x40
> >> [...] wait_on_page_bit_common+0x109/0x1c0
> >> [...] ? page_cache_tree_insert+0xf0/0xf0 [...]
> >> __filemap_fdatawait_range+0x127/0x190
> >> [...] ? __filemap_fdatawrite_range+0xd1/0x100
> >> [...] file_write_and_wait_range+0x60/0xb0
> >> [...] xfs_file_fsync+0x67/0x1d0 [xfs] [...]
> >> vfs_fsync_range+0x3d/0xb0 [...] do_fsync+0x3d/0x70 [...]
> >> SyS_fsync+0x10/0x20 [...] entry_SYSCALL_64_fastpath+0x1a/0xa5
> >> [...] RIP: 0033:0x7f0bd9633d2d
> >> [...] RSP: 002b:00007f0bd751ed30 EFLAGS: 00000293 ORIG_RAX:
> >> 000000000000004a [...] RAX: ffffffffffffffda RBX: 00005590566d0080
> >> RCX: 00007f0bd9633d2d [...] RDX: 00005590566d1260 RSI:
> >> 0000000000000000 RDI: 0000000000000005 [...] RBP: 0000000000000000
> >> R08: 0000000000000000 R09: 0000000000000017 [...] R10:
> >> 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 [...]
> >> R13: 00007f0bd751f9c0 R14: 00007f0bd751f700 R15: 0000000000000000
> >> ---
> >>
> >> Thanks,
> >> Yasuaki Ishimatsu
> >>
>
> This indeed looks like a problem.
> We're going to great lengths to submit and complete I/O on the same CPU,
> so
> if the CPU is offlined while I/O is in flight we won't be getting a
> completion for
> this particular I/O.
> However, the megasas driver should be able to cope with this situation;
> after
> all, the firmware maintains completions queues, so it would be dead easy
> to
> look at _other_ completions queues, too, if a timeout occurs.
In case of IO timeout, megaraid_sas driver is checking other queues as well.
That is why IO was completed in this case and further IOs were resumed.
Driver complete commands as below code executed from
megasas_wait_for_outstanding_fusion().
for (MSIxIndex = 0 ; MSIxIndex < count; MSIxIndex++)
complete_cmd_fusion(instance, MSIxIndex);
Because of above code executed in driver, we see only one print as below in
this logs.
megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
As per below link CPU hotplug will take care- "All interrupts targeted to
this CPU are migrated to a new CPU"
https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html
BTW - We are also able reproduce this issue locally. Reason for IO timeout
is -" IO is completed, but corresponding interrupt did not arrived on Online
CPU. Either missed due to CPU is in transient state of being OFFLINED. I am
not sure which component should take care this."
Question - "what happens once __cpu_disable is called and some of the queued
interrupt has affinity to that particular CPU ?"
I assume ideally those pending/queued Interrupt should be migrated to
remaining online CPUs. It should not be unhandled if we want to avoid such
IO timeout.
Kashyap
> Also the IRQ affinity looks bogus (we should spread IRQs to _all_ CPUs,
> not
> just a subset), and the driver should make sure to receive completions
> even if
> the respective CPUs are offlined.
> Alternatively it should not try to submit a command abort via an offlined
> CPUs; that's guaranteed to run into the same problems.
>
> So it looks more like a driver issue to me...
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke Teamlead Storage & Networking
> hare@suse.de +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284
> (AG Nürnberg)
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: system hung up when offlining CPUs
@ 2017-09-13 11:35 ` Kashyap Desai
0 siblings, 0 replies; 43+ messages in thread
From: Kashyap Desai @ 2017-09-13 11:35 UTC (permalink / raw)
To: Hannes Reinecke, YASUAKI ISHIMATSU, Marc Zyngier, Christoph Hellwig
Cc: tglx, axboe, mpe, keith.busch, peterz, LKML, linux-scsi,
Sumit Saxena, Shivasharan Srikanteshwara
>
> On 09/12/2017 08:15 PM, YASUAKI ISHIMATSU wrote:
> > + linux-scsi and maintainers of megasas
> >
> > When offlining CPU, I/O stops. Do you have any ideas?
> >
> > On 09/07/2017 04:23 PM, YASUAKI ISHIMATSU wrote:
> >> Hi Mark and Christoph,
> >>
> >> Sorry for the late reply. I appreciated that you fixed the issue on kvm
> environment.
> >> But the issue still occurs on physical server.
> >>
> >> Here ares irq information that I summarized megasas irqs from
> >> /proc/interrupts and /proc/irq/*/smp_affinity_list on my server:
> >>
> >> ---
> >> IRQ affinity_list IRQ_TYPE
> >> 42 0-5 IR-PCI-MSI 1048576-edge megasas
> >> 43 0-5 IR-PCI-MSI 1048577-edge megasas
> >> 44 0-5 IR-PCI-MSI 1048578-edge megasas
> >> 45 0-5 IR-PCI-MSI 1048579-edge megasas
> >> 46 0-5 IR-PCI-MSI 1048580-edge megasas
> >> 47 0-5 IR-PCI-MSI 1048581-edge megasas
> >> 48 0-5 IR-PCI-MSI 1048582-edge megasas
> >> 49 0-5 IR-PCI-MSI 1048583-edge megasas
> >> 50 0-5 IR-PCI-MSI 1048584-edge megasas
> >> 51 0-5 IR-PCI-MSI 1048585-edge megasas
> >> 52 0-5 IR-PCI-MSI 1048586-edge megasas
> >> 53 0-5 IR-PCI-MSI 1048587-edge megasas
> >> 54 0-5 IR-PCI-MSI 1048588-edge megasas
> >> 55 0-5 IR-PCI-MSI 1048589-edge megasas
> >> 56 0-5 IR-PCI-MSI 1048590-edge megasas
> >> 57 0-5 IR-PCI-MSI 1048591-edge megasas
> >> 58 0-5 IR-PCI-MSI 1048592-edge megasas
> >> 59 0-5 IR-PCI-MSI 1048593-edge megasas
> >> 60 0-5 IR-PCI-MSI 1048594-edge megasas
> >> 61 0-5 IR-PCI-MSI 1048595-edge megasas
> >> 62 0-5 IR-PCI-MSI 1048596-edge megasas
> >> 63 0-5 IR-PCI-MSI 1048597-edge megasas
> >> 64 0-5 IR-PCI-MSI 1048598-edge megasas
> >> 65 0-5 IR-PCI-MSI 1048599-edge megasas
> >> 66 24-29 IR-PCI-MSI 1048600-edge megasas
> >> 67 24-29 IR-PCI-MSI 1048601-edge megasas
> >> 68 24-29 IR-PCI-MSI 1048602-edge megasas
> >> 69 24-29 IR-PCI-MSI 1048603-edge megasas
> >> 70 24-29 IR-PCI-MSI 1048604-edge megasas
> >> 71 24-29 IR-PCI-MSI 1048605-edge megasas
> >> 72 24-29 IR-PCI-MSI 1048606-edge megasas
> >> 73 24-29 IR-PCI-MSI 1048607-edge megasas
> >> 74 24-29 IR-PCI-MSI 1048608-edge megasas
> >> 75 24-29 IR-PCI-MSI 1048609-edge megasas
> >> 76 24-29 IR-PCI-MSI 1048610-edge megasas
> >> 77 24-29 IR-PCI-MSI 1048611-edge megasas
> >> 78 24-29 IR-PCI-MSI 1048612-edge megasas
> >> 79 24-29 IR-PCI-MSI 1048613-edge megasas
> >> 80 24-29 IR-PCI-MSI 1048614-edge megasas
> >> 81 24-29 IR-PCI-MSI 1048615-edge megasas
> >> 82 24-29 IR-PCI-MSI 1048616-edge megasas
> >> 83 24-29 IR-PCI-MSI 1048617-edge megasas
> >> 84 24-29 IR-PCI-MSI 1048618-edge megasas
> >> 85 24-29 IR-PCI-MSI 1048619-edge megasas
> >> 86 24-29 IR-PCI-MSI 1048620-edge megasas
> >> 87 24-29 IR-PCI-MSI 1048621-edge megasas
> >> 88 24-29 IR-PCI-MSI 1048622-edge megasas
> >> 89 24-29 IR-PCI-MSI 1048623-edge megasas
> >> ---
> >>
> >> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline
> >> CPU#24-29, I/O does not work, showing the following messages.
> >>
> >> ---
> >> [...] sd 0:2:0:0: [sda] tag#1 task abort called for
> >> scmd(ffff8820574d7560) [...] sd 0:2:0:0: [sda] tag#1 CDB: Read(10) 28
> >> 00 0d e8 cf 78 00 00 08 00 [...] sd 0:2:0:0: task abort: FAILED
> >> scmd(ffff8820574d7560) [...] sd 0:2:0:0: [sda] tag#0 task abort
> >> called for scmd(ffff882057426560) [...] sd 0:2:0:0: [sda] tag#0 CDB:
> >> Write(10) 2a 00 0d 58 37 00 00 00 08 00 [...] sd 0:2:0:0: task abort:
> >> FAILED scmd(ffff882057426560) [...] sd 0:2:0:0: target reset called
> >> for scmd(ffff8820574d7560) [...] sd 0:2:0:0: [sda] tag#1 megasas:
> >> target
> reset FAILED!!
> >> [...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO
> >> timeout
> >> [...] SCSI command pointer: (ffff882057426560) SCSI host state: 5
> >> SCSI
> >> [...] IO request frame:
> >> [...]
> >> <snip>
> >> [...]
> >> [...] megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to
> >> complete for scsi0 [...] INFO: task auditd:1200 blocked for more than
> >> 120
> seconds.
> >> [...] Not tainted 4.13.0+ #15
> >> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> >> [...] auditd D 0 1200 1 0x00000000
> >> [...] Call Trace:
> >> [...] __schedule+0x28d/0x890
> >> [...] schedule+0x36/0x80
> >> [...] io_schedule+0x16/0x40
> >> [...] wait_on_page_bit_common+0x109/0x1c0
> >> [...] ? page_cache_tree_insert+0xf0/0xf0 [...]
> >> __filemap_fdatawait_range+0x127/0x190
> >> [...] ? __filemap_fdatawrite_range+0xd1/0x100
> >> [...] file_write_and_wait_range+0x60/0xb0
> >> [...] xfs_file_fsync+0x67/0x1d0 [xfs] [...]
> >> vfs_fsync_range+0x3d/0xb0 [...] do_fsync+0x3d/0x70 [...]
> >> SyS_fsync+0x10/0x20 [...] entry_SYSCALL_64_fastpath+0x1a/0xa5
> >> [...] RIP: 0033:0x7f0bd9633d2d
> >> [...] RSP: 002b:00007f0bd751ed30 EFLAGS: 00000293 ORIG_RAX:
> >> 000000000000004a [...] RAX: ffffffffffffffda RBX: 00005590566d0080
> >> RCX: 00007f0bd9633d2d [...] RDX: 00005590566d1260 RSI:
> >> 0000000000000000 RDI: 0000000000000005 [...] RBP: 0000000000000000
> >> R08: 0000000000000000 R09: 0000000000000017 [...] R10:
> >> 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 [...]
> >> R13: 00007f0bd751f9c0 R14: 00007f0bd751f700 R15: 0000000000000000
> >> ---
> >>
> >> Thanks,
> >> Yasuaki Ishimatsu
> >>
>
> This indeed looks like a problem.
> We're going to great lengths to submit and complete I/O on the same CPU,
> so
> if the CPU is offlined while I/O is in flight we won't be getting a
> completion for
> this particular I/O.
> However, the megasas driver should be able to cope with this situation;
> after
> all, the firmware maintains completions queues, so it would be dead easy
> to
> look at _other_ completions queues, too, if a timeout occurs.
In case of IO timeout, megaraid_sas driver is checking other queues as well.
That is why IO was completed in this case and further IOs were resumed.
Driver complete commands as below code executed from
megasas_wait_for_outstanding_fusion().
for (MSIxIndex = 0 ; MSIxIndex < count; MSIxIndex++)
complete_cmd_fusion(instance, MSIxIndex);
Because of above code executed in driver, we see only one print as below in
this logs.
megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
As per below link CPU hotplug will take care- "All interrupts targeted to
this CPU are migrated to a new CPU"
https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html
BTW - We are also able reproduce this issue locally. Reason for IO timeout
is -" IO is completed, but corresponding interrupt did not arrived on Online
CPU. Either missed due to CPU is in transient state of being OFFLINED. I am
not sure which component should take care this."
Question - "what happens once __cpu_disable is called and some of the queued
interrupt has affinity to that particular CPU ?"
I assume ideally those pending/queued Interrupt should be migrated to
remaining online CPUs. It should not be unhandled if we want to avoid such
IO timeout.
Kashyap
> Also the IRQ affinity looks bogus (we should spread IRQs to _all_ CPUs,
> not
> just a subset), and the driver should make sure to receive completions
> even if
> the respective CPUs are offlined.
> Alternatively it should not try to submit a command abort via an offlined
> CPUs; that's guaranteed to run into the same problems.
>
> So it looks more like a driver issue to me...
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke Teamlead Storage & Networking
> hare@suse.de +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284
> (AG Nürnberg)
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: system hung up when offlining CPUs
2017-09-13 11:35 ` Kashyap Desai
@ 2017-09-13 13:33 ` Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-09-13 13:33 UTC (permalink / raw)
To: Kashyap Desai
Cc: Hannes Reinecke, YASUAKI ISHIMATSU, Marc Zyngier,
Christoph Hellwig, axboe, mpe, keith.busch, peterz, LKML,
linux-scsi, Sumit Saxena, Shivasharan Srikanteshwara
On Wed, 13 Sep 2017, Kashyap Desai wrote:
> > On 09/12/2017 08:15 PM, YASUAKI ISHIMATSU wrote:
> > > + linux-scsi and maintainers of megasas
> > >> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline
> > >> CPU#24-29, I/O does not work, showing the following messages.
....
> > This indeed looks like a problem.
> > We're going to great lengths to submit and complete I/O on the same CPU,
> > so
> > if the CPU is offlined while I/O is in flight we won't be getting a
> > completion for
> > this particular I/O.
> > However, the megasas driver should be able to cope with this situation;
> > after
> > all, the firmware maintains completions queues, so it would be dead easy
> > to
> > look at _other_ completions queues, too, if a timeout occurs.
> In case of IO timeout, megaraid_sas driver is checking other queues as well.
> That is why IO was completed in this case and further IOs were resumed.
>
> Driver complete commands as below code executed from
> megasas_wait_for_outstanding_fusion().
> for (MSIxIndex = 0 ; MSIxIndex < count; MSIxIndex++)
> complete_cmd_fusion(instance, MSIxIndex);
>
> Because of above code executed in driver, we see only one print as below in
> this logs.
> megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
>
> As per below link CPU hotplug will take care- "All interrupts targeted to
> this CPU are migrated to a new CPU"
> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html
>
> BTW - We are also able reproduce this issue locally. Reason for IO timeout
> is -" IO is completed, but corresponding interrupt did not arrived on Online
> CPU. Either missed due to CPU is in transient state of being OFFLINED. I am
> not sure which component should take care this."
>
> Question - "what happens once __cpu_disable is called and some of the queued
> interrupt has affinity to that particular CPU ?"
> I assume ideally those pending/queued Interrupt should be migrated to
> remaining online CPUs. It should not be unhandled if we want to avoid such
> IO timeout.
Can you please provide the following information, before and after
offlining the last CPU in the affinity set:
# cat /proc/irq/$IRQNUM/smp_affinity_list
# cat /proc/irq/$IRQNUM/effective_affinity
# cat /sys/kernel/debug/irq/irqs/$IRQNUM
The last one requires: CONFIG_GENERIC_IRQ_DEBUGFS=y
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: system hung up when offlining CPUs
@ 2017-09-13 13:33 ` Thomas Gleixner
0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-09-13 13:33 UTC (permalink / raw)
To: Kashyap Desai
Cc: Hannes Reinecke, YASUAKI ISHIMATSU, Marc Zyngier,
Christoph Hellwig, axboe, mpe, keith.busch, peterz, LKML,
linux-scsi, Sumit Saxena, Shivasharan Srikanteshwara
On Wed, 13 Sep 2017, Kashyap Desai wrote:
> > On 09/12/2017 08:15 PM, YASUAKI ISHIMATSU wrote:
> > > + linux-scsi and maintainers of megasas
> > >> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline
> > >> CPU#24-29, I/O does not work, showing the following messages.
....
> > This indeed looks like a problem.
> > We're going to great lengths to submit and complete I/O on the same CPU,
> > so
> > if the CPU is offlined while I/O is in flight we won't be getting a
> > completion for
> > this particular I/O.
> > However, the megasas driver should be able to cope with this situation;
> > after
> > all, the firmware maintains completions queues, so it would be dead easy
> > to
> > look at _other_ completions queues, too, if a timeout occurs.
> In case of IO timeout, megaraid_sas driver is checking other queues as well.
> That is why IO was completed in this case and further IOs were resumed.
>
> Driver complete commands as below code executed from
> megasas_wait_for_outstanding_fusion().
> for (MSIxIndex = 0 ; MSIxIndex < count; MSIxIndex++)
> complete_cmd_fusion(instance, MSIxIndex);
>
> Because of above code executed in driver, we see only one print as below in
> this logs.
> megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
>
> As per below link CPU hotplug will take care- "All interrupts targeted to
> this CPU are migrated to a new CPU"
> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html
>
> BTW - We are also able reproduce this issue locally. Reason for IO timeout
> is -" IO is completed, but corresponding interrupt did not arrived on Online
> CPU. Either missed due to CPU is in transient state of being OFFLINED. I am
> not sure which component should take care this."
>
> Question - "what happens once __cpu_disable is called and some of the queued
> interrupt has affinity to that particular CPU ?"
> I assume ideally those pending/queued Interrupt should be migrated to
> remaining online CPUs. It should not be unhandled if we want to avoid such
> IO timeout.
Can you please provide the following information, before and after
offlining the last CPU in the affinity set:
# cat /proc/irq/$IRQNUM/smp_affinity_list
# cat /proc/irq/$IRQNUM/effective_affinity
# cat /sys/kernel/debug/irq/irqs/$IRQNUM
The last one requires: CONFIG_GENERIC_IRQ_DEBUGFS=y
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-09-13 13:33 ` Thomas Gleixner
@ 2017-09-14 16:28 ` YASUAKI ISHIMATSU
-1 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-09-14 16:28 UTC (permalink / raw)
To: Thomas Gleixner, Kashyap Desai
Cc: Hannes Reinecke, Marc Zyngier, Christoph Hellwig, axboe, mpe,
keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara, yasu.isimatu
On 09/13/2017 09:33 AM, Thomas Gleixner wrote:
> On Wed, 13 Sep 2017, Kashyap Desai wrote:
>>> On 09/12/2017 08:15 PM, YASUAKI ISHIMATSU wrote:
>>>> + linux-scsi and maintainers of megasas
>
>>>>> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline
>>>>> CPU#24-29, I/O does not work, showing the following messages.
>
> ....
>
>>> This indeed looks like a problem.
>>> We're going to great lengths to submit and complete I/O on the same CPU,
>>> so
>>> if the CPU is offlined while I/O is in flight we won't be getting a
>>> completion for
>>> this particular I/O.
>>> However, the megasas driver should be able to cope with this situation;
>>> after
>>> all, the firmware maintains completions queues, so it would be dead easy
>>> to
>>> look at _other_ completions queues, too, if a timeout occurs.
>> In case of IO timeout, megaraid_sas driver is checking other queues as well.
>> That is why IO was completed in this case and further IOs were resumed.
>>
>> Driver complete commands as below code executed from
>> megasas_wait_for_outstanding_fusion().
>> for (MSIxIndex = 0 ; MSIxIndex < count; MSIxIndex++)
>> complete_cmd_fusion(instance, MSIxIndex);
>>
>> Because of above code executed in driver, we see only one print as below in
>> this logs.
>> megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
>>
>> As per below link CPU hotplug will take care- "All interrupts targeted to
>> this CPU are migrated to a new CPU"
>> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html
>>
>> BTW - We are also able reproduce this issue locally. Reason for IO timeout
>> is -" IO is completed, but corresponding interrupt did not arrived on Online
>> CPU. Either missed due to CPU is in transient state of being OFFLINED. I am
>> not sure which component should take care this."
>>
>> Question - "what happens once __cpu_disable is called and some of the queued
>> interrupt has affinity to that particular CPU ?"
>> I assume ideally those pending/queued Interrupt should be migrated to
>> remaining online CPUs. It should not be unhandled if we want to avoid such
>> IO timeout.
>
> Can you please provide the following information, before and after
> offlining the last CPU in the affinity set:
>
> # cat /proc/irq/$IRQNUM/smp_affinity_list
> # cat /proc/irq/$IRQNUM/effective_affinity
> # cat /sys/kernel/debug/irq/irqs/$IRQNUM
>
> The last one requires: CONFIG_GENERIC_IRQ_DEBUGFS=y
Here are one irq's info of megasas:
- Before offline CPU
/proc/irq/70/smp_affinity_list
24-29
/proc/irq/70/effective_affinity
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
/sys/kernel/debug/irq/irqs/70
handler: handle_edge_irq
status: 0x00004000
istate: 0x00000000
ddepth: 0
wdepth: 0
dstate: 0x00609200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
node: 1
affinity: 24-29
effectiv: 24-29
pending:
domain: INTEL-IR-MSI-0-2
hwirq: 0x100018
chip: IR-PCI-MSI
flags: 0x10
IRQCHIP_SKIP_SET_WAKE
parent:
domain: INTEL-IR-0
hwirq: 0x400000
chip: INTEL-IR
flags: 0x0
parent:
domain: VECTOR
hwirq: 0x46
chip: APIC
flags: 0x0
- After offline CPU#24-29
/proc/irq/70/smp_affinity_list
29
/proc/irq/70/effective_affinity
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000
/sys/kernel/debug/irq/irqs/70
handler: handle_edge_irq
status: 0x00004000
istate: 0x00000000
ddepth: 1
wdepth: 0
dstate: 0x00a39000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN
node: 1
affinity: 29
effectiv: 29
pending:
domain: INTEL-IR-MSI-0-2
hwirq: 0x100018
chip: IR-PCI-MSI
flags: 0x10
IRQCHIP_SKIP_SET_WAKE
parent:
domain: INTEL-IR-0
hwirq: 0x400000
chip: INTEL-IR
flags: 0x0
parent:
domain: VECTOR
hwirq: 0x46
chip: APIC
flags: 0x0
Thanks,
Yasuaki Ishimatsu
>
> Thanks,
>
> tglx
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-09-14 16:28 ` YASUAKI ISHIMATSU
0 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-09-14 16:28 UTC (permalink / raw)
To: Thomas Gleixner, Kashyap Desai
Cc: Hannes Reinecke, Marc Zyngier, Christoph Hellwig, axboe, mpe,
keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara, yasu.isimatu
On 09/13/2017 09:33 AM, Thomas Gleixner wrote:
> On Wed, 13 Sep 2017, Kashyap Desai wrote:
>>> On 09/12/2017 08:15 PM, YASUAKI ISHIMATSU wrote:
>>>> + linux-scsi and maintainers of megasas
>
>>>>> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline
>>>>> CPU#24-29, I/O does not work, showing the following messages.
>
> ....
>
>>> This indeed looks like a problem.
>>> We're going to great lengths to submit and complete I/O on the same CPU,
>>> so
>>> if the CPU is offlined while I/O is in flight we won't be getting a
>>> completion for
>>> this particular I/O.
>>> However, the megasas driver should be able to cope with this situation;
>>> after
>>> all, the firmware maintains completions queues, so it would be dead easy
>>> to
>>> look at _other_ completions queues, too, if a timeout occurs.
>> In case of IO timeout, megaraid_sas driver is checking other queues as well.
>> That is why IO was completed in this case and further IOs were resumed.
>>
>> Driver complete commands as below code executed from
>> megasas_wait_for_outstanding_fusion().
>> for (MSIxIndex = 0 ; MSIxIndex < count; MSIxIndex++)
>> complete_cmd_fusion(instance, MSIxIndex);
>>
>> Because of above code executed in driver, we see only one print as below in
>> this logs.
>> megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
>>
>> As per below link CPU hotplug will take care- "All interrupts targeted to
>> this CPU are migrated to a new CPU"
>> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html
>>
>> BTW - We are also able reproduce this issue locally. Reason for IO timeout
>> is -" IO is completed, but corresponding interrupt did not arrived on Online
>> CPU. Either missed due to CPU is in transient state of being OFFLINED. I am
>> not sure which component should take care this."
>>
>> Question - "what happens once __cpu_disable is called and some of the queued
>> interrupt has affinity to that particular CPU ?"
>> I assume ideally those pending/queued Interrupt should be migrated to
>> remaining online CPUs. It should not be unhandled if we want to avoid such
>> IO timeout.
>
> Can you please provide the following information, before and after
> offlining the last CPU in the affinity set:
>
> # cat /proc/irq/$IRQNUM/smp_affinity_list
> # cat /proc/irq/$IRQNUM/effective_affinity
> # cat /sys/kernel/debug/irq/irqs/$IRQNUM
>
> The last one requires: CONFIG_GENERIC_IRQ_DEBUGFS=y
Here are one irq's info of megasas:
- Before offline CPU
/proc/irq/70/smp_affinity_list
24-29
/proc/irq/70/effective_affinity
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
/sys/kernel/debug/irq/irqs/70
handler: handle_edge_irq
status: 0x00004000
istate: 0x00000000
ddepth: 0
wdepth: 0
dstate: 0x00609200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
node: 1
affinity: 24-29
effectiv: 24-29
pending:
domain: INTEL-IR-MSI-0-2
hwirq: 0x100018
chip: IR-PCI-MSI
flags: 0x10
IRQCHIP_SKIP_SET_WAKE
parent:
domain: INTEL-IR-0
hwirq: 0x400000
chip: INTEL-IR
flags: 0x0
parent:
domain: VECTOR
hwirq: 0x46
chip: APIC
flags: 0x0
- After offline CPU#24-29
/proc/irq/70/smp_affinity_list
29
/proc/irq/70/effective_affinity
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000
/sys/kernel/debug/irq/irqs/70
handler: handle_edge_irq
status: 0x00004000
istate: 0x00000000
ddepth: 1
wdepth: 0
dstate: 0x00a39000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN
node: 1
affinity: 29
effectiv: 29
pending:
domain: INTEL-IR-MSI-0-2
hwirq: 0x100018
chip: IR-PCI-MSI
flags: 0x10
IRQCHIP_SKIP_SET_WAKE
parent:
domain: INTEL-IR-0
hwirq: 0x400000
chip: INTEL-IR
flags: 0x0
parent:
domain: VECTOR
hwirq: 0x46
chip: APIC
flags: 0x0
Thanks,
Yasuaki Ishimatsu
>
> Thanks,
>
> tglx
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-09-14 16:28 ` YASUAKI ISHIMATSU
@ 2017-09-16 10:15 ` Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-09-16 10:15 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:
> On 09/13/2017 09:33 AM, Thomas Gleixner wrote:
> >> Question - "what happens once __cpu_disable is called and some of the queued
> >> interrupt has affinity to that particular CPU ?"
> >> I assume ideally those pending/queued Interrupt should be migrated to
> >> remaining online CPUs. It should not be unhandled if we want to avoid such
> >> IO timeout.
> >
> > Can you please provide the following information, before and after
> > offlining the last CPU in the affinity set:
> >
> > # cat /proc/irq/$IRQNUM/smp_affinity_list
> > # cat /proc/irq/$IRQNUM/effective_affinity
> > # cat /sys/kernel/debug/irq/irqs/$IRQNUM
> >
> > The last one requires: CONFIG_GENERIC_IRQ_DEBUGFS=y
>
> Here are one irq's info of megasas:
>
> - Before offline CPU
> /proc/irq/70/smp_affinity_list
> 24-29
>
> /proc/irq/70/effective_affinity
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
>
> /sys/kernel/debug/irq/irqs/70
> handler: handle_edge_irq
> status: 0x00004000
> istate: 0x00000000
> ddepth: 0
> wdepth: 0
> dstate: 0x00609200
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_MOVE_PCNTXT
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
So this uses managed affinity, which means that once the last CPU in the
affinity mask goes offline, the interrupt is shut down by the irq core
code, which is the case:
> dstate: 0x00a39000
> IRQD_IRQ_DISABLED
> IRQD_IRQ_MASKED
> IRQD_MOVE_PCNTXT
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
> IRQD_MANAGED_SHUTDOWN <---------------
So the irq core code works as expected, but something in the
driver/scsi/block stack seems to fiddle with that shut down queue.
I only can tell about the inner workings of the irq code, but I have no
clue about the rest.
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-09-16 10:15 ` Thomas Gleixner
0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-09-16 10:15 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:
> On 09/13/2017 09:33 AM, Thomas Gleixner wrote:
> >> Question - "what happens once __cpu_disable is called and some of the queued
> >> interrupt has affinity to that particular CPU ?"
> >> I assume ideally those pending/queued Interrupt should be migrated to
> >> remaining online CPUs. It should not be unhandled if we want to avoid such
> >> IO timeout.
> >
> > Can you please provide the following information, before and after
> > offlining the last CPU in the affinity set:
> >
> > # cat /proc/irq/$IRQNUM/smp_affinity_list
> > # cat /proc/irq/$IRQNUM/effective_affinity
> > # cat /sys/kernel/debug/irq/irqs/$IRQNUM
> >
> > The last one requires: CONFIG_GENERIC_IRQ_DEBUGFS=y
>
> Here are one irq's info of megasas:
>
> - Before offline CPU
> /proc/irq/70/smp_affinity_list
> 24-29
>
> /proc/irq/70/effective_affinity
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
>
> /sys/kernel/debug/irq/irqs/70
> handler: handle_edge_irq
> status: 0x00004000
> istate: 0x00000000
> ddepth: 0
> wdepth: 0
> dstate: 0x00609200
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_MOVE_PCNTXT
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
So this uses managed affinity, which means that once the last CPU in the
affinity mask goes offline, the interrupt is shut down by the irq core
code, which is the case:
> dstate: 0x00a39000
> IRQD_IRQ_DISABLED
> IRQD_IRQ_MASKED
> IRQD_MOVE_PCNTXT
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
> IRQD_MANAGED_SHUTDOWN <---------------
So the irq core code works as expected, but something in the
driver/scsi/block stack seems to fiddle with that shut down queue.
I only can tell about the inner workings of the irq code, but I have no
clue about the rest.
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-09-16 10:15 ` Thomas Gleixner
@ 2017-09-16 15:02 ` Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-09-16 15:02 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Sat, 16 Sep 2017, Thomas Gleixner wrote:
> On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:
> > Here are one irq's info of megasas:
> >
> > - Before offline CPU
> > /proc/irq/70/smp_affinity_list
> > 24-29
> >
> > /proc/irq/70/effective_affinity
> > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
> >
> > /sys/kernel/debug/irq/irqs/70
> > handler: handle_edge_irq
> > status: 0x00004000
> > istate: 0x00000000
> > ddepth: 0
> > wdepth: 0
> > dstate: 0x00609200
> > IRQD_ACTIVATED
> > IRQD_IRQ_STARTED
> > IRQD_MOVE_PCNTXT
> > IRQD_AFFINITY_SET
> > IRQD_AFFINITY_MANAGED
>
> So this uses managed affinity, which means that once the last CPU in the
> affinity mask goes offline, the interrupt is shut down by the irq core
> code, which is the case:
>
> > dstate: 0x00a39000
> > IRQD_IRQ_DISABLED
> > IRQD_IRQ_MASKED
> > IRQD_MOVE_PCNTXT
> > IRQD_AFFINITY_SET
> > IRQD_AFFINITY_MANAGED
> > IRQD_MANAGED_SHUTDOWN <---------------
>
> So the irq core code works as expected, but something in the
> driver/scsi/block stack seems to fiddle with that shut down queue.
>
> I only can tell about the inner workings of the irq code, but I have no
> clue about the rest.
Though there is something wrong here:
> affinity: 24-29
> effectiv: 24-29
and after offlining:
> affinity: 29
> effectiv: 29
But that should be:
affinity: 24-29
effectiv: 29
because the irq core code preserves 'affinity'. It merily updates
'effective', which is where your interrupts are routed to.
Is the driver issuing any set_affinity() calls? If so, that's wrong.
Which driver are we talking about?
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-09-16 15:02 ` Thomas Gleixner
0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-09-16 15:02 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Sat, 16 Sep 2017, Thomas Gleixner wrote:
> On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:
> > Here are one irq's info of megasas:
> >
> > - Before offline CPU
> > /proc/irq/70/smp_affinity_list
> > 24-29
> >
> > /proc/irq/70/effective_affinity
> > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
> >
> > /sys/kernel/debug/irq/irqs/70
> > handler: handle_edge_irq
> > status: 0x00004000
> > istate: 0x00000000
> > ddepth: 0
> > wdepth: 0
> > dstate: 0x00609200
> > IRQD_ACTIVATED
> > IRQD_IRQ_STARTED
> > IRQD_MOVE_PCNTXT
> > IRQD_AFFINITY_SET
> > IRQD_AFFINITY_MANAGED
>
> So this uses managed affinity, which means that once the last CPU in the
> affinity mask goes offline, the interrupt is shut down by the irq core
> code, which is the case:
>
> > dstate: 0x00a39000
> > IRQD_IRQ_DISABLED
> > IRQD_IRQ_MASKED
> > IRQD_MOVE_PCNTXT
> > IRQD_AFFINITY_SET
> > IRQD_AFFINITY_MANAGED
> > IRQD_MANAGED_SHUTDOWN <---------------
>
> So the irq core code works as expected, but something in the
> driver/scsi/block stack seems to fiddle with that shut down queue.
>
> I only can tell about the inner workings of the irq code, but I have no
> clue about the rest.
Though there is something wrong here:
> affinity: 24-29
> effectiv: 24-29
and after offlining:
> affinity: 29
> effectiv: 29
But that should be:
affinity: 24-29
effectiv: 29
because the irq core code preserves 'affinity'. It merily updates
'effective', which is where your interrupts are routed to.
Is the driver issuing any set_affinity() calls? If so, that's wrong.
Which driver are we talking about?
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-09-16 15:02 ` Thomas Gleixner
@ 2017-10-02 16:36 ` YASUAKI ISHIMATSU
-1 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-10-02 16:36 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On 09/16/2017 11:02 AM, Thomas Gleixner wrote:
> On Sat, 16 Sep 2017, Thomas Gleixner wrote:
>> On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:
>>> Here are one irq's info of megasas:
>>>
>>> - Before offline CPU
>>> /proc/irq/70/smp_affinity_list
>>> 24-29
>>>
>>> /proc/irq/70/effective_affinity
>>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
>>>
>>> /sys/kernel/debug/irq/irqs/70
>>> handler: handle_edge_irq
>>> status: 0x00004000
>>> istate: 0x00000000
>>> ddepth: 0
>>> wdepth: 0
>>> dstate: 0x00609200
>>> IRQD_ACTIVATED
>>> IRQD_IRQ_STARTED
>>> IRQD_MOVE_PCNTXT
>>> IRQD_AFFINITY_SET
>>> IRQD_AFFINITY_MANAGED
>>
>> So this uses managed affinity, which means that once the last CPU in the
>> affinity mask goes offline, the interrupt is shut down by the irq core
>> code, which is the case:
>>
>>> dstate: 0x00a39000
>>> IRQD_IRQ_DISABLED
>>> IRQD_IRQ_MASKED
>>> IRQD_MOVE_PCNTXT
>>> IRQD_AFFINITY_SET
>>> IRQD_AFFINITY_MANAGED
>>> IRQD_MANAGED_SHUTDOWN <---------------
>>
>> So the irq core code works as expected, but something in the
>> driver/scsi/block stack seems to fiddle with that shut down queue.
>>
>> I only can tell about the inner workings of the irq code, but I have no
>> clue about the rest.
>
> Though there is something wrong here:
>
>> affinity: 24-29
>> effectiv: 24-29
>
> and after offlining:
>
>> affinity: 29
>> effectiv: 29
>
> But that should be:
>
> affinity: 24-29
> effectiv: 29
>
> because the irq core code preserves 'affinity'. It merily updates
> 'effective', which is where your interrupts are routed to.
>
> Is the driver issuing any set_affinity() calls? If so, that's wrong.
>
> Which driver are we talking about?
We are talking about megasas driver.
So I added linux-scsi and maintainers of megasas into the thread.
Thanks,
Yasuaki Ishimatsu
>
> Thanks,
>
> tglx
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-10-02 16:36 ` YASUAKI ISHIMATSU
0 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-10-02 16:36 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On 09/16/2017 11:02 AM, Thomas Gleixner wrote:
> On Sat, 16 Sep 2017, Thomas Gleixner wrote:
>> On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:
>>> Here are one irq's info of megasas:
>>>
>>> - Before offline CPU
>>> /proc/irq/70/smp_affinity_list
>>> 24-29
>>>
>>> /proc/irq/70/effective_affinity
>>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
>>>
>>> /sys/kernel/debug/irq/irqs/70
>>> handler: handle_edge_irq
>>> status: 0x00004000
>>> istate: 0x00000000
>>> ddepth: 0
>>> wdepth: 0
>>> dstate: 0x00609200
>>> IRQD_ACTIVATED
>>> IRQD_IRQ_STARTED
>>> IRQD_MOVE_PCNTXT
>>> IRQD_AFFINITY_SET
>>> IRQD_AFFINITY_MANAGED
>>
>> So this uses managed affinity, which means that once the last CPU in the
>> affinity mask goes offline, the interrupt is shut down by the irq core
>> code, which is the case:
>>
>>> dstate: 0x00a39000
>>> IRQD_IRQ_DISABLED
>>> IRQD_IRQ_MASKED
>>> IRQD_MOVE_PCNTXT
>>> IRQD_AFFINITY_SET
>>> IRQD_AFFINITY_MANAGED
>>> IRQD_MANAGED_SHUTDOWN <---------------
>>
>> So the irq core code works as expected, but something in the
>> driver/scsi/block stack seems to fiddle with that shut down queue.
>>
>> I only can tell about the inner workings of the irq code, but I have no
>> clue about the rest.
>
> Though there is something wrong here:
>
>> affinity: 24-29
>> effectiv: 24-29
>
> and after offlining:
>
>> affinity: 29
>> effectiv: 29
>
> But that should be:
>
> affinity: 24-29
> effectiv: 29
>
> because the irq core code preserves 'affinity'. It merily updates
> 'effective', which is where your interrupts are routed to.
>
> Is the driver issuing any set_affinity() calls? If so, that's wrong.
>
> Which driver are we talking about?
We are talking about megasas driver.
So I added linux-scsi and maintainers of megasas into the thread.
Thanks,
Yasuaki Ishimatsu
>
> Thanks,
>
> tglx
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-10-02 16:36 ` YASUAKI ISHIMATSU
@ 2017-10-03 21:44 ` Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-10-03 21:44 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Mon, 2 Oct 2017, YASUAKI ISHIMATSU wrote:
> On 09/16/2017 11:02 AM, Thomas Gleixner wrote:
> > Which driver are we talking about?
>
> We are talking about megasas driver.
Can you please apply the debug patch below.
After booting enable stack traces for the tracer:
# echo 1 >/sys/kernel/debug/tracing/options/stacktrace
Then offline CPUs 24-29. After that do
# cat /sys/kernel/debug/tracing/trace >somefile
Please compress the file and upload it to some place or if you have no place
to upload it then send it to me in private mail.
Thanks,
tglx
8<------------
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -171,11 +171,16 @@ void irq_set_thread_affinity(struct irq_
int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
bool force)
{
+ const struct cpumask *eff = irq_data_get_effective_affinity_mask(data);
struct irq_desc *desc = irq_data_to_desc(data);
struct irq_chip *chip = irq_data_get_irq_chip(data);
int ret;
ret = chip->irq_set_affinity(data, mask, force);
+
+ trace_printk("irq: %u ret %d mask: %*pbl eff: %*pbl\n", data->irq, ret,
+ cpumask_pr_args(mask), cpumask_pr_args(eff));
+
switch (ret) {
case IRQ_SET_MASK_OK:
case IRQ_SET_MASK_OK_DONE:
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-10-03 21:44 ` Thomas Gleixner
0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-10-03 21:44 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Mon, 2 Oct 2017, YASUAKI ISHIMATSU wrote:
> On 09/16/2017 11:02 AM, Thomas Gleixner wrote:
> > Which driver are we talking about?
>
> We are talking about megasas driver.
Can you please apply the debug patch below.
After booting enable stack traces for the tracer:
# echo 1 >/sys/kernel/debug/tracing/options/stacktrace
Then offline CPUs 24-29. After that do
# cat /sys/kernel/debug/tracing/trace >somefile
Please compress the file and upload it to some place or if you have no place
to upload it then send it to me in private mail.
Thanks,
tglx
8<------------
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -171,11 +171,16 @@ void irq_set_thread_affinity(struct irq_
int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
bool force)
{
+ const struct cpumask *eff = irq_data_get_effective_affinity_mask(data);
struct irq_desc *desc = irq_data_to_desc(data);
struct irq_chip *chip = irq_data_get_irq_chip(data);
int ret;
ret = chip->irq_set_affinity(data, mask, force);
+
+ trace_printk("irq: %u ret %d mask: %*pbl eff: %*pbl\n", data->irq, ret,
+ cpumask_pr_args(mask), cpumask_pr_args(eff));
+
switch (ret) {
case IRQ_SET_MASK_OK:
case IRQ_SET_MASK_OK_DONE:
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-10-03 21:44 ` Thomas Gleixner
@ 2017-10-04 21:04 ` Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-10-04 21:04 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> Can you please apply the debug patch below.
I found an issue with managed interrupts when the affinity mask of an
managed interrupt spawns multiple CPUs. Explanation in the changelog
below. I'm not sure that this cures the problems you have, but at least I
could prove that it's not doing what it should do. The failure I'm seing is
fixed, but I can't test that megasas driver due to -ENOHARDWARE.
Can you please apply the patch below on top of Linus tree and retest?
Please send me the outputs I asked you to provide last time in any case
(success or fail).
@block/scsi folks: Can you please run that through your tests as well?
Thanks,
tglx
8<-----------------------
Subject: genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 04 Oct 2017 21:07:38 +0200
Managed interrupts can end up in a stale state on CPU hotplug. If the
interrupt is not targeting a single CPU, i.e. the affinity mask spawns
multiple CPUs then the following can happen:
After boot:
dstate: 0x01601200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 0
After offlining CPU 31 - 24
dstate: 0x01a31000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN
node: 0
affinity: 24-31
effectiv: 24
pending: 0
Now CPU 25 gets onlined again, so it should get the effective interrupt
affinity for this interruopt, but due to the x86 interrupt affinity setter
restrictions this ends up after restarting the interrupt with:
dstate: 0x01601300
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_SETAFFINITY_PENDING
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 24-31
So the interrupt is still affine to CPU 24, which was the last CPU to go
offline of that affinity set and the move to an online CPU within 24-31,
in this case 25, is pending. This mechanism is x86/ia64 specific as those
architectures cannot move interrupts from thread context and do this when
an interrupt is actually handled. So the move is set to pending.
Whats worse is that offlining CPU 25 again results in:
dstate: 0x01601300
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_SETAFFINITY_PENDING
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 24-31
This means the interrupt has not been shut down, because the outgoing CPU
is not in the effective affinity mask, but of course nothing notices that
the effective affinity mask is pointing at an offline CPU.
In the case of restarting a managed interrupt the move restriction does not
apply, so the affinity setting can be made unconditional. This needs to be
done _before_ the interrupt is started up as otherwise the condition for
moving it from thread context would not longer be fulfilled.
With that change applied onlining CPU 25 after offlining 31-24 results in:
dstate: 0x01600200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 25
pending:
And after offlining CPU 25:
dstate: 0x01a30000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN
node: 0
affinity: 24-31
effectiv: 25
pending:
which is the correct and expected result.
To complete that, add some debug code to catch this kind of situation in
the cpu offline code and warn about interrupt chips which allow affinity
setting and do not update the effective affinity mask if that feature is
enabled.
Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/irq/chip.c | 2 +-
kernel/irq/cpuhotplug.c | 28 +++++++++++++++++++++++++++-
kernel/irq/manage.c | 17 +++++++++++++++++
3 files changed, 45 insertions(+), 2 deletions(-)
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, b
irq_setup_affinity(desc);
break;
case IRQ_STARTUP_MANAGED:
+ irq_do_set_affinity(d, aff, false);
ret = __irq_startup(desc);
- irq_set_affinity_locked(d, aff, false);
break;
case IRQ_STARTUP_ABORT:
return 0;
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -18,8 +18,34 @@
static inline bool irq_needs_fixup(struct irq_data *d)
{
const struct cpumask *m = irq_data_get_effective_affinity_mask(d);
+ unsigned int cpu = smp_processor_id();
- return cpumask_test_cpu(smp_processor_id(), m);
+#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
+ /*
+ * The cpumask_empty() check is a workaround for interrupt chips,
+ * which do not implement effective affinity, but the architecture has
+ * enabled the config switch. Use the general affinity mask instead.
+ */
+ if (cpumask_empty(m))
+ m = irq_data_get_affinity_mask(d);
+
+ /*
+ * Sanity check. If the mask is not empty when excluding the outgoing
+ * CPU then it must contain at least one online CPU. The outgoing CPU
+ * has been removed from the online mask already.
+ */
+ if (cpumask_any_but(m, cpu) < nr_cpu_ids &&
+ cpumask_any_and(m, cpu_online_mask) >= nr_cpu_ids) {
+ /*
+ * If this happens then there was a missed IRQ fixup at some
+ * point. Warn about it and enforce fixup.
+ */
+ pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
+ cpumask_pr_args(m), d->irq, cpu);
+ return true;
+ }
+#endif
+ return cpumask_test_cpu(cpu, m);
}
static bool migrate_one_irq(struct irq_desc *desc)
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -168,6 +168,19 @@ void irq_set_thread_affinity(struct irq_
set_bit(IRQTF_AFFINITY, &action->thread_flags);
}
+static void irq_validate_effective_affinity(struct irq_data *data)
+{
+#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
+ const struct cpumask *m = irq_data_get_effective_affinity_mask(data);
+ struct irq_chip *chip = irq_data_get_irq_chip(data);
+
+ if (!cpumask_empty(m))
+ return;
+ pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n",
+ chip->name, data->irq);
+#endif
+}
+
int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
bool force)
{
@@ -175,12 +188,16 @@ int irq_do_set_affinity(struct irq_data
struct irq_chip *chip = irq_data_get_irq_chip(data);
int ret;
+ if (!chip || !chip->irq_set_affinity)
+ return -EINVAL;
+
ret = chip->irq_set_affinity(data, mask, force);
switch (ret) {
case IRQ_SET_MASK_OK:
case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+ irq_validate_effective_affinity(data);
irq_set_thread_affinity(desc);
ret = 0;
}
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-10-04 21:04 ` Thomas Gleixner
0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-10-04 21:04 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> Can you please apply the debug patch below.
I found an issue with managed interrupts when the affinity mask of an
managed interrupt spawns multiple CPUs. Explanation in the changelog
below. I'm not sure that this cures the problems you have, but at least I
could prove that it's not doing what it should do. The failure I'm seing is
fixed, but I can't test that megasas driver due to -ENOHARDWARE.
Can you please apply the patch below on top of Linus tree and retest?
Please send me the outputs I asked you to provide last time in any case
(success or fail).
@block/scsi folks: Can you please run that through your tests as well?
Thanks,
tglx
8<-----------------------
Subject: genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 04 Oct 2017 21:07:38 +0200
Managed interrupts can end up in a stale state on CPU hotplug. If the
interrupt is not targeting a single CPU, i.e. the affinity mask spawns
multiple CPUs then the following can happen:
After boot:
dstate: 0x01601200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 0
After offlining CPU 31 - 24
dstate: 0x01a31000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN
node: 0
affinity: 24-31
effectiv: 24
pending: 0
Now CPU 25 gets onlined again, so it should get the effective interrupt
affinity for this interruopt, but due to the x86 interrupt affinity setter
restrictions this ends up after restarting the interrupt with:
dstate: 0x01601300
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_SETAFFINITY_PENDING
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 24-31
So the interrupt is still affine to CPU 24, which was the last CPU to go
offline of that affinity set and the move to an online CPU within 24-31,
in this case 25, is pending. This mechanism is x86/ia64 specific as those
architectures cannot move interrupts from thread context and do this when
an interrupt is actually handled. So the move is set to pending.
Whats worse is that offlining CPU 25 again results in:
dstate: 0x01601300
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_SETAFFINITY_PENDING
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 24-31
This means the interrupt has not been shut down, because the outgoing CPU
is not in the effective affinity mask, but of course nothing notices that
the effective affinity mask is pointing at an offline CPU.
In the case of restarting a managed interrupt the move restriction does not
apply, so the affinity setting can be made unconditional. This needs to be
done _before_ the interrupt is started up as otherwise the condition for
moving it from thread context would not longer be fulfilled.
With that change applied onlining CPU 25 after offlining 31-24 results in:
dstate: 0x01600200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 25
pending:
And after offlining CPU 25:
dstate: 0x01a30000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN
node: 0
affinity: 24-31
effectiv: 25
pending:
which is the correct and expected result.
To complete that, add some debug code to catch this kind of situation in
the cpu offline code and warn about interrupt chips which allow affinity
setting and do not update the effective affinity mask if that feature is
enabled.
Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/irq/chip.c | 2 +-
kernel/irq/cpuhotplug.c | 28 +++++++++++++++++++++++++++-
kernel/irq/manage.c | 17 +++++++++++++++++
3 files changed, 45 insertions(+), 2 deletions(-)
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, b
irq_setup_affinity(desc);
break;
case IRQ_STARTUP_MANAGED:
+ irq_do_set_affinity(d, aff, false);
ret = __irq_startup(desc);
- irq_set_affinity_locked(d, aff, false);
break;
case IRQ_STARTUP_ABORT:
return 0;
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -18,8 +18,34 @@
static inline bool irq_needs_fixup(struct irq_data *d)
{
const struct cpumask *m = irq_data_get_effective_affinity_mask(d);
+ unsigned int cpu = smp_processor_id();
- return cpumask_test_cpu(smp_processor_id(), m);
+#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
+ /*
+ * The cpumask_empty() check is a workaround for interrupt chips,
+ * which do not implement effective affinity, but the architecture has
+ * enabled the config switch. Use the general affinity mask instead.
+ */
+ if (cpumask_empty(m))
+ m = irq_data_get_affinity_mask(d);
+
+ /*
+ * Sanity check. If the mask is not empty when excluding the outgoing
+ * CPU then it must contain at least one online CPU. The outgoing CPU
+ * has been removed from the online mask already.
+ */
+ if (cpumask_any_but(m, cpu) < nr_cpu_ids &&
+ cpumask_any_and(m, cpu_online_mask) >= nr_cpu_ids) {
+ /*
+ * If this happens then there was a missed IRQ fixup at some
+ * point. Warn about it and enforce fixup.
+ */
+ pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
+ cpumask_pr_args(m), d->irq, cpu);
+ return true;
+ }
+#endif
+ return cpumask_test_cpu(cpu, m);
}
static bool migrate_one_irq(struct irq_desc *desc)
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -168,6 +168,19 @@ void irq_set_thread_affinity(struct irq_
set_bit(IRQTF_AFFINITY, &action->thread_flags);
}
+static void irq_validate_effective_affinity(struct irq_data *data)
+{
+#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
+ const struct cpumask *m = irq_data_get_effective_affinity_mask(data);
+ struct irq_chip *chip = irq_data_get_irq_chip(data);
+
+ if (!cpumask_empty(m))
+ return;
+ pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n",
+ chip->name, data->irq);
+#endif
+}
+
int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
bool force)
{
@@ -175,12 +188,16 @@ int irq_do_set_affinity(struct irq_data
struct irq_chip *chip = irq_data_get_irq_chip(data);
int ret;
+ if (!chip || !chip->irq_set_affinity)
+ return -EINVAL;
+
ret = chip->irq_set_affinity(data, mask, force);
switch (ret) {
case IRQ_SET_MASK_OK:
case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+ irq_validate_effective_affinity(data);
irq_set_thread_affinity(desc);
ret = 0;
}
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-10-02 16:36 ` YASUAKI ISHIMATSU
@ 2017-10-04 21:10 ` Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-10-04 21:10 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Mon, 2 Oct 2017, YASUAKI ISHIMATSU wrote:
>
> We are talking about megasas driver.
> So I added linux-scsi and maintainers of megasas into the thread.
Another question:
Is this the in tree megasas driver and you are observing this on Linus
latest tree, i.e. 4.14-rc3+ ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-10-04 21:10 ` Thomas Gleixner
0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-10-04 21:10 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
On Mon, 2 Oct 2017, YASUAKI ISHIMATSU wrote:
>
> We are talking about megasas driver.
> So I added linux-scsi and maintainers of megasas into the thread.
Another question:
Is this the in tree megasas driver and you are observing this on Linus
latest tree, i.e. 4.14-rc3+ ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* [tip:irq/urgent] genirq/cpuhotplug: Add sanity check for effective affinity mask
2017-10-04 21:04 ` Thomas Gleixner
(?)
@ 2017-10-09 11:35 ` tip-bot for Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-10-09 11:35 UTC (permalink / raw)
To: linux-tip-commits; +Cc: hpa, tglx, hch, marc.zyngier, mingo, linux-kernel
Commit-ID: 60b09c51bb4fb46e2331fdbb39f91520f31d35f7
Gitweb: https://git.kernel.org/tip/60b09c51bb4fb46e2331fdbb39f91520f31d35f7
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Mon, 9 Oct 2017 12:47:24 +0200
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 9 Oct 2017 13:26:48 +0200
genirq/cpuhotplug: Add sanity check for effective affinity mask
The effective affinity mask handling has no safety net when the mask is not
updated by the interrupt chip or the mask contains offline CPUs.
If that happens the CPU unplug code fails to migrate interrupts.
Add sanity checks and emit a warning when the mask contains only offline
CPUs.
Fixes: 415fcf1a2293 ("genirq/cpuhotplug: Use effective affinity mask")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710042208400.2406@nanos
---
kernel/irq/cpuhotplug.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 638eb9c..9eb09ae 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -18,8 +18,34 @@
static inline bool irq_needs_fixup(struct irq_data *d)
{
const struct cpumask *m = irq_data_get_effective_affinity_mask(d);
+ unsigned int cpu = smp_processor_id();
- return cpumask_test_cpu(smp_processor_id(), m);
+#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
+ /*
+ * The cpumask_empty() check is a workaround for interrupt chips,
+ * which do not implement effective affinity, but the architecture has
+ * enabled the config switch. Use the general affinity mask instead.
+ */
+ if (cpumask_empty(m))
+ m = irq_data_get_affinity_mask(d);
+
+ /*
+ * Sanity check. If the mask is not empty when excluding the outgoing
+ * CPU then it must contain at least one online CPU. The outgoing CPU
+ * has been removed from the online mask already.
+ */
+ if (cpumask_any_but(m, cpu) < nr_cpu_ids &&
+ cpumask_any_and(m, cpu_online_mask) >= nr_cpu_ids) {
+ /*
+ * If this happens then there was a missed IRQ fixup at some
+ * point. Warn about it and enforce fixup.
+ */
+ pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
+ cpumask_pr_args(m), d->irq, cpu);
+ return true;
+ }
+#endif
+ return cpumask_test_cpu(cpu, m);
}
static bool migrate_one_irq(struct irq_desc *desc)
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [tip:irq/urgent] genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
2017-10-04 21:04 ` Thomas Gleixner
(?)
(?)
@ 2017-10-09 11:35 ` tip-bot for Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-10-09 11:35 UTC (permalink / raw)
To: linux-tip-commits
Cc: marc.zyngier, linux-kernel, hare, hpa,
shivasharan.srikanteshwara, hch, yasu.isimatu, mingo, tglx,
sumit.saxena, kashyap.desai
Commit-ID: e43b3b58548051f8809391eb7bec7a27ed3003ea
Gitweb: https://git.kernel.org/tip/e43b3b58548051f8809391eb7bec7a27ed3003ea
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 4 Oct 2017 21:07:38 +0200
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 9 Oct 2017 13:26:48 +0200
genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
Managed interrupts can end up in a stale state on CPU hotplug. If the
interrupt is not targeting a single CPU, i.e. the affinity mask spawns
multiple CPUs then the following can happen:
After boot:
dstate: 0x01601200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 0
After offlining CPU 31 - 24
dstate: 0x01a31000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN
node: 0
affinity: 24-31
effectiv: 24
pending: 0
Now CPU 25 gets onlined again, so it should get the effective interrupt
affinity for this interruopt, but due to the x86 interrupt affinity setter
restrictions this ends up after restarting the interrupt with:
dstate: 0x01601300
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_SETAFFINITY_PENDING
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 24-31
So the interrupt is still affine to CPU 24, which was the last CPU to go
offline of that affinity set and the move to an online CPU within 24-31,
in this case 25, is pending. This mechanism is x86/ia64 specific as those
architectures cannot move interrupts from thread context and do this when
an interrupt is actually handled. So the move is set to pending.
Whats worse is that offlining CPU 25 again results in:
dstate: 0x01601300
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_SET
IRQD_SETAFFINITY_PENDING
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 24
pending: 24-31
This means the interrupt has not been shut down, because the outgoing CPU
is not in the effective affinity mask, but of course nothing notices that
the effective affinity mask is pointing at an offline CPU.
In the case of restarting a managed interrupt the move restriction does not
apply, so the affinity setting can be made unconditional. This needs to be
done _before_ the interrupt is started up as otherwise the condition for
moving it from thread context would not longer be fulfilled.
With that change applied onlining CPU 25 after offlining 31-24 results in:
dstate: 0x01600200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_MANAGED
node: 0
affinity: 24-31
effectiv: 25
pending:
And after offlining CPU 25:
dstate: 0x01a30000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN
node: 0
affinity: 24-31
effectiv: 25
pending:
which is the correct and expected result.
Fixes: 761ea388e8c4 ("genirq: Handle managed irqs gracefully in irq_startup()")
Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: axboe@kernel.dk
Cc: linux-scsi@vger.kernel.org
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: mpe@ellerman.id.au
Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: keith.busch@intel.com
Cc: peterz@infradead.org
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710042208400.2406@nanos
---
kernel/irq/chip.c | 2 +-
kernel/irq/manage.c | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 6fc89fd..5a2ef92c 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, bool resend, bool force)
irq_setup_affinity(desc);
break;
case IRQ_STARTUP_MANAGED:
+ irq_do_set_affinity(d, aff, false);
ret = __irq_startup(desc);
- irq_set_affinity_locked(d, aff, false);
break;
case IRQ_STARTUP_ABORT:
return 0;
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index ef89f72..4bff6a1 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -188,6 +188,9 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
struct irq_chip *chip = irq_data_get_irq_chip(data);
int ret;
+ if (!chip || !chip->irq_set_affinity)
+ return -EINVAL;
+
ret = chip->irq_set_affinity(data, mask, force);
switch (ret) {
case IRQ_SET_MASK_OK:
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-10-04 21:04 ` Thomas Gleixner
@ 2017-10-10 16:30 ` YASUAKI ISHIMATSU
-1 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-10-10 16:30 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara, yasu.isimatu
Hi Thomas,
Sorry for the late reply.
I'll apply the patches and retest in this week.
Please wait a while.
Thanks,
Yasuaki Ishimatsu
On 10/04/2017 05:04 PM, Thomas Gleixner wrote:
> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
>> Can you please apply the debug patch below.
>
> I found an issue with managed interrupts when the affinity mask of an
> managed interrupt spawns multiple CPUs. Explanation in the changelog
> below. I'm not sure that this cures the problems you have, but at least I
> could prove that it's not doing what it should do. The failure I'm seing is
> fixed, but I can't test that megasas driver due to -ENOHARDWARE.
>
> Can you please apply the patch below on top of Linus tree and retest?
>
> Please send me the outputs I asked you to provide last time in any case
> (success or fail).
>
> @block/scsi folks: Can you please run that through your tests as well?
>
> Thanks,
>
> tglx
>
> 8<-----------------------
> Subject: genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 04 Oct 2017 21:07:38 +0200
>
> Managed interrupts can end up in a stale state on CPU hotplug. If the
> interrupt is not targeting a single CPU, i.e. the affinity mask spawns
> multiple CPUs then the following can happen:
>
> After boot:
>
> dstate: 0x01601200
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending: 0
>
> After offlining CPU 31 - 24
>
> dstate: 0x01a31000
> IRQD_IRQ_DISABLED
> IRQD_IRQ_MASKED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
> IRQD_MANAGED_SHUTDOWN
> node: 0
> affinity: 24-31
> effectiv: 24
> pending: 0
>
> Now CPU 25 gets onlined again, so it should get the effective interrupt
> affinity for this interruopt, but due to the x86 interrupt affinity setter
> restrictions this ends up after restarting the interrupt with:
>
> dstate: 0x01601300
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_SETAFFINITY_PENDING
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending: 24-31
>
> So the interrupt is still affine to CPU 24, which was the last CPU to go
> offline of that affinity set and the move to an online CPU within 24-31,
> in this case 25, is pending. This mechanism is x86/ia64 specific as those
> architectures cannot move interrupts from thread context and do this when
> an interrupt is actually handled. So the move is set to pending.
>
> Whats worse is that offlining CPU 25 again results in:
>
> dstate: 0x01601300
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_SETAFFINITY_PENDING
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending: 24-31
>
> This means the interrupt has not been shut down, because the outgoing CPU
> is not in the effective affinity mask, but of course nothing notices that
> the effective affinity mask is pointing at an offline CPU.
>
> In the case of restarting a managed interrupt the move restriction does not
> apply, so the affinity setting can be made unconditional. This needs to be
> done _before_ the interrupt is started up as otherwise the condition for
> moving it from thread context would not longer be fulfilled.
>
> With that change applied onlining CPU 25 after offlining 31-24 results in:
>
> dstate: 0x01600200
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 25
> pending:
>
> And after offlining CPU 25:
>
> dstate: 0x01a30000
> IRQD_IRQ_DISABLED
> IRQD_IRQ_MASKED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_MANAGED
> IRQD_MANAGED_SHUTDOWN
> node: 0
> affinity: 24-31
> effectiv: 25
> pending:
>
> which is the correct and expected result.
>
> To complete that, add some debug code to catch this kind of situation in
> the cpu offline code and warn about interrupt chips which allow affinity
> setting and do not update the effective affinity mask if that feature is
> enabled.
>
> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> ---
> kernel/irq/chip.c | 2 +-
> kernel/irq/cpuhotplug.c | 28 +++++++++++++++++++++++++++-
> kernel/irq/manage.c | 17 +++++++++++++++++
> 3 files changed, 45 insertions(+), 2 deletions(-)
>
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, b
> irq_setup_affinity(desc);
> break;
> case IRQ_STARTUP_MANAGED:
> + irq_do_set_affinity(d, aff, false);
> ret = __irq_startup(desc);
> - irq_set_affinity_locked(d, aff, false);
> break;
> case IRQ_STARTUP_ABORT:
> return 0;
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -18,8 +18,34 @@
> static inline bool irq_needs_fixup(struct irq_data *d)
> {
> const struct cpumask *m = irq_data_get_effective_affinity_mask(d);
> + unsigned int cpu = smp_processor_id();
>
> - return cpumask_test_cpu(smp_processor_id(), m);
> +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> + /*
> + * The cpumask_empty() check is a workaround for interrupt chips,
> + * which do not implement effective affinity, but the architecture has
> + * enabled the config switch. Use the general affinity mask instead.
> + */
> + if (cpumask_empty(m))
> + m = irq_data_get_affinity_mask(d);
> +
> + /*
> + * Sanity check. If the mask is not empty when excluding the outgoing
> + * CPU then it must contain at least one online CPU. The outgoing CPU
> + * has been removed from the online mask already.
> + */
> + if (cpumask_any_but(m, cpu) < nr_cpu_ids &&
> + cpumask_any_and(m, cpu_online_mask) >= nr_cpu_ids) {
> + /*
> + * If this happens then there was a missed IRQ fixup at some
> + * point. Warn about it and enforce fixup.
> + */
> + pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
> + cpumask_pr_args(m), d->irq, cpu);
> + return true;
> + }
> +#endif
> + return cpumask_test_cpu(cpu, m);
> }
>
> static bool migrate_one_irq(struct irq_desc *desc)
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -168,6 +168,19 @@ void irq_set_thread_affinity(struct irq_
> set_bit(IRQTF_AFFINITY, &action->thread_flags);
> }
>
> +static void irq_validate_effective_affinity(struct irq_data *data)
> +{
> +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> + const struct cpumask *m = irq_data_get_effective_affinity_mask(data);
> + struct irq_chip *chip = irq_data_get_irq_chip(data);
> +
> + if (!cpumask_empty(m))
> + return;
> + pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n",
> + chip->name, data->irq);
> +#endif
> +}
> +
> int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> bool force)
> {
> @@ -175,12 +188,16 @@ int irq_do_set_affinity(struct irq_data
> struct irq_chip *chip = irq_data_get_irq_chip(data);
> int ret;
>
> + if (!chip || !chip->irq_set_affinity)
> + return -EINVAL;
> +
> ret = chip->irq_set_affinity(data, mask, force);
> switch (ret) {
> case IRQ_SET_MASK_OK:
> case IRQ_SET_MASK_OK_DONE:
> cpumask_copy(desc->irq_common_data.affinity, mask);
> case IRQ_SET_MASK_OK_NOCOPY:
> + irq_validate_effective_affinity(data);
> irq_set_thread_affinity(desc);
> ret = 0;
> }
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-10-10 16:30 ` YASUAKI ISHIMATSU
0 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-10-10 16:30 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara, yasu.isimatu
Hi Thomas,
Sorry for the late reply.
I'll apply the patches and retest in this week.
Please wait a while.
Thanks,
Yasuaki Ishimatsu
On 10/04/2017 05:04 PM, Thomas Gleixner wrote:
> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
>> Can you please apply the debug patch below.
>
> I found an issue with managed interrupts when the affinity mask of an
> managed interrupt spawns multiple CPUs. Explanation in the changelog
> below. I'm not sure that this cures the problems you have, but at least I
> could prove that it's not doing what it should do. The failure I'm seing is
> fixed, but I can't test that megasas driver due to -ENOHARDWARE.
>
> Can you please apply the patch below on top of Linus tree and retest?
>
> Please send me the outputs I asked you to provide last time in any case
> (success or fail).
>
> @block/scsi folks: Can you please run that through your tests as well?
>
> Thanks,
>
> tglx
>
> 8<-----------------------
> Subject: genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 04 Oct 2017 21:07:38 +0200
>
> Managed interrupts can end up in a stale state on CPU hotplug. If the
> interrupt is not targeting a single CPU, i.e. the affinity mask spawns
> multiple CPUs then the following can happen:
>
> After boot:
>
> dstate: 0x01601200
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending: 0
>
> After offlining CPU 31 - 24
>
> dstate: 0x01a31000
> IRQD_IRQ_DISABLED
> IRQD_IRQ_MASKED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
> IRQD_MANAGED_SHUTDOWN
> node: 0
> affinity: 24-31
> effectiv: 24
> pending: 0
>
> Now CPU 25 gets onlined again, so it should get the effective interrupt
> affinity for this interruopt, but due to the x86 interrupt affinity setter
> restrictions this ends up after restarting the interrupt with:
>
> dstate: 0x01601300
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_SETAFFINITY_PENDING
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending: 24-31
>
> So the interrupt is still affine to CPU 24, which was the last CPU to go
> offline of that affinity set and the move to an online CPU within 24-31,
> in this case 25, is pending. This mechanism is x86/ia64 specific as those
> architectures cannot move interrupts from thread context and do this when
> an interrupt is actually handled. So the move is set to pending.
>
> Whats worse is that offlining CPU 25 again results in:
>
> dstate: 0x01601300
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_SETAFFINITY_PENDING
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending: 24-31
>
> This means the interrupt has not been shut down, because the outgoing CPU
> is not in the effective affinity mask, but of course nothing notices that
> the effective affinity mask is pointing at an offline CPU.
>
> In the case of restarting a managed interrupt the move restriction does not
> apply, so the affinity setting can be made unconditional. This needs to be
> done _before_ the interrupt is started up as otherwise the condition for
> moving it from thread context would not longer be fulfilled.
>
> With that change applied onlining CPU 25 after offlining 31-24 results in:
>
> dstate: 0x01600200
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 25
> pending:
>
> And after offlining CPU 25:
>
> dstate: 0x01a30000
> IRQD_IRQ_DISABLED
> IRQD_IRQ_MASKED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_MANAGED
> IRQD_MANAGED_SHUTDOWN
> node: 0
> affinity: 24-31
> effectiv: 25
> pending:
>
> which is the correct and expected result.
>
> To complete that, add some debug code to catch this kind of situation in
> the cpu offline code and warn about interrupt chips which allow affinity
> setting and do not update the effective affinity mask if that feature is
> enabled.
>
> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> ---
> kernel/irq/chip.c | 2 +-
> kernel/irq/cpuhotplug.c | 28 +++++++++++++++++++++++++++-
> kernel/irq/manage.c | 17 +++++++++++++++++
> 3 files changed, 45 insertions(+), 2 deletions(-)
>
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, b
> irq_setup_affinity(desc);
> break;
> case IRQ_STARTUP_MANAGED:
> + irq_do_set_affinity(d, aff, false);
> ret = __irq_startup(desc);
> - irq_set_affinity_locked(d, aff, false);
> break;
> case IRQ_STARTUP_ABORT:
> return 0;
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -18,8 +18,34 @@
> static inline bool irq_needs_fixup(struct irq_data *d)
> {
> const struct cpumask *m = irq_data_get_effective_affinity_mask(d);
> + unsigned int cpu = smp_processor_id();
>
> - return cpumask_test_cpu(smp_processor_id(), m);
> +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> + /*
> + * The cpumask_empty() check is a workaround for interrupt chips,
> + * which do not implement effective affinity, but the architecture has
> + * enabled the config switch. Use the general affinity mask instead.
> + */
> + if (cpumask_empty(m))
> + m = irq_data_get_affinity_mask(d);
> +
> + /*
> + * Sanity check. If the mask is not empty when excluding the outgoing
> + * CPU then it must contain at least one online CPU. The outgoing CPU
> + * has been removed from the online mask already.
> + */
> + if (cpumask_any_but(m, cpu) < nr_cpu_ids &&
> + cpumask_any_and(m, cpu_online_mask) >= nr_cpu_ids) {
> + /*
> + * If this happens then there was a missed IRQ fixup at some
> + * point. Warn about it and enforce fixup.
> + */
> + pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
> + cpumask_pr_args(m), d->irq, cpu);
> + return true;
> + }
> +#endif
> + return cpumask_test_cpu(cpu, m);
> }
>
> static bool migrate_one_irq(struct irq_desc *desc)
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -168,6 +168,19 @@ void irq_set_thread_affinity(struct irq_
> set_bit(IRQTF_AFFINITY, &action->thread_flags);
> }
>
> +static void irq_validate_effective_affinity(struct irq_data *data)
> +{
> +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> + const struct cpumask *m = irq_data_get_effective_affinity_mask(data);
> + struct irq_chip *chip = irq_data_get_irq_chip(data);
> +
> + if (!cpumask_empty(m))
> + return;
> + pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n",
> + chip->name, data->irq);
> +#endif
> +}
> +
> int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> bool force)
> {
> @@ -175,12 +188,16 @@ int irq_do_set_affinity(struct irq_data
> struct irq_chip *chip = irq_data_get_irq_chip(data);
> int ret;
>
> + if (!chip || !chip->irq_set_affinity)
> + return -EINVAL;
> +
> ret = chip->irq_set_affinity(data, mask, force);
> switch (ret) {
> case IRQ_SET_MASK_OK:
> case IRQ_SET_MASK_OK_DONE:
> cpumask_copy(desc->irq_common_data.affinity, mask);
> case IRQ_SET_MASK_OK_NOCOPY:
> + irq_validate_effective_affinity(data);
> irq_set_thread_affinity(desc);
> ret = 0;
> }
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-10-10 16:30 ` YASUAKI ISHIMATSU
@ 2017-10-16 18:59 ` YASUAKI ISHIMATSU
-1 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-10-16 18:59 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara, yasu.isimatu
Hi Thomas,
> Can you please apply the patch below on top of Linus tree and retest?
>
> Please send me the outputs I asked you to provide last time in any case
> (success or fail).
The issue still occurs even if I applied your patch to linux 4.14.0-rc4.
---
[ ...] INFO: task setroubleshootd:4972 blocked for more than 120 seconds.
[ ...] Not tainted 4.14.0-rc4.thomas.with.irqdebug+ #6
[ ...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ ...] setroubleshootd D 0 4972 1 0x00000080
[ ...] Call Trace:
[ ...] __schedule+0x28d/0x890
[ ...] ? release_pages+0x16f/0x3f0
[ ...] schedule+0x36/0x80
[ ...] io_schedule+0x16/0x40
[ ...] wait_on_page_bit+0x107/0x150
[ ...] ? page_cache_tree_insert+0xb0/0xb0
[ ...] truncate_inode_pages_range+0x3dd/0x7d0
[ ...] ? schedule_hrtimeout_range_clock+0xad/0x140
[ ...] ? remove_wait_queue+0x59/0x60
[ ...] ? down_write+0x12/0x40
[ ...] ? unmap_mapping_range+0x75/0x130
[ ...] truncate_pagecache+0x47/0x60
[ ...] truncate_setsize+0x32/0x40
[ ...] xfs_setattr_size+0x100/0x300 [xfs]
[ ...] xfs_vn_setattr_size+0x40/0x90 [xfs]
[ ...] xfs_vn_setattr+0x87/0xa0 [xfs]
[ ...] notify_change+0x266/0x440
[ ...] do_truncate+0x75/0xc0
[ ...] path_openat+0xaba/0x13b0
[ ...] ? mem_cgroup_commit_charge+0x31/0x130
[ ...] do_filp_open+0x91/0x100
[ ...] ? __alloc_fd+0x46/0x170
[ ...] do_sys_open+0x124/0x210
[ ...] SyS_open+0x1e/0x20
[ ...] do_syscall_64+0x67/0x1b0
[ ...] entry_SYSCALL64_slow_path+0x25/0x25
[ ...] RIP: 0033:0x7f275e2365bd
[ ...] RSP: 002b:00007ffe29337da0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[ ...] RAX: ffffffffffffffda RBX: 00000000040aea00 RCX: 00007f275e2365bd
[ ...] RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 00000000040ae840
[ ...] RBP: 00007ffe29337e00 R08: 00000000040aea06 R09: 0000000000000240
[ ...] R10: 0000000000000024 R11: 0000000000000293 R12: 00000000040eb660
[ ...] R13: 0000000000000004 R14: 00000000040ae840 R15: 000000000186a0a0
[ ...] sd 0:2:0:0: [sda] tag#0 task abort called for scmd(ffff9b4bf2306160)
[ ...] sd 0:2:0:0: [sda] tag#0 CDB: Write(10) 2a 00 0b 3a 82 a0 00 00 20 00
[ ...] sd 0:2:0:0: task abort: FAILED scmd(ffff9b4bf2306160)
[ ...] sd 0:2:0:0: target reset called for scmd(ffff9b4bf2306160)
[ ...] sd 0:2:0:0: [sda] tag#0 megasas: target reset FAILED!!
[ ...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO timeout
[ ...] SCSI command pointer: (ffff9b4bf2306160) SCSI host state: 5 SCSI
---
I could not prepare the same environment I reported. So I reproduced
the issue on the following megasas environment.
---
IRQ affinity_list IRQ_TYPE
34 0-1 IR-PCI-MSI 1048576-edge megasas
35 2-3 IR-PCI-MSI 1048577-edge megasas
36 4-5 IR-PCI-MSI 1048578-edge megasas
37 6-7 IR-PCI-MSI 1048579-edge megasas
38 8-9 IR-PCI-MSI 1048580-edge megasas
39 10-11 IR-PCI-MSI 1048581-edge megasas
40 12-13 IR-PCI-MSI 1048582-edge megasas
41 14-15 IR-PCI-MSI 1048583-edge megasas
42 16-17 IR-PCI-MSI 1048584-edge megasas
43 18-19 IR-PCI-MSI 1048585-edge megasas
44 20-21 IR-PCI-MSI 1048586-edge megasas
45 22-23 IR-PCI-MSI 1048587-edge megasas
46 24-25 IR-PCI-MSI 1048588-edge megasas
47 26-27 IR-PCI-MSI 1048589-edge megasas
48 28-29 IR-PCI-MSI 1048590-edge megasas
49 30-31 IR-PCI-MSI 1048591-edge megasas
50 32-33 IR-PCI-MSI 1048592-edge megasas
51 34-35 IR-PCI-MSI 1048593-edge megasas
52 36-37 IR-PCI-MSI 1048594-edge megasas
53 38-39 IR-PCI-MSI 1048595-edge megasas
54 40-41 IR-PCI-MSI 1048596-edge megasas
55 42-43 IR-PCI-MSI 1048597-edge megasas
56 44-45 IR-PCI-MSI 1048598-edge megasas
57 46-47 IR-PCI-MSI 1048599-edge megasas
58 48-49 IR-PCI-MSI 1048600-edge megasas
59 50-51 IR-PCI-MSI 1048601-edge megasas
60 52-53 IR-PCI-MSI 1048602-edge megasas
61 54-55 IR-PCI-MSI 1048603-edge megasas
62 56-57 IR-PCI-MSI 1048604-edge megasas
63 58-59 IR-PCI-MSI 1048605-edge megasas
64 60-61 IR-PCI-MSI 1048606-edge megasas
65 62-63 IR-PCI-MSI 1048607-edge megasas
66 64-65 IR-PCI-MSI 1048608-edge megasas
67 66-67 IR-PCI-MSI 1048609-edge megasas
68 68-69 IR-PCI-MSI 1048610-edge megasas
69 70-71 IR-PCI-MSI 1048611-edge megasas
70 72-73 IR-PCI-MSI 1048612-edge megasas
71 74-75 IR-PCI-MSI 1048613-edge megasas
72 76-77 IR-PCI-MSI 1048614-edge megasas
73 78-79 IR-PCI-MSI 1048615-edge megasas
74 80-81 IR-PCI-MSI 1048616-edge megasas
75 82-83 IR-PCI-MSI 1048617-edge megasas
76 84-85 IR-PCI-MSI 1048618-edge megasas
77 86-87 IR-PCI-MSI 1048619-edge megasas
78 88-89 IR-PCI-MSI 1048620-edge megasas
79 90-91 IR-PCI-MSI 1048621-edge megasas
80 92-93 IR-PCI-MSI 1048622-edge megasas
81 94-95 IR-PCI-MSI 1048623-edge megasas
82 96-97 IR-PCI-MSI 1048624-edge megasas
83 98-99 IR-PCI-MSI 1048625-edge megasas
84 100-101 IR-PCI-MSI 1048626-edge megasas
85 102-103 IR-PCI-MSI 1048627-edge megasas
86 104-105 IR-PCI-MSI 1048628-edge megasas
87 106-107 IR-PCI-MSI 1048629-edge megasas
88 108-109 IR-PCI-MSI 1048630-edge megasas
89 110-111 IR-PCI-MSI 1048631-edge megasas
90 112-113 IR-PCI-MSI 1048632-edge megasas
91 114-115 IR-PCI-MSI 1048633-edge megasas
92 116-117 IR-PCI-MSI 1048634-edge megasas
93 118-119 IR-PCI-MSI 1048635-edge megasas
94 120-121 IR-PCI-MSI 1048636-edge megasas
95 122-123 IR-PCI-MSI 1048637-edge megasas
96 124-125 IR-PCI-MSI 1048638-edge megasas
97 126-127 IR-PCI-MSI 1048639-edge megasas
98 128-129 IR-PCI-MSI 1048640-edge megasas
99 130-131 IR-PCI-MSI 1048641-edge megasas
100 132-133 IR-PCI-MSI 1048642-edge megasas
101 134-135 IR-PCI-MSI 1048643-edge megasas
102 136-137 IR-PCI-MSI 1048644-edge megasas
103 138-139 IR-PCI-MSI 1048645-edge megasas
104 140-141 IR-PCI-MSI 1048646-edge megasas
105 142-143 IR-PCI-MSI 1048647-edge megasas
106 144-145 IR-PCI-MSI 1048648-edge megasas
107 146-147 IR-PCI-MSI 1048649-edge megasas
108 148-149 IR-PCI-MSI 1048650-edge megasas
109 150-151 IR-PCI-MSI 1048651-edge megasas
110 152-153 IR-PCI-MSI 1048652-edge megasas
111 154-155 IR-PCI-MSI 1048653-edge megasas
112 156-157 IR-PCI-MSI 1048654-edge megasas
113 158-159 IR-PCI-MSI 1048655-edge megasas
114 160-161 IR-PCI-MSI 1048656-edge megasas
115 162-163 IR-PCI-MSI 1048657-edge megasas
116 164-165 IR-PCI-MSI 1048658-edge megasas
117 166-167 IR-PCI-MSI 1048659-edge megasas
118 168-169 IR-PCI-MSI 1048660-edge megasas
119 170-171 IR-PCI-MSI 1048661-edge megasas
120 172-173 IR-PCI-MSI 1048662-edge megasas
121 174-175 IR-PCI-MSI 1048663-edge megasas
122 176-177 IR-PCI-MSI 1048664-edge megasas
123 178-179 IR-PCI-MSI 1048665-edge megasas
124 180-181 IR-PCI-MSI 1048666-edge megasas
125 182-183 IR-PCI-MSI 1048667-edge megasas
126 184-185 IR-PCI-MSI 1048668-edge megasas
127 186-187 IR-PCI-MSI 1048669-edge megasas
128 188-189 IR-PCI-MSI 1048670-edge megasas
129 190-191 IR-PCI-MSI 1048671-edge megasas
---
Here are trace log that I offlined CPU 186-191 in descending order.
When I offlined CPU 186, the issue occurred.
---
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
systemd-1 [000] d... 0.427765: irq_do_set_affinity: irq: 24 ret 0 mask: 0-23 eff: 0
systemd-1 [029] d... 16.745803: irq_do_set_affinity: irq: 9 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.850146: irq_do_set_affinity: irq: 25 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.856549: irq_do_set_affinity: irq: 26 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.862920: irq_do_set_affinity: irq: 27 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.869300: irq_do_set_affinity: irq: 28 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.875685: irq_do_set_affinity: irq: 29 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.897267: irq_do_set_affinity: irq: 30 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 23.983226: irq_do_set_affinity: irq: 31 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 23.998459: irq_do_set_affinity: irq: 32 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 26.095152: irq_do_set_affinity: irq: 33 ret 2 mask: 0-23 eff: 0-5
kworker/0:3-1458 [000] d... 28.497033: irq_do_set_affinity: irq: 16 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 28.715688: irq_do_set_affinity: irq: 8 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 29.163740: irq_do_set_affinity: irq: 4 ret 2 mask: 0-23 eff: 0-5
kworker/0:1-134 [000] d... 30.625367: irq_do_set_affinity: irq: 34 ret 2 mask: 0-1 eff: 0-1
kworker/0:1-134 [000] d... 30.625400: irq_do_set_affinity: irq: 35 ret 2 mask: 2-3 eff: 2-3
kworker/0:1-134 [000] d... 30.625442: irq_do_set_affinity: irq: 36 ret 2 mask: 4-5 eff: 4-5
kworker/0:1-134 [000] d... 30.625474: irq_do_set_affinity: irq: 37 ret 2 mask: 6-7 eff: 6-7
kworker/0:1-134 [000] d... 30.625513: irq_do_set_affinity: irq: 38 ret 2 mask: 8-9 eff: 8-9
kworker/0:1-134 [000] d... 30.625549: irq_do_set_affinity: irq: 39 ret 2 mask: 10-11 eff: 10-11
kworker/0:1-134 [000] d... 30.625585: irq_do_set_affinity: irq: 40 ret 2 mask: 12-13 eff: 12-13
kworker/0:1-134 [000] d... 30.625621: irq_do_set_affinity: irq: 41 ret 2 mask: 14-15 eff: 14-15
kworker/0:1-134 [000] d... 30.625656: irq_do_set_affinity: irq: 42 ret 2 mask: 16-17 eff: 16-17
kworker/0:1-134 [000] d... 30.625692: irq_do_set_affinity: irq: 43 ret 2 mask: 18-19 eff: 18-19
kworker/0:1-134 [000] d... 30.625732: irq_do_set_affinity: irq: 44 ret 2 mask: 20-21 eff: 20-21
kworker/0:1-134 [000] d... 30.625768: irq_do_set_affinity: irq: 45 ret 2 mask: 22-23 eff: 22-23
kworker/0:1-134 [000] d... 30.625801: irq_do_set_affinity: irq: 46 ret 2 mask: 24-25 eff: 24-25
kworker/0:1-134 [000] d... 30.625818: irq_do_set_affinity: irq: 47 ret 2 mask: 26-27 eff: 26-27
kworker/0:1-134 [000] d... 30.625843: irq_do_set_affinity: irq: 48 ret 2 mask: 28-29 eff: 28-29
kworker/0:1-134 [000] d... 30.625869: irq_do_set_affinity: irq: 49 ret 2 mask: 30-31 eff: 30-31
kworker/0:1-134 [000] d... 30.625897: irq_do_set_affinity: irq: 50 ret 2 mask: 32-33 eff: 32-33
kworker/0:1-134 [000] d... 30.625922: irq_do_set_affinity: irq: 51 ret 2 mask: 34-35 eff: 34-35
kworker/0:1-134 [000] d... 30.625947: irq_do_set_affinity: irq: 52 ret 2 mask: 36-37 eff: 36-37
kworker/0:1-134 [000] d... 30.625969: irq_do_set_affinity: irq: 53 ret 2 mask: 38-39 eff: 38-39
kworker/0:1-134 [000] d... 30.625992: irq_do_set_affinity: irq: 54 ret 2 mask: 40-41 eff: 40-41
kworker/0:1-134 [000] d... 30.626012: irq_do_set_affinity: irq: 55 ret 2 mask: 42-43 eff: 42-43
kworker/0:1-134 [000] d... 30.626032: irq_do_set_affinity: irq: 56 ret 2 mask: 44-45 eff: 44-45
kworker/0:1-134 [000] d... 30.626052: irq_do_set_affinity: irq: 57 ret 2 mask: 46-47 eff: 46-47
kworker/0:1-134 [000] d... 30.626088: irq_do_set_affinity: irq: 58 ret 2 mask: 48-49 eff: 48-49
kworker/0:1-134 [000] d... 30.626105: irq_do_set_affinity: irq: 59 ret 2 mask: 50-51 eff: 50-51
kworker/0:1-134 [000] d... 30.626118: irq_do_set_affinity: irq: 60 ret 2 mask: 52-53 eff: 52-53
kworker/0:1-134 [000] d... 30.626157: irq_do_set_affinity: irq: 61 ret 2 mask: 54-55 eff: 54-55
kworker/0:1-134 [000] d... 30.626185: irq_do_set_affinity: irq: 62 ret 2 mask: 56-57 eff: 56-57
kworker/0:1-134 [000] d... 30.626217: irq_do_set_affinity: irq: 63 ret 2 mask: 58-59 eff: 58-59
kworker/0:1-134 [000] d... 30.626243: irq_do_set_affinity: irq: 64 ret 2 mask: 60-61 eff: 60-61
kworker/0:1-134 [000] d... 30.626269: irq_do_set_affinity: irq: 65 ret 2 mask: 62-63 eff: 62-63
kworker/0:1-134 [000] d... 30.626299: irq_do_set_affinity: irq: 66 ret 2 mask: 64-65 eff: 64-65
kworker/0:1-134 [000] d... 30.626322: irq_do_set_affinity: irq: 67 ret 2 mask: 66-67 eff: 66-67
kworker/0:1-134 [000] d... 30.626346: irq_do_set_affinity: irq: 68 ret 2 mask: 68-69 eff: 68-69
kworker/0:1-134 [000] d... 30.626368: irq_do_set_affinity: irq: 69 ret 2 mask: 70-71 eff: 70-71
kworker/0:1-134 [000] d... 30.626390: irq_do_set_affinity: irq: 70 ret 2 mask: 72-73 eff: 72-73
kworker/0:1-134 [000] d... 30.626405: irq_do_set_affinity: irq: 71 ret 2 mask: 74-75 eff: 74-75
kworker/0:1-134 [000] d... 30.626417: irq_do_set_affinity: irq: 72 ret 2 mask: 76-77 eff: 76-77
kworker/0:1-134 [000] d... 30.626455: irq_do_set_affinity: irq: 73 ret 2 mask: 78-79 eff: 78-79
kworker/0:1-134 [000] d... 30.626483: irq_do_set_affinity: irq: 74 ret 2 mask: 80-81 eff: 80-81
kworker/0:1-134 [000] d... 30.626510: irq_do_set_affinity: irq: 75 ret 2 mask: 82-83 eff: 82-83
kworker/0:1-134 [000] d... 30.626535: irq_do_set_affinity: irq: 76 ret 2 mask: 84-85 eff: 84-85
kworker/0:1-134 [000] d... 30.626563: irq_do_set_affinity: irq: 77 ret 2 mask: 86-87 eff: 86-87
kworker/0:1-134 [000] d... 30.626585: irq_do_set_affinity: irq: 78 ret 2 mask: 88-89 eff: 88-89
kworker/0:1-134 [000] d... 30.626604: irq_do_set_affinity: irq: 79 ret 2 mask: 90-91 eff: 90-91
kworker/0:1-134 [000] d... 30.626624: irq_do_set_affinity: irq: 80 ret 2 mask: 92-93 eff: 92-93
kworker/0:1-134 [000] d... 30.626644: irq_do_set_affinity: irq: 81 ret 2 mask: 94-95 eff: 94-95
kworker/0:1-134 [000] d... 30.626665: irq_do_set_affinity: irq: 82 ret 2 mask: 96-97 eff: 96-97
kworker/0:1-134 [000] d... 30.626679: irq_do_set_affinity: irq: 83 ret 2 mask: 98-99 eff: 98-99
kworker/0:1-134 [000] d... 30.626693: irq_do_set_affinity: irq: 84 ret 2 mask: 100-101 eff: 100-101
kworker/0:1-134 [000] d... 30.626708: irq_do_set_affinity: irq: 85 ret 2 mask: 102-103 eff: 102-103
kworker/0:1-134 [000] d... 30.626750: irq_do_set_affinity: irq: 86 ret 2 mask: 104-105 eff: 104-105
kworker/0:1-134 [000] d... 30.626784: irq_do_set_affinity: irq: 87 ret 2 mask: 106-107 eff: 106-107
kworker/0:1-134 [000] d... 30.626814: irq_do_set_affinity: irq: 88 ret 2 mask: 108-109 eff: 108-109
kworker/0:1-134 [000] d... 30.626844: irq_do_set_affinity: irq: 89 ret 2 mask: 110-111 eff: 110-111
kworker/0:1-134 [000] d... 30.626872: irq_do_set_affinity: irq: 90 ret 2 mask: 112-113 eff: 112-113
kworker/0:1-134 [000] d... 30.626896: irq_do_set_affinity: irq: 91 ret 2 mask: 114-115 eff: 114-115
kworker/0:1-134 [000] d... 30.626928: irq_do_set_affinity: irq: 92 ret 2 mask: 116-117 eff: 116-117
kworker/0:1-134 [000] d... 30.626954: irq_do_set_affinity: irq: 93 ret 2 mask: 118-119 eff: 118-119
kworker/0:1-134 [000] d... 30.626975: irq_do_set_affinity: irq: 94 ret 2 mask: 120-121 eff: 120-121
kworker/0:1-134 [000] d... 30.626996: irq_do_set_affinity: irq: 95 ret 2 mask: 122-123 eff: 122-123
kworker/0:1-134 [000] d... 30.627022: irq_do_set_affinity: irq: 96 ret 2 mask: 124-125 eff: 124-125
kworker/0:1-134 [000] d... 30.627050: irq_do_set_affinity: irq: 97 ret 2 mask: 126-127 eff: 126-127
kworker/0:1-134 [000] d... 30.627081: irq_do_set_affinity: irq: 98 ret 2 mask: 128-129 eff: 128-129
kworker/0:1-134 [000] d... 30.627110: irq_do_set_affinity: irq: 99 ret 2 mask: 130-131 eff: 130-131
kworker/0:1-134 [000] d... 30.627137: irq_do_set_affinity: irq: 100 ret 2 mask: 132-133 eff: 132-133
kworker/0:1-134 [000] d... 30.627164: irq_do_set_affinity: irq: 101 ret 2 mask: 134-135 eff: 134-135
kworker/0:1-134 [000] d... 30.627191: irq_do_set_affinity: irq: 102 ret 2 mask: 136-137 eff: 136-137
kworker/0:1-134 [000] d... 30.627214: irq_do_set_affinity: irq: 103 ret 2 mask: 138-139 eff: 138-139
kworker/0:1-134 [000] d... 30.627238: irq_do_set_affinity: irq: 104 ret 2 mask: 140-141 eff: 140-141
kworker/0:1-134 [000] d... 30.627263: irq_do_set_affinity: irq: 105 ret 2 mask: 142-143 eff: 142-143
kworker/0:1-134 [000] d... 30.627283: irq_do_set_affinity: irq: 106 ret 2 mask: 144-145 eff: 144-145
kworker/0:1-134 [000] d... 30.627296: irq_do_set_affinity: irq: 107 ret 2 mask: 146-147 eff: 146-147
kworker/0:1-134 [000] d... 30.627311: irq_do_set_affinity: irq: 108 ret 2 mask: 148-149 eff: 148-149
kworker/0:1-134 [000] d... 30.627344: irq_do_set_affinity: irq: 109 ret 2 mask: 150-151 eff: 150-151
kworker/0:1-134 [000] d... 30.627377: irq_do_set_affinity: irq: 110 ret 2 mask: 152-153 eff: 152-153
kworker/0:1-134 [000] d... 30.627410: irq_do_set_affinity: irq: 111 ret 2 mask: 154-155 eff: 154-155
kworker/0:1-134 [000] d... 30.627437: irq_do_set_affinity: irq: 112 ret 2 mask: 156-157 eff: 156-157
kworker/0:1-134 [000] d... 30.627467: irq_do_set_affinity: irq: 113 ret 2 mask: 158-159 eff: 158-159
kworker/0:1-134 [000] d... 30.627494: irq_do_set_affinity: irq: 114 ret 2 mask: 160-161 eff: 160-161
kworker/0:1-134 [000] d... 30.627519: irq_do_set_affinity: irq: 115 ret 2 mask: 162-163 eff: 162-163
kworker/0:1-134 [000] d... 30.627545: irq_do_set_affinity: irq: 116 ret 2 mask: 164-165 eff: 164-165
kworker/0:1-134 [000] d... 30.627569: irq_do_set_affinity: irq: 117 ret 2 mask: 166-167 eff: 166-167
kworker/0:1-134 [000] d... 30.627589: irq_do_set_affinity: irq: 118 ret 2 mask: 168-169 eff: 168-169
kworker/0:1-134 [000] d... 30.627607: irq_do_set_affinity: irq: 119 ret 2 mask: 170-171 eff: 170-171
kworker/0:1-134 [000] d... 30.627639: irq_do_set_affinity: irq: 120 ret 2 mask: 172-173 eff: 172-173
kworker/0:1-134 [000] d... 30.627666: irq_do_set_affinity: irq: 121 ret 2 mask: 174-175 eff: 174-175
kworker/0:1-134 [000] d... 30.627691: irq_do_set_affinity: irq: 122 ret 2 mask: 176-177 eff: 176-177
kworker/0:1-134 [000] d... 30.627721: irq_do_set_affinity: irq: 123 ret 2 mask: 178-179 eff: 178-179
kworker/0:1-134 [000] d... 30.627748: irq_do_set_affinity: irq: 124 ret 2 mask: 180-181 eff: 180-181
kworker/0:1-134 [000] d... 30.627774: irq_do_set_affinity: irq: 125 ret 2 mask: 182-183 eff: 182-183
kworker/0:1-134 [000] d... 30.627799: irq_do_set_affinity: irq: 126 ret 2 mask: 184-185 eff: 184-185
kworker/0:1-134 [000] d... 30.627828: irq_do_set_affinity: irq: 127 ret 2 mask: 186-187 eff: 186
kworker/0:1-134 [000] d... 30.627850: irq_do_set_affinity: irq: 128 ret 2 mask: 188-189 eff: 188
kworker/0:1-134 [000] d... 30.627875: irq_do_set_affinity: irq: 129 ret 2 mask: 190-191 eff: 190
kworker/0:0-3 [000] d... 38.217213: irq_do_set_affinity: irq: 18 ret 2 mask: 0-23 eff: 0-5
systemd-udevd-2007 [129] d... 38.510108: irq_do_set_affinity: irq: 3 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732162: irq_do_set_affinity: irq: 131 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732195: irq_do_set_affinity: irq: 132 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732214: irq_do_set_affinity: irq: 133 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732229: irq_do_set_affinity: irq: 134 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732246: irq_do_set_affinity: irq: 135 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732261: irq_do_set_affinity: irq: 136 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732276: irq_do_set_affinity: irq: 137 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732292: irq_do_set_affinity: irq: 138 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732308: irq_do_set_affinity: irq: 139 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865529: irq_do_set_affinity: irq: 140 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865557: irq_do_set_affinity: irq: 141 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865575: irq_do_set_affinity: irq: 142 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865591: irq_do_set_affinity: irq: 143 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865607: irq_do_set_affinity: irq: 144 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865621: irq_do_set_affinity: irq: 145 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865635: irq_do_set_affinity: irq: 146 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865650: irq_do_set_affinity: irq: 147 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865664: irq_do_set_affinity: irq: 148 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 45.041598: irq_do_set_affinity: irq: 130 ret 2 mask: 0-23 eff: 6-9,11,16,18-19,21-23,26-28,31,34-35,38,40-43,47-63
NetworkManager-2628 [135] d... 45.042054: irq_do_set_affinity: irq: 130 ret 2 mask: 0-23 eff: 6-8,10-11,16,18-19,21-23,26-28,31,34-35,38,40-43,47-63
NetworkManager-2628 [135] d... 45.150285: irq_do_set_affinity: irq: 130 ret 2 mask: 0-23 eff: 0-5
(agetty)-3134 [049] d... 55.930794: irq_do_set_affinity: irq: 4 ret 2 mask: 0-23 eff: 0-5
<...>-1346 [191] d..1 100.473714: irq_do_set_affinity: irq: 129 ret 2 mask: 190-191 eff: 190
<...>-1346 [191] d..1 100.473722: <stack trace>
=> native_cpu_disable
=> take_cpu_down
=> multi_cpu_stop
=> cpu_stopper_thread
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
<...>-1334 [189] d..1 700.567235: irq_do_set_affinity: irq: 128 ret 2 mask: 188-189 eff: 188
<...>-1334 [189] d..1 700.567243: <stack trace>
=> native_cpu_disable
=> take_cpu_down
=> multi_cpu_stop
=> cpu_stopper_thread
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
<...>-1322 [187] d..1 1300.660985: irq_do_set_affinity: irq: 127 ret 2 mask: 186-187 eff: 186
<...>-1322 [187] d..1 1300.660993: <stack trace>
=> native_cpu_disable
=> take_cpu_down
=> multi_cpu_stop
=> cpu_stopper_thread
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
---
Thanks,
Yasuaki Ishimatsu
On 10/10/2017 12:30 PM, YASUAKI ISHIMATSU wrote:
> Hi Thomas,
>
> Sorry for the late reply.
>
> I'll apply the patches and retest in this week.
> Please wait a while.
>
> Thanks,
> Yasuaki Ishimatsu
>
> On 10/04/2017 05:04 PM, Thomas Gleixner wrote:
>> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
>>> Can you please apply the debug patch below.
>>
>> I found an issue with managed interrupts when the affinity mask of an
>> managed interrupt spawns multiple CPUs. Explanation in the changelog
>> below. I'm not sure that this cures the problems you have, but at least I
>> could prove that it's not doing what it should do. The failure I'm seing is
>> fixed, but I can't test that megasas driver due to -ENOHARDWARE.
>>
>> Can you please apply the patch below on top of Linus tree and retest?
>>
>> Please send me the outputs I asked you to provide last time in any case
>> (success or fail).
>>
>> @block/scsi folks: Can you please run that through your tests as well?
>>
>> Thanks,
>>
>> tglx
>>
>> 8<-----------------------
>> Subject: genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
>> From: Thomas Gleixner <tglx@linutronix.de>
>> Date: Wed, 04 Oct 2017 21:07:38 +0200
>>
>> Managed interrupts can end up in a stale state on CPU hotplug. If the
>> interrupt is not targeting a single CPU, i.e. the affinity mask spawns
>> multiple CPUs then the following can happen:
>>
>> After boot:
>>
>> dstate: 0x01601200
>> IRQD_ACTIVATED
>> IRQD_IRQ_STARTED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_SET
>> IRQD_AFFINITY_MANAGED
>> node: 0
>> affinity: 24-31
>> effectiv: 24
>> pending: 0
>>
>> After offlining CPU 31 - 24
>>
>> dstate: 0x01a31000
>> IRQD_IRQ_DISABLED
>> IRQD_IRQ_MASKED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_SET
>> IRQD_AFFINITY_MANAGED
>> IRQD_MANAGED_SHUTDOWN
>> node: 0
>> affinity: 24-31
>> effectiv: 24
>> pending: 0
>>
>> Now CPU 25 gets onlined again, so it should get the effective interrupt
>> affinity for this interruopt, but due to the x86 interrupt affinity setter
>> restrictions this ends up after restarting the interrupt with:
>>
>> dstate: 0x01601300
>> IRQD_ACTIVATED
>> IRQD_IRQ_STARTED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_SET
>> IRQD_SETAFFINITY_PENDING
>> IRQD_AFFINITY_MANAGED
>> node: 0
>> affinity: 24-31
>> effectiv: 24
>> pending: 24-31
>>
>> So the interrupt is still affine to CPU 24, which was the last CPU to go
>> offline of that affinity set and the move to an online CPU within 24-31,
>> in this case 25, is pending. This mechanism is x86/ia64 specific as those
>> architectures cannot move interrupts from thread context and do this when
>> an interrupt is actually handled. So the move is set to pending.
>>
>> Whats worse is that offlining CPU 25 again results in:
>>
>> dstate: 0x01601300
>> IRQD_ACTIVATED
>> IRQD_IRQ_STARTED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_SET
>> IRQD_SETAFFINITY_PENDING
>> IRQD_AFFINITY_MANAGED
>> node: 0
>> affinity: 24-31
>> effectiv: 24
>> pending: 24-31
>>
>> This means the interrupt has not been shut down, because the outgoing CPU
>> is not in the effective affinity mask, but of course nothing notices that
>> the effective affinity mask is pointing at an offline CPU.
>>
>> In the case of restarting a managed interrupt the move restriction does not
>> apply, so the affinity setting can be made unconditional. This needs to be
>> done _before_ the interrupt is started up as otherwise the condition for
>> moving it from thread context would not longer be fulfilled.
>>
>> With that change applied onlining CPU 25 after offlining 31-24 results in:
>>
>> dstate: 0x01600200
>> IRQD_ACTIVATED
>> IRQD_IRQ_STARTED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_MANAGED
>> node: 0
>> affinity: 24-31
>> effectiv: 25
>> pending:
>>
>> And after offlining CPU 25:
>>
>> dstate: 0x01a30000
>> IRQD_IRQ_DISABLED
>> IRQD_IRQ_MASKED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_MANAGED
>> IRQD_MANAGED_SHUTDOWN
>> node: 0
>> affinity: 24-31
>> effectiv: 25
>> pending:
>>
>> which is the correct and expected result.
>>
>> To complete that, add some debug code to catch this kind of situation in
>> the cpu offline code and warn about interrupt chips which allow affinity
>> setting and do not update the effective affinity mask if that feature is
>> enabled.
>>
>> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>
>> ---
>> kernel/irq/chip.c | 2 +-
>> kernel/irq/cpuhotplug.c | 28 +++++++++++++++++++++++++++-
>> kernel/irq/manage.c | 17 +++++++++++++++++
>> 3 files changed, 45 insertions(+), 2 deletions(-)
>>
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, b
>> irq_setup_affinity(desc);
>> break;
>> case IRQ_STARTUP_MANAGED:
>> + irq_do_set_affinity(d, aff, false);
>> ret = __irq_startup(desc);
>> - irq_set_affinity_locked(d, aff, false);
>> break;
>> case IRQ_STARTUP_ABORT:
>> return 0;
>> --- a/kernel/irq/cpuhotplug.c
>> +++ b/kernel/irq/cpuhotplug.c
>> @@ -18,8 +18,34 @@
>> static inline bool irq_needs_fixup(struct irq_data *d)
>> {
>> const struct cpumask *m = irq_data_get_effective_affinity_mask(d);
>> + unsigned int cpu = smp_processor_id();
>>
>> - return cpumask_test_cpu(smp_processor_id(), m);
>> +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
>> + /*
>> + * The cpumask_empty() check is a workaround for interrupt chips,
>> + * which do not implement effective affinity, but the architecture has
>> + * enabled the config switch. Use the general affinity mask instead.
>> + */
>> + if (cpumask_empty(m))
>> + m = irq_data_get_affinity_mask(d);
>> +
>> + /*
>> + * Sanity check. If the mask is not empty when excluding the outgoing
>> + * CPU then it must contain at least one online CPU. The outgoing CPU
>> + * has been removed from the online mask already.
>> + */
>> + if (cpumask_any_but(m, cpu) < nr_cpu_ids &&
>> + cpumask_any_and(m, cpu_online_mask) >= nr_cpu_ids) {
>> + /*
>> + * If this happens then there was a missed IRQ fixup at some
>> + * point. Warn about it and enforce fixup.
>> + */
>> + pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
>> + cpumask_pr_args(m), d->irq, cpu);
>> + return true;
>> + }
>> +#endif
>> + return cpumask_test_cpu(cpu, m);
>> }
>>
>> static bool migrate_one_irq(struct irq_desc *desc)
>> --- a/kernel/irq/manage.c
>> +++ b/kernel/irq/manage.c
>> @@ -168,6 +168,19 @@ void irq_set_thread_affinity(struct irq_
>> set_bit(IRQTF_AFFINITY, &action->thread_flags);
>> }
>>
>> +static void irq_validate_effective_affinity(struct irq_data *data)
>> +{
>> +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
>> + const struct cpumask *m = irq_data_get_effective_affinity_mask(data);
>> + struct irq_chip *chip = irq_data_get_irq_chip(data);
>> +
>> + if (!cpumask_empty(m))
>> + return;
>> + pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n",
>> + chip->name, data->irq);
>> +#endif
>> +}
>> +
>> int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>> bool force)
>> {
>> @@ -175,12 +188,16 @@ int irq_do_set_affinity(struct irq_data
>> struct irq_chip *chip = irq_data_get_irq_chip(data);
>> int ret;
>>
>> + if (!chip || !chip->irq_set_affinity)
>> + return -EINVAL;
>> +
>> ret = chip->irq_set_affinity(data, mask, force);
>> switch (ret) {
>> case IRQ_SET_MASK_OK:
>> case IRQ_SET_MASK_OK_DONE:
>> cpumask_copy(desc->irq_common_data.affinity, mask);
>> case IRQ_SET_MASK_OK_NOCOPY:
>> + irq_validate_effective_affinity(data);
>> irq_set_thread_affinity(desc);
>> ret = 0;
>> }
>>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-10-16 18:59 ` YASUAKI ISHIMATSU
0 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-10-16 18:59 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara, yasu.isimatu
Hi Thomas,
> Can you please apply the patch below on top of Linus tree and retest?
>
> Please send me the outputs I asked you to provide last time in any case
> (success or fail).
The issue still occurs even if I applied your patch to linux 4.14.0-rc4.
---
[ ...] INFO: task setroubleshootd:4972 blocked for more than 120 seconds.
[ ...] Not tainted 4.14.0-rc4.thomas.with.irqdebug+ #6
[ ...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ ...] setroubleshootd D 0 4972 1 0x00000080
[ ...] Call Trace:
[ ...] __schedule+0x28d/0x890
[ ...] ? release_pages+0x16f/0x3f0
[ ...] schedule+0x36/0x80
[ ...] io_schedule+0x16/0x40
[ ...] wait_on_page_bit+0x107/0x150
[ ...] ? page_cache_tree_insert+0xb0/0xb0
[ ...] truncate_inode_pages_range+0x3dd/0x7d0
[ ...] ? schedule_hrtimeout_range_clock+0xad/0x140
[ ...] ? remove_wait_queue+0x59/0x60
[ ...] ? down_write+0x12/0x40
[ ...] ? unmap_mapping_range+0x75/0x130
[ ...] truncate_pagecache+0x47/0x60
[ ...] truncate_setsize+0x32/0x40
[ ...] xfs_setattr_size+0x100/0x300 [xfs]
[ ...] xfs_vn_setattr_size+0x40/0x90 [xfs]
[ ...] xfs_vn_setattr+0x87/0xa0 [xfs]
[ ...] notify_change+0x266/0x440
[ ...] do_truncate+0x75/0xc0
[ ...] path_openat+0xaba/0x13b0
[ ...] ? mem_cgroup_commit_charge+0x31/0x130
[ ...] do_filp_open+0x91/0x100
[ ...] ? __alloc_fd+0x46/0x170
[ ...] do_sys_open+0x124/0x210
[ ...] SyS_open+0x1e/0x20
[ ...] do_syscall_64+0x67/0x1b0
[ ...] entry_SYSCALL64_slow_path+0x25/0x25
[ ...] RIP: 0033:0x7f275e2365bd
[ ...] RSP: 002b:00007ffe29337da0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[ ...] RAX: ffffffffffffffda RBX: 00000000040aea00 RCX: 00007f275e2365bd
[ ...] RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 00000000040ae840
[ ...] RBP: 00007ffe29337e00 R08: 00000000040aea06 R09: 0000000000000240
[ ...] R10: 0000000000000024 R11: 0000000000000293 R12: 00000000040eb660
[ ...] R13: 0000000000000004 R14: 00000000040ae840 R15: 000000000186a0a0
[ ...] sd 0:2:0:0: [sda] tag#0 task abort called for scmd(ffff9b4bf2306160)
[ ...] sd 0:2:0:0: [sda] tag#0 CDB: Write(10) 2a 00 0b 3a 82 a0 00 00 20 00
[ ...] sd 0:2:0:0: task abort: FAILED scmd(ffff9b4bf2306160)
[ ...] sd 0:2:0:0: target reset called for scmd(ffff9b4bf2306160)
[ ...] sd 0:2:0:0: [sda] tag#0 megasas: target reset FAILED!!
[ ...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO timeout
[ ...] SCSI command pointer: (ffff9b4bf2306160) SCSI host state: 5 SCSI
---
I could not prepare the same environment I reported. So I reproduced
the issue on the following megasas environment.
---
IRQ affinity_list IRQ_TYPE
34 0-1 IR-PCI-MSI 1048576-edge megasas
35 2-3 IR-PCI-MSI 1048577-edge megasas
36 4-5 IR-PCI-MSI 1048578-edge megasas
37 6-7 IR-PCI-MSI 1048579-edge megasas
38 8-9 IR-PCI-MSI 1048580-edge megasas
39 10-11 IR-PCI-MSI 1048581-edge megasas
40 12-13 IR-PCI-MSI 1048582-edge megasas
41 14-15 IR-PCI-MSI 1048583-edge megasas
42 16-17 IR-PCI-MSI 1048584-edge megasas
43 18-19 IR-PCI-MSI 1048585-edge megasas
44 20-21 IR-PCI-MSI 1048586-edge megasas
45 22-23 IR-PCI-MSI 1048587-edge megasas
46 24-25 IR-PCI-MSI 1048588-edge megasas
47 26-27 IR-PCI-MSI 1048589-edge megasas
48 28-29 IR-PCI-MSI 1048590-edge megasas
49 30-31 IR-PCI-MSI 1048591-edge megasas
50 32-33 IR-PCI-MSI 1048592-edge megasas
51 34-35 IR-PCI-MSI 1048593-edge megasas
52 36-37 IR-PCI-MSI 1048594-edge megasas
53 38-39 IR-PCI-MSI 1048595-edge megasas
54 40-41 IR-PCI-MSI 1048596-edge megasas
55 42-43 IR-PCI-MSI 1048597-edge megasas
56 44-45 IR-PCI-MSI 1048598-edge megasas
57 46-47 IR-PCI-MSI 1048599-edge megasas
58 48-49 IR-PCI-MSI 1048600-edge megasas
59 50-51 IR-PCI-MSI 1048601-edge megasas
60 52-53 IR-PCI-MSI 1048602-edge megasas
61 54-55 IR-PCI-MSI 1048603-edge megasas
62 56-57 IR-PCI-MSI 1048604-edge megasas
63 58-59 IR-PCI-MSI 1048605-edge megasas
64 60-61 IR-PCI-MSI 1048606-edge megasas
65 62-63 IR-PCI-MSI 1048607-edge megasas
66 64-65 IR-PCI-MSI 1048608-edge megasas
67 66-67 IR-PCI-MSI 1048609-edge megasas
68 68-69 IR-PCI-MSI 1048610-edge megasas
69 70-71 IR-PCI-MSI 1048611-edge megasas
70 72-73 IR-PCI-MSI 1048612-edge megasas
71 74-75 IR-PCI-MSI 1048613-edge megasas
72 76-77 IR-PCI-MSI 1048614-edge megasas
73 78-79 IR-PCI-MSI 1048615-edge megasas
74 80-81 IR-PCI-MSI 1048616-edge megasas
75 82-83 IR-PCI-MSI 1048617-edge megasas
76 84-85 IR-PCI-MSI 1048618-edge megasas
77 86-87 IR-PCI-MSI 1048619-edge megasas
78 88-89 IR-PCI-MSI 1048620-edge megasas
79 90-91 IR-PCI-MSI 1048621-edge megasas
80 92-93 IR-PCI-MSI 1048622-edge megasas
81 94-95 IR-PCI-MSI 1048623-edge megasas
82 96-97 IR-PCI-MSI 1048624-edge megasas
83 98-99 IR-PCI-MSI 1048625-edge megasas
84 100-101 IR-PCI-MSI 1048626-edge megasas
85 102-103 IR-PCI-MSI 1048627-edge megasas
86 104-105 IR-PCI-MSI 1048628-edge megasas
87 106-107 IR-PCI-MSI 1048629-edge megasas
88 108-109 IR-PCI-MSI 1048630-edge megasas
89 110-111 IR-PCI-MSI 1048631-edge megasas
90 112-113 IR-PCI-MSI 1048632-edge megasas
91 114-115 IR-PCI-MSI 1048633-edge megasas
92 116-117 IR-PCI-MSI 1048634-edge megasas
93 118-119 IR-PCI-MSI 1048635-edge megasas
94 120-121 IR-PCI-MSI 1048636-edge megasas
95 122-123 IR-PCI-MSI 1048637-edge megasas
96 124-125 IR-PCI-MSI 1048638-edge megasas
97 126-127 IR-PCI-MSI 1048639-edge megasas
98 128-129 IR-PCI-MSI 1048640-edge megasas
99 130-131 IR-PCI-MSI 1048641-edge megasas
100 132-133 IR-PCI-MSI 1048642-edge megasas
101 134-135 IR-PCI-MSI 1048643-edge megasas
102 136-137 IR-PCI-MSI 1048644-edge megasas
103 138-139 IR-PCI-MSI 1048645-edge megasas
104 140-141 IR-PCI-MSI 1048646-edge megasas
105 142-143 IR-PCI-MSI 1048647-edge megasas
106 144-145 IR-PCI-MSI 1048648-edge megasas
107 146-147 IR-PCI-MSI 1048649-edge megasas
108 148-149 IR-PCI-MSI 1048650-edge megasas
109 150-151 IR-PCI-MSI 1048651-edge megasas
110 152-153 IR-PCI-MSI 1048652-edge megasas
111 154-155 IR-PCI-MSI 1048653-edge megasas
112 156-157 IR-PCI-MSI 1048654-edge megasas
113 158-159 IR-PCI-MSI 1048655-edge megasas
114 160-161 IR-PCI-MSI 1048656-edge megasas
115 162-163 IR-PCI-MSI 1048657-edge megasas
116 164-165 IR-PCI-MSI 1048658-edge megasas
117 166-167 IR-PCI-MSI 1048659-edge megasas
118 168-169 IR-PCI-MSI 1048660-edge megasas
119 170-171 IR-PCI-MSI 1048661-edge megasas
120 172-173 IR-PCI-MSI 1048662-edge megasas
121 174-175 IR-PCI-MSI 1048663-edge megasas
122 176-177 IR-PCI-MSI 1048664-edge megasas
123 178-179 IR-PCI-MSI 1048665-edge megasas
124 180-181 IR-PCI-MSI 1048666-edge megasas
125 182-183 IR-PCI-MSI 1048667-edge megasas
126 184-185 IR-PCI-MSI 1048668-edge megasas
127 186-187 IR-PCI-MSI 1048669-edge megasas
128 188-189 IR-PCI-MSI 1048670-edge megasas
129 190-191 IR-PCI-MSI 1048671-edge megasas
---
Here are trace log that I offlined CPU 186-191 in descending order.
When I offlined CPU 186, the issue occurred.
---
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
systemd-1 [000] d... 0.427765: irq_do_set_affinity: irq: 24 ret 0 mask: 0-23 eff: 0
systemd-1 [029] d... 16.745803: irq_do_set_affinity: irq: 9 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.850146: irq_do_set_affinity: irq: 25 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.856549: irq_do_set_affinity: irq: 26 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.862920: irq_do_set_affinity: irq: 27 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.869300: irq_do_set_affinity: irq: 28 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.875685: irq_do_set_affinity: irq: 29 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 21.897267: irq_do_set_affinity: irq: 30 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 23.983226: irq_do_set_affinity: irq: 31 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 23.998459: irq_do_set_affinity: irq: 32 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 26.095152: irq_do_set_affinity: irq: 33 ret 2 mask: 0-23 eff: 0-5
kworker/0:3-1458 [000] d... 28.497033: irq_do_set_affinity: irq: 16 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 28.715688: irq_do_set_affinity: irq: 8 ret 2 mask: 0-23 eff: 0-5
systemd-1 [120] d... 29.163740: irq_do_set_affinity: irq: 4 ret 2 mask: 0-23 eff: 0-5
kworker/0:1-134 [000] d... 30.625367: irq_do_set_affinity: irq: 34 ret 2 mask: 0-1 eff: 0-1
kworker/0:1-134 [000] d... 30.625400: irq_do_set_affinity: irq: 35 ret 2 mask: 2-3 eff: 2-3
kworker/0:1-134 [000] d... 30.625442: irq_do_set_affinity: irq: 36 ret 2 mask: 4-5 eff: 4-5
kworker/0:1-134 [000] d... 30.625474: irq_do_set_affinity: irq: 37 ret 2 mask: 6-7 eff: 6-7
kworker/0:1-134 [000] d... 30.625513: irq_do_set_affinity: irq: 38 ret 2 mask: 8-9 eff: 8-9
kworker/0:1-134 [000] d... 30.625549: irq_do_set_affinity: irq: 39 ret 2 mask: 10-11 eff: 10-11
kworker/0:1-134 [000] d... 30.625585: irq_do_set_affinity: irq: 40 ret 2 mask: 12-13 eff: 12-13
kworker/0:1-134 [000] d... 30.625621: irq_do_set_affinity: irq: 41 ret 2 mask: 14-15 eff: 14-15
kworker/0:1-134 [000] d... 30.625656: irq_do_set_affinity: irq: 42 ret 2 mask: 16-17 eff: 16-17
kworker/0:1-134 [000] d... 30.625692: irq_do_set_affinity: irq: 43 ret 2 mask: 18-19 eff: 18-19
kworker/0:1-134 [000] d... 30.625732: irq_do_set_affinity: irq: 44 ret 2 mask: 20-21 eff: 20-21
kworker/0:1-134 [000] d... 30.625768: irq_do_set_affinity: irq: 45 ret 2 mask: 22-23 eff: 22-23
kworker/0:1-134 [000] d... 30.625801: irq_do_set_affinity: irq: 46 ret 2 mask: 24-25 eff: 24-25
kworker/0:1-134 [000] d... 30.625818: irq_do_set_affinity: irq: 47 ret 2 mask: 26-27 eff: 26-27
kworker/0:1-134 [000] d... 30.625843: irq_do_set_affinity: irq: 48 ret 2 mask: 28-29 eff: 28-29
kworker/0:1-134 [000] d... 30.625869: irq_do_set_affinity: irq: 49 ret 2 mask: 30-31 eff: 30-31
kworker/0:1-134 [000] d... 30.625897: irq_do_set_affinity: irq: 50 ret 2 mask: 32-33 eff: 32-33
kworker/0:1-134 [000] d... 30.625922: irq_do_set_affinity: irq: 51 ret 2 mask: 34-35 eff: 34-35
kworker/0:1-134 [000] d... 30.625947: irq_do_set_affinity: irq: 52 ret 2 mask: 36-37 eff: 36-37
kworker/0:1-134 [000] d... 30.625969: irq_do_set_affinity: irq: 53 ret 2 mask: 38-39 eff: 38-39
kworker/0:1-134 [000] d... 30.625992: irq_do_set_affinity: irq: 54 ret 2 mask: 40-41 eff: 40-41
kworker/0:1-134 [000] d... 30.626012: irq_do_set_affinity: irq: 55 ret 2 mask: 42-43 eff: 42-43
kworker/0:1-134 [000] d... 30.626032: irq_do_set_affinity: irq: 56 ret 2 mask: 44-45 eff: 44-45
kworker/0:1-134 [000] d... 30.626052: irq_do_set_affinity: irq: 57 ret 2 mask: 46-47 eff: 46-47
kworker/0:1-134 [000] d... 30.626088: irq_do_set_affinity: irq: 58 ret 2 mask: 48-49 eff: 48-49
kworker/0:1-134 [000] d... 30.626105: irq_do_set_affinity: irq: 59 ret 2 mask: 50-51 eff: 50-51
kworker/0:1-134 [000] d... 30.626118: irq_do_set_affinity: irq: 60 ret 2 mask: 52-53 eff: 52-53
kworker/0:1-134 [000] d... 30.626157: irq_do_set_affinity: irq: 61 ret 2 mask: 54-55 eff: 54-55
kworker/0:1-134 [000] d... 30.626185: irq_do_set_affinity: irq: 62 ret 2 mask: 56-57 eff: 56-57
kworker/0:1-134 [000] d... 30.626217: irq_do_set_affinity: irq: 63 ret 2 mask: 58-59 eff: 58-59
kworker/0:1-134 [000] d... 30.626243: irq_do_set_affinity: irq: 64 ret 2 mask: 60-61 eff: 60-61
kworker/0:1-134 [000] d... 30.626269: irq_do_set_affinity: irq: 65 ret 2 mask: 62-63 eff: 62-63
kworker/0:1-134 [000] d... 30.626299: irq_do_set_affinity: irq: 66 ret 2 mask: 64-65 eff: 64-65
kworker/0:1-134 [000] d... 30.626322: irq_do_set_affinity: irq: 67 ret 2 mask: 66-67 eff: 66-67
kworker/0:1-134 [000] d... 30.626346: irq_do_set_affinity: irq: 68 ret 2 mask: 68-69 eff: 68-69
kworker/0:1-134 [000] d... 30.626368: irq_do_set_affinity: irq: 69 ret 2 mask: 70-71 eff: 70-71
kworker/0:1-134 [000] d... 30.626390: irq_do_set_affinity: irq: 70 ret 2 mask: 72-73 eff: 72-73
kworker/0:1-134 [000] d... 30.626405: irq_do_set_affinity: irq: 71 ret 2 mask: 74-75 eff: 74-75
kworker/0:1-134 [000] d... 30.626417: irq_do_set_affinity: irq: 72 ret 2 mask: 76-77 eff: 76-77
kworker/0:1-134 [000] d... 30.626455: irq_do_set_affinity: irq: 73 ret 2 mask: 78-79 eff: 78-79
kworker/0:1-134 [000] d... 30.626483: irq_do_set_affinity: irq: 74 ret 2 mask: 80-81 eff: 80-81
kworker/0:1-134 [000] d... 30.626510: irq_do_set_affinity: irq: 75 ret 2 mask: 82-83 eff: 82-83
kworker/0:1-134 [000] d... 30.626535: irq_do_set_affinity: irq: 76 ret 2 mask: 84-85 eff: 84-85
kworker/0:1-134 [000] d... 30.626563: irq_do_set_affinity: irq: 77 ret 2 mask: 86-87 eff: 86-87
kworker/0:1-134 [000] d... 30.626585: irq_do_set_affinity: irq: 78 ret 2 mask: 88-89 eff: 88-89
kworker/0:1-134 [000] d... 30.626604: irq_do_set_affinity: irq: 79 ret 2 mask: 90-91 eff: 90-91
kworker/0:1-134 [000] d... 30.626624: irq_do_set_affinity: irq: 80 ret 2 mask: 92-93 eff: 92-93
kworker/0:1-134 [000] d... 30.626644: irq_do_set_affinity: irq: 81 ret 2 mask: 94-95 eff: 94-95
kworker/0:1-134 [000] d... 30.626665: irq_do_set_affinity: irq: 82 ret 2 mask: 96-97 eff: 96-97
kworker/0:1-134 [000] d... 30.626679: irq_do_set_affinity: irq: 83 ret 2 mask: 98-99 eff: 98-99
kworker/0:1-134 [000] d... 30.626693: irq_do_set_affinity: irq: 84 ret 2 mask: 100-101 eff: 100-101
kworker/0:1-134 [000] d... 30.626708: irq_do_set_affinity: irq: 85 ret 2 mask: 102-103 eff: 102-103
kworker/0:1-134 [000] d... 30.626750: irq_do_set_affinity: irq: 86 ret 2 mask: 104-105 eff: 104-105
kworker/0:1-134 [000] d... 30.626784: irq_do_set_affinity: irq: 87 ret 2 mask: 106-107 eff: 106-107
kworker/0:1-134 [000] d... 30.626814: irq_do_set_affinity: irq: 88 ret 2 mask: 108-109 eff: 108-109
kworker/0:1-134 [000] d... 30.626844: irq_do_set_affinity: irq: 89 ret 2 mask: 110-111 eff: 110-111
kworker/0:1-134 [000] d... 30.626872: irq_do_set_affinity: irq: 90 ret 2 mask: 112-113 eff: 112-113
kworker/0:1-134 [000] d... 30.626896: irq_do_set_affinity: irq: 91 ret 2 mask: 114-115 eff: 114-115
kworker/0:1-134 [000] d... 30.626928: irq_do_set_affinity: irq: 92 ret 2 mask: 116-117 eff: 116-117
kworker/0:1-134 [000] d... 30.626954: irq_do_set_affinity: irq: 93 ret 2 mask: 118-119 eff: 118-119
kworker/0:1-134 [000] d... 30.626975: irq_do_set_affinity: irq: 94 ret 2 mask: 120-121 eff: 120-121
kworker/0:1-134 [000] d... 30.626996: irq_do_set_affinity: irq: 95 ret 2 mask: 122-123 eff: 122-123
kworker/0:1-134 [000] d... 30.627022: irq_do_set_affinity: irq: 96 ret 2 mask: 124-125 eff: 124-125
kworker/0:1-134 [000] d... 30.627050: irq_do_set_affinity: irq: 97 ret 2 mask: 126-127 eff: 126-127
kworker/0:1-134 [000] d... 30.627081: irq_do_set_affinity: irq: 98 ret 2 mask: 128-129 eff: 128-129
kworker/0:1-134 [000] d... 30.627110: irq_do_set_affinity: irq: 99 ret 2 mask: 130-131 eff: 130-131
kworker/0:1-134 [000] d... 30.627137: irq_do_set_affinity: irq: 100 ret 2 mask: 132-133 eff: 132-133
kworker/0:1-134 [000] d... 30.627164: irq_do_set_affinity: irq: 101 ret 2 mask: 134-135 eff: 134-135
kworker/0:1-134 [000] d... 30.627191: irq_do_set_affinity: irq: 102 ret 2 mask: 136-137 eff: 136-137
kworker/0:1-134 [000] d... 30.627214: irq_do_set_affinity: irq: 103 ret 2 mask: 138-139 eff: 138-139
kworker/0:1-134 [000] d... 30.627238: irq_do_set_affinity: irq: 104 ret 2 mask: 140-141 eff: 140-141
kworker/0:1-134 [000] d... 30.627263: irq_do_set_affinity: irq: 105 ret 2 mask: 142-143 eff: 142-143
kworker/0:1-134 [000] d... 30.627283: irq_do_set_affinity: irq: 106 ret 2 mask: 144-145 eff: 144-145
kworker/0:1-134 [000] d... 30.627296: irq_do_set_affinity: irq: 107 ret 2 mask: 146-147 eff: 146-147
kworker/0:1-134 [000] d... 30.627311: irq_do_set_affinity: irq: 108 ret 2 mask: 148-149 eff: 148-149
kworker/0:1-134 [000] d... 30.627344: irq_do_set_affinity: irq: 109 ret 2 mask: 150-151 eff: 150-151
kworker/0:1-134 [000] d... 30.627377: irq_do_set_affinity: irq: 110 ret 2 mask: 152-153 eff: 152-153
kworker/0:1-134 [000] d... 30.627410: irq_do_set_affinity: irq: 111 ret 2 mask: 154-155 eff: 154-155
kworker/0:1-134 [000] d... 30.627437: irq_do_set_affinity: irq: 112 ret 2 mask: 156-157 eff: 156-157
kworker/0:1-134 [000] d... 30.627467: irq_do_set_affinity: irq: 113 ret 2 mask: 158-159 eff: 158-159
kworker/0:1-134 [000] d... 30.627494: irq_do_set_affinity: irq: 114 ret 2 mask: 160-161 eff: 160-161
kworker/0:1-134 [000] d... 30.627519: irq_do_set_affinity: irq: 115 ret 2 mask: 162-163 eff: 162-163
kworker/0:1-134 [000] d... 30.627545: irq_do_set_affinity: irq: 116 ret 2 mask: 164-165 eff: 164-165
kworker/0:1-134 [000] d... 30.627569: irq_do_set_affinity: irq: 117 ret 2 mask: 166-167 eff: 166-167
kworker/0:1-134 [000] d... 30.627589: irq_do_set_affinity: irq: 118 ret 2 mask: 168-169 eff: 168-169
kworker/0:1-134 [000] d... 30.627607: irq_do_set_affinity: irq: 119 ret 2 mask: 170-171 eff: 170-171
kworker/0:1-134 [000] d... 30.627639: irq_do_set_affinity: irq: 120 ret 2 mask: 172-173 eff: 172-173
kworker/0:1-134 [000] d... 30.627666: irq_do_set_affinity: irq: 121 ret 2 mask: 174-175 eff: 174-175
kworker/0:1-134 [000] d... 30.627691: irq_do_set_affinity: irq: 122 ret 2 mask: 176-177 eff: 176-177
kworker/0:1-134 [000] d... 30.627721: irq_do_set_affinity: irq: 123 ret 2 mask: 178-179 eff: 178-179
kworker/0:1-134 [000] d... 30.627748: irq_do_set_affinity: irq: 124 ret 2 mask: 180-181 eff: 180-181
kworker/0:1-134 [000] d... 30.627774: irq_do_set_affinity: irq: 125 ret 2 mask: 182-183 eff: 182-183
kworker/0:1-134 [000] d... 30.627799: irq_do_set_affinity: irq: 126 ret 2 mask: 184-185 eff: 184-185
kworker/0:1-134 [000] d... 30.627828: irq_do_set_affinity: irq: 127 ret 2 mask: 186-187 eff: 186
kworker/0:1-134 [000] d... 30.627850: irq_do_set_affinity: irq: 128 ret 2 mask: 188-189 eff: 188
kworker/0:1-134 [000] d... 30.627875: irq_do_set_affinity: irq: 129 ret 2 mask: 190-191 eff: 190
kworker/0:0-3 [000] d... 38.217213: irq_do_set_affinity: irq: 18 ret 2 mask: 0-23 eff: 0-5
systemd-udevd-2007 [129] d... 38.510108: irq_do_set_affinity: irq: 3 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732162: irq_do_set_affinity: irq: 131 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732195: irq_do_set_affinity: irq: 132 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732214: irq_do_set_affinity: irq: 133 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732229: irq_do_set_affinity: irq: 134 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732246: irq_do_set_affinity: irq: 135 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732261: irq_do_set_affinity: irq: 136 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732276: irq_do_set_affinity: irq: 137 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732292: irq_do_set_affinity: irq: 138 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.732308: irq_do_set_affinity: irq: 139 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865529: irq_do_set_affinity: irq: 140 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865557: irq_do_set_affinity: irq: 141 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865575: irq_do_set_affinity: irq: 142 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865591: irq_do_set_affinity: irq: 143 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865607: irq_do_set_affinity: irq: 144 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865621: irq_do_set_affinity: irq: 145 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865635: irq_do_set_affinity: irq: 146 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865650: irq_do_set_affinity: irq: 147 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 44.865664: irq_do_set_affinity: irq: 148 ret 2 mask: 0-23 eff: 0-5
NetworkManager-2628 [135] d... 45.041598: irq_do_set_affinity: irq: 130 ret 2 mask: 0-23 eff: 6-9,11,16,18-19,21-23,26-28,31,34-35,38,40-43,47-63
NetworkManager-2628 [135] d... 45.042054: irq_do_set_affinity: irq: 130 ret 2 mask: 0-23 eff: 6-8,10-11,16,18-19,21-23,26-28,31,34-35,38,40-43,47-63
NetworkManager-2628 [135] d... 45.150285: irq_do_set_affinity: irq: 130 ret 2 mask: 0-23 eff: 0-5
(agetty)-3134 [049] d... 55.930794: irq_do_set_affinity: irq: 4 ret 2 mask: 0-23 eff: 0-5
<...>-1346 [191] d..1 100.473714: irq_do_set_affinity: irq: 129 ret 2 mask: 190-191 eff: 190
<...>-1346 [191] d..1 100.473722: <stack trace>
=> native_cpu_disable
=> take_cpu_down
=> multi_cpu_stop
=> cpu_stopper_thread
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
<...>-1334 [189] d..1 700.567235: irq_do_set_affinity: irq: 128 ret 2 mask: 188-189 eff: 188
<...>-1334 [189] d..1 700.567243: <stack trace>
=> native_cpu_disable
=> take_cpu_down
=> multi_cpu_stop
=> cpu_stopper_thread
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
<...>-1322 [187] d..1 1300.660985: irq_do_set_affinity: irq: 127 ret 2 mask: 186-187 eff: 186
<...>-1322 [187] d..1 1300.660993: <stack trace>
=> native_cpu_disable
=> take_cpu_down
=> multi_cpu_stop
=> cpu_stopper_thread
=> smpboot_thread_fn
=> kthread
=> ret_from_fork
---
Thanks,
Yasuaki Ishimatsu
On 10/10/2017 12:30 PM, YASUAKI ISHIMATSU wrote:
> Hi Thomas,
>
> Sorry for the late reply.
>
> I'll apply the patches and retest in this week.
> Please wait a while.
>
> Thanks,
> Yasuaki Ishimatsu
>
> On 10/04/2017 05:04 PM, Thomas Gleixner wrote:
>> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
>>> Can you please apply the debug patch below.
>>
>> I found an issue with managed interrupts when the affinity mask of an
>> managed interrupt spawns multiple CPUs. Explanation in the changelog
>> below. I'm not sure that this cures the problems you have, but at least I
>> could prove that it's not doing what it should do. The failure I'm seing is
>> fixed, but I can't test that megasas driver due to -ENOHARDWARE.
>>
>> Can you please apply the patch below on top of Linus tree and retest?
>>
>> Please send me the outputs I asked you to provide last time in any case
>> (success or fail).
>>
>> @block/scsi folks: Can you please run that through your tests as well?
>>
>> Thanks,
>>
>> tglx
>>
>> 8<-----------------------
>> Subject: genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
>> From: Thomas Gleixner <tglx@linutronix.de>
>> Date: Wed, 04 Oct 2017 21:07:38 +0200
>>
>> Managed interrupts can end up in a stale state on CPU hotplug. If the
>> interrupt is not targeting a single CPU, i.e. the affinity mask spawns
>> multiple CPUs then the following can happen:
>>
>> After boot:
>>
>> dstate: 0x01601200
>> IRQD_ACTIVATED
>> IRQD_IRQ_STARTED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_SET
>> IRQD_AFFINITY_MANAGED
>> node: 0
>> affinity: 24-31
>> effectiv: 24
>> pending: 0
>>
>> After offlining CPU 31 - 24
>>
>> dstate: 0x01a31000
>> IRQD_IRQ_DISABLED
>> IRQD_IRQ_MASKED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_SET
>> IRQD_AFFINITY_MANAGED
>> IRQD_MANAGED_SHUTDOWN
>> node: 0
>> affinity: 24-31
>> effectiv: 24
>> pending: 0
>>
>> Now CPU 25 gets onlined again, so it should get the effective interrupt
>> affinity for this interruopt, but due to the x86 interrupt affinity setter
>> restrictions this ends up after restarting the interrupt with:
>>
>> dstate: 0x01601300
>> IRQD_ACTIVATED
>> IRQD_IRQ_STARTED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_SET
>> IRQD_SETAFFINITY_PENDING
>> IRQD_AFFINITY_MANAGED
>> node: 0
>> affinity: 24-31
>> effectiv: 24
>> pending: 24-31
>>
>> So the interrupt is still affine to CPU 24, which was the last CPU to go
>> offline of that affinity set and the move to an online CPU within 24-31,
>> in this case 25, is pending. This mechanism is x86/ia64 specific as those
>> architectures cannot move interrupts from thread context and do this when
>> an interrupt is actually handled. So the move is set to pending.
>>
>> Whats worse is that offlining CPU 25 again results in:
>>
>> dstate: 0x01601300
>> IRQD_ACTIVATED
>> IRQD_IRQ_STARTED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_SET
>> IRQD_SETAFFINITY_PENDING
>> IRQD_AFFINITY_MANAGED
>> node: 0
>> affinity: 24-31
>> effectiv: 24
>> pending: 24-31
>>
>> This means the interrupt has not been shut down, because the outgoing CPU
>> is not in the effective affinity mask, but of course nothing notices that
>> the effective affinity mask is pointing at an offline CPU.
>>
>> In the case of restarting a managed interrupt the move restriction does not
>> apply, so the affinity setting can be made unconditional. This needs to be
>> done _before_ the interrupt is started up as otherwise the condition for
>> moving it from thread context would not longer be fulfilled.
>>
>> With that change applied onlining CPU 25 after offlining 31-24 results in:
>>
>> dstate: 0x01600200
>> IRQD_ACTIVATED
>> IRQD_IRQ_STARTED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_MANAGED
>> node: 0
>> affinity: 24-31
>> effectiv: 25
>> pending:
>>
>> And after offlining CPU 25:
>>
>> dstate: 0x01a30000
>> IRQD_IRQ_DISABLED
>> IRQD_IRQ_MASKED
>> IRQD_SINGLE_TARGET
>> IRQD_AFFINITY_MANAGED
>> IRQD_MANAGED_SHUTDOWN
>> node: 0
>> affinity: 24-31
>> effectiv: 25
>> pending:
>>
>> which is the correct and expected result.
>>
>> To complete that, add some debug code to catch this kind of situation in
>> the cpu offline code and warn about interrupt chips which allow affinity
>> setting and do not update the effective affinity mask if that feature is
>> enabled.
>>
>> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>
>> ---
>> kernel/irq/chip.c | 2 +-
>> kernel/irq/cpuhotplug.c | 28 +++++++++++++++++++++++++++-
>> kernel/irq/manage.c | 17 +++++++++++++++++
>> 3 files changed, 45 insertions(+), 2 deletions(-)
>>
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, b
>> irq_setup_affinity(desc);
>> break;
>> case IRQ_STARTUP_MANAGED:
>> + irq_do_set_affinity(d, aff, false);
>> ret = __irq_startup(desc);
>> - irq_set_affinity_locked(d, aff, false);
>> break;
>> case IRQ_STARTUP_ABORT:
>> return 0;
>> --- a/kernel/irq/cpuhotplug.c
>> +++ b/kernel/irq/cpuhotplug.c
>> @@ -18,8 +18,34 @@
>> static inline bool irq_needs_fixup(struct irq_data *d)
>> {
>> const struct cpumask *m = irq_data_get_effective_affinity_mask(d);
>> + unsigned int cpu = smp_processor_id();
>>
>> - return cpumask_test_cpu(smp_processor_id(), m);
>> +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
>> + /*
>> + * The cpumask_empty() check is a workaround for interrupt chips,
>> + * which do not implement effective affinity, but the architecture has
>> + * enabled the config switch. Use the general affinity mask instead.
>> + */
>> + if (cpumask_empty(m))
>> + m = irq_data_get_affinity_mask(d);
>> +
>> + /*
>> + * Sanity check. If the mask is not empty when excluding the outgoing
>> + * CPU then it must contain at least one online CPU. The outgoing CPU
>> + * has been removed from the online mask already.
>> + */
>> + if (cpumask_any_but(m, cpu) < nr_cpu_ids &&
>> + cpumask_any_and(m, cpu_online_mask) >= nr_cpu_ids) {
>> + /*
>> + * If this happens then there was a missed IRQ fixup at some
>> + * point. Warn about it and enforce fixup.
>> + */
>> + pr_warn("Eff. affinity %*pbl of IRQ %u contains only offline CPUs after offlining CPU %u\n",
>> + cpumask_pr_args(m), d->irq, cpu);
>> + return true;
>> + }
>> +#endif
>> + return cpumask_test_cpu(cpu, m);
>> }
>>
>> static bool migrate_one_irq(struct irq_desc *desc)
>> --- a/kernel/irq/manage.c
>> +++ b/kernel/irq/manage.c
>> @@ -168,6 +168,19 @@ void irq_set_thread_affinity(struct irq_
>> set_bit(IRQTF_AFFINITY, &action->thread_flags);
>> }
>>
>> +static void irq_validate_effective_affinity(struct irq_data *data)
>> +{
>> +#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
>> + const struct cpumask *m = irq_data_get_effective_affinity_mask(data);
>> + struct irq_chip *chip = irq_data_get_irq_chip(data);
>> +
>> + if (!cpumask_empty(m))
>> + return;
>> + pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n",
>> + chip->name, data->irq);
>> +#endif
>> +}
>> +
>> int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>> bool force)
>> {
>> @@ -175,12 +188,16 @@ int irq_do_set_affinity(struct irq_data
>> struct irq_chip *chip = irq_data_get_irq_chip(data);
>> int ret;
>>
>> + if (!chip || !chip->irq_set_affinity)
>> + return -EINVAL;
>> +
>> ret = chip->irq_set_affinity(data, mask, force);
>> switch (ret) {
>> case IRQ_SET_MASK_OK:
>> case IRQ_SET_MASK_OK_DONE:
>> cpumask_copy(desc->irq_common_data.affinity, mask);
>> case IRQ_SET_MASK_OK_NOCOPY:
>> + irq_validate_effective_affinity(data);
>> irq_set_thread_affinity(desc);
>> ret = 0;
>> }
>>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-10-16 18:59 ` YASUAKI ISHIMATSU
@ 2017-10-16 20:27 ` Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-10-16 20:27 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
Yasuaki,
On Mon, 16 Oct 2017, YASUAKI ISHIMATSU wrote:
> Hi Thomas,
>
> > Can you please apply the patch below on top of Linus tree and retest?
> >
> > Please send me the outputs I asked you to provide last time in any case
> > (success or fail).
>
> The issue still occurs even if I applied your patch to linux 4.14.0-rc4.
Thanks for testing.
> ---
> [ ...] INFO: task setroubleshootd:4972 blocked for more than 120 seconds.
> [ ...] Not tainted 4.14.0-rc4.thomas.with.irqdebug+ #6
> [ ...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ ...] setroubleshootd D 0 4972 1 0x00000080
> [ ...] Call Trace:
> [ ...] __schedule+0x28d/0x890
> [ ...] ? release_pages+0x16f/0x3f0
> [ ...] schedule+0x36/0x80
> [ ...] io_schedule+0x16/0x40
> [ ...] wait_on_page_bit+0x107/0x150
> [ ...] ? page_cache_tree_insert+0xb0/0xb0
> [ ...] truncate_inode_pages_range+0x3dd/0x7d0
> [ ...] ? schedule_hrtimeout_range_clock+0xad/0x140
> [ ...] ? remove_wait_queue+0x59/0x60
> [ ...] ? down_write+0x12/0x40
> [ ...] ? unmap_mapping_range+0x75/0x130
> [ ...] truncate_pagecache+0x47/0x60
> [ ...] truncate_setsize+0x32/0x40
> [ ...] xfs_setattr_size+0x100/0x300 [xfs]
> [ ...] xfs_vn_setattr_size+0x40/0x90 [xfs]
> [ ...] xfs_vn_setattr+0x87/0xa0 [xfs]
> [ ...] notify_change+0x266/0x440
> [ ...] do_truncate+0x75/0xc0
> [ ...] path_openat+0xaba/0x13b0
> [ ...] ? mem_cgroup_commit_charge+0x31/0x130
> [ ...] do_filp_open+0x91/0x100
> [ ...] ? __alloc_fd+0x46/0x170
> [ ...] do_sys_open+0x124/0x210
> [ ...] SyS_open+0x1e/0x20
> [ ...] do_syscall_64+0x67/0x1b0
> [ ...] entry_SYSCALL64_slow_path+0x25/0x25
This is definitely a driver issue. The driver requests an affinity managed
interrupt. Affinity managed interrupts are different from non managed
interrupts in several ways:
Non-Managed interrupts:
1) At setup time the default interrupt affinity is assigned to each
interrupt. The effective affinity is usually a subset of the online
CPUs.
2) User space can modify the affinity of the interrupt
3) If a CPU in the affinity mask goes offline and there are still online
CPUs in the affinity mask then the effective affinity is moved to a
subset of the online CPUs in the affinity mask.
If the last CPU in the affinity mask of an interrupt goes offline then
the hotplug code breaks the affinity and makes it affine to the online
CPUs. The effective affinity is a subset of the new affinity setting,
Managed interrupts:
1) At setup time the interrupts of a multiqueue device are evenly spread
over the possible CPUs. If all CPUs in the affinity mask of a given
interrupt are offline at request_irq() time, the interrupt stays shut
down. If the first CPU in the affinity mask comes online later the
interrupt is started up.
2) User space cannot modify the affinity of the interrupt
3) If a CPU in the affinity mask goes offline and there are still online
CPUs in the affinity mask then the effective affinity is moved a subset
of the online CPUs in the affinity mask. I.e. the same as with
Non-Managed interrupts.
If the last CPU in the affinity mask of a managed interrupt goes
offline then the interrupt is shutdown. If the first CPU in the
affinity mask becomes online again then the interrupt is started up
again.
So this has consequences:
1) The device driver has to make sure that no requests are targeted at a
queue whose interrupt is affine to offline CPUs and therefor shut
down. If the driver ignores that then this queue will not deliver an
interrupt simply because that interrupt is shut down.
2) When the last CPU in the affinity mask of a queue interrupt goes
offline the device driver has to make sure, that all outstanding
requests in the queue which have not yet delivered their interrupt are
completed. This is required because when the CPU is finally offline the
interrupt is shut down and wont deliver any more interrupts.
If that does not happen then the not yet completed request will try to
send the completion interrupt which obviously gets not delivered
because it is shut down.
It's hard to tell from the debug information which of the constraints (#1
or #2 or both) has been violated by the driver (or the device hardware /
firmware) but the effect that the task which submitted the I/O operation is
hung after an offline operation points clearly into that direction.
The irq core code is doing what is expected and I have no clue about that
megasas driver/hardware so I have to punt and redirect you to the SCSI and
megasas people.
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-10-16 20:27 ` Thomas Gleixner
0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-10-16 20:27 UTC (permalink / raw)
To: YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena,
Shivasharan Srikanteshwara
Yasuaki,
On Mon, 16 Oct 2017, YASUAKI ISHIMATSU wrote:
> Hi Thomas,
>
> > Can you please apply the patch below on top of Linus tree and retest?
> >
> > Please send me the outputs I asked you to provide last time in any case
> > (success or fail).
>
> The issue still occurs even if I applied your patch to linux 4.14.0-rc4.
Thanks for testing.
> ---
> [ ...] INFO: task setroubleshootd:4972 blocked for more than 120 seconds.
> [ ...] Not tainted 4.14.0-rc4.thomas.with.irqdebug+ #6
> [ ...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ ...] setroubleshootd D 0 4972 1 0x00000080
> [ ...] Call Trace:
> [ ...] __schedule+0x28d/0x890
> [ ...] ? release_pages+0x16f/0x3f0
> [ ...] schedule+0x36/0x80
> [ ...] io_schedule+0x16/0x40
> [ ...] wait_on_page_bit+0x107/0x150
> [ ...] ? page_cache_tree_insert+0xb0/0xb0
> [ ...] truncate_inode_pages_range+0x3dd/0x7d0
> [ ...] ? schedule_hrtimeout_range_clock+0xad/0x140
> [ ...] ? remove_wait_queue+0x59/0x60
> [ ...] ? down_write+0x12/0x40
> [ ...] ? unmap_mapping_range+0x75/0x130
> [ ...] truncate_pagecache+0x47/0x60
> [ ...] truncate_setsize+0x32/0x40
> [ ...] xfs_setattr_size+0x100/0x300 [xfs]
> [ ...] xfs_vn_setattr_size+0x40/0x90 [xfs]
> [ ...] xfs_vn_setattr+0x87/0xa0 [xfs]
> [ ...] notify_change+0x266/0x440
> [ ...] do_truncate+0x75/0xc0
> [ ...] path_openat+0xaba/0x13b0
> [ ...] ? mem_cgroup_commit_charge+0x31/0x130
> [ ...] do_filp_open+0x91/0x100
> [ ...] ? __alloc_fd+0x46/0x170
> [ ...] do_sys_open+0x124/0x210
> [ ...] SyS_open+0x1e/0x20
> [ ...] do_syscall_64+0x67/0x1b0
> [ ...] entry_SYSCALL64_slow_path+0x25/0x25
This is definitely a driver issue. The driver requests an affinity managed
interrupt. Affinity managed interrupts are different from non managed
interrupts in several ways:
Non-Managed interrupts:
1) At setup time the default interrupt affinity is assigned to each
interrupt. The effective affinity is usually a subset of the online
CPUs.
2) User space can modify the affinity of the interrupt
3) If a CPU in the affinity mask goes offline and there are still online
CPUs in the affinity mask then the effective affinity is moved to a
subset of the online CPUs in the affinity mask.
If the last CPU in the affinity mask of an interrupt goes offline then
the hotplug code breaks the affinity and makes it affine to the online
CPUs. The effective affinity is a subset of the new affinity setting,
Managed interrupts:
1) At setup time the interrupts of a multiqueue device are evenly spread
over the possible CPUs. If all CPUs in the affinity mask of a given
interrupt are offline at request_irq() time, the interrupt stays shut
down. If the first CPU in the affinity mask comes online later the
interrupt is started up.
2) User space cannot modify the affinity of the interrupt
3) If a CPU in the affinity mask goes offline and there are still online
CPUs in the affinity mask then the effective affinity is moved a subset
of the online CPUs in the affinity mask. I.e. the same as with
Non-Managed interrupts.
If the last CPU in the affinity mask of a managed interrupt goes
offline then the interrupt is shutdown. If the first CPU in the
affinity mask becomes online again then the interrupt is started up
again.
So this has consequences:
1) The device driver has to make sure that no requests are targeted at a
queue whose interrupt is affine to offline CPUs and therefor shut
down. If the driver ignores that then this queue will not deliver an
interrupt simply because that interrupt is shut down.
2) When the last CPU in the affinity mask of a queue interrupt goes
offline the device driver has to make sure, that all outstanding
requests in the queue which have not yet delivered their interrupt are
completed. This is required because when the CPU is finally offline the
interrupt is shut down and wont deliver any more interrupts.
If that does not happen then the not yet completed request will try to
send the completion interrupt which obviously gets not delivered
because it is shut down.
It's hard to tell from the debug information which of the constraints (#1
or #2 or both) has been violated by the driver (or the device hardware /
firmware) but the effect that the task which submitted the I/O operation is
hung after an offline operation points clearly into that direction.
The irq core code is doing what is expected and I have no clue about that
megasas driver/hardware so I have to punt and redirect you to the SCSI and
megasas people.
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: system hung up when offlining CPUs
2017-10-16 20:27 ` Thomas Gleixner
@ 2017-10-30 9:08 ` Shivasharan Srikanteshwara
-1 siblings, 0 replies; 43+ messages in thread
From: Shivasharan Srikanteshwara @ 2017-10-30 9:08 UTC (permalink / raw)
To: Thomas Gleixner, YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena
> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Tuesday, October 17, 2017 1:57 AM
> To: YASUAKI ISHIMATSU
> Cc: Kashyap Desai; Hannes Reinecke; Marc Zyngier; Christoph Hellwig;
> axboe@kernel.dk; mpe@ellerman.id.au; keith.busch@intel.com;
> peterz@infradead.org; LKML; linux-scsi@vger.kernel.org; Sumit Saxena;
> Shivasharan Srikanteshwara
> Subject: Re: system hung up when offlining CPUs
>
> Yasuaki,
>
> On Mon, 16 Oct 2017, YASUAKI ISHIMATSU wrote:
>
> > Hi Thomas,
> >
> > > Can you please apply the patch below on top of Linus tree and
retest?
> > >
> > > Please send me the outputs I asked you to provide last time in any
> > > case (success or fail).
> >
> > The issue still occurs even if I applied your patch to linux
4.14.0-rc4.
>
> Thanks for testing.
>
> > ---
> > [ ...] INFO: task setroubleshootd:4972 blocked for more than 120
seconds.
> > [ ...] Not tainted 4.14.0-rc4.thomas.with.irqdebug+ #6
> > [ ...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this
> message.
> > [ ...] setroubleshootd D 0 4972 1 0x00000080
> > [ ...] Call Trace:
> > [ ...] __schedule+0x28d/0x890
> > [ ...] ? release_pages+0x16f/0x3f0
> > [ ...] schedule+0x36/0x80
> > [ ...] io_schedule+0x16/0x40
> > [ ...] wait_on_page_bit+0x107/0x150
> > [ ...] ? page_cache_tree_insert+0xb0/0xb0 [ ...]
> > truncate_inode_pages_range+0x3dd/0x7d0
> > [ ...] ? schedule_hrtimeout_range_clock+0xad/0x140
> > [ ...] ? remove_wait_queue+0x59/0x60
> > [ ...] ? down_write+0x12/0x40
> > [ ...] ? unmap_mapping_range+0x75/0x130 [ ...]
> > truncate_pagecache+0x47/0x60 [ ...] truncate_setsize+0x32/0x40 [ ...]
> > xfs_setattr_size+0x100/0x300 [xfs] [ ...]
> > xfs_vn_setattr_size+0x40/0x90 [xfs] [ ...] xfs_vn_setattr+0x87/0xa0
> > [xfs] [ ...] notify_change+0x266/0x440 [ ...] do_truncate+0x75/0xc0
> > [ ...] path_openat+0xaba/0x13b0 [ ...] ?
> > mem_cgroup_commit_charge+0x31/0x130
> > [ ...] do_filp_open+0x91/0x100
> > [ ...] ? __alloc_fd+0x46/0x170
> > [ ...] do_sys_open+0x124/0x210
> > [ ...] SyS_open+0x1e/0x20
> > [ ...] do_syscall_64+0x67/0x1b0
> > [ ...] entry_SYSCALL64_slow_path+0x25/0x25
>
> This is definitely a driver issue. The driver requests an affinity
managed
> interrupt. Affinity managed interrupts are different from non managed
> interrupts in several ways:
>
> Non-Managed interrupts:
>
> 1) At setup time the default interrupt affinity is assigned to each
> interrupt. The effective affinity is usually a subset of the online
> CPUs.
>
> 2) User space can modify the affinity of the interrupt
>
> 3) If a CPU in the affinity mask goes offline and there are still
online
> CPUs in the affinity mask then the effective affinity is moved to a
> subset of the online CPUs in the affinity mask.
>
> If the last CPU in the affinity mask of an interrupt goes offline
then
> the hotplug code breaks the affinity and makes it affine to the
online
> CPUs. The effective affinity is a subset of the new affinity
setting,
>
> Managed interrupts:
>
> 1) At setup time the interrupts of a multiqueue device are evenly
spread
> over the possible CPUs. If all CPUs in the affinity mask of a given
> interrupt are offline at request_irq() time, the interrupt stays
shut
> down. If the first CPU in the affinity mask comes online later the
> interrupt is started up.
>
> 2) User space cannot modify the affinity of the interrupt
>
> 3) If a CPU in the affinity mask goes offline and there are still
online
> CPUs in the affinity mask then the effective affinity is moved a
subset
> of the online CPUs in the affinity mask. I.e. the same as with
> Non-Managed interrupts.
>
> If the last CPU in the affinity mask of a managed interrupt goes
> offline then the interrupt is shutdown. If the first CPU in the
> affinity mask becomes online again then the interrupt is started up
> again.
>
Hi Thomas,
Thanks for the detailed explanation about the behavior of managed
interrupts.
This helped me to understand the issue better. This is first time I am
checking CPU hotplug system,
so my input is very preliminary. Please bear with my understanding and
correct me where required.
This issue is reproducible on our local setup as well, with managed
interrupts.
I have few queries on the requirements for device driver that you have
mentioned.
In managed-interrupts case, interrupts which were affine to the offlined
CPU is not getting migrated
to another available CPU. But the documentation at below link says that
"all interrupts" are migrated
to a new CPU. So not all interrupts are getting migrated to a new CPU
then.
https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html#the-offlin
e-case
"- All interrupts targeted to this CPU are migrated to a new CPU"
> So this has consequences:
>
> 1) The device driver has to make sure that no requests are targeted at
a
> queue whose interrupt is affine to offline CPUs and therefor shut
> down. If the driver ignores that then this queue will not deliver an
> interrupt simply because that interrupt is shut down.
>
> 2) When the last CPU in the affinity mask of a queue interrupt goes
> offline the device driver has to make sure, that all outstanding
> requests in the queue which have not yet delivered their interrupt
are
> completed. This is required because when the CPU is finally offline
the
> interrupt is shut down and wont deliver any more interrupts.
>
> If that does not happen then the not yet completed request will try
to
> send the completion interrupt which obviously gets not delivered
> because it is shut down.
>
Once the last CPU in the affinity mask is offlined and a particular IRQ is
shutdown, is there a way
currently for the device driver to get callback to complete all
outstanding requests on that queue?
>From the ftrace that I captured, below were the various functions being
called once the irq shutdown was initiated.
There were no callbacks being called from the irq core that I could see.
<...>-16 [001] d..1 9915.744040: irq_shutdown
<-irq_migrate_all_off_this_cpu
^^^^^^^^^^^
<...>-16 [001] d..1 9915.744040: __irq_disable
<-irq_shutdown
<...>-16 [001] d..1 9915.744041: mask_irq.part.30
<-__irq_disable
<...>-16 [001] d..1 9915.744041: pci_msi_mask_irq
<-mask_irq.part.30
<...>-16 [001] d..1 9915.744041: msi_set_mask_bit
<-pci_msi_mask_irq
<...>-16 [001] d..1 9915.744042: irq_domain_deactivate_irq
<-irq_shutdown
<...>-16 [001] d..1 9915.744043:
__irq_domain_deactivate_irq <-irq_domain_deactivate_irq
<...>-16 [001] d..1 9915.744043: msi_domain_deactivate
<-__irq_domain_deactivate_irq
<...>-16 [001] d..1 9915.744044: pci_msi_domain_write_msg
<-msi_domain_deactivate
<...>-16 [001] d..1 9915.744044: __pci_write_msi_msg
<-pci_msi_domain_write_msg
<...>-16 [001] d..1 9915.744044:
__irq_domain_deactivate_irq <-__irq_domain_deactivate_irq
<...>-16 [001] d..1 9915.744045:
intel_irq_remapping_deactivate <-__irq_domain_deactivate_irq
<...>-16 [001] d..1 9915.744045: modify_irte
<-intel_irq_remapping_deactivate
<...>-16 [001] d..1 9915.744045: _raw_spin_lock_irqsave
<-modify_irte
<...>-16 [001] d..1 9915.744045: qi_submit_sync
<-modify_irte
<...>-16 [001] d..1 9915.744046: _raw_spin_lock_irqsave
<-qi_submit_sync
<...>-16 [001] d..1 9915.744046: _raw_spin_lock
<-qi_submit_sync
<...>-16 [001] d..1 9915.744047:
_raw_spin_unlock_irqrestore <-qi_submit_sync
<...>-16 [001] d..1 9915.744047:
_raw_spin_unlock_irqrestore <-modify_irte
<...>-16 [001] d..1 9915.744047:
__irq_domain_deactivate_irq <-__irq_domain_deactivate_irq
In my knowledge many device drivers in the kernel tree pass
PCI_IRQ_AFFINITY flag to
pci_alloc_irq_vectors and it is widely used feature (not limited to
megaraid_sas driver).
But I could not see any of the drivers working as per the constraints
mentioned.
Can you please point me to any existing driver to understand how above
constraints can
be implemented?
Below is simple grep I ran on drivers passing PCI_IRQ_AFFINITY flag to
pci_alloc_irq_vectors.
# grep -R "PCI_IRQ_AFFINITY" drivers/*
drivers/nvme/host/pci.c: PCI_IRQ_ALL_TYPES |
PCI_IRQ_AFFINITY);
drivers/pci/host/vmd.c: PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY);
drivers/pci/msi.c: if (flags & PCI_IRQ_AFFINITY) {
drivers/scsi/aacraid/comminit.c:
PCI_IRQ_MSIX | PCI_IRQ_AFFINITY);
drivers/scsi/be2iscsi/be_main.c:
PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, &desc) < 0) {
drivers/scsi/csiostor/csio_isr.c: PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY, &desc);
drivers/scsi/hpsa.c: PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY);
drivers/scsi/lpfc/lpfc_init.c: vectors,
PCI_IRQ_MSIX | PCI_IRQ_AFFINITY);
drivers/scsi/megaraid/megaraid_sas_base.c: irq_flags
|= PCI_IRQ_AFFINITY;
drivers/scsi/megaraid/megaraid_sas_base.c: irq_flags
|= PCI_IRQ_AFFINITY;
drivers/scsi/mpt3sas/mpt3sas_base.c: irq_flags |=
PCI_IRQ_AFFINITY;
drivers/scsi/qla2xxx/qla_isr.c: ha->msix_count, PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY,
drivers/scsi/smartpqi/smartpqi_init.c: PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY);
drivers/virtio/virtio_pci_common.c:
(desc ? PCI_IRQ_AFFINITY : 0),
Thanks,
Shivasharan
> It's hard to tell from the debug information which of the constraints
(#1 or #2
> or both) has been violated by the driver (or the device hardware /
> firmware) but the effect that the task which submitted the I/O operation
is
> hung after an offline operation points clearly into that direction.
>
> The irq core code is doing what is expected and I have no clue about
that
> megasas driver/hardware so I have to punt and redirect you to the SCSI
and
> megasas people.
>
> Thanks,
>
> tglx
>
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: system hung up when offlining CPUs
@ 2017-10-30 9:08 ` Shivasharan Srikanteshwara
0 siblings, 0 replies; 43+ messages in thread
From: Shivasharan Srikanteshwara @ 2017-10-30 9:08 UTC (permalink / raw)
To: Thomas Gleixner, YASUAKI ISHIMATSU
Cc: Kashyap Desai, Hannes Reinecke, Marc Zyngier, Christoph Hellwig,
axboe, mpe, keith.busch, peterz, LKML, linux-scsi, Sumit Saxena
> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Tuesday, October 17, 2017 1:57 AM
> To: YASUAKI ISHIMATSU
> Cc: Kashyap Desai; Hannes Reinecke; Marc Zyngier; Christoph Hellwig;
> axboe@kernel.dk; mpe@ellerman.id.au; keith.busch@intel.com;
> peterz@infradead.org; LKML; linux-scsi@vger.kernel.org; Sumit Saxena;
> Shivasharan Srikanteshwara
> Subject: Re: system hung up when offlining CPUs
>
> Yasuaki,
>
> On Mon, 16 Oct 2017, YASUAKI ISHIMATSU wrote:
>
> > Hi Thomas,
> >
> > > Can you please apply the patch below on top of Linus tree and
retest?
> > >
> > > Please send me the outputs I asked you to provide last time in any
> > > case (success or fail).
> >
> > The issue still occurs even if I applied your patch to linux
4.14.0-rc4.
>
> Thanks for testing.
>
> > ---
> > [ ...] INFO: task setroubleshootd:4972 blocked for more than 120
seconds.
> > [ ...] Not tainted 4.14.0-rc4.thomas.with.irqdebug+ #6
> > [ ...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this
> message.
> > [ ...] setroubleshootd D 0 4972 1 0x00000080
> > [ ...] Call Trace:
> > [ ...] __schedule+0x28d/0x890
> > [ ...] ? release_pages+0x16f/0x3f0
> > [ ...] schedule+0x36/0x80
> > [ ...] io_schedule+0x16/0x40
> > [ ...] wait_on_page_bit+0x107/0x150
> > [ ...] ? page_cache_tree_insert+0xb0/0xb0 [ ...]
> > truncate_inode_pages_range+0x3dd/0x7d0
> > [ ...] ? schedule_hrtimeout_range_clock+0xad/0x140
> > [ ...] ? remove_wait_queue+0x59/0x60
> > [ ...] ? down_write+0x12/0x40
> > [ ...] ? unmap_mapping_range+0x75/0x130 [ ...]
> > truncate_pagecache+0x47/0x60 [ ...] truncate_setsize+0x32/0x40 [ ...]
> > xfs_setattr_size+0x100/0x300 [xfs] [ ...]
> > xfs_vn_setattr_size+0x40/0x90 [xfs] [ ...] xfs_vn_setattr+0x87/0xa0
> > [xfs] [ ...] notify_change+0x266/0x440 [ ...] do_truncate+0x75/0xc0
> > [ ...] path_openat+0xaba/0x13b0 [ ...] ?
> > mem_cgroup_commit_charge+0x31/0x130
> > [ ...] do_filp_open+0x91/0x100
> > [ ...] ? __alloc_fd+0x46/0x170
> > [ ...] do_sys_open+0x124/0x210
> > [ ...] SyS_open+0x1e/0x20
> > [ ...] do_syscall_64+0x67/0x1b0
> > [ ...] entry_SYSCALL64_slow_path+0x25/0x25
>
> This is definitely a driver issue. The driver requests an affinity
managed
> interrupt. Affinity managed interrupts are different from non managed
> interrupts in several ways:
>
> Non-Managed interrupts:
>
> 1) At setup time the default interrupt affinity is assigned to each
> interrupt. The effective affinity is usually a subset of the online
> CPUs.
>
> 2) User space can modify the affinity of the interrupt
>
> 3) If a CPU in the affinity mask goes offline and there are still
online
> CPUs in the affinity mask then the effective affinity is moved to a
> subset of the online CPUs in the affinity mask.
>
> If the last CPU in the affinity mask of an interrupt goes offline
then
> the hotplug code breaks the affinity and makes it affine to the
online
> CPUs. The effective affinity is a subset of the new affinity
setting,
>
> Managed interrupts:
>
> 1) At setup time the interrupts of a multiqueue device are evenly
spread
> over the possible CPUs. If all CPUs in the affinity mask of a given
> interrupt are offline at request_irq() time, the interrupt stays
shut
> down. If the first CPU in the affinity mask comes online later the
> interrupt is started up.
>
> 2) User space cannot modify the affinity of the interrupt
>
> 3) If a CPU in the affinity mask goes offline and there are still
online
> CPUs in the affinity mask then the effective affinity is moved a
subset
> of the online CPUs in the affinity mask. I.e. the same as with
> Non-Managed interrupts.
>
> If the last CPU in the affinity mask of a managed interrupt goes
> offline then the interrupt is shutdown. If the first CPU in the
> affinity mask becomes online again then the interrupt is started up
> again.
>
Hi Thomas,
Thanks for the detailed explanation about the behavior of managed
interrupts.
This helped me to understand the issue better. This is first time I am
checking CPU hotplug system,
so my input is very preliminary. Please bear with my understanding and
correct me where required.
This issue is reproducible on our local setup as well, with managed
interrupts.
I have few queries on the requirements for device driver that you have
mentioned.
In managed-interrupts case, interrupts which were affine to the offlined
CPU is not getting migrated
to another available CPU. But the documentation at below link says that
"all interrupts" are migrated
to a new CPU. So not all interrupts are getting migrated to a new CPU
then.
https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html#the-offlin
e-case
"- All interrupts targeted to this CPU are migrated to a new CPU"
> So this has consequences:
>
> 1) The device driver has to make sure that no requests are targeted at
a
> queue whose interrupt is affine to offline CPUs and therefor shut
> down. If the driver ignores that then this queue will not deliver an
> interrupt simply because that interrupt is shut down.
>
> 2) When the last CPU in the affinity mask of a queue interrupt goes
> offline the device driver has to make sure, that all outstanding
> requests in the queue which have not yet delivered their interrupt
are
> completed. This is required because when the CPU is finally offline
the
> interrupt is shut down and wont deliver any more interrupts.
>
> If that does not happen then the not yet completed request will try
to
> send the completion interrupt which obviously gets not delivered
> because it is shut down.
>
Once the last CPU in the affinity mask is offlined and a particular IRQ is
shutdown, is there a way
currently for the device driver to get callback to complete all
outstanding requests on that queue?
>From the ftrace that I captured, below were the various functions being
called once the irq shutdown was initiated.
There were no callbacks being called from the irq core that I could see.
<...>-16 [001] d..1 9915.744040: irq_shutdown
<-irq_migrate_all_off_this_cpu
^^^^^^^^^^^
<...>-16 [001] d..1 9915.744040: __irq_disable
<-irq_shutdown
<...>-16 [001] d..1 9915.744041: mask_irq.part.30
<-__irq_disable
<...>-16 [001] d..1 9915.744041: pci_msi_mask_irq
<-mask_irq.part.30
<...>-16 [001] d..1 9915.744041: msi_set_mask_bit
<-pci_msi_mask_irq
<...>-16 [001] d..1 9915.744042: irq_domain_deactivate_irq
<-irq_shutdown
<...>-16 [001] d..1 9915.744043:
__irq_domain_deactivate_irq <-irq_domain_deactivate_irq
<...>-16 [001] d..1 9915.744043: msi_domain_deactivate
<-__irq_domain_deactivate_irq
<...>-16 [001] d..1 9915.744044: pci_msi_domain_write_msg
<-msi_domain_deactivate
<...>-16 [001] d..1 9915.744044: __pci_write_msi_msg
<-pci_msi_domain_write_msg
<...>-16 [001] d..1 9915.744044:
__irq_domain_deactivate_irq <-__irq_domain_deactivate_irq
<...>-16 [001] d..1 9915.744045:
intel_irq_remapping_deactivate <-__irq_domain_deactivate_irq
<...>-16 [001] d..1 9915.744045: modify_irte
<-intel_irq_remapping_deactivate
<...>-16 [001] d..1 9915.744045: _raw_spin_lock_irqsave
<-modify_irte
<...>-16 [001] d..1 9915.744045: qi_submit_sync
<-modify_irte
<...>-16 [001] d..1 9915.744046: _raw_spin_lock_irqsave
<-qi_submit_sync
<...>-16 [001] d..1 9915.744046: _raw_spin_lock
<-qi_submit_sync
<...>-16 [001] d..1 9915.744047:
_raw_spin_unlock_irqrestore <-qi_submit_sync
<...>-16 [001] d..1 9915.744047:
_raw_spin_unlock_irqrestore <-modify_irte
<...>-16 [001] d..1 9915.744047:
__irq_domain_deactivate_irq <-__irq_domain_deactivate_irq
In my knowledge many device drivers in the kernel tree pass
PCI_IRQ_AFFINITY flag to
pci_alloc_irq_vectors and it is widely used feature (not limited to
megaraid_sas driver).
But I could not see any of the drivers working as per the constraints
mentioned.
Can you please point me to any existing driver to understand how above
constraints can
be implemented?
Below is simple grep I ran on drivers passing PCI_IRQ_AFFINITY flag to
pci_alloc_irq_vectors.
# grep -R "PCI_IRQ_AFFINITY" drivers/*
drivers/nvme/host/pci.c: PCI_IRQ_ALL_TYPES |
PCI_IRQ_AFFINITY);
drivers/pci/host/vmd.c: PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY);
drivers/pci/msi.c: if (flags & PCI_IRQ_AFFINITY) {
drivers/scsi/aacraid/comminit.c:
PCI_IRQ_MSIX | PCI_IRQ_AFFINITY);
drivers/scsi/be2iscsi/be_main.c:
PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, &desc) < 0) {
drivers/scsi/csiostor/csio_isr.c: PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY, &desc);
drivers/scsi/hpsa.c: PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY);
drivers/scsi/lpfc/lpfc_init.c: vectors,
PCI_IRQ_MSIX | PCI_IRQ_AFFINITY);
drivers/scsi/megaraid/megaraid_sas_base.c: irq_flags
|= PCI_IRQ_AFFINITY;
drivers/scsi/megaraid/megaraid_sas_base.c: irq_flags
|= PCI_IRQ_AFFINITY;
drivers/scsi/mpt3sas/mpt3sas_base.c: irq_flags |=
PCI_IRQ_AFFINITY;
drivers/scsi/qla2xxx/qla_isr.c: ha->msix_count, PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY,
drivers/scsi/smartpqi/smartpqi_init.c: PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY);
drivers/virtio/virtio_pci_common.c:
(desc ? PCI_IRQ_AFFINITY : 0),
Thanks,
Shivasharan
> It's hard to tell from the debug information which of the constraints
(#1 or #2
> or both) has been violated by the driver (or the device hardware /
> firmware) but the effect that the task which submitted the I/O operation
is
> hung after an offline operation points clearly into that direction.
>
> The irq core code is doing what is expected and I have no clue about
that
> megasas driver/hardware so I have to punt and redirect you to the SCSI
and
> megasas people.
>
> Thanks,
>
> tglx
>
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: system hung up when offlining CPUs
2017-10-30 9:08 ` Shivasharan Srikanteshwara
@ 2017-11-01 0:47 ` Thomas Gleixner
-1 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-11-01 0:47 UTC (permalink / raw)
To: Shivasharan Srikanteshwara
Cc: YASUAKI ISHIMATSU, Kashyap Desai, Hannes Reinecke, Marc Zyngier,
Christoph Hellwig, axboe, mpe, keith.busch, peterz, LKML,
linux-scsi, Sumit Saxena
On Mon, 30 Oct 2017, Shivasharan Srikanteshwara wrote:
> In managed-interrupts case, interrupts which were affine to the offlined
> CPU is not getting migrated to another available CPU. But the
> documentation at below link says that "all interrupts" are migrated to a
> new CPU. So not all interrupts are getting migrated to a new CPU then.
Correct.
> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html#the-offlin
> e-case
> "- All interrupts targeted to this CPU are migrated to a new CPU"
Well, documentation is not always up to date :)
> Once the last CPU in the affinity mask is offlined and a particular IRQ
> is shutdown, is there a way currently for the device driver to get
> callback to complete all outstanding requests on that queue?
No and I have no idea how the other drivers deal with that.
The way you can do that is to have your own hotplug callback which is
invoked when the cpu goes down, but way before the interrupt is shut down,
which is one of the last steps. Ideally this would be a callback in the
generic block code which then calls out to all instances like its done for
the cpu dead state.
Jens, Christoph?
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: system hung up when offlining CPUs
@ 2017-11-01 0:47 ` Thomas Gleixner
0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2017-11-01 0:47 UTC (permalink / raw)
To: Shivasharan Srikanteshwara
Cc: YASUAKI ISHIMATSU, Kashyap Desai, Hannes Reinecke, Marc Zyngier,
Christoph Hellwig, axboe, mpe, keith.busch, peterz, LKML,
linux-scsi, Sumit Saxena
On Mon, 30 Oct 2017, Shivasharan Srikanteshwara wrote:
> In managed-interrupts case, interrupts which were affine to the offlined
> CPU is not getting migrated to another available CPU. But the
> documentation at below link says that "all interrupts" are migrated to a
> new CPU. So not all interrupts are getting migrated to a new CPU then.
Correct.
> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html#the-offlin
> e-case
> "- All interrupts targeted to this CPU are migrated to a new CPU"
Well, documentation is not always up to date :)
> Once the last CPU in the affinity mask is offlined and a particular IRQ
> is shutdown, is there a way currently for the device driver to get
> callback to complete all outstanding requests on that queue?
No and I have no idea how the other drivers deal with that.
The way you can do that is to have your own hotplug callback which is
invoked when the cpu goes down, but way before the interrupt is shut down,
which is one of the last steps. Ideally this would be a callback in the
generic block code which then calls out to all instances like its done for
the cpu dead state.
Jens, Christoph?
Thanks,
tglx
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
2017-11-01 0:47 ` Thomas Gleixner
@ 2017-11-01 11:01 ` Hannes Reinecke
-1 siblings, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2017-11-01 11:01 UTC (permalink / raw)
To: Thomas Gleixner, Shivasharan Srikanteshwara
Cc: YASUAKI ISHIMATSU, Kashyap Desai, Marc Zyngier,
Christoph Hellwig, axboe, mpe, keith.busch, peterz, LKML,
linux-scsi, Sumit Saxena
On 11/01/2017 01:47 AM, Thomas Gleixner wrote:
> On Mon, 30 Oct 2017, Shivasharan Srikanteshwara wrote:
>
>> In managed-interrupts case, interrupts which were affine to the offlined
>> CPU is not getting migrated to another available CPU. But the
>> documentation at below link says that "all interrupts" are migrated to a
>> new CPU. So not all interrupts are getting migrated to a new CPU then.
>
> Correct.
>
>> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html#the-offlin
>> e-case
>> "- All interrupts targeted to this CPU are migrated to a new CPU"
>
> Well, documentation is not always up to date :)
>
>> Once the last CPU in the affinity mask is offlined and a particular IRQ
>> is shutdown, is there a way currently for the device driver to get
>> callback to complete all outstanding requests on that queue?
>
> No and I have no idea how the other drivers deal with that.
>
> The way you can do that is to have your own hotplug callback which is
> invoked when the cpu goes down, but way before the interrupt is shut down,
> which is one of the last steps. Ideally this would be a callback in the
> generic block code which then calls out to all instances like its done for
> the cpu dead state.
>
In principle, yes, that would be (and, in fact, might already) moved to
the block layer for blk-mq, as this has full control over the individual
queues and hence can ensure that the queues with dead/removed CPUs are
properly handled.
Here, OTOH, we are dealing with the legacy sq implementation (or, to be
precised, a blk-mq implementation utilizing only a single queue), so
that any of this handling need to be implemented in the driver.
So what would need to be done here is to implement a hotplug callback in
the driver, which would disable the CPU from the list/bitmap of valid
cpus. Then the driver could validate the CPU number with this bitmap
upon I/O submission (instead of just using raw_smp_cpu_number()), and
could set the queue ID to '0' if an invalid CPU was found.
With that the driver should be able to ensure that no new I/O will be
submitted which will hit the dead CPU, so with a bit of luck this might
already solve the problem.
Alternatively I could resurrect my patchset converting the driver to
blk-mq, which got vetoed the last time ...
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: system hung up when offlining CPUs
@ 2017-11-01 11:01 ` Hannes Reinecke
0 siblings, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2017-11-01 11:01 UTC (permalink / raw)
To: Thomas Gleixner, Shivasharan Srikanteshwara
Cc: YASUAKI ISHIMATSU, Kashyap Desai, Marc Zyngier,
Christoph Hellwig, axboe, mpe, keith.busch, peterz, LKML,
linux-scsi, Sumit Saxena
On 11/01/2017 01:47 AM, Thomas Gleixner wrote:
> On Mon, 30 Oct 2017, Shivasharan Srikanteshwara wrote:
>
>> In managed-interrupts case, interrupts which were affine to the offlined
>> CPU is not getting migrated to another available CPU. But the
>> documentation at below link says that "all interrupts" are migrated to a
>> new CPU. So not all interrupts are getting migrated to a new CPU then.
>
> Correct.
>
>> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html#the-offlin
>> e-case
>> "- All interrupts targeted to this CPU are migrated to a new CPU"
>
> Well, documentation is not always up to date :)
>
>> Once the last CPU in the affinity mask is offlined and a particular IRQ
>> is shutdown, is there a way currently for the device driver to get
>> callback to complete all outstanding requests on that queue?
>
> No and I have no idea how the other drivers deal with that.
>
> The way you can do that is to have your own hotplug callback which is
> invoked when the cpu goes down, but way before the interrupt is shut down,
> which is one of the last steps. Ideally this would be a callback in the
> generic block code which then calls out to all instances like its done for
> the cpu dead state.
>
In principle, yes, that would be (and, in fact, might already) moved to
the block layer for blk-mq, as this has full control over the individual
queues and hence can ensure that the queues with dead/removed CPUs are
properly handled.
Here, OTOH, we are dealing with the legacy sq implementation (or, to be
precised, a blk-mq implementation utilizing only a single queue), so
that any of this handling need to be implemented in the driver.
So what would need to be done here is to implement a hotplug callback in
the driver, which would disable the CPU from the list/bitmap of valid
cpus. Then the driver could validate the CPU number with this bitmap
upon I/O submission (instead of just using raw_smp_cpu_number()), and
could set the queue ID to '0' if an invalid CPU was found.
With that the driver should be able to ensure that no new I/O will be
submitted which will hit the dead CPU, so with a bit of luck this might
already solve the problem.
Alternatively I could resurrect my patchset converting the driver to
blk-mq, which got vetoed the last time ...
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 43+ messages in thread
* system hung up when offlining CPUs
@ 2017-08-08 19:24 YASUAKI ISHIMATSU
0 siblings, 0 replies; 43+ messages in thread
From: YASUAKI ISHIMATSU @ 2017-08-08 19:24 UTC (permalink / raw)
To: tglx; +Cc: axboe, marc.zyngier, mpe, keith.busch, peterz, yasu.isimatu, LKML
Hi Thomas,
When offlining all CPUs except cpu0, system hung up with the following message.
[...] INFO: task kworker/u384:1:1234 blocked for more than 120 seconds.
[...] Not tainted 4.12.0-rc6+ #19
[...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[...] kworker/u384:1 D 0 1234 2 0x00000000
[...] Workqueue: writeback wb_workfn (flush-253:0)
[...] Call Trace:
[...] __schedule+0x28a/0x880
[...] schedule+0x36/0x80
[...] schedule_timeout+0x249/0x300
[...] ? __schedule+0x292/0x880
[...] __down_common+0xfc/0x132
[...] ? _xfs_buf_find+0x2bb/0x510 [xfs]
[...] __down+0x1d/0x1f
[...] down+0x41/0x50
[...] xfs_buf_lock+0x3c/0xf0 [xfs]
[...] _xfs_buf_find+0x2bb/0x510 [xfs]
[...] xfs_buf_get_map+0x2a/0x280 [xfs]
[...] xfs_buf_read_map+0x2d/0x180 [xfs]
[...] xfs_trans_read_buf_map+0xf5/0x310 [xfs]
[...] xfs_btree_read_buf_block.constprop.35+0x78/0xc0 [xfs]
[...] xfs_btree_lookup_get_block+0x88/0x160 [xfs]
[...] xfs_btree_lookup+0xd0/0x3b0 [xfs]
[...] ? xfs_allocbt_init_cursor+0x41/0xe0 [xfs]
[...] xfs_alloc_ag_vextent_near+0xaf/0xaa0 [xfs]
[...] xfs_alloc_ag_vextent+0x13c/0x150 [xfs]
[...] xfs_alloc_vextent+0x425/0x590 [xfs]
[...] xfs_bmap_btalloc+0x448/0x770 [xfs]
[...] xfs_bmap_alloc+0xe/0x10 [xfs]
[...] xfs_bmapi_write+0x61d/0xc10 [xfs]
[...] ? kmem_zone_alloc+0x96/0x100 [xfs]
[...] xfs_iomap_write_allocate+0x199/0x3a0 [xfs]
[...] xfs_map_blocks+0x1e8/0x260 [xfs]
[...] xfs_do_writepage+0x1ca/0x680 [xfs]
[...] write_cache_pages+0x26f/0x510
[...] ? xfs_vm_set_page_dirty+0x1d0/0x1d0 [xfs]
[...] ? blk_mq_dispatch_rq_list+0x305/0x410
[...] ? deadline_remove_request+0x7d/0xc0
[...] xfs_vm_writepages+0xb6/0xd0 [xfs]
[...] do_writepages+0x1c/0x70
[...] __writeback_single_inode+0x45/0x320
[...] writeback_sb_inodes+0x280/0x570
[...] __writeback_inodes_wb+0x8c/0xc0
[...] wb_writeback+0x276/0x310
[...] ? get_nr_dirty_inodes+0x4d/0x80
[...] wb_workfn+0x2d4/0x3b0
[...] process_one_work+0x149/0x360
[...] worker_thread+0x4d/0x3c0
[...] kthread+0x109/0x140
[...] ? rescuer_thread+0x380/0x380
[...] ? kthread_park+0x60/0x60
[...] ret_from_fork+0x25/0x30
I bisected upstream kernel. And I found that the following commit lead
the issue.
commit c5cb83bb337c25caae995d992d1cdf9b317f83de
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Tue Jun 20 01:37:51 2017 +0200
genirq/cpuhotplug: Handle managed IRQs on CPU hotplug
Thanks,
Yasuaki Ishimatsu
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2017-11-01 11:01 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-08 19:25 system hung up when offlining CPUs YASUAKI ISHIMATSU
2017-08-09 11:42 ` Marc Zyngier
2017-08-09 19:09 ` YASUAKI ISHIMATSU
2017-08-10 11:54 ` Marc Zyngier
2017-08-21 12:07 ` Christoph Hellwig
2017-08-21 13:18 ` Christoph Hellwig
2017-08-21 13:37 ` Marc Zyngier
2017-09-07 20:23 ` YASUAKI ISHIMATSU
2017-09-12 18:15 ` YASUAKI ISHIMATSU
2017-09-13 11:13 ` Hannes Reinecke
2017-09-13 11:35 ` Kashyap Desai
2017-09-13 11:35 ` Kashyap Desai
2017-09-13 13:33 ` Thomas Gleixner
2017-09-13 13:33 ` Thomas Gleixner
2017-09-14 16:28 ` YASUAKI ISHIMATSU
2017-09-14 16:28 ` YASUAKI ISHIMATSU
2017-09-16 10:15 ` Thomas Gleixner
2017-09-16 10:15 ` Thomas Gleixner
2017-09-16 15:02 ` Thomas Gleixner
2017-09-16 15:02 ` Thomas Gleixner
2017-10-02 16:36 ` YASUAKI ISHIMATSU
2017-10-02 16:36 ` YASUAKI ISHIMATSU
2017-10-03 21:44 ` Thomas Gleixner
2017-10-03 21:44 ` Thomas Gleixner
2017-10-04 21:04 ` Thomas Gleixner
2017-10-04 21:04 ` Thomas Gleixner
2017-10-09 11:35 ` [tip:irq/urgent] genirq/cpuhotplug: Add sanity check for effective affinity mask tip-bot for Thomas Gleixner
2017-10-09 11:35 ` [tip:irq/urgent] genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs tip-bot for Thomas Gleixner
2017-10-10 16:30 ` system hung up when offlining CPUs YASUAKI ISHIMATSU
2017-10-10 16:30 ` YASUAKI ISHIMATSU
2017-10-16 18:59 ` YASUAKI ISHIMATSU
2017-10-16 18:59 ` YASUAKI ISHIMATSU
2017-10-16 20:27 ` Thomas Gleixner
2017-10-16 20:27 ` Thomas Gleixner
2017-10-30 9:08 ` Shivasharan Srikanteshwara
2017-10-30 9:08 ` Shivasharan Srikanteshwara
2017-11-01 0:47 ` Thomas Gleixner
2017-11-01 0:47 ` Thomas Gleixner
2017-11-01 11:01 ` Hannes Reinecke
2017-11-01 11:01 ` Hannes Reinecke
2017-10-04 21:10 ` Thomas Gleixner
2017-10-04 21:10 ` Thomas Gleixner
-- strict thread matches above, loose matches on Subject: below --
2017-08-08 19:24 YASUAKI ISHIMATSU
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.