Re: system hung up when offlining CPUs

From: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: tglx@linutronix.de, axboe@kernel.dk, mpe@ellerman.id.au,
	keith.busch@intel.com, peterz@infradead.org,
	LKML <linux-kernel@vger.kernel.org>,
	yasu.isimatu@gmail.com
Subject: Re: system hung up when offlining CPUs
Date: Wed, 9 Aug 2017 15:09:33 -0400	[thread overview]
Message-ID: <cd524af7-1f20-1956-1e44-92a451053387@gmail.com> (raw)
In-Reply-To: <20170809124213.0d9518bb@why.wild-wind.fr.eu.org>

Hi Marc,

On 08/09/2017 07:42 AM, Marc Zyngier wrote:
> On Tue, 8 Aug 2017 15:25:35 -0400
> YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> wrote:
> 
>> Hi Thomas,
>>
>> When offlining all CPUs except cpu0, system hung up with the following message.
>>
>> [...] INFO: task kworker/u384:1:1234 blocked for more than 120 seconds.
>> [...]       Not tainted 4.12.0-rc6+ #19
>> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [...] kworker/u384:1  D    0  1234      2 0x00000000
>> [...] Workqueue: writeback wb_workfn (flush-253:0)
>> [...] Call Trace:
>> [...]  __schedule+0x28a/0x880
>> [...]  schedule+0x36/0x80
>> [...]  schedule_timeout+0x249/0x300
>> [...]  ? __schedule+0x292/0x880
>> [...]  __down_common+0xfc/0x132
>> [...]  ? _xfs_buf_find+0x2bb/0x510 [xfs]
>> [...]  __down+0x1d/0x1f
>> [...]  down+0x41/0x50
>> [...]  xfs_buf_lock+0x3c/0xf0 [xfs]
>> [...]  _xfs_buf_find+0x2bb/0x510 [xfs]
>> [...]  xfs_buf_get_map+0x2a/0x280 [xfs]
>> [...]  xfs_buf_read_map+0x2d/0x180 [xfs]
>> [...]  xfs_trans_read_buf_map+0xf5/0x310 [xfs]
>> [...]  xfs_btree_read_buf_block.constprop.35+0x78/0xc0 [xfs]
>> [...]  xfs_btree_lookup_get_block+0x88/0x160 [xfs]
>> [...]  xfs_btree_lookup+0xd0/0x3b0 [xfs]
>> [...]  ? xfs_allocbt_init_cursor+0x41/0xe0 [xfs]
>> [...]  xfs_alloc_ag_vextent_near+0xaf/0xaa0 [xfs]
>> [...]  xfs_alloc_ag_vextent+0x13c/0x150 [xfs]
>> [...]  xfs_alloc_vextent+0x425/0x590 [xfs]
>> [...]  xfs_bmap_btalloc+0x448/0x770 [xfs]
>> [...]  xfs_bmap_alloc+0xe/0x10 [xfs]
>> [...]  xfs_bmapi_write+0x61d/0xc10 [xfs]
>> [...]  ? kmem_zone_alloc+0x96/0x100 [xfs]
>> [...]  xfs_iomap_write_allocate+0x199/0x3a0 [xfs]
>> [...]  xfs_map_blocks+0x1e8/0x260 [xfs]
>> [...]  xfs_do_writepage+0x1ca/0x680 [xfs]
>> [...]  write_cache_pages+0x26f/0x510
>> [...]  ? xfs_vm_set_page_dirty+0x1d0/0x1d0 [xfs]
>> [...]  ? blk_mq_dispatch_rq_list+0x305/0x410
>> [...]  ? deadline_remove_request+0x7d/0xc0
>> [...]  xfs_vm_writepages+0xb6/0xd0 [xfs]
>> [...]  do_writepages+0x1c/0x70
>> [...]  __writeback_single_inode+0x45/0x320
>> [...]  writeback_sb_inodes+0x280/0x570
>> [...]  __writeback_inodes_wb+0x8c/0xc0
>> [...]  wb_writeback+0x276/0x310
>> [...]  ? get_nr_dirty_inodes+0x4d/0x80
>> [...]  wb_workfn+0x2d4/0x3b0
>> [...]  process_one_work+0x149/0x360
>> [...]  worker_thread+0x4d/0x3c0
>> [...]  kthread+0x109/0x140
>> [...]  ? rescuer_thread+0x380/0x380
>> [...]  ? kthread_park+0x60/0x60
>> [...]  ret_from_fork+0x25/0x30
>>
>>
>> I bisected upstream kernel. And I found that the following commit lead
>> the issue.
>>
>> commit c5cb83bb337c25caae995d992d1cdf9b317f83de
>> Author: Thomas Gleixner <tglx@linutronix.de>
>> Date:   Tue Jun 20 01:37:51 2017 +0200
>>
>>     genirq/cpuhotplug: Handle managed IRQs on CPU hotplug
> 
> Can you please post your /proc/interrupts and details of which
> interrupt you think goes wrong? This backtrace is not telling us much
> in terms of where to start looking...

Thank you for giving advise.

The issue is easily reproduced on physical/virtual machine by offling CPUs except cpu0.
Here are my /proc/interrupts on kvm guest before reproducing the issue. And when offlining
cpu1, the issue occurred. But when offling cpu0, the issue didn't occur.

           CPU0       CPU1
  0:        127          0   IO-APIC   2-edge      timer
  1:         10          0   IO-APIC   1-edge      i8042
  4:        227          0   IO-APIC   4-edge      ttyS0
  6:          3          0   IO-APIC   6-edge      floppy
  8:          0          0   IO-APIC   8-edge      rtc0
  9:          0          0   IO-APIC   9-fasteoi   acpi
 10:      10822          0   IO-APIC  10-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2, virtio3
 11:         23          0   IO-APIC  11-fasteoi   uhci_hcd:usb3, uhci_hcd:usb4, qxl
 12:         15          0   IO-APIC  12-edge      i8042
 14:        218          0   IO-APIC  14-edge      ata_piix
 15:          0          0   IO-APIC  15-edge      ata_piix
 24:          0          0   PCI-MSI 49152-edge      virtio0-config
 25:        359          0   PCI-MSI 49153-edge      virtio0-input.0
 26:          1          0   PCI-MSI 49154-edge      virtio0-output.0
 27:          0          0   PCI-MSI 114688-edge      virtio2-config
 28:          1       3639   PCI-MSI 114689-edge      virtio2-req.0
 29:          0          0   PCI-MSI 98304-edge      virtio1-config
 30:          4          0   PCI-MSI 98305-edge      virtio1-virtqueues
 31:        189          0   PCI-MSI 65536-edge      snd_hda_intel:card0
NMI:          0          0   Non-maskable interrupts
LOC:      16115      12845   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:       3016       2135   Rescheduling interrupts
CAL:       3666        557   Function call interrupts
TLB:         65         12   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
DFR:          0          0   Deferred Error APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:          1          1   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0   Posted-interrupt notification event
NPI:          0          0   Nested posted-interrupt event
PIW:          0          0   Posted-interrupt wakeup event

Thanks,
Yasuaki Ishimatsu

> 
> Thanks,
> 
> 	M.
>