From: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: tglx@linutronix.de, axboe@kernel.dk, mpe@ellerman.id.au,
keith.busch@intel.com, peterz@infradead.org,
LKML <linux-kernel@vger.kernel.org>,
yasu.isimatu@gmail.com
Subject: Re: system hung up when offlining CPUs
Date: Wed, 9 Aug 2017 15:09:33 -0400 [thread overview]
Message-ID: <cd524af7-1f20-1956-1e44-92a451053387@gmail.com> (raw)
In-Reply-To: <20170809124213.0d9518bb@why.wild-wind.fr.eu.org>
Hi Marc,
On 08/09/2017 07:42 AM, Marc Zyngier wrote:
> On Tue, 8 Aug 2017 15:25:35 -0400
> YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> wrote:
>
>> Hi Thomas,
>>
>> When offlining all CPUs except cpu0, system hung up with the following message.
>>
>> [...] INFO: task kworker/u384:1:1234 blocked for more than 120 seconds.
>> [...] Not tainted 4.12.0-rc6+ #19
>> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [...] kworker/u384:1 D 0 1234 2 0x00000000
>> [...] Workqueue: writeback wb_workfn (flush-253:0)
>> [...] Call Trace:
>> [...] __schedule+0x28a/0x880
>> [...] schedule+0x36/0x80
>> [...] schedule_timeout+0x249/0x300
>> [...] ? __schedule+0x292/0x880
>> [...] __down_common+0xfc/0x132
>> [...] ? _xfs_buf_find+0x2bb/0x510 [xfs]
>> [...] __down+0x1d/0x1f
>> [...] down+0x41/0x50
>> [...] xfs_buf_lock+0x3c/0xf0 [xfs]
>> [...] _xfs_buf_find+0x2bb/0x510 [xfs]
>> [...] xfs_buf_get_map+0x2a/0x280 [xfs]
>> [...] xfs_buf_read_map+0x2d/0x180 [xfs]
>> [...] xfs_trans_read_buf_map+0xf5/0x310 [xfs]
>> [...] xfs_btree_read_buf_block.constprop.35+0x78/0xc0 [xfs]
>> [...] xfs_btree_lookup_get_block+0x88/0x160 [xfs]
>> [...] xfs_btree_lookup+0xd0/0x3b0 [xfs]
>> [...] ? xfs_allocbt_init_cursor+0x41/0xe0 [xfs]
>> [...] xfs_alloc_ag_vextent_near+0xaf/0xaa0 [xfs]
>> [...] xfs_alloc_ag_vextent+0x13c/0x150 [xfs]
>> [...] xfs_alloc_vextent+0x425/0x590 [xfs]
>> [...] xfs_bmap_btalloc+0x448/0x770 [xfs]
>> [...] xfs_bmap_alloc+0xe/0x10 [xfs]
>> [...] xfs_bmapi_write+0x61d/0xc10 [xfs]
>> [...] ? kmem_zone_alloc+0x96/0x100 [xfs]
>> [...] xfs_iomap_write_allocate+0x199/0x3a0 [xfs]
>> [...] xfs_map_blocks+0x1e8/0x260 [xfs]
>> [...] xfs_do_writepage+0x1ca/0x680 [xfs]
>> [...] write_cache_pages+0x26f/0x510
>> [...] ? xfs_vm_set_page_dirty+0x1d0/0x1d0 [xfs]
>> [...] ? blk_mq_dispatch_rq_list+0x305/0x410
>> [...] ? deadline_remove_request+0x7d/0xc0
>> [...] xfs_vm_writepages+0xb6/0xd0 [xfs]
>> [...] do_writepages+0x1c/0x70
>> [...] __writeback_single_inode+0x45/0x320
>> [...] writeback_sb_inodes+0x280/0x570
>> [...] __writeback_inodes_wb+0x8c/0xc0
>> [...] wb_writeback+0x276/0x310
>> [...] ? get_nr_dirty_inodes+0x4d/0x80
>> [...] wb_workfn+0x2d4/0x3b0
>> [...] process_one_work+0x149/0x360
>> [...] worker_thread+0x4d/0x3c0
>> [...] kthread+0x109/0x140
>> [...] ? rescuer_thread+0x380/0x380
>> [...] ? kthread_park+0x60/0x60
>> [...] ret_from_fork+0x25/0x30
>>
>>
>> I bisected upstream kernel. And I found that the following commit lead
>> the issue.
>>
>> commit c5cb83bb337c25caae995d992d1cdf9b317f83de
>> Author: Thomas Gleixner <tglx@linutronix.de>
>> Date: Tue Jun 20 01:37:51 2017 +0200
>>
>> genirq/cpuhotplug: Handle managed IRQs on CPU hotplug
>
> Can you please post your /proc/interrupts and details of which
> interrupt you think goes wrong? This backtrace is not telling us much
> in terms of where to start looking...
Thank you for giving advise.
The issue is easily reproduced on physical/virtual machine by offling CPUs except cpu0.
Here are my /proc/interrupts on kvm guest before reproducing the issue. And when offlining
cpu1, the issue occurred. But when offling cpu0, the issue didn't occur.
CPU0 CPU1
0: 127 0 IO-APIC 2-edge timer
1: 10 0 IO-APIC 1-edge i8042
4: 227 0 IO-APIC 4-edge ttyS0
6: 3 0 IO-APIC 6-edge floppy
8: 0 0 IO-APIC 8-edge rtc0
9: 0 0 IO-APIC 9-fasteoi acpi
10: 10822 0 IO-APIC 10-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, virtio3
11: 23 0 IO-APIC 11-fasteoi uhci_hcd:usb3, uhci_hcd:usb4, qxl
12: 15 0 IO-APIC 12-edge i8042
14: 218 0 IO-APIC 14-edge ata_piix
15: 0 0 IO-APIC 15-edge ata_piix
24: 0 0 PCI-MSI 49152-edge virtio0-config
25: 359 0 PCI-MSI 49153-edge virtio0-input.0
26: 1 0 PCI-MSI 49154-edge virtio0-output.0
27: 0 0 PCI-MSI 114688-edge virtio2-config
28: 1 3639 PCI-MSI 114689-edge virtio2-req.0
29: 0 0 PCI-MSI 98304-edge virtio1-config
30: 4 0 PCI-MSI 98305-edge virtio1-virtqueues
31: 189 0 PCI-MSI 65536-edge snd_hda_intel:card0
NMI: 0 0 Non-maskable interrupts
LOC: 16115 12845 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 0 0 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 3016 2135 Rescheduling interrupts
CAL: 3666 557 Function call interrupts
TLB: 65 12 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
DFR: 0 0 Deferred Error APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1 1 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 Posted-interrupt notification event
NPI: 0 0 Nested posted-interrupt event
PIW: 0 0 Posted-interrupt wakeup event
Thanks,
Yasuaki Ishimatsu
>
> Thanks,
>
> M.
>
next prev parent reply other threads:[~2017-08-09 19:09 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-08 19:25 system hung up when offlining CPUs YASUAKI ISHIMATSU
2017-08-09 11:42 ` Marc Zyngier
2017-08-09 19:09 ` YASUAKI ISHIMATSU [this message]
2017-08-10 11:54 ` Marc Zyngier
2017-08-21 12:07 ` Christoph Hellwig
2017-08-21 13:18 ` Christoph Hellwig
2017-08-21 13:37 ` Marc Zyngier
2017-09-07 20:23 ` YASUAKI ISHIMATSU
2017-09-12 18:15 ` YASUAKI ISHIMATSU
2017-09-13 11:13 ` Hannes Reinecke
2017-09-13 11:35 ` Kashyap Desai
2017-09-13 13:33 ` Thomas Gleixner
2017-09-14 16:28 ` YASUAKI ISHIMATSU
2017-09-16 10:15 ` Thomas Gleixner
2017-09-16 15:02 ` Thomas Gleixner
2017-10-02 16:36 ` YASUAKI ISHIMATSU
2017-10-03 21:44 ` Thomas Gleixner
2017-10-04 21:04 ` Thomas Gleixner
2017-10-09 11:35 ` [tip:irq/urgent] genirq/cpuhotplug: Add sanity check for effective affinity mask tip-bot for Thomas Gleixner
2017-10-09 11:35 ` [tip:irq/urgent] genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs tip-bot for Thomas Gleixner
2017-10-10 16:30 ` system hung up when offlining CPUs YASUAKI ISHIMATSU
2017-10-16 18:59 ` YASUAKI ISHIMATSU
2017-10-16 20:27 ` Thomas Gleixner
2017-10-30 9:08 ` Shivasharan Srikanteshwara
2017-11-01 0:47 ` Thomas Gleixner
2017-11-01 11:01 ` Hannes Reinecke
2017-10-04 21:10 ` Thomas Gleixner
-- strict thread matches above, loose matches on Subject: below --
2017-08-08 19:24 YASUAKI ISHIMATSU
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cd524af7-1f20-1956-1e44-92a451053387@gmail.com \
--to=yasu.isimatu@gmail.com \
--cc=axboe@kernel.dk \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=mpe@ellerman.id.au \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).