linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
To: Marc Zyngier <marc.zyngier@arm.com>, Christoph Hellwig <hch@lst.de>
Cc: tglx@linutronix.de, axboe@kernel.dk, mpe@ellerman.id.au,
	keith.busch@intel.com, peterz@infradead.org,
	LKML <linux-kernel@vger.kernel.org>,
	yasu.isimatu@gmail.com
Subject: Re: system hung up when offlining CPUs
Date: Thu, 7 Sep 2017 16:23:41 -0400	[thread overview]
Message-ID: <8e0d76cd-7cd4-3a98-12ba-815f00d4d772@gmail.com> (raw)
In-Reply-To: <fce0ad52-8739-09c8-ec9d-a23eb92cec5a@arm.com>

Hi Mark and Christoph,

Sorry for the late reply. I appreciated that you fixed the issue on kvm environment.
But the issue still occurs on physical server.

Here ares irq information that I summarized megasas irqs from /proc/interrupts
and /proc/irq/*/smp_affinity_list on my server:

---
IRQ affinity_list IRQ_TYPE
 42        0-5    IR-PCI-MSI 1048576-edge megasas
 43        0-5    IR-PCI-MSI 1048577-edge megasas
 44        0-5    IR-PCI-MSI 1048578-edge megasas
 45        0-5    IR-PCI-MSI 1048579-edge megasas
 46        0-5    IR-PCI-MSI 1048580-edge megasas
 47        0-5    IR-PCI-MSI 1048581-edge megasas
 48        0-5    IR-PCI-MSI 1048582-edge megasas
 49        0-5    IR-PCI-MSI 1048583-edge megasas
 50        0-5    IR-PCI-MSI 1048584-edge megasas
 51        0-5    IR-PCI-MSI 1048585-edge megasas
 52        0-5    IR-PCI-MSI 1048586-edge megasas
 53        0-5    IR-PCI-MSI 1048587-edge megasas
 54        0-5    IR-PCI-MSI 1048588-edge megasas
 55        0-5    IR-PCI-MSI 1048589-edge megasas
 56        0-5    IR-PCI-MSI 1048590-edge megasas
 57        0-5    IR-PCI-MSI 1048591-edge megasas
 58        0-5    IR-PCI-MSI 1048592-edge megasas
 59        0-5    IR-PCI-MSI 1048593-edge megasas
 60        0-5    IR-PCI-MSI 1048594-edge megasas
 61        0-5    IR-PCI-MSI 1048595-edge megasas
 62        0-5    IR-PCI-MSI 1048596-edge megasas
 63        0-5    IR-PCI-MSI 1048597-edge megasas
 64        0-5    IR-PCI-MSI 1048598-edge megasas
 65        0-5    IR-PCI-MSI 1048599-edge megasas
 66      24-29    IR-PCI-MSI 1048600-edge megasas
 67      24-29    IR-PCI-MSI 1048601-edge megasas
 68      24-29    IR-PCI-MSI 1048602-edge megasas
 69      24-29    IR-PCI-MSI 1048603-edge megasas
 70      24-29    IR-PCI-MSI 1048604-edge megasas
 71      24-29    IR-PCI-MSI 1048605-edge megasas
 72      24-29    IR-PCI-MSI 1048606-edge megasas
 73      24-29    IR-PCI-MSI 1048607-edge megasas
 74      24-29    IR-PCI-MSI 1048608-edge megasas
 75      24-29    IR-PCI-MSI 1048609-edge megasas
 76      24-29    IR-PCI-MSI 1048610-edge megasas
 77      24-29    IR-PCI-MSI 1048611-edge megasas
 78      24-29    IR-PCI-MSI 1048612-edge megasas
 79      24-29    IR-PCI-MSI 1048613-edge megasas
 80      24-29    IR-PCI-MSI 1048614-edge megasas
 81      24-29    IR-PCI-MSI 1048615-edge megasas
 82      24-29    IR-PCI-MSI 1048616-edge megasas
 83      24-29    IR-PCI-MSI 1048617-edge megasas
 84      24-29    IR-PCI-MSI 1048618-edge megasas
 85      24-29    IR-PCI-MSI 1048619-edge megasas
 86      24-29    IR-PCI-MSI 1048620-edge megasas
 87      24-29    IR-PCI-MSI 1048621-edge megasas
 88      24-29    IR-PCI-MSI 1048622-edge megasas
 89      24-29    IR-PCI-MSI 1048623-edge megasas
---

In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline CPU#24-29,
I/O does not work, showing the following messages.

---
[...] sd 0:2:0:0: [sda] tag#1 task abort called for scmd(ffff8820574d7560)
[...] sd 0:2:0:0: [sda] tag#1 CDB: Read(10) 28 00 0d e8 cf 78 00 00 08 00
[...] sd 0:2:0:0: task abort: FAILED scmd(ffff8820574d7560)
[...] sd 0:2:0:0: [sda] tag#0 task abort called for scmd(ffff882057426560)
[...] sd 0:2:0:0: [sda] tag#0 CDB: Write(10) 2a 00 0d 58 37 00 00 00 08 00
[...] sd 0:2:0:0: task abort: FAILED scmd(ffff882057426560)
[...] sd 0:2:0:0: target reset called for scmd(ffff8820574d7560)
[...] sd 0:2:0:0: [sda] tag#1 megasas: target reset FAILED!!
[...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO timeout
[...] SCSI command pointer: (ffff882057426560)   SCSI host state: 5      SCSI
[...] IO request frame:
[...]
<snip>
[...]
[...] megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
[...] INFO: task auditd:1200 blocked for more than 120 seconds.
[...]       Not tainted 4.13.0+ #15
[...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[...] auditd          D    0  1200      1 0x00000000
[...] Call Trace:
[...]  __schedule+0x28d/0x890
[...]  schedule+0x36/0x80
[...]  io_schedule+0x16/0x40
[...]  wait_on_page_bit_common+0x109/0x1c0
[...]  ? page_cache_tree_insert+0xf0/0xf0
[...]  __filemap_fdatawait_range+0x127/0x190
[...]  ? __filemap_fdatawrite_range+0xd1/0x100
[...]  file_write_and_wait_range+0x60/0xb0
[...]  xfs_file_fsync+0x67/0x1d0 [xfs]
[...]  vfs_fsync_range+0x3d/0xb0
[...]  do_fsync+0x3d/0x70
[...]  SyS_fsync+0x10/0x20
[...]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[...] RIP: 0033:0x7f0bd9633d2d
[...] RSP: 002b:00007f0bd751ed30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[...] RAX: ffffffffffffffda RBX: 00005590566d0080 RCX: 00007f0bd9633d2d
[...] RDX: 00005590566d1260 RSI: 0000000000000000 RDI: 0000000000000005
[...] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000017
[...] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[...] R13: 00007f0bd751f9c0 R14: 00007f0bd751f700 R15: 0000000000000000
---

Thanks,
Yasuaki Ishimatsu

On 08/21/2017 09:37 AM, Marc Zyngier wrote:
> On 21/08/17 14:18, Christoph Hellwig wrote:
>> Can you try the patch below please?
>>
>> ---
>> From d5f59cb7a629de8439b318e1384660e6b56e7dd8 Mon Sep 17 00:00:00 2001
>> From: Christoph Hellwig <hch@lst.de>
>> Date: Mon, 21 Aug 2017 14:24:11 +0200
>> Subject: virtio_pci: fix cpu affinity support
>>
>> Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
>> virtqueues"") removed the adjustment of the pre_vectors for the virtio
>> MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
>> allow drivers to request IRQ affinity when creating VQs"). This will
>> lead to an incorrect assignment of MSI-X vectors, and potential
>> deadlocks when offlining cpus.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
>> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
> 
> Just gave it a go on an arm64 VM, and the behaviour seems much saner
> (the virtio queue affinity now spans the whole system).
> 
> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> Thanks,
> 
> 	M.
> 

  reply	other threads:[~2017-09-07 20:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-08 19:25 system hung up when offlining CPUs YASUAKI ISHIMATSU
2017-08-09 11:42 ` Marc Zyngier
2017-08-09 19:09   ` YASUAKI ISHIMATSU
2017-08-10 11:54     ` Marc Zyngier
2017-08-21 12:07       ` Christoph Hellwig
2017-08-21 13:18       ` Christoph Hellwig
2017-08-21 13:37         ` Marc Zyngier
2017-09-07 20:23           ` YASUAKI ISHIMATSU [this message]
2017-09-12 18:15             ` YASUAKI ISHIMATSU
2017-09-13 11:13               ` Hannes Reinecke
2017-09-13 11:35                 ` Kashyap Desai
2017-09-13 13:33                   ` Thomas Gleixner
2017-09-14 16:28                     ` YASUAKI ISHIMATSU
2017-09-16 10:15                       ` Thomas Gleixner
2017-09-16 15:02                         ` Thomas Gleixner
2017-10-02 16:36                           ` YASUAKI ISHIMATSU
2017-10-03 21:44                             ` Thomas Gleixner
2017-10-04 21:04                               ` Thomas Gleixner
2017-10-09 11:35                                 ` [tip:irq/urgent] genirq/cpuhotplug: Add sanity check for effective affinity mask tip-bot for Thomas Gleixner
2017-10-09 11:35                                 ` [tip:irq/urgent] genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs tip-bot for Thomas Gleixner
2017-10-10 16:30                                 ` system hung up when offlining CPUs YASUAKI ISHIMATSU
2017-10-16 18:59                                   ` YASUAKI ISHIMATSU
2017-10-16 20:27                                     ` Thomas Gleixner
2017-10-30  9:08                                       ` Shivasharan Srikanteshwara
2017-11-01  0:47                                         ` Thomas Gleixner
2017-11-01 11:01                                           ` Hannes Reinecke
2017-10-04 21:10                             ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2017-08-08 19:24 YASUAKI ISHIMATSU

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e0d76cd-7cd4-3a98-12ba-815f00d4d772@gmail.com \
    --to=yasu.isimatu@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).