All of lore.kernel.org
 help / color / mirror / Atom feed
From: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
To: Marc Zyngier <marc.zyngier@arm.com>, Christoph Hellwig <hch@lst.de>
Cc: tglx@linutronix.de, axboe@kernel.dk, mpe@ellerman.id.au,
	keith.busch@intel.com, peterz@infradead.org,
	LKML <linux-kernel@vger.kernel.org>,
	linux-scsi@vger.kernel.org, kashyap.desai@broadcom.com,
	sumit.saxena@broadcom.com,
	shivasharan.srikanteshwara@broadcom.com
Subject: Re: system hung up when offlining CPUs
Date: Tue, 12 Sep 2017 14:15:53 -0400	[thread overview]
Message-ID: <2f2ae1bc-4093-d083-6a18-96b9aaa090c9@gmail.com> (raw)
In-Reply-To: <8e0d76cd-7cd4-3a98-12ba-815f00d4d772@gmail.com>

+ linux-scsi and maintainers of megasas

When offlining CPU, I/O stops. Do you have any ideas?

On 09/07/2017 04:23 PM, YASUAKI ISHIMATSU wrote:
> Hi Mark and Christoph,
> 
> Sorry for the late reply. I appreciated that you fixed the issue on kvm environment.
> But the issue still occurs on physical server.
> 
> Here ares irq information that I summarized megasas irqs from /proc/interrupts
> and /proc/irq/*/smp_affinity_list on my server:
> 
> ---
> IRQ affinity_list IRQ_TYPE
>  42        0-5    IR-PCI-MSI 1048576-edge megasas
>  43        0-5    IR-PCI-MSI 1048577-edge megasas
>  44        0-5    IR-PCI-MSI 1048578-edge megasas
>  45        0-5    IR-PCI-MSI 1048579-edge megasas
>  46        0-5    IR-PCI-MSI 1048580-edge megasas
>  47        0-5    IR-PCI-MSI 1048581-edge megasas
>  48        0-5    IR-PCI-MSI 1048582-edge megasas
>  49        0-5    IR-PCI-MSI 1048583-edge megasas
>  50        0-5    IR-PCI-MSI 1048584-edge megasas
>  51        0-5    IR-PCI-MSI 1048585-edge megasas
>  52        0-5    IR-PCI-MSI 1048586-edge megasas
>  53        0-5    IR-PCI-MSI 1048587-edge megasas
>  54        0-5    IR-PCI-MSI 1048588-edge megasas
>  55        0-5    IR-PCI-MSI 1048589-edge megasas
>  56        0-5    IR-PCI-MSI 1048590-edge megasas
>  57        0-5    IR-PCI-MSI 1048591-edge megasas
>  58        0-5    IR-PCI-MSI 1048592-edge megasas
>  59        0-5    IR-PCI-MSI 1048593-edge megasas
>  60        0-5    IR-PCI-MSI 1048594-edge megasas
>  61        0-5    IR-PCI-MSI 1048595-edge megasas
>  62        0-5    IR-PCI-MSI 1048596-edge megasas
>  63        0-5    IR-PCI-MSI 1048597-edge megasas
>  64        0-5    IR-PCI-MSI 1048598-edge megasas
>  65        0-5    IR-PCI-MSI 1048599-edge megasas
>  66      24-29    IR-PCI-MSI 1048600-edge megasas
>  67      24-29    IR-PCI-MSI 1048601-edge megasas
>  68      24-29    IR-PCI-MSI 1048602-edge megasas
>  69      24-29    IR-PCI-MSI 1048603-edge megasas
>  70      24-29    IR-PCI-MSI 1048604-edge megasas
>  71      24-29    IR-PCI-MSI 1048605-edge megasas
>  72      24-29    IR-PCI-MSI 1048606-edge megasas
>  73      24-29    IR-PCI-MSI 1048607-edge megasas
>  74      24-29    IR-PCI-MSI 1048608-edge megasas
>  75      24-29    IR-PCI-MSI 1048609-edge megasas
>  76      24-29    IR-PCI-MSI 1048610-edge megasas
>  77      24-29    IR-PCI-MSI 1048611-edge megasas
>  78      24-29    IR-PCI-MSI 1048612-edge megasas
>  79      24-29    IR-PCI-MSI 1048613-edge megasas
>  80      24-29    IR-PCI-MSI 1048614-edge megasas
>  81      24-29    IR-PCI-MSI 1048615-edge megasas
>  82      24-29    IR-PCI-MSI 1048616-edge megasas
>  83      24-29    IR-PCI-MSI 1048617-edge megasas
>  84      24-29    IR-PCI-MSI 1048618-edge megasas
>  85      24-29    IR-PCI-MSI 1048619-edge megasas
>  86      24-29    IR-PCI-MSI 1048620-edge megasas
>  87      24-29    IR-PCI-MSI 1048621-edge megasas
>  88      24-29    IR-PCI-MSI 1048622-edge megasas
>  89      24-29    IR-PCI-MSI 1048623-edge megasas
> ---
> 
> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline CPU#24-29,
> I/O does not work, showing the following messages.
> 
> ---
> [...] sd 0:2:0:0: [sda] tag#1 task abort called for scmd(ffff8820574d7560)
> [...] sd 0:2:0:0: [sda] tag#1 CDB: Read(10) 28 00 0d e8 cf 78 00 00 08 00
> [...] sd 0:2:0:0: task abort: FAILED scmd(ffff8820574d7560)
> [...] sd 0:2:0:0: [sda] tag#0 task abort called for scmd(ffff882057426560)
> [...] sd 0:2:0:0: [sda] tag#0 CDB: Write(10) 2a 00 0d 58 37 00 00 00 08 00
> [...] sd 0:2:0:0: task abort: FAILED scmd(ffff882057426560)
> [...] sd 0:2:0:0: target reset called for scmd(ffff8820574d7560)
> [...] sd 0:2:0:0: [sda] tag#1 megasas: target reset FAILED!!
> [...] sd 0:2:0:0: [sda] tag#0 Controller reset is requested due to IO timeout
> [...] SCSI command pointer: (ffff882057426560)   SCSI host state: 5      SCSI
> [...] IO request frame:
> [...]
> <snip>
> [...]
> [...] megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0
> [...] INFO: task auditd:1200 blocked for more than 120 seconds.
> [...]       Not tainted 4.13.0+ #15
> [...] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [...] auditd          D    0  1200      1 0x00000000
> [...] Call Trace:
> [...]  __schedule+0x28d/0x890
> [...]  schedule+0x36/0x80
> [...]  io_schedule+0x16/0x40
> [...]  wait_on_page_bit_common+0x109/0x1c0
> [...]  ? page_cache_tree_insert+0xf0/0xf0
> [...]  __filemap_fdatawait_range+0x127/0x190
> [...]  ? __filemap_fdatawrite_range+0xd1/0x100
> [...]  file_write_and_wait_range+0x60/0xb0
> [...]  xfs_file_fsync+0x67/0x1d0 [xfs]
> [...]  vfs_fsync_range+0x3d/0xb0
> [...]  do_fsync+0x3d/0x70
> [...]  SyS_fsync+0x10/0x20
> [...]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [...] RIP: 0033:0x7f0bd9633d2d
> [...] RSP: 002b:00007f0bd751ed30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
> [...] RAX: ffffffffffffffda RBX: 00005590566d0080 RCX: 00007f0bd9633d2d
> [...] RDX: 00005590566d1260 RSI: 0000000000000000 RDI: 0000000000000005
> [...] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000017
> [...] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
> [...] R13: 00007f0bd751f9c0 R14: 00007f0bd751f700 R15: 0000000000000000
> ---
> 
> Thanks,
> Yasuaki Ishimatsu
> 
> On 08/21/2017 09:37 AM, Marc Zyngier wrote:
>> On 21/08/17 14:18, Christoph Hellwig wrote:
>>> Can you try the patch below please?
>>>
>>> ---
>>> From d5f59cb7a629de8439b318e1384660e6b56e7dd8 Mon Sep 17 00:00:00 2001
>>> From: Christoph Hellwig <hch@lst.de>
>>> Date: Mon, 21 Aug 2017 14:24:11 +0200
>>> Subject: virtio_pci: fix cpu affinity support
>>>
>>> Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
>>> virtqueues"") removed the adjustment of the pre_vectors for the virtio
>>> MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
>>> allow drivers to request IRQ affinity when creating VQs"). This will
>>> lead to an incorrect assignment of MSI-X vectors, and potential
>>> deadlocks when offlining cpus.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
>>> Reported-by: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
>>
>> Just gave it a go on an arm64 VM, and the behaviour seems much saner
>> (the virtio queue affinity now spans the whole system).
>>
>> Tested-by: Marc Zyngier <marc.zyngier@arm.com>
>>
>> Thanks,
>>
>> 	M.
>>

  reply	other threads:[~2017-09-12 18:15 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-08 19:25 system hung up when offlining CPUs YASUAKI ISHIMATSU
2017-08-09 11:42 ` Marc Zyngier
2017-08-09 19:09   ` YASUAKI ISHIMATSU
2017-08-10 11:54     ` Marc Zyngier
2017-08-21 12:07       ` Christoph Hellwig
2017-08-21 13:18       ` Christoph Hellwig
2017-08-21 13:37         ` Marc Zyngier
2017-09-07 20:23           ` YASUAKI ISHIMATSU
2017-09-12 18:15             ` YASUAKI ISHIMATSU [this message]
2017-09-13 11:13               ` Hannes Reinecke
2017-09-13 11:35                 ` Kashyap Desai
2017-09-13 11:35                   ` Kashyap Desai
2017-09-13 13:33                   ` Thomas Gleixner
2017-09-13 13:33                     ` Thomas Gleixner
2017-09-14 16:28                     ` YASUAKI ISHIMATSU
2017-09-14 16:28                       ` YASUAKI ISHIMATSU
2017-09-16 10:15                       ` Thomas Gleixner
2017-09-16 10:15                         ` Thomas Gleixner
2017-09-16 15:02                         ` Thomas Gleixner
2017-09-16 15:02                           ` Thomas Gleixner
2017-10-02 16:36                           ` YASUAKI ISHIMATSU
2017-10-02 16:36                             ` YASUAKI ISHIMATSU
2017-10-03 21:44                             ` Thomas Gleixner
2017-10-03 21:44                               ` Thomas Gleixner
2017-10-04 21:04                               ` Thomas Gleixner
2017-10-04 21:04                                 ` Thomas Gleixner
2017-10-09 11:35                                 ` [tip:irq/urgent] genirq/cpuhotplug: Add sanity check for effective affinity mask tip-bot for Thomas Gleixner
2017-10-09 11:35                                 ` [tip:irq/urgent] genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs tip-bot for Thomas Gleixner
2017-10-10 16:30                                 ` system hung up when offlining CPUs YASUAKI ISHIMATSU
2017-10-10 16:30                                   ` YASUAKI ISHIMATSU
2017-10-16 18:59                                   ` YASUAKI ISHIMATSU
2017-10-16 18:59                                     ` YASUAKI ISHIMATSU
2017-10-16 20:27                                     ` Thomas Gleixner
2017-10-16 20:27                                       ` Thomas Gleixner
2017-10-30  9:08                                       ` Shivasharan Srikanteshwara
2017-10-30  9:08                                         ` Shivasharan Srikanteshwara
2017-11-01  0:47                                         ` Thomas Gleixner
2017-11-01  0:47                                           ` Thomas Gleixner
2017-11-01 11:01                                           ` Hannes Reinecke
2017-11-01 11:01                                             ` Hannes Reinecke
2017-10-04 21:10                             ` Thomas Gleixner
2017-10-04 21:10                               ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2017-08-08 19:24 YASUAKI ISHIMATSU

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f2ae1bc-4093-d083-6a18-96b9aaa090c9@gmail.com \
    --to=yasu.isimatu@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kashyap.desai@broadcom.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infradead.org \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=sumit.saxena@broadcom.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.