linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Balsundar.P@microchip.com, martin.petersen@oracle.com
Cc: hch@lst.de, james.bottomley@hansenpartnership.com,
	linux-scsi@vger.kernel.org, balsundar.p@microsemi.com,
	aacraid@microsemi.com
Subject: Re: [PATCH 08/11] aacraid: use scsi_host_quiesce() to wait for I/O to complete
Date: Wed, 4 Dec 2019 16:02:06 +0100	[thread overview]
Message-ID: <6696e47b-9f72-4034-a724-16e39e531409@suse.de> (raw)
In-Reply-To: <4ec87e61-2e2f-a3b5-00f6-1e6abf9cb261@suse.de>

On 11/28/19 1:09 PM, Hannes Reinecke wrote:
> On 11/28/19 12:45 PM, Balsundar.P@microchip.com wrote:
>> NAK
>>
>> After applying this patch, while IOs were running on physical drive, 
>> issued controller reset from management utility.
>> Observed below call trace. It is from scsi_device_quiesce().
>>
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.799311] INFO: task arcconf:2386 blocked for more than 120 seconds.
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.799841]       Not tainted 5.4.0-rc1+ #2
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800235] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800678] arcconf         D    0  2386   2173 0x00004000
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800682] Call Trace:
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800699]  __schedule+0x291/0x6f0
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800703]  schedule+0x33/0xa0
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800710]  blk_mq_freeze_queue_wait+0x4b/0xb0
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800717]  ? wait_woken+0x80/0x80
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800721]  blk_mq_freeze_queue+0x1a/0x20
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800727]  scsi_device_quiesce+0x5d/0xb0
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800730]  scsi_host_quiesce+0x41/0x60
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800742]  aac_send_shutdown+0x7c/0x180 [aacraid]
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800749]  aac_reset_adapter+0x29f/0x760 [aacraid]
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800757]  ? security_capable+0x3f/0x60
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800762]  aac_store_reset_adapter+0x41/0x60 [aacraid]
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800770]  dev_attr_store+0x17/0x30
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800777]  sysfs_kf_write+0x3c/0x50
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800779]  kernfs_fop_write+0x125/0x1a0
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800785]  __vfs_write+0x1b/0x40
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800789]  vfs_write+0xb1/0x1a0
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800792]  ksys_write+0xa7/0xe0
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800795]  __x64_sys_write+0x1a/0x20
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800802]  do_syscall_64+0x57/0x190
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800806]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800810] RIP: 0033:0x7f67cfb7c2b7
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800819] Code: Bad RIP value.
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800821] RSP: 002b:00007ffeb23ca8c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800823] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f67cfb7c2b7
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800825] RDX: 0000000000000002 RSI: 00007ffeb23ca8f0 RDI: 0000000000000006
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800826] RBP: 00007ffeb23ca8f0 R08: 0000000000000000 R09: 0000000000000000
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800828] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002
>> Nov 27 19:24:21 ubuntu kernel: [ 1330.800829] R13: 00007ffeb23cad20 R14: 00007ffeb23cadf0 R15: 00007ffeb23cac60
>>
> Thanks for testing.
> I'll have a look at the call trace and will come back to you with an
> updated version.
> 
After testing I've discovered that we can't use freeze_queue here.
Point is, when resetting the HBA there will be commands outstanding
(which will keep the q_usage_counter to non-zero), but we should _not_
terminate those commands as I/O processing will be resumed after reset.
Hence the blk_mq_freeze_queue_wait() will never complete.

So for the next iteration I've reverted back to use a busy iterator, as
we just need to complete the commands currently held by the firmware;
all other commands can (and should) be left alone.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

  reply	other threads:[~2019-12-04 15:02 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20 10:31 [PATCHv5 00/11] scsi: remove legacy cmd_list implementation Hannes Reinecke
2019-11-20 10:31 ` [PATCH 01/11] dpt_i2o: rename adpt_i2o_to_scsi() to adpt_i2o_scsi_complete() Hannes Reinecke
2019-11-26 16:54   ` Christoph Hellwig
2019-11-20 10:31 ` [PATCH 02/11] scsi: add scsi_host_flush_commands() helper Hannes Reinecke
2019-11-20 16:19   ` Bart Van Assche
2019-11-21 15:46     ` Hannes Reinecke
2019-11-26 16:55   ` Christoph Hellwig
2019-11-26 18:25     ` Hannes Reinecke
2019-11-20 10:31 ` [PATCH 03/11] dpt_i2o: use scsi_host_flush_commands() to abort outstanding commands Hannes Reinecke
2019-11-26 16:55   ` Christoph Hellwig
2019-11-20 10:31 ` [PATCH 04/11] aacraid: Do not wait for outstanding write commands on synchronize_cache Hannes Reinecke
2019-11-26 17:02   ` Christoph Hellwig
2019-11-28 11:40   ` Balsundar.P
2019-11-20 10:31 ` [PATCH 05/11] aacraid: use midlayer helper to terminate outstanding commands Hannes Reinecke
2019-11-26 17:02   ` Christoph Hellwig
2019-11-28 11:41   ` Balsundar.P
2019-11-20 10:31 ` [PATCH 06/11] aacraid: replace aac_flush_ios() with midlayer helper Hannes Reinecke
2019-11-20 16:14   ` Bart Van Assche
2019-11-28 11:41   ` Balsundar.P
2019-11-20 10:31 ` [PATCH 07/11] scsi: add scsi_host_quiesce()/scsi_host_resume() helper Hannes Reinecke
2019-11-20 10:31 ` [PATCH 08/11] aacraid: use scsi_host_quiesce() to wait for I/O to complete Hannes Reinecke
2019-11-20 16:23   ` Bart Van Assche
2019-11-26  8:29   ` Balsundar.P
2019-11-28 11:45   ` Balsundar.P
2019-11-28 12:09     ` Hannes Reinecke
2019-12-04 15:02       ` Hannes Reinecke [this message]
2019-11-20 10:31 ` [PATCH 09/11] scsi: add scsi_host_busy_iter() Hannes Reinecke
2019-11-20 10:31 ` [PATCH 10/11] aacraid: use scsi_host_busy_iter() in get_num_of_incomplete_fibs() Hannes Reinecke
2019-11-28 11:42   ` Balsundar.P
2019-11-20 10:31 ` [PATCH 11/11] scsi: Remove cmd_list functionality Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6696e47b-9f72-4034-a724-16e39e531409@suse.de \
    --to=hare@suse.de \
    --cc=Balsundar.P@microchip.com \
    --cc=aacraid@microsemi.com \
    --cc=balsundar.p@microsemi.com \
    --cc=hch@lst.de \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).