All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/2] block: fix queue freeze and cleanup
@ 2017-11-23  4:47 Ming Lei
  2017-11-23  4:47 ` [PATCH V2 1/2] block: run queue before waiting for q_usage_counter becoming zero Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ming Lei @ 2017-11-23  4:47 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Wen Xiong,
	Mauricio Faria de Oliveira, Ming Lei

Hi Jens,

The 1st patch runs queue in blk_freeze_queue_start() for fixing one
regression by 055f6e18e08f("block: Make q_usage_counter also track legacy
requests").

The 2nd patch drians blkcg part of request_queue for both blk-mq and
legacy, which can be a fix on blk-mq's queue cleanup.

V2:
	- follow Bart's suggestion to use run queue instead of drain queue
	- drians blkcg part of request_queue for blk-mq

thanks,
Ming

Ming Lei (2):
  block: run queue before waiting for q_usage_counter becoming zero
  block: drain blkcg part of request_queue after queue is frozen

 block/blk-core.c | 3 +--
 block/blk-mq.c   | 2 ++
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V2 1/2] block: run queue before waiting for q_usage_counter becoming zero
  2017-11-23  4:47 [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
@ 2017-11-23  4:47 ` Ming Lei
  2017-11-27 12:15   ` Mauricio Faria de Oliveira
  2017-11-23  4:48 ` [PATCH V2 2/2] block: drain blkcg part of request_queue in blk_cleanup_queue() Ming Lei
  2017-11-27 12:41 ` [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
  2 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2017-11-23  4:47 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Wen Xiong,
	Mauricio Faria de Oliveira, Ming Lei

Now we track legacy requests with .q_usage_counter in commit 055f6e18e08f
("block: Make q_usage_counter also track legacy requests"), but that
commit never runs legacy queue before waiting for this counter becoming zero,
then IO hang is caused in the test of pulling disk during IO.

This patch fixes the issue by running queue in blk_freeze_queue_start() like
blk-mq before waiting for q_usage_counter becoming zero.

Fixes: 055f6e18e08f("block: Make q_usage_counter also track legacy requests")
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Suggested-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..e2b6a57b004d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -128,6 +128,8 @@ void blk_freeze_queue_start(struct request_queue *q)
 		percpu_ref_kill(&q->q_usage_counter);
 		if (q->mq_ops)
 			blk_mq_run_hw_queues(q, false);
+		else
+			blk_run_queue(q);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 2/2] block: drain blkcg part of request_queue in blk_cleanup_queue()
  2017-11-23  4:47 [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
  2017-11-23  4:47 ` [PATCH V2 1/2] block: run queue before waiting for q_usage_counter becoming zero Ming Lei
@ 2017-11-23  4:48 ` Ming Lei
  2017-11-27 12:15   ` Mauricio Faria de Oliveira
  2017-11-27 12:41 ` [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
  2 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2017-11-23  4:48 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Wen Xiong,
	Mauricio Faria de Oliveira, Ming Lei

Now once blk_freeze_queue() returns, all requests(in-queue and pending)
can be drained, but we still need to drain blkcg part of request_queue
for both blk-mq and legacy, so this patch calls blkcg_drain_queue()
explicitely in blk_cleanup_queue() to do that.

Then the __blk_drain_queue() in blk_cleanup_queue() can be covered by both
blk_freeze_queue() and blkcg_drain_queue(), and tasks blocked in get_request()
are waken up in blk_set_queue_dying() too, so remove it from blk_cleanup_queue().

Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 1038706edd87..f3f6f11a5b31 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -689,8 +689,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 	spin_lock_irq(lock);
-	if (!q->mq_ops)
-		__blk_drain_queue(q, true);
+	blkcg_drain_queue(q);
 	queue_flag_set(QUEUE_FLAG_DEAD, q);
 	spin_unlock_irq(lock);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 1/2] block: run queue before waiting for q_usage_counter becoming zero
  2017-11-23  4:47 ` [PATCH V2 1/2] block: run queue before waiting for q_usage_counter becoming zero Ming Lei
@ 2017-11-27 12:15   ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 21+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-11-27 12:15 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Wen Xiong

On 11/23/2017 02:47 AM, Ming Lei wrote:
> Now we track legacy requests with .q_usage_counter in commit 055f6e18e08f
> ("block: Make q_usage_counter also track legacy requests"), but that
> commit never runs legacy queue before waiting for this counter becoming zero,
> then IO hang is caused in the test of pulling disk during IO.
> 
> This patch fixes the issue by running queue in blk_freeze_queue_start() like
> blk-mq before waiting for q_usage_counter becoming zero.
> 
> Fixes: 055f6e18e08f("block: Make q_usage_counter also track legacy requests")
> Cc: Wen Xiong<wenxiong@us.ibm.com>
> Cc: Mauricio Faria de Oliveira<mauricfo@linux.vnet.ibm.com>
> Suggested-by: Bart Van Assche<bart.vanassche@wdc.com>
> Signed-off-by: Ming Lei<ming.lei@redhat.com>

Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>

All disk pull tests completed successfully without I/O hangs (24 disks).


-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 2/2] block: drain blkcg part of request_queue in blk_cleanup_queue()
  2017-11-23  4:48 ` [PATCH V2 2/2] block: drain blkcg part of request_queue in blk_cleanup_queue() Ming Lei
@ 2017-11-27 12:15   ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 21+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-11-27 12:15 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Wen Xiong

On 11/23/2017 02:48 AM, Ming Lei wrote:
> Now once blk_freeze_queue() returns, all requests(in-queue and pending)
> can be drained, but we still need to drain blkcg part of request_queue
> for both blk-mq and legacy, so this patch calls blkcg_drain_queue()
> explicitely in blk_cleanup_queue() to do that.
> 
> Then the __blk_drain_queue() in blk_cleanup_queue() can be covered by both
> blk_freeze_queue() and blkcg_drain_queue(), and tasks blocked in get_request()
> are waken up in blk_set_queue_dying() too, so remove it from blk_cleanup_queue().
> 
> Cc: Wen Xiong<wenxiong@us.ibm.com>
> Cc: Mauricio Faria de Oliveira<mauricfo@linux.vnet.ibm.com>
> Signed-off-by: Ming Lei<ming.lei@redhat.com>

Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>

All disk pull tests completed successfully without I/O hangs (24 disks).


-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-11-23  4:47 [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
  2017-11-23  4:47 ` [PATCH V2 1/2] block: run queue before waiting for q_usage_counter becoming zero Ming Lei
  2017-11-23  4:48 ` [PATCH V2 2/2] block: drain blkcg part of request_queue in blk_cleanup_queue() Ming Lei
@ 2017-11-27 12:41 ` Ming Lei
  2017-11-29  2:57   ` chenxiang (M)
  2 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2017-11-27 12:41 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Wen Xiong,
	Mauricio Faria de Oliveira

On Thu, Nov 23, 2017 at 12:47:58PM +0800, Ming Lei wrote:
> Hi Jens,
> 
> The 1st patch runs queue in blk_freeze_queue_start() for fixing one
> regression by 055f6e18e08f("block: Make q_usage_counter also track legacy
> requests").
> 
> The 2nd patch drians blkcg part of request_queue for both blk-mq and
> legacy, which can be a fix on blk-mq's queue cleanup.
> 
> V2:
> 	- follow Bart's suggestion to use run queue instead of drain queue
> 	- drians blkcg part of request_queue for blk-mq
> 

Hi Jens,

Without this patchset, IO hang can be triggered in Mauricio's disk
pull test, and this IO hang won't happen any more after this patchset
is applied.

So could you make it in V4.15 if you are fine with the two patches?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-11-27 12:41 ` [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
@ 2017-11-29  2:57   ` chenxiang (M)
  2017-11-29  4:54     ` Ming Lei
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: chenxiang (M) @ 2017-11-29  2:57 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Wen Xiong,
	Mauricio Faria de Oliveira, Linuxarm

在 2017/11/27 20:41, Ming Lei 写道:
> On Thu, Nov 23, 2017 at 12:47:58PM +0800, Ming Lei wrote:
>> Hi Jens,
>>
>> The 1st patch runs queue in blk_freeze_queue_start() for fixing one
>> regression by 055f6e18e08f("block: Make q_usage_counter also track legacy
>> requests").
>>
>> The 2nd patch drians blkcg part of request_queue for both blk-mq and
>> legacy, which can be a fix on blk-mq's queue cleanup.
>>
>> V2:
>> 	- follow Bart's suggestion to use run queue instead of drain queue
>> 	- drians blkcg part of request_queue for blk-mq
>>
> Hi Jens,
>
> Without this patchset, IO hang can be triggered in Mauricio's disk
> pull test, and this IO hang won't happen any more after this patchset
> is applied.
>
> So could you make it in V4.15 if you are fine with the two patches?

Hi Lei Ming,

I applied this v2 patchset to kernel 4.15-rc1, running fio on a SATA 
disk, then disable the disk with sysfs interface
(echo 0 > /sys/class/sas_phy/phy-1:0:1/enable), and find system is hung. 
But with v1 patch, it doesn't
has this issue. Please have a check.

Log of the issue is as follows:

estuary:/$ fio -filename=/dev/sdb1 -direct=1 -iodepth 1 -thread -rw=re
ad -ioengine=psync -bs=4k -numjobs=64 -runtime=300 -group_reporting 
-name=mytest
mytest: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.1.11
Starting 64 threads
[  112.362950] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=1 
task=ffff801fc3e5e580 CQ hdr: 0x1103 0x1 0x0 0x0 Error info: 0x0 0x200 
0x0 0x0
[  112.376108] sas: smp_execute_task_sg: task to dev 500e004aaaaaaa1f 
response: 0x0 status 0x2
[  112.384597] sas: broadcast received: 0
[  112.388357] sas: REVALIDATING DOMAIN on port 0, pid:2032
[  112.394136] sas: Expander phy change count has changed
[  112.399501] sas: ex 500e004aaaaaaa1f phy1 originated BROADCAST(CHANGE)
[  112.408321] sas: done REVALIDATING DOMAIN on port 0, pid:2032, res 0x0
[  112.415524] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
fio: pid=2781, err=5/file:engines/[  112.420876] sd 0:0:1:0: [sdb] 
Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00
sync.c:67, func=xfer, error=Input/[  112.432688] sd 0:0:1:0: [sdb] 
Stopping disk
output error
fio: pid=2784, err=[  112.439696] sd 0:0:1:0: [sdb] Start/Stop Unit 
failed: Result: hostbyte=0x04 driverbyte=0x00
5/file:engines/sync.c:67, func=xfer, error=Input/output error
fio: pid=2817, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2792, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2777, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2782, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2814, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2819, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2776, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2815, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2791, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2796, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2799, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2803, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2816, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2778, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2820, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2807, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2769, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2822, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2783, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2821, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2809, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2811, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2804, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2808, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2824, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2786, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2766, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2794, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2774, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2802, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2810, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2826, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2829, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2767, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
[  144.890733] sas: Enter sas_scsi_recover_host busy: 3 failed: 3
[  144.896570] sas: trying to find task 0xffff8017ca664300
[  144.901794] sas: sas_scsi_find_task: aborting task 0xffff8017ca664300
[  144.908261] hisi_sas_v2_hw HISI0162:01: internal task abort: task to 
dev 500e004aaaaaaa01 task=ffff8017cfb4db00 resp: 0x0 sts 0x8
[  144.954984] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=66 
task=ffff8017d0130a80 CQ hdr: 0x101b 0x20042 0x0 0x0 Error info: 0x8000 
0x0 0x0 0x0
[  144.968627] hisi_sas_v2_hw HISI0162:01: abort tmf: task to dev 
500e004aaaaaaa01 resp: 0x0 status 0x87
[  145.012914] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=66 
task=ffff8017d0130a80 CQ hdr: 0x101b 0x20042 0x0 0x0 Error info: 0x8000 
0x0 0x0 0x0
[  145.026568] hisi_sas_v2_hw HISI0162:01: abort tmf: task to dev 
500e004aaaaaaa01 resp: 0x0 status 0x87
[  145.070853] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=66 
task=ffff8017d0130a80 CQ hdr: 0x101b 0x20042 0x0 0x0 Error info: 0x8000 
0x0 0x0 0x0
[  145.084496] hisi_sas_v2_hw HISI0162:01: abort tmf: task to dev 
500e004aaaaaaa01 resp: 0x0 status 0x87
[  145.093712] hisi_sas_v2_hw HISI0162:01: abort tmf: executing internal 
task failed!
[  145.101279] hisi_sas_v2_hw HISI0162:01: ata disk reset failed
[  145.107021] hisi_sas_v2_hw HISI0162:01: abort task: rc=5
[  145.112330] sas: sas_scsi_find_task: task 0xffff8017ca664300 is done
[  145.118678] sas: sas_eh_handle_sas_errors: task 0xffff8017ca664300 is 
done
[  145.125550] sas: trying to find task 0xffff8017ce1bf900
[  145.130771] sas: sas_scsi_find_task: aborting task 0xffff8017ce1bf900
[  145.137208] sas: sas_scsi_find_task: task 0xffff8017ce1bf900 is done
[  145.143557] sas: sas_eh_handle_sas_errors: task 0xffff8017ce1bf900 is 
done
[  145.150429] sas: trying to find task 0xffff8017ca45cd80
[  145.155648] sas: sas_scsi_find_task: aborting task 0xffff8017ca45cd80
[  145.162101] hisi_sas_v2_hw HISI0162:01: internal task abort: task to 
dev 500e004aaaaaaa01 task=ffff8017d0130a80 resp: 0x0 sts 0x0
[  145.208816] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=66 
task=ffff8017d0130a80 CQ hdr: 0x101b 0x20042 0x0 0x0 Error info: 0x8000 
0x0 0x0 0x0
[  145.222458] hisi_sas_v2_hw HISI0162:01: abort tmf: task to dev 
500e004aaaaaaa01 resp: 0x0 status 0x87
[  145.266741] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=66 
task=ffff8017d0130a80 CQ hdr: 0x101b 0x20042 0x0 0x0 Error info: 0x8000 
0x0 0x0 0x0
[  145.280383] hisi_sas_v2_hw HISI0162:01: abort tmf: task to dev 
500e004aaaaaaa01 resp: 0x0 status 0x87
[  145.324666] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=66 
task=ffff8017d0130a80 CQ hdr: 0x101b 0x20042 0x0 0x0 Error info: 0x8000 
0x0 0x0 0x0
[  145.338309] hisi_sas_v2_hw HISI0162:01: abort tmf: task to dev 
500e004aaaaaaa01 resp: 0x0 status 0x87
[  145.347524] hisi_sas_v2_hw HISI0162:01: abort tmf: executing internal 
task failed!
[  145.355091] hisi_sas_v2_hw HISI0162:01: ata disk reset failed
[  145.360832] hisi_sas_v2_hw HISI0162:01: abort task: rc=5
[  145.366140] sas: sas_scsi_find_task: querying task 0xffff8017ca45cd80
[  145.372575] sas: sas_scsi_find_task: task 0xffff8017ca45cd80 failed 
to abort
[  145.379619] sas: task 0xffff8017ca45cd80 is not at LU: I_T recover
[  145.385794] sas: I_T nexus reset for dev 500e004aaaaaaa01
[  145.391205] hisi_sas_v2_hw HISI0162:01: internal task abort: task to 
dev 500e004aaaaaaa01 task=ffff8017d0130a80 resp: 0x0 sts 0x0
[  147.438724] sas: I_T 500e004aaaaaaa01 recovered
[  147.443255] sas: ata1: end_device-0:0:1: cmd error handler
[  147.448755] sas: ata1: end_device-0:0:1: dev error handler
[  147.448762] sas: ata2: end_device-0:0:4: dev error handler
[  147.448768] sas: ata3: end_device-0:0:7: dev error handler
[  147.448779] sas: ata4: end_device-0:0:9: dev error handler
[  147.448781] sas: ata5: end_device-0:0:10: dev error handler
[  147.476262] ata1.00: exception Emask 0x0 SAct 0xe0000 SErr 0x0 action 
0x6 frozen
[  147.483664] ata1.00: failed command: READ FPDMA QUEUED
[  147.488802] ata1.00: cmd 60/08:00:88:5a:01/00:00:00:00:00/40 tag 18 
ncq dma 4096 in
[  147.488802]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
0x4 (timeout)
[  147.503834] ata1.00: status: { DRDY }
[  147.507492] ata1.00: failed command: READ FPDMA QUEUED
[  147.512628] ata1.00: cmd 60/08:00:78:5a:01/00:00:00:00:00/40 tag 19 
ncq dma 4096 in
[  147.512628]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
0x4 (timeout)
[  147.527657] ata1.00: status: { DRDY }
[  147.531319] ata1: hard resetting link
[  147.534977] sas: ata1: end_device-0:0:1: Unable to reset ata device?
[  147.698866] sas: ata: ex 500e004aaaaaaa1f phy01:U:A attached: 
0000000000000000 (no device)
[  148.546838] sas: ata1: end_device-0:0:1: reset failed (errno=-19)
[  148.552933] ata1: reset failed (errno=-19), retrying in 9 secs
[  157.678707] ata1: hard resetting link
[  157.682358] sas: ata1: end_device-0:0:1: Unable to reset ata device?
[  158.686836] sas: ata1: end_device-0:0:1: reset failed (errno=-19)
[  158.692927] ata1: reset failed (errno=-19), retrying in 9 secs
[  167.918703] ata1: hard resetting link
[  167.922354] sas: ata1: end_device-0:0:1: Unable to reset ata device?
[  168.926838] sas: ata1: end_device-0:0:1: reset failed (errno=-19)
[  168.932929] ata1: reset failed (errno=-19), retrying in 34 secs
[  204.270708] ata1: hard resetting link
[  204.274359] sas: ata1: end_device-0:0:1: Unable to reset ata device?
[  205.278855] sas: ata1: end_device-0:0:1: reset failed (errno=-19)
[  205.284947] ata1: reset failed, giving up
[  205.288961] ata1.00: disabled
[  205.291946] WARNING: CPU: 0 PID: 2197 at drivers/ata/libata-eh.c:4039 
ata_eh_finish+0xb4/0xcc
[  205.300457] Modules linked in:
[  205.303504] CPU: 0 PID: 2197 Comm: kworker/u129:11 Not tainted 
4.15.0-rc1-g35c43a4-dirty #585
[  205.312016] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI 
Nemo 1.8 RC0 08/31/2017
[  205.320269] Workqueue: events_unbound async_run_entry_fn
[  205.325571] task: ffff8017d53d3200 task.stack: ffff000019b68000
[  205.331479] pstate: 60000005 (nZCv daif -PAN -UAO)
[  205.336258] pc : ata_eh_finish+0xb4/0xcc
[  205.340169] lr : ata_eh_finish+0xb0/0xcc
[  205.344080] sp : ffff000019b6bbf0
[  205.347383] x29: ffff000019b6bbf0 x28: ffff8017d2aea520
[  205.352686] x27: ffff8017d2aea598 x26: ffff8017d2aec2e8
[  205.357990] x25: 0000000000000000 x24: ffff8017d2aec000
[  205.363293] x23: 0000000000000001 x22: ffff0000086ca090
[  205.368597] x21: ffff8017d2ae8000 x20: ffff8017d2ae9f80
[  205.373900] x19: ffff8017d2ae9f80 x18: 0000000000000007
[  205.379204] x17: 000000000000000e x16: 0000000000000001
[  205.384507] x15: 0000000000000007 x14: 0000000000000000
[  205.389811] x13: 0000000000000000 x12: ffffffffffffffff
[  205.395114] x11: 0000000000000000 x10: 0000000000000006
[  205.400418] x9 : 0000000000000006 x8 : 000000000000059c
[  205.405721] x7 : ffff0000086ca844 x6 : 0000000008000002
[  205.411025] x5 : ffff8017ca80f560 x4 : ffff8017d2aec1c0
[  205.416328] x3 : ffff8017ca80f5b0 x2 : ffff8017cf5761b0
[  205.421632] x1 : 0000000000000000 x0 : 0000000000000001
[  205.426936] Call trace:
[  205.429371]  ata_eh_finish+0xb4/0xcc
[  205.432935]  ata_do_eh+0xac/0xbc
[  205.436151]  ata_std_error_handler+0x3c/0x80
[  205.440410]  ata_scsi_port_error_handler+0x468/0x65c
[  205.445364]  async_sas_ata_eh+0x48/0x70
[  205.449189]  async_run_entry_fn+0x48/0x130
[  205.453274]  process_one_work+0x1a8/0x39c
[  205.457273]  worker_thread+0x14c/0x408
[  205.461010]  kthread+0x12c/0x158
[  205.464227]  ret_from_fork+0x10/0x18
[  205.467791] ---[ end trace cebfc3ab091dd523 ]---
[  205.472438] scsi 0:0:1:0: [sdb] tag#1 UNKNOWN(0x2003) Result: 
hostbyte=0x00 driverbyte=0x08
[  205.480786] scsi 0:0:1:0: [sdb] tag#1 Sense Key : 0x2 [current]
[  205.486794] scsi 0:0:1:0: [sdb] tag#1 ASC=0x4 ASCQ=0x21
[  205.492103] scsi 0:0:1:0: [sdb] tag#1 CDB: opcode=0x88 88 00 00 00 00 
00 00 01 5a 88 00 00 00 08 00 00
[  205.501409] print_req_error: I/O error, dev sdb, sector 88712
[  205.507167] scsi 0:0:1:0: [sdb] tag#2 UNKNOWN(0x2003) Result: 
hostbyte=0x00 driverbyte=0x08
fio: pid=2801, err=5/file:engines[  205.515521] scsi 0:0:1:0: [sdb] 
tag#2 Sense Key : 0x2 [current]
/sync.c:67, func=xfer, error=Inpu[  205.524405] scsi 0:0:1:0: [sdb] 
tag#2 ASC=0x4 ASCQ=0x21
t/output error
[  205.532619] scsi 0:0:1:0: [sdb] tag#2 CDB: opcode=0x88 88 00 00 00 00 
00 00 01 5a 78 00 00 00 08 00 00
[  205.543240] print_req_error: I/O error, dev sdb, sector 88696
[  205.549001] ata1: EH complete
fio: pid=2779, err=5/file:engines[  205.551988] scsi 0:0:1:0: [sdb] 
tag#0 UNKNOWN(0x2003) Result: hostbyte=0x05 driverbyte=0x00
/sync.c:67, func=xfer, error=Inpu[  205.563191] scsi 0:0:1:0: [sdb] 
tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 01 5a 88 00 00 00 08 00 00
t/output error
[  205.575340] print_req_error: I/O error, dev sdb, sector 88712
[  205.582484] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 3 
tries: 1
fio: pid=2805, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
[  249.326782] INFO: task kworker/u128:1:2032 blocked for more than 120 
seconds.
[  249.333925]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  249.340885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.348716] kworker/u128:1  D    0  2032      2 0x00000020
[  249.354212] Workqueue: HISI0162:01_disco_q sas_revalidate_domain
[  249.360217] Call trace:
[  249.362656]  __switch_to+0x98/0xb4
[  249.366059]  __schedule+0x22c/0x888
[  249.369543]  schedule+0x34/0x94
[  249.372685]  blk_mq_freeze_queue_wait+0x4c/0x9c
[  249.377211]  blk_freeze_queue+0x1c/0x28
[  249.381049]  blk_cleanup_queue+0xb8/0x234
[  249.385057]  __scsi_remove_device+0x60/0x120
[  249.389323]  scsi_remove_device+0x2c/0x40
[  249.393329]  scsi_remove_target+0x184/0x1c0
[  249.397508]  sas_rphy_remove+0x60/0x64
[  249.401253]  sas_rphy_delete+0x14/0x28
[  249.404998]  sas_destruct_devices+0x70/0xa4
[  249.409177]  sas_revalidate_domain+0x5c/0xe8
[  249.413445]  process_one_work+0x1a8/0x39c
[  249.417455]  worker_thread+0x14c/0x408
[  249.421200]  kthread+0x12c/0x158
[  249.424424]  ret_from_fork+0x10/0x18
[  249.427996] INFO: task kworker/u128:2:2033 blocked for more than 120 
seconds.
[  249.435125]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  249.442081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.449905] kworker/u128:2  D    0  2033      2 0x00000020
[  249.455395] Workqueue: HISI0162:01_event_q sas_port_event_worker
[  249.461400] Call trace:
[  249.463845]  __switch_to+0x98/0xb4
[  249.467245]  __schedule+0x22c/0x888
[  249.470730]  schedule+0x34/0x94
[  249.473862]  schedule_timeout+0x1dc/0x37c
[  249.477869]  wait_for_common+0x138/0x1f0
[  249.481789]  wait_for_completion+0x14/0x1c
[  249.485882]  flush_workqueue+0x118/0x444
[  249.489801]  sas_porte_broadcast_rcvd+0x5c/0x68
[  249.494331]  sas_port_event_worker+0x24/0x38
[  249.498598]  process_one_work+0x1a8/0x39c
[  249.502605]  worker_thread+0x14c/0x408
[  249.506352]  kthread+0x12c/0x158
[  249.509576]  ret_from_fork+0x10/0x18
[  249.513155] INFO: task fio:2768 blocked for more than 120 seconds.
[  249.519331]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  249.526288] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.534113] fio             D    0  2768   2745 0x00000000
[  249.539596] Call trace:
[  249.542033]  __switch_to+0x98/0xb4
[  249.545433]  __schedule+0x22c/0x888
[  249.548917]  schedule+0x34/0x94
[  249.552061]  io_schedule+0x14/0x30
[  249.555463]  __blkdev_direct_IO_simple+0x158/0x290
[  249.560250]  blkdev_direct_IO+0x36c/0x378
[  249.564261]  generic_file_read_iter+0xa0/0x7f4
[  249.568703]  blkdev_read_iter+0x44/0x54
[  249.572537]  __vfs_read+0xc8/0x11c
[  249.575936]  vfs_read+0x80/0x134
[  249.579162]  SyS_pread64+0x74/0x8c
[  249.582552]  el0_svc_naked+0x20/0x24
[  249.586127] INFO: task fio:2770 blocked for more than 120 seconds.
[  249.592302]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  249.599261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.607086] fio             D    0  2770   2745 0x00000000
[  249.612570] Call trace:
[  249.615014]  __switch_to+0x98/0xb4
[  249.618406]  __schedule+0x22c/0x888
[  249.621892]  schedule+0x34/0x94
[  249.625031]  io_schedule+0x14/0x30
[  249.628429]  __blkdev_direct_IO_simple+0x158/0x290
[  249.633217]  blkdev_direct_IO+0x36c/0x378
[  249.637225]  generic_file_read_iter+0xa0/0x7f4
[  249.641665]  blkdev_read_iter+0x44/0x54
[  249.645499]  __vfs_read+0xc8/0x11c
[  249.648897]  vfs_read+0x80/0x134
[  249.652120]  SyS_pread64+0x74/0x8c
[  249.655520]  el0_svc_naked+0x20/0x24
[  249.659092] INFO: task fio:2771 blocked for more than 120 seconds.
[  249.665268]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  249.672225] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.680050] fio             D    0  2771   2745 0x00000000
[  249.685535] Call trace:
[  249.687979]  __switch_to+0x98/0xb4
[  249.691378]  __schedule+0x22c/0x888
[  249.694864]  schedule+0x34/0x94
[  249.697995]  io_schedule+0x14/0x30
[  249.701394]  __blkdev_direct_IO_simple+0x158/0x290
[  249.706182]  blkdev_direct_IO+0x36c/0x378
[  249.710187]  generic_file_read_iter+0xa0/0x7f4
[  249.714627]  blkdev_read_iter+0x44/0x54
[  249.718460]  __vfs_read+0xc8/0x11c
[  249.721859]  vfs_read+0x80/0x134
[  249.725084]  SyS_pread64+0x74/0x8c
[  249.728482]  el0_svc_naked+0x20/0x24
[  249.732055] INFO: task fio:2772 blocked for more than 120 seconds.
[  249.738230]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  249.745187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.753016] fio             D    0  2772   2745 0x00000000
[  249.758502] Call trace:
[  249.760946]  __switch_to+0x98/0xb4
[  249.764343]  __schedule+0x22c/0x888
[  249.767829]  schedule+0x34/0x94
[  249.770968]  io_schedule+0x14/0x30
[  249.774358]  __blkdev_direct_IO_simple+0x158/0x290
[  249.779146]  blkdev_direct_IO+0x36c/0x378
[  249.783153]  generic_file_read_iter+0xa0/0x7f4
[  249.787593]  blkdev_read_iter+0x44/0x54
[  249.791426]  __vfs_read+0xc8/0x11c
[  249.794825]  vfs_read+0x80/0x134
[  249.798042]  SyS_pread64+0x74/0x8c
[  249.801441]  el0_svc_naked+0x20/0x24
[  249.805013] INFO: task fio:2773 blocked for more than 120 seconds.
[  249.811189]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  249.818146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.825972] fio             D    0  2773   2745 0x00000000
[  249.831456] Call trace:
[  249.833892]  __switch_to+0x98/0xb4
[  249.837291]  __schedule+0x22c/0x888
[  249.840780]  schedule+0x34/0x94
[  249.843919]  io_schedule+0x14/0x30
[  249.847318]  __blkdev_direct_IO_simple+0x158/0x290
[  249.852106]  blkdev_direct_IO+0x36c/0x378
[  249.856113]  generic_file_read_iter+0xa0/0x7f4
[  249.860553]  blkdev_read_iter+0x44/0x54
[  249.864386]  __vfs_read+0xc8/0x11c
[  249.867785]  vfs_read+0x80/0x134
[  249.871010]  SyS_pread64+0x74/0x8c
[  249.874401]  el0_svc_naked+0x20/0x24
[  249.877971] INFO: task fio:2775 blocked for more than 120 seconds.
[  249.884148]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  249.891105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.898930] fio             D    0  2775   2745 0x00000000
[  249.904413] Call trace:
[  249.906860]  __switch_to+0x98/0xb4
[  249.910251]  __schedule+0x22c/0x888
[  249.913737]  schedule+0x34/0x94
[  249.916873]  io_schedule+0x14/0x30
[  249.920273]  __blkdev_direct_IO_simple+0x158/0x290
[  249.925060]  blkdev_direct_IO+0x36c/0x378
[  249.929068]  generic_file_read_iter+0xa0/0x7f4
[  249.933506]  blkdev_read_iter+0x44/0x54
[  249.937339]  __vfs_read+0xc8/0x11c
[  249.940738]  vfs_read+0x80/0x134
[  249.943963]  SyS_pread64+0x74/0x8c
[  249.947362]  el0_svc_naked+0x20/0x24
Jobs: 25 (f=25): [X(2),R(1),X(1),[  249.950935] INFO: task fio:2780 
blocked for more than 120 seconds.
R(4),X(1),R(1),X(4),R(1),X(4),R(1[  249.959975]       Tainted: G        
W        4.15.0-rc1-g35c43a4-dirty #585
),X(1),R(4),X(2),R(1),X(1),R(1),[  249.969784] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
X(1),R(2),X(1),R(1),X(5),R(1),X([  249.980376] fio             D 0  
2780   2745 0x00000000
5),R(2),X(4),R(1),X(4),R(1),X(1)[  249.988623] Call trace:
,R(1),X(1),R(2),X(1)] [12.0% don[  249.993834] __switch_to+0x98/0xb4
e] [0KB/0KB/0KB /s] [0/0/0 iops][  249.999998] __schedule+0x22c/0x888
[  250.006249]  schedule+0x34/0x94
[  250.010677]  io_schedule+0x14/0x30
[  250.014074]  __blkdev_direct_IO_simple+0x158/0x290
[  250.018860]  blkdev_direct_IO+0x36c/0x378
[  250.022865]  generic_file_read_iter+0xa0/0x7f4
[  250.027303]  blkdev_read_iter+0x44/0x54
[  250.031134]  __vfs_read+0xc8/0x11c
[  250.034525]  vfs_read+0x80/0x134
[  250.037748]  SyS_pread64+0x74/0x8c
[  250.041145]  el0_svc_naked+0x20/0x24
[  250.044715] INFO: task fio:2785 blocked for more than 120 seconds.
[  250.050889]       Tainted: G        W 4.15.0-rc1-g35c43a4-dirty #585
[  250.057845] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  250.065668] fio             D    0  2785   2745 0x00000000
[  250.071151] Call trace:
[  250.073587]  __switch_to+0x98/0xb4
[  250.076985]  __schedule+0x22c/0x888
[  250.080468]  schedule+0x34/0x94
[  250.083604]  io_schedule+0x14/0x30
[  250.087001]  __blkdev_direct_IO_simple+0x158/0x290
[  250.091787]  blkdev_direct_IO+0x36c/0x378
[  250.095794]  generic_file_read_iter+0xa0/0x7f4
[  250.100233]  blkdev_read_iter+0x44/0x54
[  250.104064]  __vfs_read+0xc8/0x11c
[  250.107461]  vfs_read+0x80/0x134
[  250.110678]  SyS_pread64+0x74/0x8c
[  250.114075]  el0_svc_naked+0x20/0x24

>
> Thanks,
> Ming
>
> .
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-11-29  2:57   ` chenxiang (M)
@ 2017-11-29  4:54     ` Ming Lei
  2017-11-29  5:40       ` chenxiang (M)
  2017-12-01 15:36     ` Mauricio Faria de Oliveira
  2017-12-13 21:53     ` Bart Van Assche
  2 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2017-11-29  4:54 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Omar Sandoval,
	Bart Van Assche, Hannes Reinecke, Wen Xiong,
	Mauricio Faria de Oliveira, Linuxarm

Hi Chenxiang,

On Wed, Nov 29, 2017 at 10:57:06AM +0800, chenxiang (M) wrote:
> 在 2017/11/27 20:41, Ming Lei 写道:
> > On Thu, Nov 23, 2017 at 12:47:58PM +0800, Ming Lei wrote:
> > > Hi Jens,
> > > 
> > > The 1st patch runs queue in blk_freeze_queue_start() for fixing one
> > > regression by 055f6e18e08f("block: Make q_usage_counter also track legacy
> > > requests").
> > > 
> > > The 2nd patch drians blkcg part of request_queue for both blk-mq and
> > > legacy, which can be a fix on blk-mq's queue cleanup.
> > > 
> > > V2:
> > > 	- follow Bart's suggestion to use run queue instead of drain queue
> > > 	- drians blkcg part of request_queue for blk-mq
> > > 
> > Hi Jens,
> > 
> > Without this patchset, IO hang can be triggered in Mauricio's disk
> > pull test, and this IO hang won't happen any more after this patchset
> > is applied.
> > 
> > So could you make it in V4.15 if you are fine with the two patches?
> 
> Hi Lei Ming,
> 
> I applied this v2 patchset to kernel 4.15-rc1, running fio on a SATA disk,
> then disable the disk with sysfs interface

I guess it is a SAS disk?

> (echo 0 > /sys/class/sas_phy/phy-1:0:1/enable), and find system is hung. But
> with v1 patch, it doesn't
> has this issue. Please have a check.

Thanks for your test!

Then looks even though V2 can fix Mauricio's issue, it still can't fix
yours, which is a regression caused by 055f6e18e08f("block: Make q_usage_counter
also track legacy requests").

Given V1 can fix both your issue and Mauricio's, I will post V3 which falls
back to original V1.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-11-29  4:54     ` Ming Lei
@ 2017-11-29  5:40       ` chenxiang (M)
  0 siblings, 0 replies; 21+ messages in thread
From: chenxiang (M) @ 2017-11-29  5:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Omar Sandoval,
	Bart Van Assche, Hannes Reinecke, Wen Xiong,
	Mauricio Faria de Oliveira, Linuxarm

在 2017/11/29 12:54, Ming Lei 写道:
> Hi Chenxiang,
>
> On Wed, Nov 29, 2017 at 10:57:06AM +0800, chenxiang (M) wrote:
>> 在 2017/11/27 20:41, Ming Lei 写道:
>>> On Thu, Nov 23, 2017 at 12:47:58PM +0800, Ming Lei wrote:
>>>> Hi Jens,
>>>>
>>>> The 1st patch runs queue in blk_freeze_queue_start() for fixing one
>>>> regression by 055f6e18e08f("block: Make q_usage_counter also track legacy
>>>> requests").
>>>>
>>>> The 2nd patch drians blkcg part of request_queue for both blk-mq and
>>>> legacy, which can be a fix on blk-mq's queue cleanup.
>>>>
>>>> V2:
>>>> 	- follow Bart's suggestion to use run queue instead of drain queue
>>>> 	- drians blkcg part of request_queue for blk-mq
>>>>
>>> Hi Jens,
>>>
>>> Without this patchset, IO hang can be triggered in Mauricio's disk
>>> pull test, and this IO hang won't happen any more after this patchset
>>> is applied.
>>>
>>> So could you make it in V4.15 if you are fine with the two patches?
>> Hi Lei Ming,
>>
>> I applied this v2 patchset to kernel 4.15-rc1, running fio on a SATA disk,
>> then disable the disk with sysfs interface
> I guess it is a SAS disk?

My test is on SATA disk. Our driver is SAS driver, and it supports SATA 
disk and SAS disk.

>
>> (echo 0 > /sys/class/sas_phy/phy-1:0:1/enable), and find system is hung. But
>> with v1 patch, it doesn't
>> has this issue. Please have a check.
> Thanks for your test!
>
> Then looks even though V2 can fix Mauricio's issue, it still can't fix
> yours, which is a regression caused by 055f6e18e08f("block: Make q_usage_counter
> also track legacy requests").
>
> Given V1 can fix both your issue and Mauricio's, I will post V3 which falls
> back to original V1.
>
>
> Thanks,
> Ming
>
> .
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-11-29  2:57   ` chenxiang (M)
  2017-11-29  4:54     ` Ming Lei
@ 2017-12-01 15:36     ` Mauricio Faria de Oliveira
  2017-12-01 15:42       ` Ming Lei
  2017-12-01 16:08       ` Bart Van Assche
  2017-12-13 21:53     ` Bart Van Assche
  2 siblings, 2 replies; 21+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-12-01 15:36 UTC (permalink / raw)
  To: chenxiang (M), Ming Lei, Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Wen Xiong, Linuxarm

Hi Ming Lei,

On 11/29/2017 12:57 AM, chenxiang (M) wrote:
> I applied this v2 patchset to kernel 4.15-rc1, running fio on a SATA 
> disk, then disable the disk with sysfs interface
> (echo 0 > /sys/class/sas_phy/phy-1:0:1/enable), and find system is hung. 
> But with v1 patch, it doesn't
> has this issue. Please have a check.

Indeed, with this particular test-case (thanks, chenxiang) the problem
can be recreated with PATCH v2 but _not_ with v1.

For reference, I'm including the tests with v2 in this e-mail.
The same tests have too been performed with v1, without blocked tasks.

Interestingly, physical disk pulls did not hit the problem either
on v1 or v2 (but it does not matter anymore) -- so v1 is the one.

That said, I have to withdraw my Tested-By tag from v2:

On 11/27/2017 10:15 AM, Mauricio Faria de Oliveira wrote:
 > Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>

Thanks.



# ls -ld 
/sys/class/sas_phy/phy-*/device/port/end_device*/target*/*/block/sd* 
 

...
drwxr-xr-x 8 root root 0 Dec  1 08:55 
/sys/class/sas_phy/phy-0:5/device/port/end_device-0:0/target0:0:0/0:0:0:0/block/sdg
...

# fio -name=test-sdg -filename=/dev/sdg -direct=1 -iodepth 1 -thread 
-rw=read -ioengine=psync -bs=4k -numjobs=64 -runtime=300 -group_reporting
test-sdg: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, 
iodepth=1 & 

... 

fio-2.16 

time     4360  cycles_start=233715732609 

Starting 64 threads
...

# echo 0 > /sys/class/sas_phy/phy-0:5/enable

...
fio: io_u error on file /dev/sdg: Input/output error: read 
offset=38236160, buflen=4096
fio: io_u error on file /dev/sdg: Input/output error: read 
offset=38236160, buflen=4096
fio: pid=4402, err=5/file:io_u.c:1712, func=io_u error, 
error=Input/output error
...
fio: pid=4383, err=5/file:io_u.c:1712, func=io_u error, 
error=Input/output error
fio: pid=4370, err=5/file:io_u.c:1712, func=io_u error, 
error=Input/output error
fio: pid=4394, err=5/file:io_u.c:1712, func=io_u error, 
error=Input/output error

test-sdg: (groupid=0, jobs=64): err= 5 (file:io_u.c:1712, func=io_u 
error, error=Input/output error): pid=4362: Fri Dec  1 09:02:38 2017 

...

# dmesg
...
[  434.880517] sd 0:0:0:0: [sdg] tag#1 FAILED Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  434.880519] sd 0:0:0:0: [sdg] tag#3 FAILED Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  434.880521] sd 0:0:0:0: [sdg] tag#4 FAILED Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  434.880524] sd 0:0:0:0: [sdg] tag#0 FAILED Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  434.880526] sd 0:0:0:0: [sdg] tag#2 FAILED Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
...
[  434.880637] sd 0:0:0:0: [sdg] tag#11 CDB: Read(10) 28 00 00 02 7d c0 
00 00 18 00
[  434.880775] sd 0:0:0:0: [sdg] Synchronizing SCSI cache 

[  434.880900] sd 0:0:0:0: [sdg] Synchronize Cache(10) failed: Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK 

[  617.732540] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
...
[  617.732815] Call Trace: 

[  617.732819] [c000000002eef270] [c00000000001c9a8] 
__switch_to+0x318/0x620 

[  617.732824] [c000000002eef2d0] [c000000000c55764] 
__schedule+0x354/0xaf0
[  617.732827] [c000000002eef3a0] [c000000000c56818] 
schedule_preempt_disabled+0x48/0xc0
[  617.732830] [c000000002eef3d0] [c000000000c58da4] 
__mutex_lock.isra.6+0x1b4/0x650
[  617.732834] [c000000002eef460] [c00000000061e9b0] 
blk_cleanup_queue+0x40/0x240
[  617.732839] [c000000002eef4a0] [d000000011930d94] 
sas_end_device_release+0x34/0x70 [scsi_transport_sas] 

[  617.732843] [c000000002eef4d0] [c0000000007f5640] 
device_release+0x60/0xf0
[  617.732846] [c000000002eef550] [c000000000c3eaec] 
kobject_put+0x1dc/0x360
[  617.732850] [c000000002eef5d0] [c0000000007f5bf4] 
put_device+0x34/0x50
[  617.732853] [c000000002eef600] [c000000000656f8c] 
bsg_request_fn+0x17c/0x210 

[  617.732856] [c000000002eef660] [c0000000006203ac] 
blk_run_queue+0x8c/0xf0
[  617.732859] [c000000002eef690] [c0000000006374a4] 
blk_freeze_queue_start+0xa4/0xb0
[  617.732862] [c000000002eef6c0] [c00000000061d190] 
blk_set_queue_dying+0x70/0x1b0
[  617.732865] [c000000002eef6f0] [c00000000061e9bc] 
blk_cleanup_queue+0x4c/0x240 

[  617.732869] [c000000002eef730] [d000000011930d94] 
sas_end_device_release+0x34/0x70 [scsi_transport_sas] 

[  617.732872] [c000000002eef760] [c0000000007f5640] 
device_release+0x60/0xf0 

[  617.732875] [c000000002eef7e0] [c000000000c3eaec] 
kobject_put+0x1dc/0x360 

[  617.732878] [c000000002eef860] [c0000000007f5bf4] 
put_device+0x34/0x50 

[  617.732882] [c000000002eef890] [d000000011935c60] 
sas_port_delete+0x180/0x310 [scsi_transport_sas] 

[  617.732889] [c000000002eef8f0] [d000000012d31918] 
mpt3sas_transport_port_remove+0x278/0x340 [mpt3sas] 

[  617.732895] [c000000002eef9a0] [d000000012d22470] 
_scsih_remove_device+0x200/0x440 [mpt3sas] 

[  617.732901] [c000000002eefa70] [d000000012d22a50] 
_scsih_device_remove_by_handle.part.23+0x110/0x290 [mpt3sas] 

[  617.732907] [c000000002eefac0] [d000000012d2abd4] 
_firmware_event_work+0x2334/0x2d90 [mpt3sas] 

[  617.732911] [c000000002eefc80] [c00000000015d5ec] 
process_one_work+0x1bc/0x5f0
[  617.732913] [c000000002eefd20] [c00000000016060c] 
worker_thread+0xac/0x6b0
[  617.732916] [c000000002eefdc0] [c00000000016a528] kthread+0x168/0x1b0
[  617.732920] [c000000002eefe30] [c00000000000b4e8] 
ret_from_kernel_thread+0x5c/0x74
...

# dmesg | grep blocked
[  617.732540] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[  740.612135] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[  863.482376] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[  986.357365] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[ 1109.234520] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[ 1232.112572] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[ 1354.985545] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[ 1477.856508] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[ 1600.729815] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.
[ 1723.604766] INFO: task kworker/u320:3:3914 blocked for more than 120 
seconds.


-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-01 15:36     ` Mauricio Faria de Oliveira
@ 2017-12-01 15:42       ` Ming Lei
  2017-12-01 16:08       ` Bart Van Assche
  1 sibling, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-12-01 15:42 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira, Jens Axboe
  Cc: chenxiang (M),
	Jens Axboe, linux-block, Christoph Hellwig, Omar Sandoval,
	Bart Van Assche, Hannes Reinecke, Wen Xiong, Linuxarm

On Fri, Dec 01, 2017 at 01:36:13PM -0200, Mauricio Faria de Oliveira wrote:
> Hi Ming Lei,
> 
> On 11/29/2017 12:57 AM, chenxiang (M) wrote:
> > I applied this v2 patchset to kernel 4.15-rc1, running fio on a SATA
> > disk, then disable the disk with sysfs interface
> > (echo 0 > /sys/class/sas_phy/phy-1:0:1/enable), and find system is hung.
> > But with v1 patch, it doesn't
> > has this issue. Please have a check.
> 
> Indeed, with this particular test-case (thanks, chenxiang) the problem
> can be recreated with PATCH v2 but _not_ with v1.
> 
> For reference, I'm including the tests with v2 in this e-mail.
> The same tests have too been performed with v1, without blocked tasks.
> 
> Interestingly, physical disk pulls did not hit the problem either
> on v1 or v2 (but it does not matter anymore) -- so v1 is the one.

V1 has been resent out in the following link with your tested-by:

	https://marc.info/?l=linux-block&m=151200356024020&w=2

Jens, since it fixes regression from two reports, could you take a look?

-- 
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-01 15:36     ` Mauricio Faria de Oliveira
  2017-12-01 15:42       ` Ming Lei
@ 2017-12-01 16:08       ` Bart Van Assche
  2017-12-01 17:35         ` Ming Lei
  2017-12-01 18:49         ` Mauricio Faria de Oliveira
  1 sibling, 2 replies; 21+ messages in thread
From: Bart Van Assche @ 2017-12-01 16:08 UTC (permalink / raw)
  To: mauricfo, chenxiang66, hch, linux-block, axboe, ming.lei
  Cc: osandov, hare, wenxiong, linuxarm

T24gRnJpLCAyMDE3LTEyLTAxIGF0IDEzOjM2IC0wMjAwLCBNYXVyaWNpbyBGYXJpYSBkZSBPbGl2
ZWlyYSB3cm90ZToNCj4gT24gMTEvMjkvMjAxNyAxMjo1NyBBTSwgY2hlbnhpYW5nIChNKSB3cm90
ZToNCj4gPiBJIGFwcGxpZWQgdGhpcyB2MiBwYXRjaHNldCB0byBrZXJuZWwgNC4xNS1yYzEsIHJ1
bm5pbmcgZmlvIG9uIGEgU0FUQSANCj4gPiBkaXNrLCB0aGVuIGRpc2FibGUgdGhlIGRpc2sgd2l0
aCBzeXNmcyBpbnRlcmZhY2UNCj4gPiAoZWNobyAwID4gL3N5cy9jbGFzcy9zYXNfcGh5L3BoeS0x
OjA6MS9lbmFibGUpLCBhbmQgZmluZCBzeXN0ZW0gaXMgaHVuZy4gDQo+ID4gQnV0IHdpdGggdjEg
cGF0Y2gsIGl0IGRvZXNuJ3QNCj4gPiBoYXMgdGhpcyBpc3N1ZS4gUGxlYXNlIGhhdmUgYSBjaGVj
ay4NCj4gDQo+IEluZGVlZCwgd2l0aCB0aGlzIHBhcnRpY3VsYXIgdGVzdC1jYXNlICh0aGFua3Ms
IGNoZW54aWFuZykgdGhlIHByb2JsZW0NCj4gY2FuIGJlIHJlY3JlYXRlZCB3aXRoIFBBVENIIHYy
IGJ1dCBfbm90XyB3aXRoIHYxLg0KPiANCj4gRm9yIHJlZmVyZW5jZSwgSSdtIGluY2x1ZGluZyB0
aGUgdGVzdHMgd2l0aCB2MiBpbiB0aGlzIGUtbWFpbC4NCj4gVGhlIHNhbWUgdGVzdHMgaGF2ZSB0
b28gYmVlbiBwZXJmb3JtZWQgd2l0aCB2MSwgd2l0aG91dCBibG9ja2VkIHRhc2tzLg0KPiANCj4g
SW50ZXJlc3RpbmdseSwgcGh5c2ljYWwgZGlzayBwdWxscyBkaWQgbm90IGhpdCB0aGUgcHJvYmxl
bSBlaXRoZXINCj4gb24gdjEgb3IgdjIgKGJ1dCBpdCBkb2VzIG5vdCBtYXR0ZXIgYW55bW9yZSkg
LS0gc28gdjEgaXMgdGhlIG9uZS4NCg0KVGhlIHRlc3QgY2hlbnhpYW5nIHJhbiBkb2VzIG5vdCBw
cm92ZSB0aGF0IHRoZXJlIGlzIGFueXRoaW5nIHdyb25nIHdpdGggdjIuDQpNYXliZSBjaGVueGlh
bmcgaGl0IHRoZSBpc3N1ZSBkZXNjcmliZWQgaW4gaHR0cHM6Ly9sa21sLm9yZy9sa21sLzIwMTcv
OS81LzM4MT8NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-01 16:08       ` Bart Van Assche
@ 2017-12-01 17:35         ` Ming Lei
  2017-12-01 18:49         ` Mauricio Faria de Oliveira
  1 sibling, 0 replies; 21+ messages in thread
From: Ming Lei @ 2017-12-01 17:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: mauricfo, chenxiang66, hch, linux-block, axboe, osandov, hare,
	wenxiong, linuxarm

On Fri, Dec 01, 2017 at 04:08:49PM +0000, Bart Van Assche wrote:
> On Fri, 2017-12-01 at 13:36 -0200, Mauricio Faria de Oliveira wrote:
> > On 11/29/2017 12:57 AM, chenxiang (M) wrote:
> > > I applied this v2 patchset to kernel 4.15-rc1, running fio on a SATA 
> > > disk, then disable the disk with sysfs interface
> > > (echo 0 > /sys/class/sas_phy/phy-1:0:1/enable), and find system is hung. 
> > > But with v1 patch, it doesn't
> > > has this issue. Please have a check.
> > 
> > Indeed, with this particular test-case (thanks, chenxiang) the problem
> > can be recreated with PATCH v2 but _not_ with v1.
> > 
> > For reference, I'm including the tests with v2 in this e-mail.
> > The same tests have too been performed with v1, without blocked tasks.
> > 
> > Interestingly, physical disk pulls did not hit the problem either
> > on v1 or v2 (but it does not matter anymore) -- so v1 is the one.
> 
> The test chenxiang ran does not prove that there is anything wrong with v2.
> Maybe chenxiang hit the issue described in https://lkml.org/lkml/2017/9/5/381?

No, if the issue is in SCSI EH, blk_drain_queue() shouldn't have moved on too.

Anyway, V1/V3 is the correct thing to do.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-01 16:08       ` Bart Van Assche
  2017-12-01 17:35         ` Ming Lei
@ 2017-12-01 18:49         ` Mauricio Faria de Oliveira
  2017-12-02  0:49           ` Ming Lei
  2017-12-13 21:49           ` Bart Van Assche
  1 sibling, 2 replies; 21+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-12-01 18:49 UTC (permalink / raw)
  To: Bart Van Assche, chenxiang66, hch, linux-block, axboe, ming.lei
  Cc: osandov, hare, wenxiong, linuxarm

Hi Bart,

On 12/01/2017 02:08 PM, Bart Van Assche wrote:
> The test chenxiang ran does not prove that there is anything wrong with v2.
> Maybe chenxiang hit the issue described inhttps://lkml.org/lkml/2017/9/5/381?

Unfortunately v2 has been found to exhibit another problem, an Oops in
the systemd shutdown path for device-mapper LVM volumes at least, when
calling 'q->request_fn(q)' if 'q->request_fn' is NULL.

I couldn't find a more synthetic test-case yet (say, dmsetup commands),
but the default LVM scheme in RHEL triggers it consistently.

Here are the tests with 4.15-rc1 +v2 (oops), +v1 (okay), clean (okay).

[root@guest ~]# dmsetup table
guest-swap: 0 8388608 linear 8:3 2048
guest-root: 0 104857600 linear 8:3 161458176
guest-home: 0 153067520 linear 8:3 8390656

[root@guest ~]# dmsetup ls -o blkdevname,ascii --tree
rhel_guest-swap <dm-1> (253:1)
  `- <sda3> (8:3)
rhel_guest-root <dm-0> (253:0)
  `- <sda3> (8:3)
rhel_guest-home <dm-2> (253:2)
  `- <sda3> (8:3)

[root@guest ~]# uname -r
4.15.0-rc1.mingleiv2

[root@guest ~]# reboot
<...>
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: Detaching DM 253:2.
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x00000000
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: vmx_crypto virtio_balloon ip_tables x_tables autofs4 
xfs libcrc32c virtio_net virtio_scsi crc32c_vpmsum virtio_pci 
virtio_ring virtio
CPU: 8 PID: 1 Comm: systemd-shutdow Not tainted 4.15.0-rc1.mingleiv2 #2
task: c0000001fb000000 task.stack: c0000001fb080000
NIP:  0000000000000000 LR: c00000000057c7fc CTR: 0000000000000000
REGS: c0000001fb0836f0 TRAP: 0400   Not tainted  (4.15.0-rc1.mingleiv2)
MSR:  8000000040009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22042422  XER: 00000000
CFAR: c0000000000087c0 SOFTE: 0
GPR00: c00000000057ce0c c0000001fb083970 c000000000f91000 c0000001ee070000
GPR04: 0000000000000001 0000000000000000 0000000000000000 0000000000000003
GPR08: 0000000000000001 0000000000000000 0000000000000001 0000000000000000
GPR12: 0000000000000000 c00000000fd42c00 00000001184eea18 0000000000000000
GPR16: 00000001184ee4b8 00000001184ee468 00000001184ee348 00000001184ee3c8
GPR20: 0000000000000001 00007fffcc925140 00007fffcc925030 0000000000000003
GPR24: 0000000000000000 c0000000008af550 c0000001ed4b5000 0000000000000000
GPR28: 0000000000000000 c0000001ee070000 0000000000000001 c0000001ee070000
NIP [0000000000000000]           (null)
LR [c00000000057c7fc] __blk_run_queue+0x6c/0xb0
Call Trace:
[c0000001fb083970] [c0000001fb0839e0] 0xc0000001fb0839e0 (unreliable)
[c0000001fb0839a0] [c00000000057ce0c] blk_run_queue+0x4c/0x80
[c0000001fb0839d0] [c000000000591f54] blk_freeze_queue_start+0xa4/0xb0
[c0000001fb083a00] [c00000000057d5cc] blk_set_queue_dying+0x6c/0x190
[c0000001fb083a30] [c0000000008a3fbc] __dm_destroy+0xac/0x300
[c0000001fb083ad0] [c0000000008af6a4] dev_remove+0x154/0x1d0
[c0000001fb083b20] [c0000000008affd0] ctl_ioctl+0x360/0x4f0
[c0000001fb083d10] [c0000000008b0198] dm_ctl_ioctl+0x38/0x50
[c0000001fb083d50] [c0000000003863b8] do_vfs_ioctl+0xd8/0x8c0
[c0000001fb083df0] [c000000000386c08] SyS_ioctl+0x68/0x100
[c0000001fb083e30] [c00000000000b760] system_call+0x58/0x6c
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace e1710ec836e5526f ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Rebooting in 10 seconds..

<...>

[root@guest ~]# uname -r
4.15.0-rc1.mingleiv1

[root@guest ~]# reboot
<...>
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: Detaching DM 253:2.
shutdown: 7 output lines suppressed due to ratelimiting
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
XFS (dm-0): Unmounting Filesystem
dracut Warning: Unmounted /oldroot.
dracut: Disassembling device-mapper devices
Rebooting.
sd 0:0:0:0: [sda] Synchronizing SCSI cache
reboot: Restarting system

<...>

[root@guest ~]# uname -r
4.15.0-rc1

[root@guest ~]# reboot
<...>
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: Detaching DM 253:2.
shutdown: 7 output lines suppressed due to ratelimiting
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
XFS (dm-0): Unmounting Filesystem
dracut Warning: Unmounted /oldroot.
dracut: Disassembling device-mapper devices
Rebooting.

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-01 18:49         ` Mauricio Faria de Oliveira
@ 2017-12-02  0:49           ` Ming Lei
  2017-12-04 12:31             ` Mauricio Faria de Oliveira
  2017-12-13 21:49           ` Bart Van Assche
  1 sibling, 1 reply; 21+ messages in thread
From: Ming Lei @ 2017-12-02  0:49 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira
  Cc: Bart Van Assche, chenxiang66, hch, linux-block, axboe, osandov,
	hare, wenxiong, linuxarm

On Fri, Dec 01, 2017 at 04:49:31PM -0200, Mauricio Faria de Oliveira wrote:
> Hi Bart,
> 
> On 12/01/2017 02:08 PM, Bart Van Assche wrote:
> > The test chenxiang ran does not prove that there is anything wrong with v2.
> > Maybe chenxiang hit the issue described inhttps://lkml.org/lkml/2017/9/5/381?
> 
> Unfortunately v2 has been found to exhibit another problem, an Oops in
> the systemd shutdown path for device-mapper LVM volumes at least, when
> calling 'q->request_fn(q)' if 'q->request_fn' is NULL.

That issue can be fixed easily in V2 and we understand DM's case clearly.

But the most important thing is that V2 can't fix Chenxiang's issue, but
V3/V1 can.


-- 
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-02  0:49           ` Ming Lei
@ 2017-12-04 12:31             ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 21+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-12-04 12:31 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, chenxiang66, hch, linux-block, axboe, osandov,
	hare, wenxiong, linuxarm

On 12/01/2017 10:49 PM, Ming Lei wrote:
>> Unfortunately v2 has been found to exhibit another problem, an Oops in
>> the systemd shutdown path for device-mapper LVM volumes at least, when
>> calling 'q->request_fn(q)' if 'q->request_fn' is NULL.
> That issue can be fixed easily in V2 and we understand DM's case clearly.

Yes. It just had not been mentioned upstream yet.

> But the most important thing is that V2 can't fix Chenxiang's issue, but
> V3/V1 can.

Agree.


-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-01 18:49         ` Mauricio Faria de Oliveira
  2017-12-02  0:49           ` Ming Lei
@ 2017-12-13 21:49           ` Bart Van Assche
  2017-12-20 14:34             ` Mauricio Faria de Oliveira
  1 sibling, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2017-12-13 21:49 UTC (permalink / raw)
  To: mauricfo, chenxiang66, hch, linux-block, axboe, ming.lei
  Cc: osandov, hare, wenxiong, linuxarm

T24gRnJpLCAyMDE3LTEyLTAxIGF0IDE2OjQ5IC0wMjAwLCBNYXVyaWNpbyBGYXJpYSBkZSBPbGl2
ZWlyYSB3cm90ZToNCj4gTFIgW2MwMDAwMDAwMDA1N2M3ZmNdIF9fYmxrX3J1bl9xdWV1ZSsweDZj
LzB4YjANCj4gQ2FsbCBUcmFjZToNCj4gW2MwMDAwMDAxZmIwODM5NzBdIFtjMDAwMDAwMWZiMDgz
OWUwXSAweGMwMDAwMDAxZmIwODM5ZTAgKHVucmVsaWFibGUpDQo+IFtjMDAwMDAwMWZiMDgzOWEw
XSBbYzAwMDAwMDAwMDU3Y2UwY10gYmxrX3J1bl9xdWV1ZSsweDRjLzB4ODANCj4gW2MwMDAwMDAx
ZmIwODM5ZDBdIFtjMDAwMDAwMDAwNTkxZjU0XSBibGtfZnJlZXplX3F1ZXVlX3N0YXJ0KzB4YTQv
MHhiMA0KPiBbYzAwMDAwMDFmYjA4M2EwMF0gW2MwMDAwMDAwMDA1N2Q1Y2NdIGJsa19zZXRfcXVl
dWVfZHlpbmcrMHg2Yy8weDE5MA0KPiBbYzAwMDAwMDFmYjA4M2EzMF0gW2MwMDAwMDAwMDA4YTNm
YmNdIF9fZG1fZGVzdHJveSsweGFjLzB4MzAwDQo+IFtjMDAwMDAwMWZiMDgzYWQwXSBbYzAwMDAw
MDAwMDhhZjZhNF0gZGV2X3JlbW92ZSsweDE1NC8weDFkMA0KPiBbYzAwMDAwMDFmYjA4M2IyMF0g
W2MwMDAwMDAwMDA4YWZmZDBdIGN0bF9pb2N0bCsweDM2MC8weDRmMA0KPiBbYzAwMDAwMDFmYjA4
M2QxMF0gW2MwMDAwMDAwMDA4YjAxOThdIGRtX2N0bF9pb2N0bCsweDM4LzB4NTANCj4gW2MwMDAw
MDAxZmIwODNkNTBdIFtjMDAwMDAwMDAwMzg2M2I4XSBkb192ZnNfaW9jdGwrMHhkOC8weDhjMA0K
PiBbYzAwMDAwMDFmYjA4M2RmMF0gW2MwMDAwMDAwMDAzODZjMDhdIFN5U19pb2N0bCsweDY4LzB4
MTAwDQo+IFtjMDAwMDAwMWZiMDgzZTMwXSBbYzAwMDAwMDAwMDAwYjc2MF0gc3lzdGVtX2NhbGwr
MHg1OC8weDZjDQo+IEluc3RydWN0aW9uIGR1bXA6DQo+IFhYWFhYWFhYIFhYWFhYWFhYIFhYWFhY
WFhYIFhYWFhYWFhYIFhYWFhYWFhYIFhYWFhYWFhYIFhYWFhYWFhYIFhYWFhYWFhYDQo+IFhYWFhY
WFhYIFhYWFhYWFhYIFhYWFhYWFhYIFhYWFhYWFhYIFhYWFhYWFhYIFhYWFhYWFhYIFhYWFhYWFhY
IFhYWFhYWFhYDQo+IC0tLVsgZW5kIHRyYWNlIGUxNzEwZWM4MzZlNTUyNmYgXS0tLQ0KPiANCj4g
S2VybmVsIHBhbmljIC0gbm90IHN5bmNpbmc6IEF0dGVtcHRlZCB0byBraWxsIGluaXQhIGV4aXRj
b2RlPTB4MDAwMDAwMGINCg0KSGVsbG8gTWF1cmljaW8sDQoNCldvdWxkIGl0IGJlIHBvc3NpYmxl
IHRvIHJlcGVhdCB5b3VyIHRlc3Qgd2l0aCB0aGUgcGF0Y2ggYmVsb3cgYXBwbGllZCBvbiB5b3Vy
DQprZXJuZWwgdHJlZT8gVGhpcyBwYXRjaCBoYXMganVzdCBiZWVuIHBvc3RlZCBvbiB0aGUgZG0t
ZGV2ZWwgbWFpbGluZyBsaXN0Lg0KDQpUaGFua3MsDQoNCkJhcnQuDQoNCg0KRnJvbTogQmFydCBW
YW4gQXNzY2hlIDxiYXJ0LnZhbmFzc2NoZUB3ZGMuY29tPg0KRGF0ZTogV2VkLCAxMyBEZWMgMjAx
NyAxMzowNzoxNCAtMDgwMA0KU3ViamVjdDogW1BBVENIXSBkbTogRml4IGEgcmVjZW50bHkgaW50
cm9kdWNlZCByZWZlcmVuY2UgY291bnRpbmcgYnVnDQoNClRoaXMgcGF0Y2ggYXZvaWRzIHRoYXQg
dGhlIGZvbGxvd2luZyBtZXNzYWdlIG9jY3VycyBzcG9yYWRpY2FsbHkNCmluIHRoZSBzeXN0ZW0g
bG9nIChyZXZlYWxpbmcgdGhhdCBwZ3BhdGgtPnBhdGguZGV2LT5uYW1lIGJlY2FtZQ0KYSBkYW5n
bGluZyBwb2ludGVyKToNCg0KZGV2aWNlLW1hcHBlcjogdGFibGU6IDI1NDoyOiBkZXZpY2Uga2tr
a2tra2tra2tra2tra2traz8/Pz8/Pz8/P3gwP2E/Pz8/P0U/Pz8/Pz8/Pz8/Pz8/P0U/Pz8/Pz9G
Pz8/Pz8yPz8/Pz9wRj8/Pz8/P1BGPz8/Pz85W0Y/Pz8/Pz9dRj8/Pz8/Pz8jPz8/Pz8/PyM/Pz8/
Pz8nZj8/Pz8/IG5vdCBpbiB0YWJsZSBkZXZpY2VzIGxpc3QNCg0KVGhpcyBwYXRjaCBhbHNvIGZp
eGVzIHRoZSBmb2xsb3dpbmcga2VybmVsIGNyYXNoOg0KDQpnZW5lcmFsIHByb3RlY3Rpb24gZmF1
bHQ6IDAwMDAgWyMxXSBQUkVFTVBUIFNNUA0KUklQOiAwMDEwOm11bHRpcGF0aF9idXN5KzB4Nzcv
MHhkMCBbZG1fbXVsdGlwYXRoXQ0KQ2FsbCBUcmFjZToNCiBkbV9tcV9xdWV1ZV9ycSsweDQ0LzB4
MTEwIFtkbV9tb2RdDQogYmxrX21xX2Rpc3BhdGNoX3JxX2xpc3QrMHg3My8weDQ0MA0KIGJsa19t
cV9kb19kaXNwYXRjaF9zY2hlZCsweDYwLzB4ZTANCiBibGtfbXFfc2NoZWRfZGlzcGF0Y2hfcmVx
dWVzdHMrMHgxMWEvMHgxYTANCiBfX2Jsa19tcV9ydW5faHdfcXVldWUrMHgxMWYvMHgxYzANCiBf
X2Jsa19tcV9kZWxheV9ydW5faHdfcXVldWUrMHg5NS8weGUwDQogYmxrX21xX3J1bl9od19xdWV1
ZSsweDI1LzB4ODANCiBibGtfbXFfZmx1c2hfcGx1Z19saXN0KzB4MTk3LzB4NDIwDQogYmxrX2Zs
dXNoX3BsdWdfbGlzdCsweGU0LzB4MjcwDQogYmxrX2ZpbmlzaF9wbHVnKzB4MjcvMHg0MA0KIF9f
ZG9fcGFnZV9jYWNoZV9yZWFkYWhlYWQrMHgyYjQvMHgzNzANCiBmb3JjZV9wYWdlX2NhY2hlX3Jl
YWRhaGVhZCsweGI0LzB4MTEwDQogZ2VuZXJpY19maWxlX3JlYWRfaXRlcisweDc1NS8weDk3MA0K
IF9fdmZzX3JlYWQrMHhkMi8weDE0MA0KIHZmc19yZWFkKzB4OWIvMHgxNDANCiBTeVNfcmVhZCsw
eDQ1LzB4YTANCiBkb19zeXNjYWxsXzY0KzB4NTYvMHgxYTANCiBlbnRyeV9TWVNDQUxMNjRfc2xv
d19wYXRoKzB4MjUvMHgyNQ0KDQpGcm9tIHRoZSBkaXNhc3NlbWJseSBvZiBtdWx0aXBhdGhfYnVz
eSAoMHg3NyA9IDExOSk6DQoNCi4vaW5jbHVkZS9saW51eC9ibGtkZXYuaDoNCjk5MiAgICAgICAg
ICAgICByZXR1cm4gYmRldi0+YmRfZGlzay0+cXVldWU7ICAgIC8qIHRoaXMgaXMgbmV2ZXIgTlVM
TCAqLw0KICAgMHgwMDAwMDAwMDAwMDAwNmI0IDwrMTE2PjogICBtb3YgICAgKCVyYXgpLCVyYXgN
CiAgIDB4MDAwMDAwMDAwMDAwMDZiNyA8KzExOT46ICAgbW92ICAgIDB4ZTAoJXJheCksJXJheA0K
DQpGaXhlczogY29tbWl0IDJhMGI0NjgyZTA5ZCAoImRtOiBjb252ZXJ0IGRtX2Rldl9pbnRlcm5h
bC5jb3VudCBmcm9tIGF0b21pY190IHRvIHJlZmNvdW50X3QiKQ0KU2lnbmVkLW9mZi1ieTogQmFy
dCBWYW4gQXNzY2hlIDxiYXJ0LnZhbmFzc2NoZUB3ZGMuY29tPg0KQ2M6IEVsZW5hIFJlc2hldG92
YSA8ZWxlbmEucmVzaGV0b3ZhQGludGVsLmNvbT4NCkNjOiBLZWVzIENvb2sgPGtlZXNjb29rQGNo
cm9taXVtLm9yZz4NCkNjOiBEYXZpZCBXaW5kc29yIDxkd2luZHNvckBnbWFpbC5jb20+DQpDYzog
SGFucyBMaWxqZXN0cmFuZCA8aXNoa2FtaWVsQGdtYWlsLmNvbT4NCkNjOiBIYW5uZXMgUmVpbmVj
a2UgPGhhcmVAc3VzZS5jb20+DQpDYzogc3RhYmxlQHZnZXIua2VybmVsLm9yZyAjIHY0LjE1DQot
LS0NCiBkcml2ZXJzL21kL2RtLXRhYmxlLmMgfCAyICsrDQogMSBmaWxlIGNoYW5nZWQsIDIgaW5z
ZXJ0aW9ucygrKQ0KDQpkaWZmIC0tZ2l0IGEvZHJpdmVycy9tZC9kbS10YWJsZS5jIGIvZHJpdmVy
cy9tZC9kbS10YWJsZS5jDQppbmRleCA4ODEzMGI1ZDk1ZjkuLmVlNWMzODllNzI1NiAxMDA2NDQN
Ci0tLSBhL2RyaXZlcnMvbWQvZG0tdGFibGUuYw0KKysrIGIvZHJpdmVycy9tZC9kbS10YWJsZS5j
DQpAQCAtNDU5LDYgKzQ1OSw4IEBAIGludCBkbV9nZXRfZGV2aWNlKHN0cnVjdCBkbV90YXJnZXQg
KnRpLCBjb25zdCBjaGFyICpwYXRoLCBmbW9kZV90IG1vZGUsDQogCQlpZiAocikNCiAJCQlyZXR1
cm4gcjsNCiAJCXJlZmNvdW50X2luYygmZGQtPmNvdW50KTsNCisJfSBlbHNlIHsNCisJCXJlZmNv
dW50X2luYygmZGQtPmNvdW50KTsNCiAJfQ0KIA0KIAkqcmVzdWx0ID0gZGQtPmRtX2RldjsNCi0t
IA0KMi4xNS4xDQoNCg==

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-11-29  2:57   ` chenxiang (M)
  2017-11-29  4:54     ` Ming Lei
  2017-12-01 15:36     ` Mauricio Faria de Oliveira
@ 2017-12-13 21:53     ` Bart Van Assche
  2017-12-15  7:58       ` chenxiang (M)
  2 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2017-12-13 21:53 UTC (permalink / raw)
  To: chenxiang66, hch, linux-block, axboe, ming.lei
  Cc: mauricfo, osandov, hare, wenxiong, linuxarm

T24gV2VkLCAyMDE3LTExLTI5IGF0IDEwOjU3ICswODAwLCBjaGVueGlhbmcgKE0pIHdyb3RlOg0K
PiBJIGFwcGxpZWQgdGhpcyB2MiBwYXRjaHNldCB0byBrZXJuZWwgNC4xNS1yYzEsIHJ1bm5pbmcg
ZmlvIG9uIGEgU0FUQSANCj4gZGlzaywgdGhlbiBkaXNhYmxlIHRoZSBkaXNrIHdpdGggc3lzZnMg
aW50ZXJmYWNlDQo+IChlY2hvIDAgPiAvc3lzL2NsYXNzL3Nhc19waHkvcGh5LTE6MDoxL2VuYWJs
ZSksIGFuZCBmaW5kIHN5c3RlbSBpcyBodW5nLiANCj4gQnV0IHdpdGggdjEgcGF0Y2gsIGl0IGRv
ZXNuJ3QgaGFzIHRoaXMgaXNzdWUuDQoNCkhlbGxvIGNoZW54aWFuZywNCg0KV291bGQgaXQgYmUg
cG9zc2libGUgdG8gcmVwZWF0IHlvdXIgdGVzdCB3aXRoIHYyIG9mIHRoaXMgc2VyaWVzIGFuZCBN
YXJ0aW4ncw0KZm9yLTQuMTYgU0NTSSB0cmVlIG1lcmdlZCBpbnRvIHlvdXIga2VybmVsIHRlc3Qg
dHJlZT8gTWFydGluJ3MgdHJlZSBpcyBhdmFpbGFibGUNCmF0IGh0dHBzOi8vZ2l0Lmtlcm5lbC5v
cmcvcHViL3NjbS9saW51eC9rZXJuZWwvZ2l0L21rcC9zY3NpLmdpdC9sb2cvP2g9NC4xNi9zY3Np
LXF1ZXVlLg0KVGhlIGZvbGxvd2luZyBwYXRjaCBpbiB0aGF0IHRyZWUgZml4ZXMgYSByYWNlIGNv
bmRpdGlvbiB0aGF0IHNvbWV0aW1lcyBjYXVzZWQNCnRoZSBTQ1NJIGVycm9yIGhhbmRsZXIgbm90
IHRvIGJlIHdva2VuIHVwOg0KaHR0cHM6Ly9naXQua2VybmVsLm9yZy9wdWIvc2NtL2xpbnV4L2tl
cm5lbC9naXQvbWtwL3Njc2kuZ2l0L2NvbW1pdC8/aD00LjE2L3Njc2ktcXVldWUmaWQ9M2JkNmY0
M2Y1Y2IzNzE0ZjcwYzU5MTUxNGYzNDQzODlkZjU5MzUwMQ0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-13 21:53     ` Bart Van Assche
@ 2017-12-15  7:58       ` chenxiang (M)
  2017-12-15 17:44         ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: chenxiang (M) @ 2017-12-15  7:58 UTC (permalink / raw)
  To: Bart Van Assche, hch, linux-block, axboe, ming.lei
  Cc: mauricfo, osandov, hare, wenxiong, linuxarm, John Garry

在 2017/12/14 5:53, Bart Van Assche 写道:
> On Wed, 2017-11-29 at 10:57 +0800, chenxiang (M) wrote:
>> I applied this v2 patchset to kernel 4.15-rc1, running fio on a SATA
>> disk, then disable the disk with sysfs interface
>> (echo 0 > /sys/class/sas_phy/phy-1:0:1/enable), and find system is hung.
>> But with v1 patch, it doesn't has this issue.
> Hello chenxiang,
>
> Would it be possible to repeat your test with v2 of this series and Martin's
> for-4.16 SCSI tree merged into your kernel test tree? Martin's tree is available
> at https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/log/?h=4.16/scsi-queue.
> The following patch in that tree fixes a race condition that sometimes caused
> the SCSI error handler not to be woken up:
> https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.16/scsi-queue&id=3bd6f43f5cb3714f70c591514f344389df593501

Hi Bart, sorry to reply this email late, and i just noticed this email 
after John's remind yesterday.
I tested v2 of this series based on Martin's for-4.16 SCSI tree which 
branch error handler issue "scsi: core: Ensure that the
SCSI error handler gets woken up" is merged. And system is still hung 
after repeat my testcase.
Log is as follows:

estuary:/$ fio -filename=/dev/sdb1 -direct=1 -iodepth 1 -thread -rw=re
ad -ioengine=psync -bs=4k -numjobs=64 -runtime=300 -group_reporting 
-name=mytes
t
mytest: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.1.11
Starting 64 threads
[   45.794460] random: crng init done
[   54.369720] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=1 
task=ffff8017cf471400 CQ hdr: 0x1103 0x1 0x0 0x0 Error info: 0x0 0x200 
0x0 0x0
[   54.382869] sas: smp_execute_task_sg: task to dev 500e004aaaaaaa1f 
response: 0x0 status 0x2
[   54.391339] sas: broadcast received: 0
[   54.395092] sas: REVALIDATING DOMAIN on port 0, pid:2022
[   54.400879] sas: Expander phy change count has changed
[   54.406191] sas: ex 500e004aaaaaaa1f phy1 originated BROADCAST(CHANGE)
[   54.415052] sas: done REVALIDATING DOMAIN on port 0, pid:2022, res 0x0
[   54.422057] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[   54.427248] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: 
hostbyte=0x04 driverbyte=0x00
fio: pid=2786, err=5/file:engines[   54.436147] sd 0:0:1:0: [sdb] 
Stopping disk
/sync.c:67, func=xfer, error=Inpu[   54.443187] sd 0:0:1:0: [sdb] 
Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
t/output error
fio: pid=2772, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2729, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2727, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2787, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2740, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2775, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2774, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2768, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2754, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2732, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2756, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2738, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
[  249.291697] INFO: task kworker/u128:1:2021 blocked for more than 120 
seconds.(1),R(11),X(1),R(3),X(1),R(1),X(2),R(10),X(2),R(3)] [50.5% done] 
[0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:20s]
[  249.298836]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.304316] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.312138] kworker/u128:1  D    0  2021      2 0x00000020
[  249.317632] Workqueue: HISI0162:01_event_q sas_port_event_worker
[  249.323634] Call trace:
[  249.326077]  __switch_to+0xa0/0xb4
[  249.329475]  __schedule+0x1c4/0x704
[  249.332955]  schedule+0x34/0x94
[  249.336090]  schedule_timeout+0x1b8/0x358
[  249.340092]  wait_for_common+0xfc/0x1b0
[  249.343920]  wait_for_completion+0x14/0x1c
[  249.348011]  flush_workqueue+0xfc/0x424
[  249.351840]  sas_porte_broadcast_rcvd+0x5c/0x68
[  249.356363]  sas_port_event_worker+0x24/0x38
[  249.360626]  process_one_work+0x128/0x2cc
[  249.364627]  worker_thread+0x14c/0x408
[  249.368369]  kthread+0xfc/0x128
[  249.371499]  ret_from_fork+0x10/0x18
[  249.375068] INFO: task kworker/u128:2:2022 blocked for more than 120 
seconds.
[  249.382194]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.387671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.395492] kworker/u128:2  D    0  2022      2 0x00000020
[  249.400973] Workqueue: HISI0162:01_disco_q sas_revalidate_domain
[  249.406972] Call trace:
[  249.409412]  __switch_to+0xa0/0xb4
[  249.412806]  __schedule+0x1c4/0x704
[  249.416287]  schedule+0x34/0x94
[  249.419422]  blk_mq_freeze_queue_wait+0x4c/0x9c
[  249.423945]  blk_freeze_queue+0x1c/0x28
[  249.427779]  blk_cleanup_queue+0x68/0xf8
[  249.431694]  __scsi_remove_device+0x5c/0x118
[  249.435956]  scsi_remove_device+0x28/0x3c
[  249.439957]  scsi_remove_target+0x194/0x1d0
[  249.444133]  sas_rphy_remove+0x60/0x64
[  249.447875]  sas_rphy_delete+0x14/0x28
[  249.451612]  sas_destruct_devices+0x74/0xa8
[  249.455787]  sas_revalidate_domain+0x58/0xe4
[  249.460049]  process_one_work+0x128/0x2cc
[  249.464051]  worker_thread+0x14c/0x408
[  249.467792]  kthread+0xfc/0x128
[  249.470922]  ret_from_fork+0x10/0x18
[  249.474494] INFO: task fio:2728 blocked for more than 120 seconds.
[  249.480665]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.486140] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.493960] fio             D    0  2728   2706 0x00000000
[  249.499439] Call trace:
[  249.501878]  __switch_to+0xa0/0xb4
[  249.505272]  __schedule+0x1c4/0x704
[  249.508753]  schedule+0x34/0x94
[  249.511887]  io_schedule+0x14/0x30
[  249.515280]  __blkdev_direct_IO_simple+0x158/0x290
[  249.520063]  blkdev_direct_IO+0x36c/0x378
[  249.524066]  generic_file_read_iter+0xa0/0x7f4
[  249.528501]  blkdev_read_iter+0x44/0x54
[  249.532331]  __vfs_read+0xc8/0x11c
[  249.535725]  vfs_read+0x80/0x134
[  249.538941]  SyS_pread64+0x74/0x8c
[  249.542335]  el0_svc_naked+0x20/0x24
[  249.545902] INFO: task fio:2730 blocked for more than 120 seconds.
[  249.552074]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.557551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.565372] fio             D    0  2730   2706 0x00000000
[  249.570851] Call trace:
[  249.573291]  __switch_to+0xa0/0xb4
[  249.576684]  __schedule+0x1c4/0x704
[  249.580165]  schedule+0x34/0x94
[  249.583295]  io_schedule+0x14/0x30
[  249.586689]  __blkdev_direct_IO_simple+0x158/0x290
[  249.591471]  blkdev_direct_IO+0x36c/0x378
[  249.595473]  generic_file_read_iter+0xa0/0x7f4
[  249.599909]  blkdev_read_iter+0x44/0x54
[  249.603737]  __vfs_read+0xc8/0x11c
[  249.607127]  vfs_read+0x80/0x134
[  249.610347]  SyS_pread64+0x74/0x8c
[  249.613741]  el0_svc_naked+0x20/0x24
[  249.617308] INFO: task fio:2731 blocked for more than 120 seconds.
[  249.623479]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.628955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.636775] fio             D    0  2731   2706 0x00000000
[  249.642253] Call trace:
[  249.644692]  __switch_to+0xa0/0xb4
[  249.648086]  __schedule+0x1c4/0x704
[  249.651562]  schedule+0x34/0x94
[  249.654695]  io_schedule+0x14/0x30
[  249.658090]  __blkdev_direct_IO_simple+0x158/0x290
[  249.662872]  blkdev_direct_IO+0x36c/0x378
[  249.666874]  generic_file_read_iter+0xa0/0x7f4
[  249.671310]  blkdev_read_iter+0x44/0x54
[  249.675139]  __vfs_read+0xc8/0x11c
[  249.678534]  vfs_read+0x80/0x134
[  249.681755]  SyS_pread64+0x74/0x8c
[  249.685149]  el0_svc_naked+0x20/0x24
[  249.688716] INFO: task fio:2733 blocked for more than 120 seconds.
[  249.694887]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.700363] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.708184] fio             D    0  2733   2706 0x00000000
[  249.713662] Call trace:
[  249.716101]  __switch_to+0xa0/0xb4
[  249.719491]  __schedule+0x1c4/0x704
[  249.722971]  schedule+0x34/0x94
[  249.726105]  io_schedule+0x14/0x30
[  249.729498]  __blkdev_direct_IO_simple+0x158/0x290
[  249.734281]  blkdev_direct_IO+0x36c/0x378
[  249.738282]  generic_file_read_iter+0xa0/0x7f4
[  249.742718]  blkdev_read_iter+0x44/0x54
[  249.746546]  __vfs_read+0xc8/0x11c
[  249.749940]  vfs_read+0x80/0x134
[  249.753161]  SyS_pread64+0x74/0x8c
[  249.756555]  el0_svc_naked+0x20/0x24
[  249.760123] INFO: task fio:2734 blocked for more than 120 seconds.
[  249.766293]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.771769] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.779590] fio             D    0  2734   2706 0x00000000
[  249.785070] Call trace:
[  249.787506]  __switch_to+0xa0/0xb4
[  249.790899]  __schedule+0x1c4/0x704
[  249.794380]  schedule+0x34/0x94
[  249.797513]  io_schedule+0x14/0x30
Jobs: 51 (f=51): [X(1),R(1),X(1),[  249.800909] 
__blkdev_direct_IO_simple+0x158/0x290
R(2),X(1),R(5),X(1),R(1),X(1),R(1[  249.808552] blkdev_direct_IO+0x36c/0x378
3),X(1),R(1),X(1),R(11),X(1),R(3[  249.815409] 
generic_file_read_iter+0xa0/0x7f4
),X(1),R(1),X(2),R(10),X(2),R(3)[  249.822615] blkdev_read_iter+0x44/0x54
] [50.7% done] [0KB/0KB/0KB /s] [  249.829213] __vfs_read+0xc8/0x11c
[  249.835377]  vfs_read+0x80/0x134
[  249.840933]  SyS_pread64+0x74/0x8c
[  249.844327]  el0_svc_naked+0x20/0x24
[  249.847894] INFO: task fio:2735 blocked for more than 120 seconds.
[  249.854065]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.859541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.867361] fio             D    0  2735   2706 0x00000000
[  249.872839] Call trace:
[  249.875275]  __switch_to+0xa0/0xb4
[  249.878670]  __schedule+0x1c4/0x704
[  249.882150]  schedule+0x34/0x94
[  249.885283]  io_schedule+0x14/0x30
[  249.888678]  __blkdev_direct_IO_simple+0x158/0x290
[  249.893460]  blkdev_direct_IO+0x36c/0x378
[  249.897462]  generic_file_read_iter+0xa0/0x7f4
[  249.901898]  blkdev_read_iter+0x44/0x54
[  249.905726]  __vfs_read+0xc8/0x11c
[  249.909119]  vfs_read+0x80/0x134
[  249.912339]  SyS_pread64+0x74/0x8c
[  249.915733]  el0_svc_naked+0x20/0x24
[  249.919297] INFO: task fio:2736 blocked for more than 120 seconds.
[  249.925468]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.930944] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.938764] fio             D    0  2736   2706 0x00000000
[  249.944242] Call trace:
[  249.946677]  __switch_to+0xa0/0xb4
[  249.950071]  __schedule+0x1c4/0x704
[  249.953551]  schedule+0x34/0x94
[  249.956684]  io_schedule+0x14/0x30
[  249.960078]  __blkdev_direct_IO_simple+0x158/0x290
[  249.964861]  blkdev_direct_IO+0x36c/0x378
[  249.968863]  generic_file_read_iter+0xa0/0x7f4
[  249.973298]  blkdev_read_iter+0x44/0x54
[  249.977127]  __vfs_read+0xc8/0x11c
[  249.980521]  vfs_read+0x80/0x134
[  249.983741]  SyS_pread64+0x74/0x8c
[  249.987131]  el0_svc_naked+0x20/0x24
[  249.990698] INFO: task fio:2737 blocked for more than 120 seconds.
[  249.996869]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  250.002345] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  250.010165] fio             D    0  2737   2706 0x00000000
[  250.015643] Call trace:
[  250.018078]  __switch_to+0xa0/0xb4
[  250.021472]  __schedule+0x1c4/0x704
[  250.024952]  schedule+0x34/0x94
[  250.028086]  io_schedule+0x14/0x30
[  250.031476]  __blkdev_direct_IO_simple+0x158/0x290
[  250.036258]  blkdev_direct_IO+0x36c/0x378
[  250.040260]  generic_file_read_iter+0xa0/0x7f4
[  250.044696]  blkdev_read_iter+0x44/0x54
[  250.048523]  __vfs_read+0xc8/0x11c
[  250.051917]  vfs_read+0x80/0x134
[  250.055133]  SyS_pread64+0x74/0x8c
[  250.058526]  el0_svc_naked+0x20/0x24
[  318.975633] WARNING: CPU: 9 PID: 0 at kernel/rcu/tree.c:2792 
rcu_process_callbacks+0x3e4/0x3f43),X(1),R(1),X(2),R(10),X(2),R(3)] 
[67.8% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 02m:10s]
[  318.984231] Modules linked in:
[  318.987276] CPU: 9 PID: 0 Comm: swapper/9 Not tainted 
4.15.0-rc1-00045-ga9c054d #1
[  318.994831] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 IT17 
Nemo 2.0 RC0 11/29/2017
[  319.003080] task: ffff8017d63c4880 task.stack: ffff0000097a0000
[  319.008986] pstate: 20000085 (nzCv daIf -PAN -UAO)
[  319.013764] pc : rcu_process_callbacks+0x3e4/0x3f4
[  319.018543] lr : rcu_process_callbacks+0x204/0x3f4
[  319.023319] sp : ffff00000804be30
[  319.026621] x29: ffff00000804be30 x28: 0000000000000000
[  319.031922] x27: 0000000000000784 x26: ffff00000804be90
[  319.037222] x25: ffff8017fbf642b8 x24: ffff000009233000
[  319.042523] x23: ffff0000090919c0 x22: ffff00000920bd20
[  319.047822] x21: ffff000009252000 x20: ffff00000924ef80
[  319.053122] x19: ffff8017fbf64280 x18: 0000000000000007
[  319.058422] x17: 000000000000000e x16: 0000000000000001
[  319.063722] x15: 0000000000000019 x14: 0000000000000033
[  319.069022] x13: 000000000000004c x12: 0000000000000068
[  319.074322] x11: 0000000000000000 x10: 0000000000000a80
[  319.079622] x9 : 0000000000005e79 x8 : 0000000044042000
[  319.084921] x7 : 0000000000210d00 x6 : 0000000000000000
[  319.090221] x5 : 0000000000000029 x4 : 0000000000000004
[  319.095520] x3 : 0000000000000000 x2 : 0000000000002710
[  319.100820] x1 : 0000000000000001 x0 : 0000000000000000
[  319.106120] Call trace:
[  319.108555]  rcu_process_callbacks+0x3e4/0x3f4
[  319.112987]  __do_softirq+0x10c/0x208
[  319.116638]  irq_exit+0xa8/0xb4
[  319.119767]  __handle_domain_irq+0x8c/0xf0
[  319.123849]  gic_handle_irq+0xd4/0x184
[  319.127585]  el1_irq+0xb0/0x128
[  319.130714]  arch_cpu_idle+0x14/0x20
[  319.134280]  default_idle_call+0x18/0x2c
[  319.138190]  do_idle+0x168/0x1d0
[  319.141406]  cpu_startup_entry+0x1c/0x24
[  319.145316]  secondary_start_kernel+0x11c/0x128
[  319.149832] ---[ end trace 2158876af600f163 ]---
>
> Thanks,
>
> Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-15  7:58       ` chenxiang (M)
@ 2017-12-15 17:44         ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2017-12-15 17:44 UTC (permalink / raw)
  To: chenxiang66, hch, linux-block, axboe, ming.lei
  Cc: mauricfo, osandov, hare, john.garry, wenxiong, linuxarm

T24gRnJpLCAyMDE3LTEyLTE1IGF0IDE1OjU4ICswODAwLCBjaGVueGlhbmcgKE0pIHdyb3RlOg0K
PiBJIHRlc3RlZCB2MiBvZiB0aGlzIHNlcmllcyBiYXNlZCBvbiBNYXJ0aW4ncyBmb3ItNC4xNiBT
Q1NJIHRyZWUgd2hpY2ggDQo+IGJyYW5jaCBlcnJvciBoYW5kbGVyIGlzc3VlICJzY3NpOiBjb3Jl
OiBFbnN1cmUgdGhhdCB0aGUNCj4gU0NTSSBlcnJvciBoYW5kbGVyIGdldHMgd29rZW4gdXAiIGlz
IG1lcmdlZC4gQW5kIHN5c3RlbSBpcyBzdGlsbCBodW5nIA0KPiBhZnRlciByZXBlYXQgbXkgdGVz
dGNhc2UuDQoNCkhlbGxvIGNoZW54aWFuZywNCg0KVGhhbmtzIGZvciBoYXZpbmcgcnVuIHRoaXMg
dGVzdCBhbmQgZm9yIGhhdmluZyBzaGFyZWQgdGhlIHJlc3VsdHMuIEkgd2lsbCBzZWUNCndoZXRo
ZXIgSSBjYW4gcmVwcm9kdWNlIHRoaXMgb24gbXkgdGVzdCBzZXR1cC4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
  2017-12-13 21:49           ` Bart Van Assche
@ 2017-12-20 14:34             ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 21+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-12-20 14:34 UTC (permalink / raw)
  To: Bart Van Assche, chenxiang66, hch, linux-block, axboe, ming.lei
  Cc: osandov, hare, wenxiong, linuxarm

Hi Bart,

On 12/13/2017 07:49 PM, Bart Van Assche wrote:
> Would it be possible to repeat your test with the patch below applied on your
> kernel tree? This patch has just been posted on the dm-devel mailing list.

Sorry for the delay. I missed this.

Unfortunately the oops problem still happens on PATCH v2 and that patch
(actually, the version that ended up in v4.15-rc4 [1] by Mike Snitzer).

The problem does not happen with PATCH v3 (a.k.a. v1) -- i.e., v3 is OK.

Thanks!


Test-case w/ PATCH v3:
---

[root@guest ~]# uname -r
4.15.0-rc4.mingleiv3

[root@guest ~]# reboot
...
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: Detaching DM 253:2.
shutdown: 7 output lines suppressed due to ratelimiting
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
XFS (dm-0): Unmounting Filesystem
dracut Warning: Unmounted /oldroot.
dracut: Disassembling device-mapper devices
Rebooting.
sd 0:0:0:0: [sda] Synchronizing SCSI cache
reboot: Restarting system


Test-case w/ PATCH v2:
---

[root@guest ~]# uname -r
4.15.0-rc4.mingleiv2

[root@guest ~]# reboot
...
systemd-shutdown[1]: Detaching DM devices.
systemd-shutdown[1]: Detaching DM 253:2.
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x00000000
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: vmx_crypto virtio_balloon ip_tables x_tables xfs 
libcrc32c virtio_net virtio_scsi crc32c_vpmsum virtio_pci virtio_ring 
virtio autofs4
CPU: 3 PID: 1 Comm: systemd-shutdow Not tainted 4.15.0-rc4.mingleiv2 #4
<...>
NIP [0000000000000000]           (null)
LR [c00000000057d0dc] __blk_run_queue+0x6c/0xb0
Call Trace:
[c0000001fb083970] [c0000001fb0839e0] 0xc0000001fb0839e0 (unreliable)
[c0000001fb0839a0] [c00000000057d6ec] blk_run_queue+0x4c/0x80
[c0000001fb0839d0] [c000000000592834] blk_freeze_queue_start+0xa4/0xb0
[c0000001fb083a00] [c00000000057deac] blk_set_queue_dying+0x6c/0x190
[c0000001fb083a30] [c0000000008a4b7c] __dm_destroy+0xac/0x300
[c0000001fb083ad0] [c0000000008b0244] dev_remove+0x154/0x1d0
[c0000001fb083b20] [c0000000008b0b70] ctl_ioctl+0x360/0x4f0
[c0000001fb083d10] [c0000000008b0d38] dm_ctl_ioctl+0x38/0x50
[c0000001fb083d50] [c0000000003867d8] do_vfs_ioctl+0xd8/0x8c0
[c0000001fb083df0] [c000000000387028] SyS_ioctl+0x68/0x100
[c0000001fb083e30] [c00000000000b760] system_call+0x58/0x6c
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 0fceefbe4fc1cd29 ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b



[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.15&id=afc567a4977b2d798e05153dd131a3c8d4758c0c


-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-12-20 14:39 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-23  4:47 [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
2017-11-23  4:47 ` [PATCH V2 1/2] block: run queue before waiting for q_usage_counter becoming zero Ming Lei
2017-11-27 12:15   ` Mauricio Faria de Oliveira
2017-11-23  4:48 ` [PATCH V2 2/2] block: drain blkcg part of request_queue in blk_cleanup_queue() Ming Lei
2017-11-27 12:15   ` Mauricio Faria de Oliveira
2017-11-27 12:41 ` [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
2017-11-29  2:57   ` chenxiang (M)
2017-11-29  4:54     ` Ming Lei
2017-11-29  5:40       ` chenxiang (M)
2017-12-01 15:36     ` Mauricio Faria de Oliveira
2017-12-01 15:42       ` Ming Lei
2017-12-01 16:08       ` Bart Van Assche
2017-12-01 17:35         ` Ming Lei
2017-12-01 18:49         ` Mauricio Faria de Oliveira
2017-12-02  0:49           ` Ming Lei
2017-12-04 12:31             ` Mauricio Faria de Oliveira
2017-12-13 21:49           ` Bart Van Assche
2017-12-20 14:34             ` Mauricio Faria de Oliveira
2017-12-13 21:53     ` Bart Van Assche
2017-12-15  7:58       ` chenxiang (M)
2017-12-15 17:44         ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.