All of lore.kernel.org
 help / color / mirror / Atom feed
From: "chenxiang (M)" <chenxiang66@hisilicon.com>
To: Bart Van Assche <Bart.VanAssche@wdc.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@fb.com" <axboe@fb.com>,
	"ming.lei@redhat.com" <ming.lei@redhat.com>
Cc: "mauricfo@linux.vnet.ibm.com" <mauricfo@linux.vnet.ibm.com>,
	"osandov@fb.com" <osandov@fb.com>,
	"hare@suse.com" <hare@suse.com>,
	"wenxiong@us.ibm.com" <wenxiong@us.ibm.com>,
	"linuxarm@huawei.com" <linuxarm@huawei.com>,
	John Garry <john.garry@huawei.com>
Subject: Re: [PATCH V2 0/2] block: fix queue freeze and cleanup
Date: Fri, 15 Dec 2017 15:58:51 +0800	[thread overview]
Message-ID: <ceb3a7fc-83cb-dbc5-4bc6-c8f2c089640f@hisilicon.com> (raw)
In-Reply-To: <1513201983.2413.11.camel@sandisk.com>

在 2017/12/14 5:53, Bart Van Assche 写道:
> On Wed, 2017-11-29 at 10:57 +0800, chenxiang (M) wrote:
>> I applied this v2 patchset to kernel 4.15-rc1, running fio on a SATA
>> disk, then disable the disk with sysfs interface
>> (echo 0 > /sys/class/sas_phy/phy-1:0:1/enable), and find system is hung.
>> But with v1 patch, it doesn't has this issue.
> Hello chenxiang,
>
> Would it be possible to repeat your test with v2 of this series and Martin's
> for-4.16 SCSI tree merged into your kernel test tree? Martin's tree is available
> at https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/log/?h=4.16/scsi-queue.
> The following patch in that tree fixes a race condition that sometimes caused
> the SCSI error handler not to be woken up:
> https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.16/scsi-queue&id=3bd6f43f5cb3714f70c591514f344389df593501

Hi Bart, sorry to reply this email late, and i just noticed this email 
after John's remind yesterday.
I tested v2 of this series based on Martin's for-4.16 SCSI tree which 
branch error handler issue "scsi: core: Ensure that the
SCSI error handler gets woken up" is merged. And system is still hung 
after repeat my testcase.
Log is as follows:

estuary:/$ fio -filename=/dev/sdb1 -direct=1 -iodepth 1 -thread -rw=re
ad -ioengine=psync -bs=4k -numjobs=64 -runtime=300 -group_reporting 
-name=mytes
t
mytest: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.1.11
Starting 64 threads
[   45.794460] random: crng init done
[   54.369720] hisi_sas_v2_hw HISI0162:01: erroneous completion iptt=1 
task=ffff8017cf471400 CQ hdr: 0x1103 0x1 0x0 0x0 Error info: 0x0 0x200 
0x0 0x0
[   54.382869] sas: smp_execute_task_sg: task to dev 500e004aaaaaaa1f 
response: 0x0 status 0x2
[   54.391339] sas: broadcast received: 0
[   54.395092] sas: REVALIDATING DOMAIN on port 0, pid:2022
[   54.400879] sas: Expander phy change count has changed
[   54.406191] sas: ex 500e004aaaaaaa1f phy1 originated BROADCAST(CHANGE)
[   54.415052] sas: done REVALIDATING DOMAIN on port 0, pid:2022, res 0x0
[   54.422057] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[   54.427248] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: 
hostbyte=0x04 driverbyte=0x00
fio: pid=2786, err=5/file:engines[   54.436147] sd 0:0:1:0: [sdb] 
Stopping disk
/sync.c:67, func=xfer, error=Inpu[   54.443187] sd 0:0:1:0: [sdb] 
Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00
t/output error
fio: pid=2772, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2729, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2727, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2787, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2740, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2775, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2774, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2768, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2754, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2732, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2756, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
fio: pid=2738, err=5/file:engines/sync.c:67, func=xfer, 
error=Input/output error
[  249.291697] INFO: task kworker/u128:1:2021 blocked for more than 120 
seconds.(1),R(11),X(1),R(3),X(1),R(1),X(2),R(10),X(2),R(3)] [50.5% done] 
[0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:20s]
[  249.298836]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.304316] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.312138] kworker/u128:1  D    0  2021      2 0x00000020
[  249.317632] Workqueue: HISI0162:01_event_q sas_port_event_worker
[  249.323634] Call trace:
[  249.326077]  __switch_to+0xa0/0xb4
[  249.329475]  __schedule+0x1c4/0x704
[  249.332955]  schedule+0x34/0x94
[  249.336090]  schedule_timeout+0x1b8/0x358
[  249.340092]  wait_for_common+0xfc/0x1b0
[  249.343920]  wait_for_completion+0x14/0x1c
[  249.348011]  flush_workqueue+0xfc/0x424
[  249.351840]  sas_porte_broadcast_rcvd+0x5c/0x68
[  249.356363]  sas_port_event_worker+0x24/0x38
[  249.360626]  process_one_work+0x128/0x2cc
[  249.364627]  worker_thread+0x14c/0x408
[  249.368369]  kthread+0xfc/0x128
[  249.371499]  ret_from_fork+0x10/0x18
[  249.375068] INFO: task kworker/u128:2:2022 blocked for more than 120 
seconds.
[  249.382194]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.387671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.395492] kworker/u128:2  D    0  2022      2 0x00000020
[  249.400973] Workqueue: HISI0162:01_disco_q sas_revalidate_domain
[  249.406972] Call trace:
[  249.409412]  __switch_to+0xa0/0xb4
[  249.412806]  __schedule+0x1c4/0x704
[  249.416287]  schedule+0x34/0x94
[  249.419422]  blk_mq_freeze_queue_wait+0x4c/0x9c
[  249.423945]  blk_freeze_queue+0x1c/0x28
[  249.427779]  blk_cleanup_queue+0x68/0xf8
[  249.431694]  __scsi_remove_device+0x5c/0x118
[  249.435956]  scsi_remove_device+0x28/0x3c
[  249.439957]  scsi_remove_target+0x194/0x1d0
[  249.444133]  sas_rphy_remove+0x60/0x64
[  249.447875]  sas_rphy_delete+0x14/0x28
[  249.451612]  sas_destruct_devices+0x74/0xa8
[  249.455787]  sas_revalidate_domain+0x58/0xe4
[  249.460049]  process_one_work+0x128/0x2cc
[  249.464051]  worker_thread+0x14c/0x408
[  249.467792]  kthread+0xfc/0x128
[  249.470922]  ret_from_fork+0x10/0x18
[  249.474494] INFO: task fio:2728 blocked for more than 120 seconds.
[  249.480665]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.486140] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.493960] fio             D    0  2728   2706 0x00000000
[  249.499439] Call trace:
[  249.501878]  __switch_to+0xa0/0xb4
[  249.505272]  __schedule+0x1c4/0x704
[  249.508753]  schedule+0x34/0x94
[  249.511887]  io_schedule+0x14/0x30
[  249.515280]  __blkdev_direct_IO_simple+0x158/0x290
[  249.520063]  blkdev_direct_IO+0x36c/0x378
[  249.524066]  generic_file_read_iter+0xa0/0x7f4
[  249.528501]  blkdev_read_iter+0x44/0x54
[  249.532331]  __vfs_read+0xc8/0x11c
[  249.535725]  vfs_read+0x80/0x134
[  249.538941]  SyS_pread64+0x74/0x8c
[  249.542335]  el0_svc_naked+0x20/0x24
[  249.545902] INFO: task fio:2730 blocked for more than 120 seconds.
[  249.552074]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.557551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.565372] fio             D    0  2730   2706 0x00000000
[  249.570851] Call trace:
[  249.573291]  __switch_to+0xa0/0xb4
[  249.576684]  __schedule+0x1c4/0x704
[  249.580165]  schedule+0x34/0x94
[  249.583295]  io_schedule+0x14/0x30
[  249.586689]  __blkdev_direct_IO_simple+0x158/0x290
[  249.591471]  blkdev_direct_IO+0x36c/0x378
[  249.595473]  generic_file_read_iter+0xa0/0x7f4
[  249.599909]  blkdev_read_iter+0x44/0x54
[  249.603737]  __vfs_read+0xc8/0x11c
[  249.607127]  vfs_read+0x80/0x134
[  249.610347]  SyS_pread64+0x74/0x8c
[  249.613741]  el0_svc_naked+0x20/0x24
[  249.617308] INFO: task fio:2731 blocked for more than 120 seconds.
[  249.623479]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.628955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.636775] fio             D    0  2731   2706 0x00000000
[  249.642253] Call trace:
[  249.644692]  __switch_to+0xa0/0xb4
[  249.648086]  __schedule+0x1c4/0x704
[  249.651562]  schedule+0x34/0x94
[  249.654695]  io_schedule+0x14/0x30
[  249.658090]  __blkdev_direct_IO_simple+0x158/0x290
[  249.662872]  blkdev_direct_IO+0x36c/0x378
[  249.666874]  generic_file_read_iter+0xa0/0x7f4
[  249.671310]  blkdev_read_iter+0x44/0x54
[  249.675139]  __vfs_read+0xc8/0x11c
[  249.678534]  vfs_read+0x80/0x134
[  249.681755]  SyS_pread64+0x74/0x8c
[  249.685149]  el0_svc_naked+0x20/0x24
[  249.688716] INFO: task fio:2733 blocked for more than 120 seconds.
[  249.694887]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.700363] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.708184] fio             D    0  2733   2706 0x00000000
[  249.713662] Call trace:
[  249.716101]  __switch_to+0xa0/0xb4
[  249.719491]  __schedule+0x1c4/0x704
[  249.722971]  schedule+0x34/0x94
[  249.726105]  io_schedule+0x14/0x30
[  249.729498]  __blkdev_direct_IO_simple+0x158/0x290
[  249.734281]  blkdev_direct_IO+0x36c/0x378
[  249.738282]  generic_file_read_iter+0xa0/0x7f4
[  249.742718]  blkdev_read_iter+0x44/0x54
[  249.746546]  __vfs_read+0xc8/0x11c
[  249.749940]  vfs_read+0x80/0x134
[  249.753161]  SyS_pread64+0x74/0x8c
[  249.756555]  el0_svc_naked+0x20/0x24
[  249.760123] INFO: task fio:2734 blocked for more than 120 seconds.
[  249.766293]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.771769] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.779590] fio             D    0  2734   2706 0x00000000
[  249.785070] Call trace:
[  249.787506]  __switch_to+0xa0/0xb4
[  249.790899]  __schedule+0x1c4/0x704
[  249.794380]  schedule+0x34/0x94
[  249.797513]  io_schedule+0x14/0x30
Jobs: 51 (f=51): [X(1),R(1),X(1),[  249.800909] 
__blkdev_direct_IO_simple+0x158/0x290
R(2),X(1),R(5),X(1),R(1),X(1),R(1[  249.808552] blkdev_direct_IO+0x36c/0x378
3),X(1),R(1),X(1),R(11),X(1),R(3[  249.815409] 
generic_file_read_iter+0xa0/0x7f4
),X(1),R(1),X(2),R(10),X(2),R(3)[  249.822615] blkdev_read_iter+0x44/0x54
] [50.7% done] [0KB/0KB/0KB /s] [  249.829213] __vfs_read+0xc8/0x11c
[  249.835377]  vfs_read+0x80/0x134
[  249.840933]  SyS_pread64+0x74/0x8c
[  249.844327]  el0_svc_naked+0x20/0x24
[  249.847894] INFO: task fio:2735 blocked for more than 120 seconds.
[  249.854065]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.859541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.867361] fio             D    0  2735   2706 0x00000000
[  249.872839] Call trace:
[  249.875275]  __switch_to+0xa0/0xb4
[  249.878670]  __schedule+0x1c4/0x704
[  249.882150]  schedule+0x34/0x94
[  249.885283]  io_schedule+0x14/0x30
[  249.888678]  __blkdev_direct_IO_simple+0x158/0x290
[  249.893460]  blkdev_direct_IO+0x36c/0x378
[  249.897462]  generic_file_read_iter+0xa0/0x7f4
[  249.901898]  blkdev_read_iter+0x44/0x54
[  249.905726]  __vfs_read+0xc8/0x11c
[  249.909119]  vfs_read+0x80/0x134
[  249.912339]  SyS_pread64+0x74/0x8c
[  249.915733]  el0_svc_naked+0x20/0x24
[  249.919297] INFO: task fio:2736 blocked for more than 120 seconds.
[  249.925468]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  249.930944] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  249.938764] fio             D    0  2736   2706 0x00000000
[  249.944242] Call trace:
[  249.946677]  __switch_to+0xa0/0xb4
[  249.950071]  __schedule+0x1c4/0x704
[  249.953551]  schedule+0x34/0x94
[  249.956684]  io_schedule+0x14/0x30
[  249.960078]  __blkdev_direct_IO_simple+0x158/0x290
[  249.964861]  blkdev_direct_IO+0x36c/0x378
[  249.968863]  generic_file_read_iter+0xa0/0x7f4
[  249.973298]  blkdev_read_iter+0x44/0x54
[  249.977127]  __vfs_read+0xc8/0x11c
[  249.980521]  vfs_read+0x80/0x134
[  249.983741]  SyS_pread64+0x74/0x8c
[  249.987131]  el0_svc_naked+0x20/0x24
[  249.990698] INFO: task fio:2737 blocked for more than 120 seconds.
[  249.996869]       Not tainted 4.15.0-rc1-00045-ga9c054d #1
[  250.002345] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  250.010165] fio             D    0  2737   2706 0x00000000
[  250.015643] Call trace:
[  250.018078]  __switch_to+0xa0/0xb4
[  250.021472]  __schedule+0x1c4/0x704
[  250.024952]  schedule+0x34/0x94
[  250.028086]  io_schedule+0x14/0x30
[  250.031476]  __blkdev_direct_IO_simple+0x158/0x290
[  250.036258]  blkdev_direct_IO+0x36c/0x378
[  250.040260]  generic_file_read_iter+0xa0/0x7f4
[  250.044696]  blkdev_read_iter+0x44/0x54
[  250.048523]  __vfs_read+0xc8/0x11c
[  250.051917]  vfs_read+0x80/0x134
[  250.055133]  SyS_pread64+0x74/0x8c
[  250.058526]  el0_svc_naked+0x20/0x24
[  318.975633] WARNING: CPU: 9 PID: 0 at kernel/rcu/tree.c:2792 
rcu_process_callbacks+0x3e4/0x3f43),X(1),R(1),X(2),R(10),X(2),R(3)] 
[67.8% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 02m:10s]
[  318.984231] Modules linked in:
[  318.987276] CPU: 9 PID: 0 Comm: swapper/9 Not tainted 
4.15.0-rc1-00045-ga9c054d #1
[  318.994831] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 IT17 
Nemo 2.0 RC0 11/29/2017
[  319.003080] task: ffff8017d63c4880 task.stack: ffff0000097a0000
[  319.008986] pstate: 20000085 (nzCv daIf -PAN -UAO)
[  319.013764] pc : rcu_process_callbacks+0x3e4/0x3f4
[  319.018543] lr : rcu_process_callbacks+0x204/0x3f4
[  319.023319] sp : ffff00000804be30
[  319.026621] x29: ffff00000804be30 x28: 0000000000000000
[  319.031922] x27: 0000000000000784 x26: ffff00000804be90
[  319.037222] x25: ffff8017fbf642b8 x24: ffff000009233000
[  319.042523] x23: ffff0000090919c0 x22: ffff00000920bd20
[  319.047822] x21: ffff000009252000 x20: ffff00000924ef80
[  319.053122] x19: ffff8017fbf64280 x18: 0000000000000007
[  319.058422] x17: 000000000000000e x16: 0000000000000001
[  319.063722] x15: 0000000000000019 x14: 0000000000000033
[  319.069022] x13: 000000000000004c x12: 0000000000000068
[  319.074322] x11: 0000000000000000 x10: 0000000000000a80
[  319.079622] x9 : 0000000000005e79 x8 : 0000000044042000
[  319.084921] x7 : 0000000000210d00 x6 : 0000000000000000
[  319.090221] x5 : 0000000000000029 x4 : 0000000000000004
[  319.095520] x3 : 0000000000000000 x2 : 0000000000002710
[  319.100820] x1 : 0000000000000001 x0 : 0000000000000000
[  319.106120] Call trace:
[  319.108555]  rcu_process_callbacks+0x3e4/0x3f4
[  319.112987]  __do_softirq+0x10c/0x208
[  319.116638]  irq_exit+0xa8/0xb4
[  319.119767]  __handle_domain_irq+0x8c/0xf0
[  319.123849]  gic_handle_irq+0xd4/0x184
[  319.127585]  el1_irq+0xb0/0x128
[  319.130714]  arch_cpu_idle+0x14/0x20
[  319.134280]  default_idle_call+0x18/0x2c
[  319.138190]  do_idle+0x168/0x1d0
[  319.141406]  cpu_startup_entry+0x1c/0x24
[  319.145316]  secondary_start_kernel+0x11c/0x128
[  319.149832] ---[ end trace 2158876af600f163 ]---
>
> Thanks,
>
> Bart.

  reply	other threads:[~2017-12-15  7:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-23  4:47 [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
2017-11-23  4:47 ` [PATCH V2 1/2] block: run queue before waiting for q_usage_counter becoming zero Ming Lei
2017-11-27 12:15   ` Mauricio Faria de Oliveira
2017-11-23  4:48 ` [PATCH V2 2/2] block: drain blkcg part of request_queue in blk_cleanup_queue() Ming Lei
2017-11-27 12:15   ` Mauricio Faria de Oliveira
2017-11-27 12:41 ` [PATCH V2 0/2] block: fix queue freeze and cleanup Ming Lei
2017-11-29  2:57   ` chenxiang (M)
2017-11-29  4:54     ` Ming Lei
2017-11-29  5:40       ` chenxiang (M)
2017-12-01 15:36     ` Mauricio Faria de Oliveira
2017-12-01 15:42       ` Ming Lei
2017-12-01 16:08       ` Bart Van Assche
2017-12-01 17:35         ` Ming Lei
2017-12-01 18:49         ` Mauricio Faria de Oliveira
2017-12-02  0:49           ` Ming Lei
2017-12-04 12:31             ` Mauricio Faria de Oliveira
2017-12-13 21:49           ` Bart Van Assche
2017-12-20 14:34             ` Mauricio Faria de Oliveira
2017-12-13 21:53     ` Bart Van Assche
2017-12-15  7:58       ` chenxiang (M) [this message]
2017-12-15 17:44         ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ceb3a7fc-83cb-dbc5-4bc6-c8f2c089640f@hisilicon.com \
    --to=chenxiang66@hisilicon.com \
    --cc=Bart.VanAssche@wdc.com \
    --cc=axboe@fb.com \
    --cc=hare@suse.com \
    --cc=hch@infradead.org \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mauricfo@linux.vnet.ibm.com \
    --cc=ming.lei@redhat.com \
    --cc=osandov@fb.com \
    --cc=wenxiong@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.