* [bug report] blktests srp/013 lead kernel panic with latest block/for-next and 5.13.15
@ 2021-09-12 16:26 Yi Zhang
2021-09-12 21:25 ` Bart Van Assche
0 siblings, 1 reply; 5+ messages in thread
From: Yi Zhang @ 2021-09-12 16:26 UTC (permalink / raw)
To: linux-block; +Cc: CKI Project
Hello
We've hit the following issue below with latest block/for-next and
5.13.15 on s390x, pls help check it, feel free to let me know if you
need any testing/debug info for it.
# use_siw=1 ./check srp/013
srp/013 (Direct I/O using a discontiguous buffer) [passed]
runtime ... 2.065s
[ 127.475787] run blktests srp/013 at 2021-09-12 12:02:09
[ 127.632487] alua: device handler registered
[ 127.635115] emc: device handler registered
[ 127.637998] rdac: device handler registered
[ 127.644060] null_blk: module loaded
[ 127.790639] SoftiWARP attached
[ 127.799681] enc1 speed is unknown, defaulting to 1000
[ 127.799685] enc1 speed is unknown, defaulting to 1000
[ 127.799699] enc1 speed is unknown, defaulting to 1000
[ 127.799722] enc1 speed is unknown, defaulting to 1000
[ 127.826812] scsi_debug:sdebug_add_store: dif_storep 524288 bytes @
000000001388e38f
[ 127.828344] scsi_debug:sdebug_driver_probe: scsi_debug: trim
poll_queues to 0. poll_q/nr_hw = (0/1)
[ 127.828349] scsi_debug:sdebug_driver_probe: host protection DIF3 DIX3
[ 127.828354] scsi host0: scsi_debug: version 0190 [20200710]
[ 127.828354] dev_size_mb=32, opts=0x0, submit_queues=1, statistics=0
[ 127.829241] scsi 0:0:0:0: Direct-Access Linux scsi_debug
0190 PQ: 0 ANSI: 7
[ 127.829447] sd 0:0:0:0: Power-on or device reset occurred
[ 127.829464] sd 0:0:0:0: [sda] Enabling DIF Type 3 protection
[ 127.829480] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[ 127.829486] sd 0:0:0:0: [sda] Write Protect is off
[ 127.829495] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, supports DPO and FUA
[ 127.829507] sd 0:0:0:0: [sda] Optimal transfer size 524288 bytes
[ 127.829515] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 127.901964] sd 0:0:0:0: [sda] Enabling DIX T10-DIF-TYPE3-CRC protection
[ 127.901975] sd 0:0:0:0: [sda] DIF application tag size 6
[ 127.971979] sd 0:0:0:0: [sda] Attached SCSI disk
[ 128.105497] enc1 speed is unknown, defaulting to 1000
[ 128.122601] Rounding down aligned max_sectors from 4294967295 to 4294967288
[ 128.131527] enc1 speed is unknown, defaulting to 1000
[ 128.389958] Rounding down aligned max_sectors from 255 to 248
[ 128.399984] Rounding down aligned max_sectors from 255 to 248
[ 128.422487] Rounding down aligned max_sectors from 4294967295 to 4294967288
[ 128.424359] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[ 128.499155] enc1 speed is unknown, defaulting to 1000
[ 128.510288] scsi host1: REJ reason 0xffffff98
[ 128.510295] scsi host1: ib_srp: Connection 0/2 to 10.0.160.59 failed
[ 128.695427] ib_srpt Received SRP_LOGIN_REQ with i_port_id
0002:55a6:0677:0000:0000:0000:0000:0000, t_port_id
0202:55ff:fea6:0677:0202:55ff:fea6:0677 and it_iu_len 8260 on port 1
(guid=0002:55a6:0677:0000:0000:0000:0000:0000); pkey 0x00
[ 128.710958] ib_srpt Received SRP_LOGIN_REQ with i_port_id
0002:55a6:0677:0000:0000:0000:0000:0000, t_port_id
0202:55ff:fea6:0677:0202:55ff:fea6:0677 and it_iu_len 8260 on port 1
(guid=0002:55a6:0677:0000:0000:0000:0000:0000); pkey 0x00
[ 128.725662] scsi host1: SRP.T10:020255FFFEA60677
[ 128.727233] scsi 1:0:0:0: Direct-Access LIO-ORG IBLOCK
4.0 PQ: 0 ANSI: 6
[ 128.727450] scsi 1:0:0:0: alua: supports implicit and explicit TPGS
[ 128.727456] scsi 1:0:0:0: alua: device
naa.60014056e756c6c62300000000000000 port group 0 rel port 1
[ 128.727603] scsi 1:0:0:0: Attached scsi generic sg1 type 0
[ 128.728031] sd 1:0:0:0: Warning! Received an indication that the
LUN assignments on this target have changed. The Linux SCSI layer does
not automatical
[ 128.729735] scsi 1:0:0:2: Direct-Access LIO-ORG IBLOCK
4.0 PQ: 0 ANSI: 6
[ 128.731038] scsi 1:0:0:2: alua: supports implicit and explicit TPGS
[ 128.731042] scsi 1:0:0:2: alua: device
naa.60014057363736964626700000000000 port group 0 rel port 1
[ 128.731113] sd 1:0:0:2: Attached scsi generic sg2 type 0
[ 128.731373] scsi 1:0:0:1: Direct-Access LIO-ORG IBLOCK
4.0 PQ: 0 ANSI: 6
[ 128.731505] scsi 1:0:0:1: alua: supports implicit and explicit TPGS
[ 128.731509] scsi 1:0:0:1: alua: device
naa.60014056e756c6c62310000000000000 port group 0 rel port 1
[ 128.731608] sd 1:0:0:1: Attached scsi generic sg3 type 0
[ 128.731623] sd 1:0:0:1: Warning! Received an indication that the
LUN assignments on this target have changed. The Linux SCSI layer does
not automatical
[ 128.735376] sd 1:0:0:2: [sdc] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[ 128.736151] sd 1:0:0:2: [sdc] Write Protect is off
[ 128.736258] sd 1:0:0:2: [sdc] Write cache: enabled, read cache:
enabled, supports DPO and FUA
[ 128.736278] srpt/0x000255a6067700000000000000000000: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 128.736339] sd 1:0:0:2: [sdc] Optimal transfer size 524288 bytes
[ 128.739390] scsi host2: ib_srp: Already connected to target port
with id_ext=020255fffea60677;ioc_guid=020255fffea60677;dest=fe80:0000:0000:0000:0202:55ff:fea6:0677
[ 128.751950] sd 1:0:0:0: alua: transition timeout set to 60 seconds
[ 128.751957] sd 1:0:0:0: alua: port group 00 state A non-preferred
supports TOlUSNA
[ 128.751963] sd 1:0:0:1: alua: transition timeout set to 60 seconds
[ 128.751966] sd 1:0:0:1: alua: port group 00 state A non-preferred
supports TOlUSNA
[ 128.752237] sd 1:0:0:1: [sdd] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[ 128.752282] sd 1:0:0:1: [sdd] Write Protect is off
[ 128.752307] sd 1:0:0:0: [sdb] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[ 128.752361] sd 1:0:0:0: [sdb] Write Protect is off
[ 128.752377] sd 1:0:0:1: [sdd] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 128.752394] srpt/0x000255a6067700000000000000000000: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 128.752478] sd 1:0:0:1: [sdd] Optimal transfer size 126976 bytes
[ 128.752505] sd 1:0:0:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 128.752517] srpt/0x000255a6067700000000000000000000: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 128.752812] sd 1:0:0:0: [sdb] Optimal transfer size 126976 bytes
[ 128.773126] sd 1:0:0:2: alua: transition timeout set to 60 seconds
[ 128.773133] sd 1:0:0:2: alua: port group 00 state A non-preferred
supports TOlUSNA
[ 128.852620] sd 1:0:0:0: [sdb] Attached SCSI disk
[ 128.872622] sd 1:0:0:1: [sdd] Attached SCSI disk
[ 128.873284] sd 1:0:0:2: [sdc] Attached SCSI disk
[ 128.905378] device-mapper: multipath service-time: version 0.3.0 loaded
[ 129.682590] sd 1:0:0:2: [sdc] Synchronizing SCSI cache
[ 129.761679] scsi 1:0:0:0: alua: Detached
[ 129.851668] scsi 1:0:0:2: alua: Detached
[ 129.882680] ib_srpt receiving failed for ioctx 00000000009471f4 with status 5
[ 129.882691] ib_srpt receiving failed for ioctx 00000000a87fed08 with status 5
[ 129.882692] ib_srpt receiving failed for ioctx 00000000aad23569 with status 5
[ 129.882694] ib_srpt receiving failed for ioctx 0000000070d12b2e with status 5
[ 129.882696] ib_srpt receiving failed for ioctx 00000000b19e3451 with status 5
[ 129.882698] ib_srpt receiving failed for ioctx 00000000f3cf8cea with status 5
[ 129.882699] ib_srpt receiving failed for ioctx 00000000c9c6774e with status 5
[ 129.882701] ib_srpt receiving failed for ioctx 000000001eac69e6 with status 5
[ 129.882702] ib_srpt receiving failed for ioctx 0000000097705934 with status 5
[ 129.882704] ib_srpt receiving failed for ioctx 0000000053368827 with status 5
[ 130.016899] device-mapper: multipath: 253:2: Failing path 8:48.
[ 130.811836] scsi 1:0:0:1: alua: Detached
[ 130.851910] Unable to handle kernel pointer dereference in virtual
kernel address space
[ 130.851918] Failing address: 000003ff80815000 TEID: 000003ff80815803
[ 130.851920] Fault in home space mode while using kernel ASCE.
[ 130.851923] AS:000000002f200007 R3:0000000080280007
S:0000000095d93800 P:0000000000000400
[ 130.852020] Oops: 0011 ilc:3 [#1] SMP
[ 130.852024] Modules linked in: dm_service_time scsi_transport_srp
target_core_pscsi target_core_file ib_srpt target_core_iblock
target_core_mod rdma_cm iw_cm ib_cm ib_umad ib_uverbs scsi_debug siw
null_blk scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath ib_core
sunrpc virtio_net net_failover failover vfio_ccw mdev s390_trng
vfio_iommu_type1 vfio drm drm_panel_orientation_quirks fb font fuse
backlight i2c_core zram ip_tables xfs crc32_vx_s390 ghash_s390 prng
aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390
sha256_s390 sha1_s390 sha_common virtio_blk pkey zcrypt [last
unloaded: ib_srp]
[ 130.852068] CPU: 1 PID: 950 Comm: multipathd Not tainted 5.14.0 #1
[ 130.852071] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
[ 130.852073] Krnl PSW : 0704e00180000000 000000002e37e7cc
(scsi_mq_exit_request+0x2c/0x58)
[ 130.852085] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3
CC:2 PM:0 RI:0 EA:3
[ 130.852087] Krnl GPRS: 00000000000001d0 000003ff80815390
000000009a84c000 000000009dc20000
[ 130.852089] 0000000000000000 0000000000000002
0000000000000000 0000000000000000
[ 130.852091] 0000000000000000 000000009bfc1d80
0000000000000000 000000009dc20000
[ 130.852093] 000000008596e000 000002aa008aa520
0000038000643990 0000038000643960
[ 130.852101] Krnl Code: 000000002e37e7bc: b90400b3 lgr %r11,%r3
[ 130.852101] 000000002e37e7c0: e32020600004 lg %r2,96(%r2)
[ 130.852101] #000000002e37e7c6: e31020980004 lg %r1,152(%r2)
[ 130.852101] >000000002e37e7cc: e31010480002 ltg %r1,72(%r1)
[ 130.852101] 000000002e37e7d2: a7840007 brc 8,000000002e37e7e0
[ 130.852101] 000000002e37e7d6: 41303128 la %r3,296(%r3)
[ 130.852101] 000000002e37e7da: 0de1 basr %r14,%r1
[ 130.852101] 000000002e37e7dc: 47000700 bc 0,1792
[ 130.852113] Call Trace:
[ 130.852115] [<000000002e37e7cc>] scsi_mq_exit_request+0x2c/0x58
[ 130.852120] [<000000002e1c2608>] blk_mq_free_rqs+0x80/0x218
[ 130.852125] [<000000002e1c2f0a>] blk_mq_free_tag_set+0x5a/0x128
[ 130.852128] [<000000002e3774d0>] scsi_host_dev_release+0xb0/0x118
[ 130.852130] [<000000002e33fe10>] device_release+0x48/0xb0
[ 130.852136] [<000000002e28bf12>] kobject_put+0xca/0x1f0
[ 130.852140] [<000000002e33fe10>] device_release+0x48/0xb0
[ 130.852142] [<000000002e28bf12>] kobject_put+0xca/0x1f0
[ 130.852145] [<000000002dc1324a>] execute_in_process_context+0x4a/0xf0
[ 130.852149] [<000000002e33fe10>] device_release+0x48/0xb0
[ 130.852151] [<000000002e28bf12>] kobject_put+0xca/0x1f0
[ 130.852153] [<000000002e38e49e>] sd_release+0x6e/0xf8
[ 130.852158] [<000000002e1a86d0>] blkdev_put+0xe0/0x278
[ 130.852162] [<000000002e1a9946>] blkdev_close+0x3e/0x50
[ 130.852164] [<000000002de94728>] __fput+0xa0/0x280
[ 130.852168] [<000000002dc19190>] task_work_run+0x88/0xd0
[ 130.852170] [<000000002dc89b9e>] exit_to_user_mode_loop+0x1ce/0x1d8
[ 130.852175] [<000000002dc89c22>] exit_to_user_mode_prepare+0x7a/0x80
[ 130.852178] [<000000002e6e70be>] __do_syscall+0x106/0x1e8
[ 130.852181] [<000000002e6f5518>] system_call+0x78/0xa0
[ 130.852184] Last Breaking-Event-Address:
[ 130.852185] [<000000008596e808>] 0x8596e808
[ 130.852189] Kernel panic - not syncing: Fatal exception: panic_on_oops
--
Best Regards,
Yi Zhang
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report] blktests srp/013 lead kernel panic with latest block/for-next and 5.13.15
2021-09-12 16:26 [bug report] blktests srp/013 lead kernel panic with latest block/for-next and 5.13.15 Yi Zhang
@ 2021-09-12 21:25 ` Bart Van Assche
2021-09-12 21:28 ` Laurence Oberman
0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2021-09-12 21:25 UTC (permalink / raw)
To: Yi Zhang, linux-block; +Cc: CKI Project
On 9/12/21 09:26, Yi Zhang wrote:
> [ 130.851918] Failing address: 000003ff80815000 TEID: 000003ff80815803
> [ 130.852068] CPU: 1 PID: 950 Comm: multipathd Not tainted 5.14.0 #1
> [ 130.852071] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> [ 130.852073] Krnl PSW : 0704e00180000000 000000002e37e7cc
> (scsi_mq_exit_request+0x2c/0x58)
> [ 130.852113] Call Trace:
> [ 130.852115] [<000000002e37e7cc>] scsi_mq_exit_request+0x2c/0x58
> [ 130.852120] [<000000002e1c2608>] blk_mq_free_rqs+0x80/0x218
> [ 130.852125] [<000000002e1c2f0a>] blk_mq_free_tag_set+0x5a/0x128
> [ 130.852128] [<000000002e3774d0>] scsi_host_dev_release+0xb0/0x118
> [ 130.852130] [<000000002e33fe10>] device_release+0x48/0xb0
> [ 130.852136] [<000000002e28bf12>] kobject_put+0xca/0x1f0
> [ 130.852140] [<000000002e33fe10>] device_release+0x48/0xb0
> [ 130.852142] [<000000002e28bf12>] kobject_put+0xca/0x1f0
> [ 130.852145] [<000000002dc1324a>] execute_in_process_context+0x4a/0xf0
> [ 130.852149] [<000000002e33fe10>] device_release+0x48/0xb0
> [ 130.852151] [<000000002e28bf12>] kobject_put+0xca/0x1f0
> [ 130.852153] [<000000002e38e49e>] sd_release+0x6e/0xf8
> [ 130.852158] [<000000002e1a86d0>] blkdev_put+0xe0/0x278
> [ 130.852162] [<000000002e1a9946>] blkdev_close+0x3e/0x50
> [ 130.852164] [<000000002de94728>] __fput+0xa0/0x280
> [ 130.852168] [<000000002dc19190>] task_work_run+0x88/0xd0
> [ 130.852170] [<000000002dc89b9e>] exit_to_user_mode_loop+0x1ce/0x1d8
> [ 130.852175] [<000000002dc89c22>] exit_to_user_mode_prepare+0x7a/0x80
> [ 130.852178] [<000000002e6e70be>] __do_syscall+0x106/0x1e8
> [ 130.852181] [<000000002e6f5518>] system_call+0x78/0xa0
I haven't seen this yet. Is this crash reproducible? If so, please
bisect this crash.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report] blktests srp/013 lead kernel panic with latest block/for-next and 5.13.15
2021-09-12 21:25 ` Bart Van Assche
@ 2021-09-12 21:28 ` Laurence Oberman
[not found] ` <CAHj4cs_8KbMJ+HU22E4-e_zYuPj8TfGOzxNtzQqxqKig9S=gQg@mail.gmail.com>
0 siblings, 1 reply; 5+ messages in thread
From: Laurence Oberman @ 2021-09-12 21:28 UTC (permalink / raw)
To: Bart Van Assche, Yi Zhang, linux-block; +Cc: CKI Project
On Sun, 2021-09-12 at 14:25 -0700, Bart Van Assche wrote:
> On 9/12/21 09:26, Yi Zhang wrote:
> > [ 130.851918] Failing address: 000003ff80815000 TEID:
> > 000003ff80815803
> > [ 130.852068] CPU: 1 PID: 950 Comm: multipathd Not tainted 5.14.0
> > #1
> > [ 130.852071] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> > [ 130.852073] Krnl PSW : 0704e00180000000 000000002e37e7cc
> > (scsi_mq_exit_request+0x2c/0x58)
> > [ 130.852113] Call Trace:
> > [ 130.852115] [<000000002e37e7cc>] scsi_mq_exit_request+0x2c/0x58
> > [ 130.852120] [<000000002e1c2608>] blk_mq_free_rqs+0x80/0x218
> > [ 130.852125] [<000000002e1c2f0a>] blk_mq_free_tag_set+0x5a/0x128
> > [ 130.852128] [<000000002e3774d0>]
> > scsi_host_dev_release+0xb0/0x118
> > [ 130.852130] [<000000002e33fe10>] device_release+0x48/0xb0
> > [ 130.852136] [<000000002e28bf12>] kobject_put+0xca/0x1f0
> > [ 130.852140] [<000000002e33fe10>] device_release+0x48/0xb0
> > [ 130.852142] [<000000002e28bf12>] kobject_put+0xca/0x1f0
> > [ 130.852145] [<000000002dc1324a>]
> > execute_in_process_context+0x4a/0xf0
> > [ 130.852149] [<000000002e33fe10>] device_release+0x48/0xb0
> > [ 130.852151] [<000000002e28bf12>] kobject_put+0xca/0x1f0
> > [ 130.852153] [<000000002e38e49e>] sd_release+0x6e/0xf8
> > [ 130.852158] [<000000002e1a86d0>] blkdev_put+0xe0/0x278
> > [ 130.852162] [<000000002e1a9946>] blkdev_close+0x3e/0x50
> > [ 130.852164] [<000000002de94728>] __fput+0xa0/0x280
> > [ 130.852168] [<000000002dc19190>] task_work_run+0x88/0xd0
> > [ 130.852170] [<000000002dc89b9e>]
> > exit_to_user_mode_loop+0x1ce/0x1d8
> > [ 130.852175] [<000000002dc89c22>]
> > exit_to_user_mode_prepare+0x7a/0x80
> > [ 130.852178] [<000000002e6e70be>] __do_syscall+0x106/0x1e8
> > [ 130.852181] [<000000002e6f5518>] system_call+0x78/0xa0
>
> I haven't seen this yet. Is this crash reproducible? If so, please
> bisect this crash.
>
> Thanks,
>
> Bart.
>
I am looking to reproduce and bisect as well.
Regards
Laurence
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report] blktests srp/013 lead kernel panic with latest block/for-next and 5.13.15
[not found] ` <CAHj4cs_8KbMJ+HU22E4-e_zYuPj8TfGOzxNtzQqxqKig9S=gQg@mail.gmail.com>
@ 2021-09-28 18:07 ` Bart Van Assche
2021-10-05 3:21 ` Yi Zhang
0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2021-09-28 18:07 UTC (permalink / raw)
To: Yi Zhang, Ming Lei; +Cc: Laurence Oberman, linux-block, CKI Project
On 9/27/21 10:10 PM, Yi Zhang wrote:
> Hi Bart
>
> Bisect shows this issue was introduced from bellow commit, btw, this is always reproduced on the s390x kvm environment:
>
> commit 65ca846a53149a1a72cd8d02e7b2e73dd545b834
> Author: Bart Van Assche <bvanassche@acm.org <mailto:bvanassche@acm.org>>
> Date: Wed Jan 22 19:56:34 2020 -0800
>
> scsi: core: Introduce {init,exit}_cmd_priv()
>
> The current behavior of the SCSI core is to clear driver-private data
> before preparing a request for submission to the SCSI LLD. Make it possible
> for SCSI LLDs to disable clearing of driver-private data.
>
> These hooks will be used by a later patch, namely "scsi: ufs: Let the SCSI
> core allocate per-command UFS data".
>
> (gdb) l *(scsi_mq_exit_request+0x2c)
> 0x8d7be4 is in scsi_mq_exit_request (drivers/scsi/scsi_lib.c:1780).
> 1775 unsigned int hctx_idx)
> 1776 {
> 1777 struct Scsi_Host *shost = set->driver_data;
> 1778 struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
> 1779
> 1780 if (shost->hostt->exit_cmd_priv)
> 1781 shost->hostt->exit_cmd_priv(shost, cmd);
> 1782 kmem_cache_free(scsi_sense_cache, cmd->sense_buffer);
> 1783 }
> 1784
Hi Yi,
Thank you for having taken the time to run a bisect. However, I strongly doubt
that the bisection result is correct. If there would be anything wrong with the
above patch it would already have been noticed on other architectures. I
recommend to proceed as follows:
* Verify whether the reported issue only occurs with the stable kernel series or
also with mainline kernels.
* Work with the soft-iWARP author to improve the reliability of the siw driver.
If I run blktests in an x86 VM then the following appears sporadically in
the kernel log:
------------[ cut here ]------------
WARNING: CPU: 18 PID: 5462 at drivers/infiniband/sw/siw/siw_cm.c:255 __siw_cep_dealloc+0x184/0x190 [siw]
CPU: 1 PID: 5462 Comm: kworker/u144:13 Tainted: G E 5.15.0-rc2-dbg+ #7
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
RIP: 0010:__siw_cep_dealloc+0x184/0x190 [siw]
Call Trace:
siw_cep_put+0x5c/0x80 [siw]
siw_reject+0x13c/0x230 [siw]
iw_cm_reject+0xac/0x130 [iw_cm]
cm_conn_req_handler+0x4f1/0x7d0 [iw_cm]
cm_work_handler+0x885/0x9c0 [iw_cm]
process_one_work+0x535/0xad0
worker_thread+0x2e7/0x700
kthread+0x1f6/0x220
ret_from_fork+0x1f/0x30
irq event stamp: 11449266
hardirqs last enabled at (11449265): [<ffffffff81fc4248>] _raw_spin_unlock_irq+0x28/0x50
hardirqs last disabled at (11449266): [<ffffffff81fb7e44>] __schedule+0x5f4/0xbb0
softirqs last enabled at (11449176): [<ffffffffa06d142f>] p_fill_from_dev_buffer+0xff/0x140 [scsi_debug]
softirqs last disabled at (11449168): [<ffffffffa06d1400>] p_fill_from_dev_buffer+0xd0/0x140 [scsi_debug]
---[ end trace b23871487c995b72 ]---
* Use the rdma_rxe driver to run blktests since at least in my experience that
driver is more reliable than the soft-iWARP driver.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report] blktests srp/013 lead kernel panic with latest block/for-next and 5.13.15
2021-09-28 18:07 ` Bart Van Assche
@ 2021-10-05 3:21 ` Yi Zhang
0 siblings, 0 replies; 5+ messages in thread
From: Yi Zhang @ 2021-10-05 3:21 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Ming Lei, Laurence Oberman, linux-block, CKI Project
On Wed, Sep 29, 2021 at 2:07 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 9/27/21 10:10 PM, Yi Zhang wrote:
> > Hi Bart
> >
> > Bisect shows this issue was introduced from bellow commit, btw, this is always reproduced on the s390x kvm environment:
> >
> > commit 65ca846a53149a1a72cd8d02e7b2e73dd545b834
> > Author: Bart Van Assche <bvanassche@acm.org <mailto:bvanassche@acm.org>>
> > Date: Wed Jan 22 19:56:34 2020 -0800
> >
> > scsi: core: Introduce {init,exit}_cmd_priv()
> >
> > The current behavior of the SCSI core is to clear driver-private data
> > before preparing a request for submission to the SCSI LLD. Make it possible
> > for SCSI LLDs to disable clearing of driver-private data.
> >
> > These hooks will be used by a later patch, namely "scsi: ufs: Let the SCSI
> > core allocate per-command UFS data".
> >
> > (gdb) l *(scsi_mq_exit_request+0x2c)
> > 0x8d7be4 is in scsi_mq_exit_request (drivers/scsi/scsi_lib.c:1780).
> > 1775 unsigned int hctx_idx)
> > 1776 {
> > 1777 struct Scsi_Host *shost = set->driver_data;
> > 1778 struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
> > 1779
> > 1780 if (shost->hostt->exit_cmd_priv)
> > 1781 shost->hostt->exit_cmd_priv(shost, cmd);
> > 1782 kmem_cache_free(scsi_sense_cache, cmd->sense_buffer);
> > 1783 }
> > 1784
>
> Hi Yi,
>
> Thank you for having taken the time to run a bisect. However, I strongly doubt
> that the bisection result is correct. If there would be anything wrong with the
> above patch it would already have been noticed on other architectures. I
> recommend to proceed as follows:
> * Verify whether the reported issue only occurs with the stable kernel series or
> also with mainline kernels.
This can be reproduced on both stable kernels and mainline kernels.
> * Work with the soft-iWARP author to improve the reliability of the siw driver.
> If I run blktests in an x86 VM then the following appears sporadically in
> the kernel log:
>
> ------------[ cut here ]------------
> WARNING: CPU: 18 PID: 5462 at drivers/infiniband/sw/siw/siw_cm.c:255 __siw_cep_dealloc+0x184/0x190 [siw]
> CPU: 1 PID: 5462 Comm: kworker/u144:13 Tainted: G E 5.15.0-rc2-dbg+ #7
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> RIP: 0010:__siw_cep_dealloc+0x184/0x190 [siw]
> Call Trace:
> siw_cep_put+0x5c/0x80 [siw]
> siw_reject+0x13c/0x230 [siw]
> iw_cm_reject+0xac/0x130 [iw_cm]
> cm_conn_req_handler+0x4f1/0x7d0 [iw_cm]
> cm_work_handler+0x885/0x9c0 [iw_cm]
> process_one_work+0x535/0xad0
> worker_thread+0x2e7/0x700
> kthread+0x1f6/0x220
> ret_from_fork+0x1f/0x30
> irq event stamp: 11449266
> hardirqs last enabled at (11449265): [<ffffffff81fc4248>] _raw_spin_unlock_irq+0x28/0x50
> hardirqs last disabled at (11449266): [<ffffffff81fb7e44>] __schedule+0x5f4/0xbb0
> softirqs last enabled at (11449176): [<ffffffffa06d142f>] p_fill_from_dev_buffer+0xff/0x140 [scsi_debug]
> softirqs last disabled at (11449168): [<ffffffffa06d1400>] p_fill_from_dev_buffer+0xd0/0x140 [scsi_debug]
> ---[ end trace b23871487c995b72 ]---
>
> * Use the rdma_rxe driver to run blktests since at least in my experience that
> driver is more reliable than the soft-iWARP driver.
>
I would suggest reproducing it on s390x platform since it was easy on
that platform from my testing.
And from the CKI tests history, it also has been reproduced on
ppc64le/aarch64 with rdma_rxe.
BTW, I've verified this issue with Ming's patch on s390x, thanks for
looking this issue.
https://lore.kernel.org/linux-scsi/20210930124415.1160754-1-ming.lei@redhat.com/T/#u
> Thanks,
>
> Bart.
>
--
Best Regards,
Yi Zhang
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-10-05 3:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-12 16:26 [bug report] blktests srp/013 lead kernel panic with latest block/for-next and 5.13.15 Yi Zhang
2021-09-12 21:25 ` Bart Van Assche
2021-09-12 21:28 ` Laurence Oberman
[not found] ` <CAHj4cs_8KbMJ+HU22E4-e_zYuPj8TfGOzxNtzQqxqKig9S=gQg@mail.gmail.com>
2021-09-28 18:07 ` Bart Van Assche
2021-10-05 3:21 ` Yi Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).