Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert

linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
       [not found] <CAHj4cs-NUKzGj5pgzRhDgdrGGbgPBqUoQ44+xgvk6njH9a_RYQ@mail.gmail.com>
@ 2021-11-02 19:00 ` Steffen Maier
  2021-11-02 19:02   ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Steffen Maier @ 2021-11-02 19:00 UTC (permalink / raw)
  To: Yi Zhang, linux-block, Linux-Next Mailing List, linux-scsi

On 11/2/21 07:42, Yi Zhang wrote:
> Below WARNING was triggered with blktests srp/001 on the latest
> linux-block/for-next, and it cannot be reproduced with v5.15, pls help
> check it, thanks.
> 
> 88d2c6ab15f7 (origin/for-next) Merge branch 'for-5.16/block' into for-next

Same warning here with a slightly different stack trace.
It breaks root-fs on zfcp-attached SCSI disks for us, because we run our CI 
intentionally with panic_on_warn.

> [    9.031740] ------------[ cut here ]------------
> [    9.031743] WARNING: CPU: 13 PID: 196 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
> [    9.031751] Modules linked in: nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) dm_service_time(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) zfcp(E) scsi_transport_fc(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) zcrypt_cex4(E) vfio(E) eadm_sch(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E)
> [    9.031785] CPU: 13 PID: 196 Comm: kworker/13:2 Tainted: G            E     5.16.0-20211102.rc0.git0.9febf1194306.300.fc34.s390x+next #1
> [    9.031789] Hardware name: IBM 3906 M04 704 (LPAR)
> [    9.031791] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
> [    9.031795] Krnl PSW : 0704e00180000000 000000006558e948 (blk_mq_sched_insert_request+0x58/0x178)
> [    9.031800]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> [    9.031803] Krnl GPRS: 0000000000000080 00000000000004c6 00000000ade56000 0000000000000001
> [    9.031806]            0000000000000001 0000000000000000 00000000a2d6a400 0000000084003c00
> [    9.031808]            0000000000000000 0000000000000001 0000000000000001 00000000ade56000
> [    9.031810]            000000008aef0000 000003ff7af59400 000003800e3d7b00 000003800e3d7a90
> [    9.031817] Krnl Code: 000000006558e93c: a71effff		chi	%r1,-1
>                           000000006558e940: a7840004		brc	8,000000006558e948
>                          #000000006558e944: af000000		mc	0,0
>                          >000000006558e948: 5810b01c		l	%r1,28(%r11)
>                           000000006558e94c: ec213bbb0055	risbg	%r2,%r1,59,187,0
>                           000000006558e952: a7740057		brc	7,000000006558ea00
>                           000000006558e956: 5810b018		l	%r1,24(%r11)
>                           000000006558e95a: c01b000000ff	nilf	%r1,255
> [    9.031833] Call Trace:
> [    9.031835]  [<000000006558e948>] blk_mq_sched_insert_request+0x58/0x178 
> [    9.031838]  [<000000006557effe>] blk_execute_rq+0x56/0xd8 
> [    9.031841]  [<0000000065768708>] __scsi_execute+0x118/0x240 
> [    9.031847]  [<000003ff803c3788>] alua_rtpg+0x120/0x8f8 [scsi_dh_alua] 
> [    9.031849]  [<000003ff803c402c>] alua_rtpg_work+0xcc/0x648 [scsi_dh_alua] 
> [    9.031852]  [<0000000064f024d2>] process_one_work+0x1fa/0x470 
> [    9.031856]  [<0000000064f02c74>] worker_thread+0x64/0x498 
> [    9.031859]  [<0000000064f0a894>] kthread+0x17c/0x188 
> [    9.031861]  [<0000000064e933c4>] __ret_from_fork+0x3c/0x58 
> [    9.031864]  [<0000000065a71cea>] ret_from_fork+0xa/0x40 
> [    9.031868] Last Breaking-Event-Address:
> [    9.031869]  [<000000006557ef72>] blk_execute_rq_nowait+0x82/0x98
> [    9.031871] Kernel panic - not syncing: panic_on_warn set ...


> [ 3881.829489] ------------[ cut here ]------------
> [ 3881.829493] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432
> blk_mq_sched_insert_request+0x54/0x178
> [ 3881.829504] Modules linked in: ib_srp scsi_transport_srp
> target_core_pscsi target_core_file ib_srpt target_core_iblock
> target_core_mod rdma_cm iw_cm ib_cm ib_umad scsi_debug rdma_rxe
> ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc qeth
> qdio ccwgroup vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 drm fb
> fuse font drm_panel_orientation_quirks i2c_core backlight zram
> ip_tables xfs crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes
> sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod
> pkey zcrypt
> [ 3881.829553] CPU: 1 PID: 1386 Comm: kworker/u128:2 Not tainted 5.15.0+ #3
> [ 3881.829556] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
> [ 3881.829558] Workqueue: events_unbound async_run_entry_fn
> [ 3881.829564] Krnl PSW : 0704e00180000000 000000001055afc0
> (blk_mq_sched_insert_request+0x58/0x178)
> [ 3881.829569]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3
> CC:2 PM:0 RI:0 EA:3
> [ 3881.829572] Krnl GPRS: 0000000000000004 000000000000003d
> 00000000448bf400 0000000000000001
> [ 3881.829575]            0000000000000001 0000000000000000
> 000000004352bc00 000000002e0f3000
> [ 3881.829577]            0000000000000000 0000000000000001
> 0000000000000001 00000000448bf400
> [ 3881.829580]            000000001ee72100 000003ff7e82dd00
> 00000380022f7838 00000380022f77c8
> [ 3881.829587] Krnl Code: 000000001055afb4: a71effff chi %r1,-1
>                            000000001055afb8: a7840004 brc 8,000000001055afc0
>                           #000000001055afbc: af000000 mc 0,0
>                           >000000001055afc0: 5810b01c l %r1,28(%r11)
>                            000000001055afc4: ec213bbb0055 risbg %r2,%r1,59,187,0
>                            000000001055afca: a7740057 brc 7,000000001055b078
>                            000000001055afce: 5810b018 l %r1,24(%r11)
>                            000000001055afd2: c01b000000ff nilf %r1,255
> [ 3881.829607] Call Trace:
> [ 3881.829609]  [<000000001055afc0>] blk_mq_sched_insert_request+0x58/0x178
> [ 3881.829616]  [<000000001054b876>] blk_execute_rq+0x56/0xd8
> [ 3881.829620]  [<000000001070e3a0>] __scsi_execute+0x110/0x230
> [ 3881.829625]  [<000000001070e602>] scsi_mode_sense+0x142/0x340
> [ 3881.829627]  [<000000001071f8ee>] sd_revalidate_disk.isra.0+0x74e/0x2240
> [ 3881.829632]  [<0000000010721912>] sd_probe+0x312/0x4b0
> [ 3881.829634]  [<00000000106d4c30>] really_probe+0xd0/0x4b0
> [ 3881.829639]  [<00000000106d51c0>] driver_probe_device+0x40/0xf0
> [ 3881.829642]  [<00000000106d58cc>] __device_attach_driver+0xa4/0x128
> [ 3881.829645]  [<00000000106d1fd0>] bus_for_each_drv+0x88/0xc0
> [ 3881.829649]  [<00000000106d4130>] __device_attach_async_helper+0x90/0xf0
> [ 3881.829652]  [<000000000ffb0f46>] async_run_entry_fn+0x4e/0x1b0
> [ 3881.829655]  [<000000000ffa384a>] process_one_work+0x21a/0x498
> [ 3881.829658]  [<000000000ffa3ff4>] worker_thread+0x64/0x498
> [ 3881.829661]  [<000000000ffac8e0>] kthread+0x150/0x160
> [ 3881.829665]  [<000000000ff37468>] __ret_from_fork+0x40/0x58
> [ 3881.829670]  [<0000000010a8550a>] ret_from_fork+0xa/0x30
> [ 3881.829675] Last Breaking-Event-Address:
> [ 3881.829676]  [<0000000000000007>] 0x7
> [ 3881.829679] ---[ end trace a501db666d088cc7 ]---



-- 
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z and LinuxONE

https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-02 19:00 ` [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 Steffen Maier
@ 2021-11-02 19:02   ` Jens Axboe
  2021-11-02 20:03     ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-11-02 19:02 UTC (permalink / raw)
  To: Steffen Maier, Yi Zhang, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/2/21 1:00 PM, Steffen Maier wrote:
> On 11/2/21 07:42, Yi Zhang wrote:
>> Below WARNING was triggered with blktests srp/001 on the latest
>> linux-block/for-next, and it cannot be reproduced with v5.15, pls help
>> check it, thanks.
>>
>> 88d2c6ab15f7 (origin/for-next) Merge branch 'for-5.16/block' into for-next
> 
> Same warning here with a slightly different stack trace.
> It breaks root-fs on zfcp-attached SCSI disks for us, because we run our CI 
> intentionally with panic_on_warn.
> 
>> [    9.031740] ------------[ cut here ]------------
>> [    9.031743] WARNING: CPU: 13 PID: 196 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
>> [    9.031751] Modules linked in: nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) dm_service_time(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) zfcp(E) scsi_transport_fc(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) zcrypt_cex4(E) vfio(E) eadm_sch(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E)
>> [    9.031785] CPU: 13 PID: 196 Comm: kworker/13:2 Tainted: G            E     5.16.0-20211102.rc0.git0.9febf1194306.300.fc34.s390x+next #1
>> [    9.031789] Hardware name: IBM 3906 M04 704 (LPAR)
>> [    9.031791] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
>> [    9.031795] Krnl PSW : 0704e00180000000 000000006558e948 (blk_mq_sched_insert_request+0x58/0x178)
>> [    9.031800]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>> [    9.031803] Krnl GPRS: 0000000000000080 00000000000004c6 00000000ade56000 0000000000000001
>> [    9.031806]            0000000000000001 0000000000000000 00000000a2d6a400 0000000084003c00
>> [    9.031808]            0000000000000000 0000000000000001 0000000000000001 00000000ade56000
>> [    9.031810]            000000008aef0000 000003ff7af59400 000003800e3d7b00 000003800e3d7a90
>> [    9.031817] Krnl Code: 000000006558e93c: a71effff		chi	%r1,-1
>>                           000000006558e940: a7840004		brc	8,000000006558e948
>>                          #000000006558e944: af000000		mc	0,0
>>                          >000000006558e948: 5810b01c		l	%r1,28(%r11)
>>                           000000006558e94c: ec213bbb0055	risbg	%r2,%r1,59,187,0
>>                           000000006558e952: a7740057		brc	7,000000006558ea00
>>                           000000006558e956: 5810b018		l	%r1,24(%r11)
>>                           000000006558e95a: c01b000000ff	nilf	%r1,255
>> [    9.031833] Call Trace:
>> [    9.031835]  [<000000006558e948>] blk_mq_sched_insert_request+0x58/0x178 
>> [    9.031838]  [<000000006557effe>] blk_execute_rq+0x56/0xd8 
>> [    9.031841]  [<0000000065768708>] __scsi_execute+0x118/0x240 
>> [    9.031847]  [<000003ff803c3788>] alua_rtpg+0x120/0x8f8 [scsi_dh_alua] 
>> [    9.031849]  [<000003ff803c402c>] alua_rtpg_work+0xcc/0x648 [scsi_dh_alua] 
>> [    9.031852]  [<0000000064f024d2>] process_one_work+0x1fa/0x470 
>> [    9.031856]  [<0000000064f02c74>] worker_thread+0x64/0x498 
>> [    9.031859]  [<0000000064f0a894>] kthread+0x17c/0x188 
>> [    9.031861]  [<0000000064e933c4>] __ret_from_fork+0x3c/0x58 
>> [    9.031864]  [<0000000065a71cea>] ret_from_fork+0xa/0x40 
>> [    9.031868] Last Breaking-Event-Address:
>> [    9.031869]  [<000000006557ef72>] blk_execute_rq_nowait+0x82/0x98
>> [    9.031871] Kernel panic - not syncing: panic_on_warn set ...

I'm looking into this one, it's a bit puzzling. The WARN is:

WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG));

which is "we have an elevator", yet the tag isn't initialized to BLK_MQ_NO_TAG.
That seems to hint at the initialization changes there, but nothing sticks out
there for me.

I'll keep looking.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-02 19:02   ` Jens Axboe
@ 2021-11-02 20:03     ` Jens Axboe
  2021-11-03  2:21       ` Yi Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-11-02 20:03 UTC (permalink / raw)
  To: Steffen Maier, Yi Zhang, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/2/21 1:02 PM, Jens Axboe wrote:
> On 11/2/21 1:00 PM, Steffen Maier wrote:
>> On 11/2/21 07:42, Yi Zhang wrote:
>>> Below WARNING was triggered with blktests srp/001 on the latest
>>> linux-block/for-next, and it cannot be reproduced with v5.15, pls help
>>> check it, thanks.
>>>
>>> 88d2c6ab15f7 (origin/for-next) Merge branch 'for-5.16/block' into for-next
>>
>> Same warning here with a slightly different stack trace.
>> It breaks root-fs on zfcp-attached SCSI disks for us, because we run our CI 
>> intentionally with panic_on_warn.
>>
>>> [    9.031740] ------------[ cut here ]------------
>>> [    9.031743] WARNING: CPU: 13 PID: 196 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
>>> [    9.031751] Modules linked in: nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) dm_service_time(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) zfcp(E) scsi_transport_fc(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) zcrypt_cex4(E) vfio(E) eadm_sch(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E)
>>> [    9.031785] CPU: 13 PID: 196 Comm: kworker/13:2 Tainted: G            E     5.16.0-20211102.rc0.git0.9febf1194306.300.fc34.s390x+next #1
>>> [    9.031789] Hardware name: IBM 3906 M04 704 (LPAR)
>>> [    9.031791] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
>>> [    9.031795] Krnl PSW : 0704e00180000000 000000006558e948 (blk_mq_sched_insert_request+0x58/0x178)
>>> [    9.031800]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>>> [    9.031803] Krnl GPRS: 0000000000000080 00000000000004c6 00000000ade56000 0000000000000001
>>> [    9.031806]            0000000000000001 0000000000000000 00000000a2d6a400 0000000084003c00
>>> [    9.031808]            0000000000000000 0000000000000001 0000000000000001 00000000ade56000
>>> [    9.031810]            000000008aef0000 000003ff7af59400 000003800e3d7b00 000003800e3d7a90
>>> [    9.031817] Krnl Code: 000000006558e93c: a71effff		chi	%r1,-1
>>>                           000000006558e940: a7840004		brc	8,000000006558e948
>>>                          #000000006558e944: af000000		mc	0,0
>>>                          >000000006558e948: 5810b01c		l	%r1,28(%r11)
>>>                           000000006558e94c: ec213bbb0055	risbg	%r2,%r1,59,187,0
>>>                           000000006558e952: a7740057		brc	7,000000006558ea00
>>>                           000000006558e956: 5810b018		l	%r1,24(%r11)
>>>                           000000006558e95a: c01b000000ff	nilf	%r1,255
>>> [    9.031833] Call Trace:
>>> [    9.031835]  [<000000006558e948>] blk_mq_sched_insert_request+0x58/0x178 
>>> [    9.031838]  [<000000006557effe>] blk_execute_rq+0x56/0xd8 
>>> [    9.031841]  [<0000000065768708>] __scsi_execute+0x118/0x240 
>>> [    9.031847]  [<000003ff803c3788>] alua_rtpg+0x120/0x8f8 [scsi_dh_alua] 
>>> [    9.031849]  [<000003ff803c402c>] alua_rtpg_work+0xcc/0x648 [scsi_dh_alua] 
>>> [    9.031852]  [<0000000064f024d2>] process_one_work+0x1fa/0x470 
>>> [    9.031856]  [<0000000064f02c74>] worker_thread+0x64/0x498 
>>> [    9.031859]  [<0000000064f0a894>] kthread+0x17c/0x188 
>>> [    9.031861]  [<0000000064e933c4>] __ret_from_fork+0x3c/0x58 
>>> [    9.031864]  [<0000000065a71cea>] ret_from_fork+0xa/0x40 
>>> [    9.031868] Last Breaking-Event-Address:
>>> [    9.031869]  [<000000006557ef72>] blk_execute_rq_nowait+0x82/0x98
>>> [    9.031871] Kernel panic - not syncing: panic_on_warn set ...
> 
> I'm looking into this one, it's a bit puzzling. The WARN is:
> 
> WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG));
> 
> which is "we have an elevator", yet the tag isn't initialized to BLK_MQ_NO_TAG.
> That seems to hint at the initialization changes there, but nothing sticks out
> there for me.
> 
> I'll keep looking.

Can either one of you try with this patch? Won't fix anything, but it'll
hopefully shine a bit of light on the issue.


diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 4a6789e4398b..1b7647722ec0 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -429,7 +429,8 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
-	WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG));
+	if (e && (rq->tag != BLK_MQ_NO_TAG))
+		printk("tag=%d/%d, e=%lx, rq cmd_flags %x, rq_flags %x\n", rq->tag, rq->internal_tag, (long) e, rq->cmd_flags, rq->rq_flags);
 
 	if (blk_mq_sched_bypass_insert(hctx, rq)) {
 		/*

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-02 20:03     ` Jens Axboe
@ 2021-11-03  2:21       ` Yi Zhang
  2021-11-03  3:21         ` Jens Axboe
       [not found]         ` <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2>
  0 siblings, 2 replies; 24+ messages in thread
From: Yi Zhang @ 2021-11-03  2:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi

>
> Can either one of you try with this patch? Won't fix anything, but it'll
> hopefully shine a bit of light on the issue.
>
Hi Jens

Here is the full log:
[  566.964613] run blktests srp/001 at 2021-11-02 22:09:12
[  567.372541] alua: device handler registered
[  567.375340] emc: device handler registered
[  567.388737] rdac: device handler registered
[  567.403792] null_blk: module loaded
[  567.624077] rdma_rxe: loaded
[  567.629083] infiniband enc8000_rxe: set active
[  567.629087] infiniband enc8000_rxe: added enc8000
[  567.699017] scsi_debug:sdebug_add_store: dif_storep 524288 bytes @
000000005c9bf0dc
[  567.699682] scsi_debug:sdebug_driver_probe: scsi_debug: trim
poll_queues to 0. poll_q/nr_hw = (0/1)
[  567.699686] scsi_debug:sdebug_driver_probe: host protection DIF3 DIX3
[  567.699691] scsi host0: scsi_debug: version 0190 [20200710]
                 dev_size_mb=32, opts=0x0, submit_queues=1, statistics=0
[  567.700433] scsi 0:0:0:0: Direct-Access     Linux    scsi_debug
  0190 PQ: 0 ANSI: 7
[  567.700588] sd 0:0:0:0: Power-on or device reset occurred
[  567.700610] sd 0:0:0:0: [sda] Enabling DIF Type 3 protection
[  567.700634] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[  567.700643] sd 0:0:0:0: [sda] Write Protect is off
[  567.700648] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08
[  567.700658] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, supports DPO and FUA
[  567.700679] sd 0:0:0:0: [sda] Optimal transfer size 524288 bytes
[  567.700807] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  567.787563] sd 0:0:0:0: [sda] Enabling DIX T10-DIF-TYPE3-CRC protection
[  567.787568] sd 0:0:0:0: [sda] DIF application tag size 6
[  567.887644] sd 0:0:0:0: [sda] Attached SCSI disk
[  568.453337] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  568.488877] ib_srpt:srpt_add_one: ib_srpt device = 00000000dd1cba21
[  568.488883] ib_srpt:srpt_use_srq: ib_srpt
srpt_use_srq(enc8000_rxe): use_srq = 0; ret = 0
[  568.488885] ib_srpt:srpt_add_one: ib_srpt Target login info:
id_ext=00debdfffebeef80,ioc_guid=00debdfffebeef80,pkey=ffff,service_id=00debdfffebeef80
[  568.488924] ib_srpt:srpt_add_one: ib_srpt added enc8000_rxe.
[  568.933299] Rounding down aligned max_sectors from 255 to 248
[  568.942144] Rounding down aligned max_sectors from 255 to 248
[  568.951076] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  569.055204] ib_srp:srp_add_one: ib_srp: srp_add_one:
18446744073709551615 / 4096 = 4503599627370495 <> 512
[  569.055208] ib_srp:srp_add_one: ib_srp: enc8000_rxe: mr_page_shift
= 12, device->max_mr_size = 0xffffffffffffffff,
device->max_fast_reg_page_list_len = 512, max_pages_per_mr = 512,
mr_max_size = 0x200000
[  569.072666] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0
[  569.072672] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555
[  569.072674] ib_srp:add_target_store: ib_srp: max_sectors = 1024;
max_pages_per_mr = 512; mr_page_size = 4096; max_sectors_per_mr =
4096; mr_per_cmd = 2
[  569.072676] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  569.073279] ib_srpt Received SRP_LOGIN_REQ with i_port_id
fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id
00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1
(guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff
[  569.073366] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68
[  569.076164] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib:
max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 0000000097d2923d
[  569.076177] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr
10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80
[  569.076201] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection
sess=00000000ffaa765e name=10.16.69.39 ch=0000000097d2923d
[  569.076233] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  569.076236] scsi host1: ib_srp: using immediate data
[  569.076606] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-18:
queued zerolength write
[  569.076620] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-18 wc->status 0
[  569.076907] ib_srpt Received SRP_LOGIN_REQ with i_port_id
fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id
00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1
(guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff
[  569.076989] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68
[  569.079227] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib:
max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 0000000078d4dcf5
[  569.079240] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr
10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80
[  569.079255] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection
sess=0000000015a0b0b5 name=10.16.69.39 ch=0000000078d4dcf5
[  569.079311] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  569.079313] scsi host1: ib_srp: using immediate data
[  569.079669] scsi host1: SRP.T10:00DEBDFFFEBEEF80
[  569.079675] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-20:
queued zerolength write
[  569.079686] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-20 wc->status 0
[  569.080892] scsi 1:0:0:0: Direct-Access     LIO-ORG  IBLOCK
  4.0  PQ: 0 ANSI: 6
[  569.081009] scsi 1:0:0:0: alua: supports implicit and explicit TPGS
[  569.081014] scsi 1:0:0:0: alua: device
naa.60014056e756c6c62300000000000000 port group 0 rel port 1
[  569.081143] sd 1:0:0:0: Warning! Received an indication that the
LUN assignments on this target have changed. The Linux SCSI layer does
not automatical
[  569.081849] sd 1:0:0:0: Attached scsi generic sg1 type 0
[  569.097426] sd 1:0:0:0: alua: transition timeout set to 60 seconds
[  569.097431] sd 1:0:0:0: alua: port group 00 state A non-preferred
supports TOlUSNA
[  569.097619] sd 1:0:0:0: [sdb] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[  569.097649] sd 1:0:0:0: [sdb] Write Protect is off
[  569.097652] sd 1:0:0:0: [sdb] Mode Sense: 43 00 00 08
[  569.097680] scsi 1:0:0:2: Direct-Access     LIO-ORG  IBLOCK
  4.0  PQ: 0 ANSI: 6
[  569.097702] sd 1:0:0:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[  569.097714] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending
CHECK_CONDITION.
[  569.097753] sd 1:0:0:0: [sdb] Optimal transfer size 126976 bytes
[  569.097794] scsi 1:0:0:2: alua: supports implicit and explicit TPGS
[  569.097798] scsi 1:0:0:2: alua: device
naa.60014057363736964626700000000000 port group 0 rel port 1
[  569.098042] sd 1:0:0:2: [sdc] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[  569.098068] sd 1:0:0:2: [sdc] Write Protect is off
[  569.098070] sd 1:0:0:2: [sdc] Mode Sense: 43 00 10 08
[  569.098117] sd 1:0:0:2: [sdc] Write cache: enabled, read cache:
enabled, supports DPO and FUA
[  569.098129] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending
CHECK_CONDITION.
[  569.098169] sd 1:0:0:2: [sdc] Optimal transfer size 524288 bytes
[  569.099162] sd 1:0:0:2: Attached scsi generic sg2 type 0
[  569.099412] scsi 1:0:0:1: Direct-Access     LIO-ORG  IBLOCK
  4.0  PQ: 0 ANSI: 6
[  569.099531] scsi 1:0:0:1: alua: supports implicit and explicit TPGS
[  569.099535] scsi 1:0:0:1: alua: device
naa.60014056e756c6c62310000000000000 port group 0 rel port 1
[  569.102434] sd 1:0:0:1: Warning! Received an indication that the
LUN assignments on this target have changed. The Linux SCSI layer does
not automatical
[  569.102913] sd 1:0:0:1: Attached scsi generic sg3 type 0
[  569.102941] ib_srp:srp_add_target: ib_srp: host1: SCSI scan
succeeded - detected 3 LUNs
[  569.102943] scsi host1: ib_srp: new target: id_ext 00debdfffebeef80
ioc_guid 00debdfffebeef80 sgid fe80:0000:0000:0000:00de:bdff:febe:ef80
dest 10.16.69.39
[  569.104669] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0
[  569.104673] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555
[  569.104679] ib_srp:srp_parse_in: ib_srp:
[2620:52:0:1040:de:bdff:febe:ef80] ->
[2620:52:0:1040:de:bdff:febe:ef80]:0/168838439%0
[  569.104683] ib_srp:srp_parse_in: ib_srp:
[2620:52:0:1040:de:bdff:febe:ef80]:5555 ->
[2620:52:0:1040:de:bdff:febe:ef80]:5555/168838439%0
[  569.104686] scsi host2: ib_srp: Already connected to target port
with id_ext=00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=2620:0052:0000:1040:00de:bdff:febe:ef80
[  569.127417] sd 1:0:0:1: alua: transition timeout set to 60 seconds
[  569.127422] sd 1:0:0:1: alua: port group 00 state A non-preferred
supports TOlUSNA
[  569.127518] sd 1:0:0:2: alua: transition timeout set to 60 seconds
[  569.127523] sd 1:0:0:2: alua: port group 00 state A non-preferred
supports TOlUSNA
[  569.127803] sd 1:0:0:1: [sdd] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[  569.128220] sd 1:0:0:1: [sdd] Write Protect is off
[  569.128223] sd 1:0:0:1: [sdd] Mode Sense: 43 00 00 08
[  569.128273] sd 1:0:0:1: [sdd] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[  569.128284] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending
CHECK_CONDITION.
[  569.128320] sd 1:0:0:1: [sdd] Optimal transfer size 126976 bytes
[  569.140904] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0
[  569.140909] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555
[  569.140914] ib_srp:srp_parse_in: ib_srp:
[2620:52:0:1040:de:bdff:febe:ef80] ->
[2620:52:0:1040:de:bdff:febe:ef80]:0/168838439%0
[  569.140919] ib_srp:srp_parse_in: ib_srp:
[2620:52:0:1040:de:bdff:febe:ef80]:5555 ->
[2620:52:0:1040:de:bdff:febe:ef80]:5555/168838439%0
[  569.140926] ib_srp:srp_parse_in: ib_srp:
[fe80::de:bdff:febe:ef80%2] -> [fe80::de:bdff:febe:ef80]:0/168838439%2
[  569.140932] ib_srp:srp_parse_in: ib_srp:
[fe80::de:bdff:febe:ef80%2]:5555 ->
[fe80::de:bdff:febe:ef80]:5555/168838439%2
[  569.140934] scsi host2: ib_srp: Already connected to target port
with id_ext=00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=fe80:0000:0000:0000:00de:bdff:febe:ef80
[  569.197816] tag=31/-1, e=16ac2000, rq cmd_flags 22, rq_flags 2800
[  569.247577] sd 1:0:0:2: [sdc] Attached SCSI disk
[  569.248420] tag=25/-1, e=1c08c400, rq cmd_flags 22, rq_flags 2800
[  569.248569] sd 1:0:0:1: [sdd] Attached SCSI disk
[  569.248931] tag=29/-1, e=16ac0800, rq cmd_flags 22, rq_flags 2800
[  569.249108] sd 1:0:0:0: [sdb] Attached SCSI disk
[  569.305302] rdma_rxe: rxe_invalidate_mr: rkey (0x84f6) doesn't
match mr->ibmr.rkey (0x84f7)
[  569.305315] scsi host1: ib_srp: failed RECV status WR flushed (5)
for CQE 00000000a216794f
[  569.367710] scsi 1:0:0:0: alua: Detached
[  576.367413] scsi host1: SRP abort called
[  579.907417] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  579.907624] ib_srpt receiving failed for ioctx 000000005e721910 with status 5
[  579.907631] ib_srpt receiving failed for ioctx 0000000056beb878 with status 5
[  579.907634] ib_srpt receiving failed for ioctx 000000001f2eb0be with status 5
[  579.907636] ib_srpt receiving failed for ioctx 00000000338f74b5 with status 5
[  579.907639] ib_srpt receiving failed for ioctx 00000000d57a4874 with status 5
[  579.907642] ib_srpt receiving failed for ioctx 00000000806f7498 with status 5
[  579.907646] ib_srpt receiving failed for ioctx 00000000330004a8 with status 5
[  579.907649] ib_srpt receiving failed for ioctx 00000000833f52ea with status 5
[  579.907652] ib_srpt receiving failed for ioctx 00000000a4a4cd8e with status 5
[  579.907656] ib_srpt receiving failed for ioctx 000000006f7d8b81 with status 5
[  580.127744] ib_srpt Received SRP_LOGIN_REQ with i_port_id
fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id
00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1
(guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff
[  580.127890] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68
[  580.130681] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib:
max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 000000003dffffde
[  580.130694] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr
10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80
[  580.130714] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection
sess=00000000d7a559cb name=10.16.69.39 ch=000000003dffffde
[  580.130737] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  580.130740] scsi host1: ib_srp: using immediate data
[  580.130785] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-23:
queued zerolength write
[  580.130801] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-23 wc->status 0
[  580.130825] ib_srpt Received SRP_LOGIN_REQ with i_port_id
fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id
00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1
(guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff
[  580.130926] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68
[  580.132805] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib:
max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 00000000518459a5
[  580.132820] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr
10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80
[  580.132832] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection
sess=00000000292cad95 name=10.16.69.39 ch=00000000518459a5
[  580.132851] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  580.132852] scsi host1: ib_srp: using immediate data
[  580.132875] scsi host1: ib_srp: reconnect succeeded
[  580.132883] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-24:
queued zerolength write
[  580.132894] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-24 wc->status 0
[  582.027396] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-20:
queued zerolength write
[  582.027406] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-18:
queued zerolength write
[  582.027425] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-20 wc->status 5
[  582.027429] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-18 wc->status 5
[  582.027432] ib_srpt:srpt_release_channel_work: ib_srpt 10.16.69.39-20
[  582.027440] ib_srpt:srpt_release_channel_work: ib_srpt 10.16.69.39-18
[  614.717672] ib_srpt Closing channel 10.16.69.39-23 because target
enc8000_rxe_1 has been disabled
[  614.717687] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-23:
queued zerolength write
[  614.717693] ib_srpt Closing channel 10.16.69.39-24 because target
enc8000_rxe_1 has been disabled
[  614.717696] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-24:
queued zerolength write
[  614.717802] srpt_recv_done: 246 callbacks suppressed
[  614.717803] ib_srpt receiving failed for ioctx 000000005b444422 with status 5
[  614.717809] ib_srpt receiving failed for ioctx 00000000776294fd with status 5
[  614.717812] ib_srpt receiving failed for ioctx 000000002feadcfd with status 5
[  614.717815] ib_srpt receiving failed for ioctx 000000006a886bb2 with status 5
[  614.717818] ib_srpt receiving failed for ioctx 0000000016397dad with status 5
[  614.717822] ib_srpt receiving failed for ioctx 00000000740fece1 with status 5
[  614.717825] ib_srpt receiving failed for ioctx 00000000e7518963 with status 5
[  614.717829] ib_srpt receiving failed for ioctx 00000000eb32ccfd with status 5
[  614.717833] ib_srpt receiving failed for ioctx 000000009d275157 with status 5
[  614.717837] ib_srpt receiving failed for ioctx 00000000549482a1 with status 5
[  614.717865] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-23 wc->status 5
[  614.717880] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-24 wc->status 5
[  614.717894] ib_srpt:srpt_release_channel_work: ib_srpt 10.16.69.39-23
[  614.717902] ib_srpt:srpt_release_channel_work: ib_srpt 10.16.69.39-24
[  614.717915] scsi host1: ib_srp: received DREQ
[  614.717997] scsi host1: ib_srp: failed RECV status WR flushed (5)
for CQE 000000005a1c1db0
[  614.721889] scsi host1: ib_srp: received DREQ
[  614.721905] ib_srpt:srpt_close_ch: ib_srpt 10.16.69.39: already closed
[  614.721964] ib_srpt:srpt_close_ch: ib_srpt 10.16.69.39: already closed
[  616.827436] scsi host1: ib_srp: connection closed
[  616.827446] scsi host1: ib_srp: connection closed
[  625.347390] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  625.567932] scsi host1: ib_srp: REJ received
[  625.567935] scsi host1:   REJ reason 0x8
[  625.567953] scsi host1: reconnect attempt 1 failed (-104)
[  629.787421] fast_io_fail_tmo expired for SRP port-1:1 / host1.
[  636.187360] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  636.407959] scsi host1: ib_srp: REJ received
[  636.407963] scsi host1:   REJ reason 0x8
[  636.407982] scsi host1: reconnect attempt 2 failed (-104)
[  646.427427] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  646.647993] scsi host1: ib_srp: REJ received
[  646.647997] scsi host1:   REJ reason 0x8
[  646.648017] scsi host1: reconnect attempt 3 failed (-104)
[  656.667405] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  656.887953] scsi host1: ib_srp: REJ received
[  656.887958] scsi host1:   REJ reason 0x8
[  656.887979] scsi host1: reconnect attempt 4 failed (-104)
[  666.907388] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  667.127961] scsi host1: ib_srp: REJ received
[  667.127965] scsi host1:   REJ reason 0x8
[  667.127989] scsi host1: reconnect attempt 5 failed (-104)
[  677.147368] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  677.377871] scsi host1: ib_srp: REJ received
[  677.377874] scsi host1:   REJ reason 0x8
[  677.377892] scsi host1: reconnect attempt 6 failed (-104)
[  687.387428] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  687.607973] scsi host1: ib_srp: REJ received
[  687.607977] scsi host1:   REJ reason 0x8
[  687.608006] scsi host1: reconnect attempt 7 failed (-104)
[  687.608014] sd 1:0:0:1: rejecting I/O to offline device
[  697.627395] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  697.848018] scsi host1: ib_srp: REJ received
[  697.848022] scsi host1:   REJ reason 0x8
[  697.848070] ------------[ cut here ]------------
[  697.848076] WARNING: CPU: 1 PID: 1973 at block/blk-mq.c:294
blk_mq_unquiesce_queue+0xb2/0xc8
[  697.848087] Modules linked in: ib_srp scsi_transport_srp rdma_cm
iw_cm ib_cm ib_umad scsi_debug rdma_rxe ib_uverbs ip6_udp_tunnel
udp_tunnel null_blk scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
ib_core sunrpc qeth_l2 bridge stp llc qeth qdio ccwgroup vfio_ccw mdev
zcrypt_cex4 vfio_iommu_type1 vfio drm fb fuse font
drm_panel_orientation_quirks i2c_core backlight zram ip_tables xfs
crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390
sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod pkey zcrypt
[last unloaded: target_core_mod]
[  697.848149] CPU: 1 PID: 1973 Comm: kworker/1:5 Not tainted 5.15.0+ #4
[  697.848154] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
[  697.848158] Workqueue: events_long srp_reconnect_work [scsi_transport_srp]
[  697.848168] Krnl PSW : 0404c00180000000 000000003253c696
(blk_mq_unquiesce_queue+0xb6/0xc8)
[  697.848183]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3
CC:0 PM:0 RI:0 EA:3
[  697.848187] Krnl GPRS: 00000000000000d4 0000000000000000
00000000078b4f30 0000000000000002
[  697.848190]            0000000000000004 0000000000000009
ffffffffffffff98 00000000326f7df0
[  697.848192]            000003800729fcdc 0000000000000007
070003800729fb00 00000000078b4ec0
[  697.848194]            000000001b9b2100 0000000000000000
000003800729fb00 000003800729fac0
[  697.848203] Krnl Code: 000000003253c68a: a7180001 lhi %r1,1
                          000000003253c68e: a7f4ffd7 brc 15,000000003253c63c
                         #000000003253c692: af000000 mc 0,0
                         >000000003253c696: a7180000 lhi %r1,0
                          000000003253c69a: a7f4ffd1 brc 15,000000003253c63c
                          000000003253c69e: c0e500071f89 brasl
%r14,00000000326205b0
                          000000003253c6a4: a7f4ffbe brc 15,000000003253c620
                          000000003253c6a8: c004002ce124 brcl 0,0000000032ad88f0
[  697.848221] Call Trace:
[  697.848224]  [<000000003253c696>] blk_mq_unquiesce_queue+0xb6/0xc8
[  697.848230]  [<00000000326f7da0>]
scsi_internal_device_unblock_nowait+0x50/0xa0
[  697.848235]  [<00000000326f7e30>] device_unblock+0x40/0x50
[  697.848238]  [<00000000326ef9c8>] starget_for_each_device+0xa8/0xd0
[  697.848244]  [<00000000326f862e>] target_unblock+0x56/0x68
[  697.848247]  [<00000000326b7018>] device_for_each_child+0x60/0xa0
[  697.848251]  [<00000000326f7ea6>] scsi_target_unblock+0x66/0x78
[  697.848253]  [<000003ff808e3872>] srp_reconnect_rport+0x202/0x238
[scsi_transport_srp]
[  697.848340]  [<000003ff808e3902>] srp_reconnect_work+0x5a/0xf0
[scsi_transport_srp]
[  697.848345]  [<0000000031f8b62a>] process_one_work+0x21a/0x498
[  697.848349]  [<0000000031f8bdd4>] worker_thread+0x64/0x498
[  697.848351]  [<0000000031f9471c>] kthread+0x184/0x190
[  697.848356]  [<0000000031f1f468>] __ret_from_fork+0x40/0x58
[  697.848359]  [<0000000032a6d2da>] ret_from_fork+0xa/0x30
[  697.848365] Last Breaking-Event-Address:
[  697.848367]  [<0000000000000000>] 0x0
[  697.848370] ---[ end trace 270726b44805023e ]---
[  697.848374] scsi host1: reconnect attempt 8 failed (-104)
[  707.867368] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  708.087976] scsi host1: ib_srp: REJ received
[  708.087981] scsi host1:   REJ reason 0x8
[  708.088024] scsi host1: reconnect attempt 9 failed (-104)
[  718.107392] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  718.327884] scsi host1: ib_srp: REJ received
[  718.327888] scsi host1:   REJ reason 0x8
[  718.327929] scsi host1: reconnect attempt 10 failed (-104)
[  728.347398] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  728.567888] scsi host1: ib_srp: REJ received
[  728.567891] scsi host1:   REJ reason 0x8
[  728.567931] scsi host1: reconnect attempt 11 failed (-104)
[  735.447460] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  738.587419] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  738.807992] scsi host1: ib_srp: REJ received
[  738.807995] scsi host1:   REJ reason 0x8
[  738.808049] scsi host1: reconnect attempt 12 failed (-104)
[  759.067426] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  759.288047] scsi host1: ib_srp: REJ received
[  759.288051] scsi host1:   REJ reason 0x8
[  759.288103] scsi host1: reconnect attempt 13 failed (-104)
[  789.787399] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  790.007875] scsi host1: ib_srp: REJ received
[  790.007878] scsi host1:   REJ reason 0x8
[  790.007922] scsi host1: reconnect attempt 14 failed (-104)
[  830.107393] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  830.329148] scsi host1: ib_srp: REJ received
[  830.329152] scsi host1:   REJ reason 0x8
[  830.329221] scsi host1: reconnect attempt 15 failed (-104)
[  883.867389] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  884.087954] scsi host1: ib_srp: REJ received
[  884.087958] scsi host1:   REJ reason 0x8
[  884.088004] scsi host1: reconnect attempt 16 failed (-104)
[  945.307398] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  945.527867] scsi host1: ib_srp: REJ received
[  945.527870] scsi host1:   REJ reason 0x8
[  945.527914] scsi host1: reconnect attempt 17 failed (-104)
[ 1016.987360] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[ 1017.207882] scsi host1: ib_srp: REJ received
[ 1017.207885] scsi host1:   REJ reason 0x8
[ 1017.207937] scsi host1: reconnect attempt 18 failed (-104)
[ 1098.907436] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[ 1099.127947] scsi host1: ib_srp: REJ received
[ 1099.127951] scsi host1:   REJ reason 0x8
[ 1099.128003] scsi host1: reconnect attempt 19 failed (-104)


>
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 4a6789e4398b..1b7647722ec0 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -429,7 +429,8 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
>         struct blk_mq_ctx *ctx = rq->mq_ctx;
>         struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>
> -       WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG));
> +       if (e && (rq->tag != BLK_MQ_NO_TAG))
> +               printk("tag=%d/%d, e=%lx, rq cmd_flags %x, rq_flags %x\n", rq->tag, rq->internal_tag, (long) e, rq->cmd_flags, rq->rq_flags);
>
>         if (blk_mq_sched_bypass_insert(hctx, rq)) {
>                 /*
>
> --
> Jens Axboe
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03  2:21       ` Yi Zhang
@ 2021-11-03  3:21         ` Jens Axboe
  2021-11-03  3:51           ` Ming Lei
       [not found]         ` <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2>
  1 sibling, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-11-03  3:21 UTC (permalink / raw)
  To: Yi Zhang; +Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi

On 11/2/21 8:21 PM, Yi Zhang wrote:
>>
>> Can either one of you try with this patch? Won't fix anything, but it'll
>> hopefully shine a bit of light on the issue.
>>
> Hi Jens
> 
> Here is the full log:

Thanks! I think I see what it could be - can you try this one as well,
would like to confirm that the condition I think is triggering is what
is triggering.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 07eb1412760b..81dede885231 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
 	if (plug && plug->cached_rq) {
 		rq = rq_list_pop(&plug->cached_rq);
 		INIT_LIST_HEAD(&rq->queuelist);
+		WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
+		WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
 	} else {
 		struct blk_mq_alloc_data data = {
 			.q		= q,
@@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
 				bio_wouldblock_error(bio);
 			goto queue_exit;
 		}
+		WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
+		WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
 	}
 
 	trace_block_getrq(bio);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03  3:21         ` Jens Axboe
@ 2021-11-03  3:51           ` Ming Lei
  2021-11-03  3:54             ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-11-03  3:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
> On 11/2/21 8:21 PM, Yi Zhang wrote:
> >>
> >> Can either one of you try with this patch? Won't fix anything, but it'll
> >> hopefully shine a bit of light on the issue.
> >>
> > Hi Jens
> > 
> > Here is the full log:
> 
> Thanks! I think I see what it could be - can you try this one as well,
> would like to confirm that the condition I think is triggering is what
> is triggering.
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 07eb1412760b..81dede885231 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>  	if (plug && plug->cached_rq) {
>  		rq = rq_list_pop(&plug->cached_rq);
>  		INIT_LIST_HEAD(&rq->queuelist);
> +		WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> +		WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>  	} else {
>  		struct blk_mq_alloc_data data = {
>  			.q		= q,
> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>  				bio_wouldblock_error(bio);
>  			goto queue_exit;
>  		}
> +		WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> +		WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));

Hello Jens,

I guess the issue could be the following code run without grabbing
->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().

.rq_flags       = q->elevator ? RQF_ELV : 0,

then elevator is switched to real one from none, and check on q->elevator
becomes not consistent.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03  3:51           ` Ming Lei
@ 2021-11-03  3:54             ` Jens Axboe
  2021-11-03  4:00               ` Yi Zhang
  2021-11-03 11:59               ` Jens Axboe
  0 siblings, 2 replies; 24+ messages in thread
From: Jens Axboe @ 2021-11-03  3:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
> 
> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>> 
>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>> hopefully shine a bit of light on the issue.
>>>>> 
>>> Hi Jens
>>> 
>>> Here is the full log:
>> 
>> Thanks! I think I see what it could be - can you try this one as well,
>> would like to confirm that the condition I think is triggering is what
>> is triggering.
>> 
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 07eb1412760b..81dede885231 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>    if (plug && plug->cached_rq) {
>>        rq = rq_list_pop(&plug->cached_rq);
>>        INIT_LIST_HEAD(&rq->queuelist);
>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>    } else {
>>        struct blk_mq_alloc_data data = {
>>            .q        = q,
>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>                bio_wouldblock_error(bio);
>>            goto queue_exit;
>>        }
>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> 
> Hello Jens,
> 
> I guess the issue could be the following code run without grabbing
> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
> 
> .rq_flags       = q->elevator ? RQF_ELV : 0,
> 
> then elevator is switched to real one from none, and check on q->elevator
> becomes not consistent.

Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well. 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03  3:54             ` Jens Axboe
@ 2021-11-03  4:00               ` Yi Zhang
  2021-11-03 19:03                 ` Jens Axboe
  2021-11-03 11:59               ` Jens Axboe
  1 sibling, 1 reply; 24+ messages in thread
From: Yi Zhang @ 2021-11-03  4:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

> >
> > Hello Jens,
> >
> > I guess the issue could be the following code run without grabbing
> > ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
> >
> > .rq_flags       = q->elevator ? RQF_ELV : 0,
> >
> > then elevator is switched to real one from none, and check on q->elevator
> > becomes not consistent.
>
> Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well.
>

Hi Jens
Here is the log in case you still need it. :)

[  147.962222] run blktests srp/001 at 2021-11-02 23:57:02
[  148.220309] alua: device handler registered
[  148.223332] emc: device handler registered
[  148.226203] rdac: device handler registered
[  148.231724] null_blk: module loaded
[  150.275727] rdma_rxe: loaded
[  150.281728] infiniband enc8000_rxe: set active
[  150.281732] infiniband enc8000_rxe: added enc8000
[  150.381380] scsi_debug:sdebug_add_store: dif_storep 524288 bytes @ 0000000098
4c2b06
[  150.382109] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0
. poll_q/nr_hw = (0/1)
[  150.382112] scsi_debug:sdebug_driver_probe: host protection DIF3 DIX3
[  150.382116] scsi host0: scsi_debug: version 0190 [20200710]
[  150.382116]   dev_size_mb=32, opts=0x0, submit_queues=1, statistics=0
[  150.382802] scsi 0:0:0:0: Direct-Access     Linux    scsi_debug       0190 PQ
: 0 ANSI: 7
[  150.383007] sd 0:0:0:0: Power-on or device reset occurred
[  150.383029] sd 0:0:0:0: [sda] Enabling DIF Type 3 protection
[  150.383053] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.6 MB/32.0 Mi
B)
[  150.383061] sd 0:0:0:0: [sda] Write Protect is off
[  150.383075] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supp
orts DPO and FUA
[  150.383093] sd 0:0:0:0: [sda] Optimal transfer size 524288 bytes
[  150.383104] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  150.467327] sd 0:0:0:0: [sda] Enabling DIX T10-DIF-TYPE3-CRC protection
[  150.467332] sd 0:0:0:0: [sda] DIF application tag size 6
[  150.547390] sd 0:0:0:0: [sda] Attached SCSI disk
[  150.655136] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  151.972911] Rounding down aligned max_sectors from 255 to 248
[  151.982162] Rounding down aligned max_sectors from 255 to 248
[  151.991500] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  153.254537] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000
:00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_i
u_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff

[  153.262242] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000
:00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_i
u_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff

[  153.264644] scsi host1: SRP.T10:00DEBDFFFEBEEF80
[  153.265188] scsi 1:0:0:0: Direct-Access     LIO-ORG  IBLOCK           4.0  PQ
: 0 ANSI: 6
[  153.265611] scsi 1:0:0:0: alua: supports implicit and explicit TPGS
[  153.265618] scsi 1:0:0:0: alua: device naa.60014056e756c6c62300000000000000
ort group 0 rel port 1
[  153.265765] sd 1:0:0:0: Warning! Received an indication that the LUN assignme
nts on this target have changed. The Linux SCSI layer does not automatical
[  153.265782] sd 1:0:0:0: Attached scsi generic sg1 type 0
[  153.287283] sd 1:0:0:0: alua: transition timeout set to 60 seconds
[  153.287299] sd 1:0:0:0: alua: port group 00 state A non-preferred supports TO
lUSNA
[  153.287479] scsi 1:0:0:2: Direct-Access     LIO-ORG  IBLOCK           4.0  PQ
: 0 ANSI: 6
[  153.287656] sd 1:0:0:0: [sdb] 65536 512-byte logical blocks: (33.6 MB/32.0 Mi
B)
[  153.287687] scsi 1:0:0:2: alua: supports implicit and explicit TPGS
[  153.287691] scsi 1:0:0:2: alua: device naa.60014057363736964626700000000000 p
ort group 0 rel port 1
[  153.287824] sd 1:0:0:2: Attached scsi generic sg2 type 0
[  153.288006] sd 1:0:0:0: [sdb] Write Protect is off
[  153.288133] scsi 1:0:0:1: Direct-Access     LIO-ORG  IBLOCK           4.0  PQ
: 0 ANSI: 6
[  153.288161] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doe
sn't support DPO or FUA
[  153.288171] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CON
DITION.
[  153.288817] sd 1:0:0:0: [sdb] Optimal transfer size 126976 bytes
[  153.289825] sd 1:0:0:2: [sdc] 65536 512-byte logical blocks: (33.6 MB/32.0 Mi
B)
[  153.289853] scsi 1:0:0:1: alua: supports implicit and explicit TPGS
[  153.289857] scsi 1:0:0:1: alua: device naa.60014056e756c6c62310000000000000 p
ort group 0 rel port 1
[  153.289981] sd 1:0:0:1: Warning! Received an indication that the LUN assignme
nts on this target have changed. The Linux SCSI layer does not automatical
[  153.289993] sd 1:0:0:1: Attached scsi generic sg3 type 0
[  153.291415] sd 1:0:0:2: [sdc] Write Protect is off
[  153.291574] sd 1:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supp
orts DPO and FUA
[  153.291584] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CON
DITION.
[  153.291634] sd 1:0:0:2: [sdc] Optimal transfer size 524288 bytes
[  153.293587] scsi host2: ib_srp: Already connected to target port with id_ext=
00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=2620:0052:0000:1040:00de:bdff:fe
be:ef80
[  153.327364] sd 1:0:0:2: alua: transition timeout set to 60 seconds
[  153.327371] sd 1:0:0:2: alua: port group 00 state A non-preferred supports TO
lUSNA
[  153.329782] scsi host2: ib_srp: Already connected to target port with id_ext=
00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=fe80:0000:0000:0000:00de:bdff:fe
be:ef80
[  153.347178] sd 1:0:0:1: alua: transition timeout set to 60 seconds
[  153.347183] sd 1:0:0:1: alua: port group 00 state A non-preferred supports TO
lUSNA
[  153.347301] sd 1:0:0:1: [sdd] 65536 512-byte logical blocks: (33.6 MB/32.0 Mi
B)
[  153.347327] sd 1:0:0:1: [sdd] Write Protect is off
[  153.347376] sd 1:0:0:1: [sdd] Write cache: disabled, read cache: enabled, doe
sn't support DPO or FUA
[  153.347386] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CON
DITION.
[  153.347423] sd 1:0:0:1: [sdd] Optimal transfer size 126976 bytes
[  153.397703] ------------[ cut here ]------------
[  153.397707] WARNING: CPU: 1 PID: 38 at block/blk-mq-sched.c:432 blk_mq_sched_
insert_request+0x54/0x178
[  153.397719] Modules linked in: ib_srp scsi_transport_srp target_core_pscsi ta
rget_core_file ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm ib
_umad scsi_debug rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_r
dac scsi_dh_emc scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc
qeth qdio ccwgroup vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 drm fb font d
rm_panel_orientation_quirks i2c_core fuse backlight zram ip_tables xfs crc32_vx_
s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390
sha_common dasd_eckd_mod dasd_mod pkey zcrypt
[  153.397770] CPU: 1 PID: 38 Comm: kworker/u128:1 Not tainted 5.15.0.v2+ #5
[  153.397774] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
[  153.397776] Workqueue: events_unbound async_run_entry_fn
[  153.397783] Krnl PSW : 0704e00180000000 000000002bd6e0c0 (blk_mq_sched_insert
_request+0x58/0x178)
[  153.397788]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:
0 EA:3
[  153.397792] Krnl GPRS: 0000000000000004 0000000000000022 000000001da68800 000
0000000000001
[  153.397794]            0000000000000001 0000000000000000 000000000c99f800 000
00000168b1c00
[  153.397796]            0000000000000000 0000000000000001 0000000000000001 000
000001da68800
[  153.397799]            0000000000f5c200 000003ff7e82b000 0000038000353830 00
00380003537c0
[  153.397807] Krnl Code: 000000002bd6e0b4: a71effff            chi     %r1,-1
[  153.397807]            000000002bd6e0b8: a7840004            brc     8,000000
002bd6e0c0
[  153.397807]           #000000002bd6e0bc: af000000            mc      0,0
[  153.397807]           >000000002bd6e0c0: 5810b01c            l       %r1,28(%
r11)
[  153.397807]            000000002bd6e0c4: ec213bbb0055        risbg   %r2,%r1,
59,187,0
[  153.397807]            000000002bd6e0ca: a7740057            brc     7,000000
002bd6e178
[  153.397807]            000000002bd6e0ce: 5810b018            l       %r1,24(%
r11)
[  153.397807]            000000002bd6e0d2: c01b000000ff        nilf    %r1,255
[  153.397827] Call Trace:
[  153.397829]  [<000000002bd6e0c0>] blk_mq_sched_insert_request+0x58/0x178
[  153.397834]  [<000000002bd5e926>] blk_execute_rq+0x56/0xd8
[  153.397840]  [<000000002bf20d50>] __scsi_execute+0x110/0x230
[  153.397846]  [<000000002bf20fb2>] scsi_mode_sense+0x142/0x340
[  153.397849]  [<000000002bf322c6>] sd_revalidate_disk.isra.0+0x74e/0x2240
[  153.397853]  [<000000002bf342ea>] sd_probe+0x312/0x4b0
[  153.397856]  [<000000002bee76e0>] really_probe+0xd0/0x4b0
[  153.397862]  [<000000002bee7c70>] driver_probe_device+0x40/0xf0
[  153.397865]  [<000000002bee837c>] __device_attach_driver+0xa4/0x128
[  153.397869]  [<000000002bee4a80>] bus_for_each_drv+0x88/0xc0
[  153.397872]  [<000000002bee6be0>] __device_attach_async_helper+0x90/0xf0
[  153.397875]  [<000000002b7c0d1e>] async_run_entry_fn+0x4e/0x1b0
[  153.397878]  [<000000002b7b362a>] process_one_work+0x21a/0x498
[  153.397881]  [<000000002b7b3dd4>] worker_thread+0x64/0x498
[  153.397884]  [<000000002b7bc71c>] kthread+0x184/0x190
[  153.397889]  [<000000002b747468>] __ret_from_fork+0x40/0x58
[  153.397892]  [<000000002c2952fa>] ret_from_fork+0xa/0x30
[  153.397899] Last Breaking-Event-Address:
[  153.397901]  [<0000000000000000>] 0x0
[  153.397903] ---[ end trace e8c7933cbb1a7d90 ]---
[  153.398050] sd 1:0:0:0: [sdb] Attached SCSI disk
[  153.428144] ------------[ cut here ]------------
[  153.428148] WARNING: CPU: 0 PID: 7 at block/blk-mq-sched.c:432 blk_mq_sched_i
nsert_request+0x54/0x178
[  153.428154] Modules linked in: ib_srp scsi_transport_srp target_core_pscsi ta
rget_core_file ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm i
_umad scsi_debug rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_r
dac scsi_dh_emc scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc
qeth qdio ccwgroup vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 drm fb font d
rm_panel_orientation_quirks i2c_core fuse backlight zram ip_tables xfs crc32_vx_
s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390
sha_common dasd_eckd_mod dasd_mod pkey zcrypt
[  153.428200] CPU: 0 PID: 7 Comm: kworker/u128:0 Tainted: G        W         5.
15.0.v2+ #5
[  153.428203] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
[  153.428205] Workqueue: events_unbound async_run_entry_fn
[  153.428208] Krnl PSW : 0704e00180000000 000000002bd6e0c0 (blk_mq_sched_insert
_request+0x58/0x178)
[  153.428214]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:
0 EA:3
[  153.428218] Krnl GPRS: 0000000000000004 000000000000003a 000000001d86e800 000
0000000000001
[  153.428220]            0000000000000001 0000000000000000 000000001d83d800 000
000001e74dc00
[  153.428222]            0000000000000000 0000000000000001 0000000000000001 000
000001d86e800
[  153.428225]            00000000006ba100 000003ff7e80b700 000003800003f830 000
003800003f7c0
[  153.428231] Krnl Code: 000000002bd6e0b4: a71effff            chi     %r1,-1
[  153.428231]            000000002bd6e0b8: a7840004            brc     8,000000
002bd6e0c0
[  153.428231]           #000000002bd6e0bc: af000000            mc      0,0
[  153.428231]           >000000002bd6e0c0: 5810b01c            l       %r1,28(%
r11)
[  153.428231]            000000002bd6e0c4: ec213bbb0055        risbg   %r2,%r1,
59,187,0
[  153.428231]            000000002bd6e0ca: a7740057            brc     7,000000
002bd6e178
[  153.428231]            000000002bd6e0ce: 5810b018            l       %r1,24(%
r11)
[  153.428231]            000000002bd6e0d2: c01b000000ff        nilf    %r1,255
[  153.428251] Call Trace:
[  153.428253]  [<000000002bd6e0c0>] blk_mq_sched_insert_request+0x58/0x178
[  153.428257]  [<000000002bd5e926>] blk_execute_rq+0x56/0xd8
[  153.428261]  [<000000002bf20d50>] __scsi_execute+0x110/0x230
[  153.428264]  [<000000002bf20fb2>] scsi_mode_sense+0x142/0x340
[  153.428267]  [<000000002bf322c6>] sd_revalidate_disk.isra.0+0x74e/0x2240
[  153.428270]  [<000000002bf342ea>] sd_probe+0x312/0x4b0
[  153.428273]  [<000000002bee76e0>] really_probe+0xd0/0x4b0
[  153.428277]  [<000000002bee7c70>] driver_probe_device+0x40/0xf0
[  153.428280]  [<000000002bee837c>] __device_attach_driver+0xa4/0x128
[  153.428283]  [<000000002bee4a80>] bus_for_each_drv+0x88/0xc0
[  153.428286]  [<000000002bee6be0>] __device_attach_async_helper+0x90/0xf0
[  153.428289]  [<000000002b7c0d1e>] async_run_entry_fn+0x4e/0x1b0
[  153.428292]  [<000000002b7b362a>] process_one_work+0x21a/0x498
[  153.428295]  [<000000002b7b3dd4>] worker_thread+0x64/0x498
[  153.428298]  [<000000002b7bc71c>] kthread+0x184/0x190
[  153.428301]  [<000000002b747468>] __ret_from_fork+0x40/0x58
[  153.428304]  [<000000002c2952fa>] ret_from_fork+0xa/0x30
[  153.428308] Last Breaking-Event-Address:
[  153.428309]  [<0000000000000000>] 0x0
[  153.428311] ---[ end trace e8c7933cbb1a7d91 ]---

-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03  4:00               ` Yi Zhang
@ 2021-11-03 19:03                 ` Jens Axboe
  2021-11-05 11:13                   ` Yi Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 19:03 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/2/21 10:00 PM, Yi Zhang wrote:
>>>
>>> Hello Jens,
>>>
>>> I guess the issue could be the following code run without grabbing
>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>>
>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>>
>>> then elevator is switched to real one from none, and check on q->elevator
>>> becomes not consistent.
>>
>> Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well.
>>
> 
> Hi Jens
> Here is the log in case you still need it. :)

Can you retry with the updated for-next pulled into -git?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 19:03                 ` Jens Axboe
@ 2021-11-05 11:13                   ` Yi Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Yi Zhang @ 2021-11-05 11:13 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On Thu, Nov 4, 2021 at 3:03 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 11/2/21 10:00 PM, Yi Zhang wrote:
> >>>
> >>> Hello Jens,
> >>>
> >>> I guess the issue could be the following code run without grabbing
> >>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
> >>>
> >>> .rq_flags       = q->elevator ? RQF_ELV : 0,
> >>>
> >>> then elevator is switched to real one from none, and check on q->elevator
> >>> becomes not consistent.
> >>
> >> Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well.
> >>
> >
> > Hi Jens
> > Here is the log in case you still need it. :)
>
> Can you retry with the updated for-next pulled into -git?

Hi Jens

Sorry for the delay, the issue cannot be reproduced now.

>
> --
> Jens Axboe
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03  3:54             ` Jens Axboe
  2021-11-03  4:00               ` Yi Zhang
@ 2021-11-03 11:59               ` Jens Axboe
  2021-11-03 13:59                 ` Yi Zhang
  1 sibling, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 11:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/2/21 9:54 PM, Jens Axboe wrote:
> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
>>
>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>
>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>> hopefully shine a bit of light on the issue.
>>>>>>
>>>> Hi Jens
>>>>
>>>> Here is the full log:
>>>
>>> Thanks! I think I see what it could be - can you try this one as well,
>>> would like to confirm that the condition I think is triggering is what
>>> is triggering.
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 07eb1412760b..81dede885231 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>    if (plug && plug->cached_rq) {
>>>        rq = rq_list_pop(&plug->cached_rq);
>>>        INIT_LIST_HEAD(&rq->queuelist);
>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>    } else {
>>>        struct blk_mq_alloc_data data = {
>>>            .q        = q,
>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>                bio_wouldblock_error(bio);
>>>            goto queue_exit;
>>>        }
>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>
>> Hello Jens,
>>
>> I guess the issue could be the following code run without grabbing
>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>
>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>
>> then elevator is switched to real one from none, and check on q->elevator
>> becomes not consistent.
> 
> Indeed, that’s where I was going with this. I have a patch, testing it
> locally but it’s getting late. Will send it out tomorrow. The nice
> benefit is that it allows dropping the weird ref get on plug flush,
> and batches getting the refs as well. 

Yi/Steffen, can you try pulling this into your test kernel:

git://git.kernel.dk/linux-block for-next

and see if it fixes the issue for you. Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 11:59               ` Jens Axboe
@ 2021-11-03 13:59                 ` Yi Zhang
  2021-11-03 14:26                   ` Jens Axboe
  2021-11-03 14:57                   ` Ming Lei
  0 siblings, 2 replies; 24+ messages in thread
From: Yi Zhang @ 2021-11-03 13:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 11/2/21 9:54 PM, Jens Axboe wrote:
> > On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
> >>
> >> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
> >>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
> >>>>>>
> >>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
> >>>>>> hopefully shine a bit of light on the issue.
> >>>>>>
> >>>> Hi Jens
> >>>>
> >>>> Here is the full log:
> >>>
> >>> Thanks! I think I see what it could be - can you try this one as well,
> >>> would like to confirm that the condition I think is triggering is what
> >>> is triggering.
> >>>
> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>> index 07eb1412760b..81dede885231 100644
> >>> --- a/block/blk-mq.c
> >>> +++ b/block/blk-mq.c
> >>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>    if (plug && plug->cached_rq) {
> >>>        rq = rq_list_pop(&plug->cached_rq);
> >>>        INIT_LIST_HEAD(&rq->queuelist);
> >>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> >>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> >>>    } else {
> >>>        struct blk_mq_alloc_data data = {
> >>>            .q        = q,
> >>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>                bio_wouldblock_error(bio);
> >>>            goto queue_exit;
> >>>        }
> >>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> >>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> >>
> >> Hello Jens,
> >>
> >> I guess the issue could be the following code run without grabbing
> >> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
> >>
> >> .rq_flags       = q->elevator ? RQF_ELV : 0,
> >>
> >> then elevator is switched to real one from none, and check on q->elevator
> >> becomes not consistent.
> >
> > Indeed, that’s where I was going with this. I have a patch, testing it
> > locally but it’s getting late. Will send it out tomorrow. The nice
> > benefit is that it allows dropping the weird ref get on plug flush,
> > and batches getting the refs as well.
>
> Yi/Steffen, can you try pulling this into your test kernel:
>
> git://git.kernel.dk/linux-block for-next
>
> and see if it fixes the issue for you. Thanks!

It still can be reproduced with the latest linux-block/for-next, here is the log

fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next

[  965.892911] run blktests srp/001 at 2021-11-03 09:54:14
[  966.069421] alua: device handler registered
[  966.072163] emc: device handler registered
[  966.074955] rdac: device handler registered
[  966.079931] null_blk: module loaded
[  966.207798] rdma_rxe: loaded
[  966.213462] infiniband enc8000_rxe: set active
[  966.213467] infiniband enc8000_rxe: added enc8000
[  966.259104] scsi_debug:sdebug_add_store: dif_storep 524288 bytes @
00000000340c6f55
[  966.259306] scsi_debug:sdebug_driver_probe: scsi_debug: trim
poll_queues to 0. poll_q/nr_hw = (0/1)
[  966.259309] scsi_debug:sdebug_driver_probe: host protection DIF3 DIX3
[  966.259314] scsi host0: scsi_debug: version 0190 [20200710]
                 dev_size_mb=32, opts=0x0, submit_queues=1, statistics=0
[  966.259933] scsi 0:0:0:0: Direct-Access     Linux    scsi_debug
  0190 PQ: 0 ANSI: 7
[  966.260273] sd 0:0:0:0: Power-on or device reset occurred
[  966.260299] sd 0:0:0:0: [sda] Enabling DIF Type 3 protection
[  966.260327] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[  966.260337] sd 0:0:0:0: [sda] Write Protect is off
[  966.260341] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08
[  966.260352] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, supports DPO and FUA
[  966.260372] sd 0:0:0:0: [sda] Optimal transfer size 524288 bytes
[  966.261748] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  966.350416] sd 0:0:0:0: [sda] Enabling DIX T10-DIF-TYPE3-CRC protection
[  966.350422] sd 0:0:0:0: [sda] DIF application tag size 6
[  966.450576] sd 0:0:0:0: [sda] Attached SCSI disk
[  966.667676] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  966.703431] ib_srpt:srpt_add_one: ib_srpt device = 00000000d6a9642e
[  966.703441] ib_srpt:srpt_use_srq: ib_srpt
srpt_use_srq(enc8000_rxe): use_srq = 0; ret = 0
[  966.703443] ib_srpt:srpt_add_one: ib_srpt Target login info:
id_ext=00debdfffebeef80,ioc_guid=00debdfffebeef80,pkey=ffff,service_id=00debdfffebeef80
[  966.703499] ib_srpt:srpt_add_one: ib_srpt added enc8000_rxe.
[  967.002396] Rounding down aligned max_sectors from 255 to 248
[  967.011605] Rounding down aligned max_sectors from 255 to 248
[  967.037191] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  967.130091] ib_srp:srp_add_one: ib_srp: srp_add_one:
18446744073709551615 / 4096 = 4503599627370495 <> 512
[  967.130097] ib_srp:srp_add_one: ib_srp: enc8000_rxe: mr_page_shift
= 12, device->max_mr_size = 0xffffffffffffffff,
device->max_fast_reg_page_list_len = 512, max_pages_per_mr = 512,
mr_max_size = 0x200000
[  967.148347] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0
[  967.148352] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555
[  967.148355] ib_srp:add_target_store: ib_srp: max_sectors = 1024;
max_pages_per_mr = 512; mr_page_size = 4096; max_sectors_per_mr =
4096; mr_per_cmd = 2
[  967.148358] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  967.148754] ib_srpt Received SRP_LOGIN_REQ with i_port_id
fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id
00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1
(guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff
[  967.148856] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68
[  967.151639] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib:
max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 00000000d562fdc1
[  967.151654] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr
10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80
[  967.151674] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection
sess=0000000047d911b7 name=10.16.69.39 ch=00000000d562fdc1
[  967.151699] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  967.151702] scsi host1: ib_srp: using immediate data
[  967.152088] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-18:
queued zerolength write
[  967.152101] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-18 wc->status 0
[  967.152444] ib_srpt Received SRP_LOGIN_REQ with i_port_id
fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id
00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1
(guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff
[  967.152538] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68
[  967.154559] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib:
max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 000000001d71a872
[  967.154572] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr
10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80
[  967.154586] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection
sess=00000000af249bba name=10.16.69.39 ch=000000001d71a872
[  967.154607] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  967.154609] scsi host1: ib_srp: using immediate data
[  967.155020] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-20:
queued zerolength write
[  967.155031] ib_srpt:srpt_zerolength_write_done: ib_srpt
10.16.69.39-20 wc->status 0
[  967.155036] scsi host1: SRP.T10:00DEBDFFFEBEEF80
[  967.155533] scsi 1:0:0:0: Direct-Access     LIO-ORG  IBLOCK
  4.0  PQ: 0 ANSI: 6
[  967.155909] scsi 1:0:0:0: alua: supports implicit and explicit TPGS
[  967.155914] scsi 1:0:0:0: alua: device
naa.60014056e756c6c62300000000000000 port group 0 rel port 1
[  967.156070] sd 1:0:0:0: Warning! Received an indication that the
LUN assignments on this target have changed. The Linux SCSI layer does
not automatical
[  967.156091] sd 1:0:0:0: Attached scsi generic sg1 type 0
[  967.170537] sd 1:0:0:0: alua: transition timeout set to 60 seconds
[  967.170543] sd 1:0:0:0: alua: port group 00 state A non-preferred
supports TOlUSNA
[  967.170806] sd 1:0:0:0: [sdb] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[  967.170855] sd 1:0:0:0: [sdb] Write Protect is off
[  967.170858] sd 1:0:0:0: [sdb] Mode Sense: 43 00 00 08
[  967.170907] sd 1:0:0:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[  967.170918] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending
CHECK_CONDITION.
[  967.170956] sd 1:0:0:0: [sdb] Optimal transfer size 126976 bytes
[  967.172013] scsi 1:0:0:2: Direct-Access     LIO-ORG  IBLOCK
  4.0  PQ: 0 ANSI: 6
[  967.172138] scsi 1:0:0:2: alua: supports implicit and explicit TPGS
[  967.172142] scsi 1:0:0:2: alua: device
naa.60014057363736964626700000000000 port group 0 rel port 1
[  967.172205] sd 1:0:0:2: Attached scsi generic sg2 type 0
[  967.172404] sd 1:0:0:2: [sdc] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[  967.172425] scsi 1:0:0:1: Direct-Access     LIO-ORG  IBLOCK
  4.0  PQ: 0 ANSI: 6
[  967.172434] sd 1:0:0:2: [sdc] Write Protect is off
[  967.172437] sd 1:0:0:2: [sdc] Mode Sense: 43 00 10 08
[  967.172488] sd 1:0:0:2: [sdc] Write cache: enabled, read cache:
enabled, supports DPO and FUA
[  967.172499] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending
CHECK_CONDITION.
[  967.172538] sd 1:0:0:2: [sdc] Optimal transfer size 524288 bytes
[  967.172542] scsi 1:0:0:1: alua: supports implicit and explicit TPGS
[  967.172546] scsi 1:0:0:1: alua: device
naa.60014056e756c6c62310000000000000 port group 0 rel port 1
[  967.172624] sd 1:0:0:1: Attached scsi generic sg3 type 0
[  967.172648] ib_srp:srp_add_target: ib_srp: host1: SCSI scan
succeeded - detected 3 LUNs
[  967.172650] scsi host1: ib_srp: new target: id_ext 00debdfffebeef80
ioc_guid 00debdfffebeef80 sgid fe80:0000:0000:0000:00de:bdff:febe:ef80
dest 10.16.69.39
[  967.173447] sd 1:0:0:1: Warning! Received an indication that the
LUN assignments on this target have changed. The Linux SCSI layer does
not automatical
[  967.175069] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0
[  967.175073] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555
[  967.175080] ib_srp:srp_parse_in: ib_srp:
[2620:52:0:1040:de:bdff:febe:ef80] ->
[2620:52:0:1040:de:bdff:febe:ef80]:0/168838439%0
[  967.175085] ib_srp:srp_parse_in: ib_srp:
[2620:52:0:1040:de:bdff:febe:ef80]:5555 ->
[2620:52:0:1040:de:bdff:febe:ef80]:5555/168838439%0
[  967.175087] scsi host2: ib_srp: Already connected to target port
with id_ext=00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=2620:0052:0000:1040:00de:bdff:febe:ef80
[  967.190459] sd 1:0:0:1: alua: transition timeout set to 60 seconds
[  967.190464] sd 1:0:0:1: alua: port group 00 state A non-preferred
supports TOlUSNA
[  967.190612] sd 1:0:0:1: [sdd] 65536 512-byte logical blocks: (33.6
MB/32.0 MiB)
[  967.190639] sd 1:0:0:1: [sdd] Write Protect is off
[  967.190642] sd 1:0:0:1: [sdd] Mode Sense: 43 00 00 08
[  967.190688] sd 1:0:0:1: [sdd] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[  967.190698] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending
CHECK_CONDITION.
[  967.190735] sd 1:0:0:1: [sdd] Optimal transfer size 126976 bytes
[  967.230346] sd 1:0:0:2: alua: transition timeout set to 60 seconds
[  967.230351] sd 1:0:0:2: alua: port group 00 state A non-preferred
supports TOlUSNA
[  967.232123] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0
[  967.232127] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555
[  967.232133] ib_srp:srp_parse_in: ib_srp:
[2620:52:0:1040:de:bdff:febe:ef80] ->
[2620:52:0:1040:de:bdff:febe:ef80]:0/168838439%0
[  967.232137] ib_srp:srp_parse_in: ib_srp:
[2620:52:0:1040:de:bdff:febe:ef80]:5555 ->
[2620:52:0:1040:de:bdff:febe:ef80]:5555/168838439%0
[  967.232143] ib_srp:srp_parse_in: ib_srp:
[fe80::de:bdff:febe:ef80%2] -> [fe80::de:bdff:febe:ef80]:0/168838439%2
[  967.232147] ib_srp:srp_parse_in: ib_srp:
[fe80::de:bdff:febe:ef80%2]:5555 ->
[fe80::de:bdff:febe:ef80]:5555/168838439%2
[  967.232150] scsi host2: ib_srp: Already connected to target port
with id_ext=00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=fe80:0000:0000:0000:00de:bdff:febe:ef80
[  967.295512] ------------[ cut here ]------------
[  967.295517] WARNING: CPU: 1 PID: 8 at block/blk-mq-sched.c:432
blk_mq_sched_insert_request+0x54/0x178
[  967.295529] Modules linked in: ib_srp scsi_transport_srp
target_core_pscsi target_core_file ib_srpt target_core_iblock
target_core_mod rdma_cm iw_cm ib_cm ib_umad scsi_debug rdma_rxe
ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_rdac scsi_dh_emc
scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc qeth
zcrypt_cex4 qdio ccwgroup vfio_ccw mdev vfio_iommu_type1 vfio drm fb
font drm_panel_orientation_quirks i2c_core fuse backlight zram
ip_tables xfs crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes
sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod
pkey zcrypt
[  967.295579] CPU: 1 PID: 8 Comm: kworker/u128:0 Not tainted 5.15.0.v3+ #6
[  967.295582] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)
[  967.295584] Workqueue: events_unbound async_run_entry_fn
[  967.295591] Krnl PSW : 0704e00180000000 000000004feaa208
(blk_mq_sched_insert_request+0x58/0x178)
[  967.295596]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3
CC:2 PM:0 RI:0 EA:3
[  967.295600] Krnl GPRS: 0000000000000040 000000000000001d
000000001eb37400 0000000000000001
[  967.295603]            0000000000000001 0000000000000000
000000001dbea800 000000001c739800
[  967.295605]            0000000000000000 0000000000000001
0000000000000001 000000001eb37400
[  967.295607]            00000000006ba100 000003ff7e82da00
0000038000047818 00000380000477a8
[  967.295615] Krnl Code: 000000004feaa1fc: a71effff chi %r1,-1
                          000000004feaa200: a7840004 brc 8,000000004feaa208
                         #000000004feaa204: af000000 mc 0,0
                         >000000004feaa208: 5810b01c l %r1,28(%r11)
                          000000004feaa20c: ec213bbb0055 risbg %r2,%r1,59,187,0
                          000000004feaa212: a7740057 brc 7,000000004feaa2c0
                          000000004feaa216: 5810b018 l %r1,24(%r11)
                          000000004feaa21a: c01b000000ff nilf %r1,255
[  967.295635] Call Trace:
[  967.295637]  [<000000004feaa208>] blk_mq_sched_insert_request+0x58/0x178
[  967.295643]  [<000000004fe9aa2e>] blk_execute_rq+0x56/0xd8
[  967.295649]  [<000000005005cea0>] __scsi_execute+0x110/0x230
[  967.295654]  [<00000000500534bc>] scsi_vpd_inquiry+0x7c/0xc0
[  967.295660]  [<000000005005354a>] scsi_get_vpd_page+0x4a/0xf8
[  967.295663]  [<000000005006ebdc>] sd_revalidate_disk.isra.0+0xf14/0x2240
[  967.295667]  [<000000005007043a>] sd_probe+0x312/0x4b0
[  967.295670]  [<0000000050023830>] really_probe+0xd0/0x4b0
[  967.295675]  [<0000000050023dc0>] driver_probe_device+0x40/0xf0
[  967.295679]  [<00000000500244cc>] __device_attach_driver+0xa4/0x128
[  967.295682]  [<0000000050020bd0>] bus_for_each_drv+0x88/0xc0
[  967.295685]  [<0000000050022d30>] __device_attach_async_helper+0x90/0xf0
[  967.295688]  [<000000004f8fcd1e>] async_run_entry_fn+0x4e/0x1b0
[  967.295691]  [<000000004f8ef62a>] process_one_work+0x21a/0x498
[  967.295695]  [<000000004f8efdd4>] worker_thread+0x64/0x498
[  967.295697]  [<000000004f8f871c>] kthread+0x184/0x190
[  967.295702]  [<000000004f883468>] __ret_from_fork+0x40/0x58
[  967.295706]  [<00000000503d144a>] ret_from_fork+0xa/0x30
[  967.295712] Last Breaking-Event-Address:
[  967.295713]  [<0000000000000003>] 0x3
[  967.295716] ---[ end trace faff7345b32090bf ]---


>
> --
> Jens Axboe
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 13:59                 ` Yi Zhang
@ 2021-11-03 14:26                   ` Jens Axboe
  2021-11-03 14:57                   ` Ming Lei
  1 sibling, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 14:26 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/3/21 7:59 AM, Yi Zhang wrote:
> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 11/2/21 9:54 PM, Jens Axboe wrote:
>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
>>>>
>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>>>
>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>>>> hopefully shine a bit of light on the issue.
>>>>>>>>
>>>>>> Hi Jens
>>>>>>
>>>>>> Here is the full log:
>>>>>
>>>>> Thanks! I think I see what it could be - can you try this one as well,
>>>>> would like to confirm that the condition I think is triggering is what
>>>>> is triggering.
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 07eb1412760b..81dede885231 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>    if (plug && plug->cached_rq) {
>>>>>        rq = rq_list_pop(&plug->cached_rq);
>>>>>        INIT_LIST_HEAD(&rq->queuelist);
>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>    } else {
>>>>>        struct blk_mq_alloc_data data = {
>>>>>            .q        = q,
>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>                bio_wouldblock_error(bio);
>>>>>            goto queue_exit;
>>>>>        }
>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>
>>>> Hello Jens,
>>>>
>>>> I guess the issue could be the following code run without grabbing
>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>>>
>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>>>
>>>> then elevator is switched to real one from none, and check on q->elevator
>>>> becomes not consistent.
>>>
>>> Indeed, that’s where I was going with this. I have a patch, testing it
>>> locally but it’s getting late. Will send it out tomorrow. The nice
>>> benefit is that it allows dropping the weird ref get on plug flush,
>>> and batches getting the refs as well.
>>
>> Yi/Steffen, can you try pulling this into your test kernel:
>>
>> git://git.kernel.dk/linux-block for-next
>>
>> and see if it fixes the issue for you. Thanks!
> 
> It still can be reproduced with the latest linux-block/for-next, here
> is the log
> 
> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into
> for-next

Funky! Thanks for re-testing, I guess I need to think even harder about
this. Can't seem to reproduce it here at all, which makes it a bit
harder to poke at.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 13:59                 ` Yi Zhang
  2021-11-03 14:26                   ` Jens Axboe
@ 2021-11-03 14:57                   ` Ming Lei
  2021-11-03 15:03                     ` Jens Axboe
  1 sibling, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-11-03 14:57 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Jens Axboe, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi, ming.lei

On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 11/2/21 9:54 PM, Jens Axboe wrote:
> > > On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
> > >>
> > >> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
> > >>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
> > >>>>>>
> > >>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
> > >>>>>> hopefully shine a bit of light on the issue.
> > >>>>>>
> > >>>> Hi Jens
> > >>>>
> > >>>> Here is the full log:
> > >>>
> > >>> Thanks! I think I see what it could be - can you try this one as well,
> > >>> would like to confirm that the condition I think is triggering is what
> > >>> is triggering.
> > >>>
> > >>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> > >>> index 07eb1412760b..81dede885231 100644
> > >>> --- a/block/blk-mq.c
> > >>> +++ b/block/blk-mq.c
> > >>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
> > >>>    if (plug && plug->cached_rq) {
> > >>>        rq = rq_list_pop(&plug->cached_rq);
> > >>>        INIT_LIST_HEAD(&rq->queuelist);
> > >>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> > >>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> > >>>    } else {
> > >>>        struct blk_mq_alloc_data data = {
> > >>>            .q        = q,
> > >>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
> > >>>                bio_wouldblock_error(bio);
> > >>>            goto queue_exit;
> > >>>        }
> > >>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> > >>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> > >>
> > >> Hello Jens,
> > >>
> > >> I guess the issue could be the following code run without grabbing
> > >> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
> > >>
> > >> .rq_flags       = q->elevator ? RQF_ELV : 0,
> > >>
> > >> then elevator is switched to real one from none, and check on q->elevator
> > >> becomes not consistent.
> > >
> > > Indeed, that’s where I was going with this. I have a patch, testing it
> > > locally but it’s getting late. Will send it out tomorrow. The nice
> > > benefit is that it allows dropping the weird ref get on plug flush,
> > > and batches getting the refs as well.
> >
> > Yi/Steffen, can you try pulling this into your test kernel:
> >
> > git://git.kernel.dk/linux-block for-next
> >
> > and see if it fixes the issue for you. Thanks!
> 
> It still can be reproduced with the latest linux-block/for-next, here is the log
> 
> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next

Hi Yi,

Please try the following change:


diff --git a/block/blk-mq.c b/block/blk-mq.c
index e1e64964a31b..eb634a9c61ff 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 		.q		= q,
 		.flags		= flags,
 		.cmd_flags	= op,
-		.rq_flags	= q->elevator ? RQF_ELV : 0,
 		.nr_tags	= 1,
 	};
 	struct request *rq;
@@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 	if (ret)
 		return ERR_PTR(ret);
 
+	data.rq_flags	= q->elevator ? RQF_ELV : 0,
 	rq = __blk_mq_alloc_requests(&data);
 	if (!rq)
 		goto out_queue_exit;
@@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 		.q		= q,
 		.flags		= flags,
 		.cmd_flags	= op,
-		.rq_flags	= q->elevator ? RQF_ELV : 0,
 		.nr_tags	= 1,
 	};
 	u64 alloc_time_ns = 0;
@@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	ret = blk_queue_enter(q, flags);
 	if (ret)
 		return ERR_PTR(ret);
+	data.rq_flags	= q->elevator ? RQF_ELV : 0,
 
 	/*
 	 * Check if the hardware context is actually mapped to anything.

Thanks,
Ming


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 14:57                   ` Ming Lei
@ 2021-11-03 15:03                     ` Jens Axboe
  2021-11-03 15:09                       ` Ming Lei
  2021-11-03 15:10                       ` Jens Axboe
  0 siblings, 2 replies; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 15:03 UTC (permalink / raw)
  To: Ming Lei, Yi Zhang
  Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi

On 11/3/21 8:57 AM, Ming Lei wrote:
> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> On 11/2/21 9:54 PM, Jens Axboe wrote:
>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
>>>>>
>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>>>>
>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>>>>> hopefully shine a bit of light on the issue.
>>>>>>>>>
>>>>>>> Hi Jens
>>>>>>>
>>>>>>> Here is the full log:
>>>>>>
>>>>>> Thanks! I think I see what it could be - can you try this one as well,
>>>>>> would like to confirm that the condition I think is triggering is what
>>>>>> is triggering.
>>>>>>
>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>> index 07eb1412760b..81dede885231 100644
>>>>>> --- a/block/blk-mq.c
>>>>>> +++ b/block/blk-mq.c
>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>    if (plug && plug->cached_rq) {
>>>>>>        rq = rq_list_pop(&plug->cached_rq);
>>>>>>        INIT_LIST_HEAD(&rq->queuelist);
>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>    } else {
>>>>>>        struct blk_mq_alloc_data data = {
>>>>>>            .q        = q,
>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>                bio_wouldblock_error(bio);
>>>>>>            goto queue_exit;
>>>>>>        }
>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>
>>>>> Hello Jens,
>>>>>
>>>>> I guess the issue could be the following code run without grabbing
>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>>>>
>>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>>>>
>>>>> then elevator is switched to real one from none, and check on q->elevator
>>>>> becomes not consistent.
>>>>
>>>> Indeed, that’s where I was going with this. I have a patch, testing it
>>>> locally but it’s getting late. Will send it out tomorrow. The nice
>>>> benefit is that it allows dropping the weird ref get on plug flush,
>>>> and batches getting the refs as well.
>>>
>>> Yi/Steffen, can you try pulling this into your test kernel:
>>>
>>> git://git.kernel.dk/linux-block for-next
>>>
>>> and see if it fixes the issue for you. Thanks!
>>
>> It still can be reproduced with the latest linux-block/for-next, here is the log
>>
>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
> 
> Hi Yi,
> 
> Please try the following change:
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index e1e64964a31b..eb634a9c61ff 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>  		.q		= q,
>  		.flags		= flags,
>  		.cmd_flags	= op,
> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>  		.nr_tags	= 1,
>  	};
>  	struct request *rq;
> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>  	if (ret)
>  		return ERR_PTR(ret);
>  
> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>  	rq = __blk_mq_alloc_requests(&data);
>  	if (!rq)
>  		goto out_queue_exit;
> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  		.q		= q,
>  		.flags		= flags,
>  		.cmd_flags	= op,
> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>  		.nr_tags	= 1,
>  	};
>  	u64 alloc_time_ns = 0;
> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  	ret = blk_queue_enter(q, flags);
>  	if (ret)
>  		return ERR_PTR(ret);
> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,

Don't think that will compile, but I guess the point is that we can't do
this assignment before queue enter, in case we're in the midst of
switching schedulers. Which is indeed a valid concern.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 15:03                     ` Jens Axboe
@ 2021-11-03 15:09                       ` Ming Lei
  2021-11-03 15:12                         ` Jens Axboe
  2021-11-03 15:10                       ` Jens Axboe
  1 sibling, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-11-03 15:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On Wed, Nov 03, 2021 at 09:03:02AM -0600, Jens Axboe wrote:
> On 11/3/21 8:57 AM, Ming Lei wrote:
> > On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
> >> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>
> >>> On 11/2/21 9:54 PM, Jens Axboe wrote:
> >>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
> >>>>>
> >>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
> >>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
> >>>>>>>>>
> >>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
> >>>>>>>>> hopefully shine a bit of light on the issue.
> >>>>>>>>>
> >>>>>>> Hi Jens
> >>>>>>>
> >>>>>>> Here is the full log:
> >>>>>>
> >>>>>> Thanks! I think I see what it could be - can you try this one as well,
> >>>>>> would like to confirm that the condition I think is triggering is what
> >>>>>> is triggering.
> >>>>>>
> >>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>>>>> index 07eb1412760b..81dede885231 100644
> >>>>>> --- a/block/blk-mq.c
> >>>>>> +++ b/block/blk-mq.c
> >>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>>>>    if (plug && plug->cached_rq) {
> >>>>>>        rq = rq_list_pop(&plug->cached_rq);
> >>>>>>        INIT_LIST_HEAD(&rq->queuelist);
> >>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> >>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> >>>>>>    } else {
> >>>>>>        struct blk_mq_alloc_data data = {
> >>>>>>            .q        = q,
> >>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>>>>                bio_wouldblock_error(bio);
> >>>>>>            goto queue_exit;
> >>>>>>        }
> >>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> >>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> >>>>>
> >>>>> Hello Jens,
> >>>>>
> >>>>> I guess the issue could be the following code run without grabbing
> >>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
> >>>>>
> >>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
> >>>>>
> >>>>> then elevator is switched to real one from none, and check on q->elevator
> >>>>> becomes not consistent.
> >>>>
> >>>> Indeed, that’s where I was going with this. I have a patch, testing it
> >>>> locally but it’s getting late. Will send it out tomorrow. The nice
> >>>> benefit is that it allows dropping the weird ref get on plug flush,
> >>>> and batches getting the refs as well.
> >>>
> >>> Yi/Steffen, can you try pulling this into your test kernel:
> >>>
> >>> git://git.kernel.dk/linux-block for-next
> >>>
> >>> and see if it fixes the issue for you. Thanks!
> >>
> >> It still can be reproduced with the latest linux-block/for-next, here is the log
> >>
> >> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
> > 
> > Hi Yi,
> > 
> > Please try the following change:
> > 
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index e1e64964a31b..eb634a9c61ff 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
> >  		.q		= q,
> >  		.flags		= flags,
> >  		.cmd_flags	= op,
> > -		.rq_flags	= q->elevator ? RQF_ELV : 0,
> >  		.nr_tags	= 1,
> >  	};
> >  	struct request *rq;
> > @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
> >  	if (ret)
> >  		return ERR_PTR(ret);
> >  
> > +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
> >  	rq = __blk_mq_alloc_requests(&data);
> >  	if (!rq)
> >  		goto out_queue_exit;
> > @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >  		.q		= q,
> >  		.flags		= flags,
> >  		.cmd_flags	= op,
> > -		.rq_flags	= q->elevator ? RQF_ELV : 0,
> >  		.nr_tags	= 1,
> >  	};
> >  	u64 alloc_time_ns = 0;
> > @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >  	ret = blk_queue_enter(q, flags);
> >  	if (ret)
> >  		return ERR_PTR(ret);
> > +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
> 
> Don't think that will compile, but I guess the point is that we can't do

It can compile.

> this assignment before queue enter, in case we're in the midst of
> switching schedulers. Which is indeed a valid concern.

Yeah, for scsi, real io sched is switched when adding disk, before that, the
passthrough command need to see consistent q->elevator.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 15:09                       ` Ming Lei
@ 2021-11-03 15:12                         ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 15:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/3/21 9:09 AM, Ming Lei wrote:
> On Wed, Nov 03, 2021 at 09:03:02AM -0600, Jens Axboe wrote:
>> On 11/3/21 8:57 AM, Ming Lei wrote:
>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>
>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote:
>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
>>>>>>>
>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>>>>>>
>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>>>>>>> hopefully shine a bit of light on the issue.
>>>>>>>>>>>
>>>>>>>>> Hi Jens
>>>>>>>>>
>>>>>>>>> Here is the full log:
>>>>>>>>
>>>>>>>> Thanks! I think I see what it could be - can you try this one as well,
>>>>>>>> would like to confirm that the condition I think is triggering is what
>>>>>>>> is triggering.
>>>>>>>>
>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>>>> index 07eb1412760b..81dede885231 100644
>>>>>>>> --- a/block/blk-mq.c
>>>>>>>> +++ b/block/blk-mq.c
>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>>    if (plug && plug->cached_rq) {
>>>>>>>>        rq = rq_list_pop(&plug->cached_rq);
>>>>>>>>        INIT_LIST_HEAD(&rq->queuelist);
>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>>    } else {
>>>>>>>>        struct blk_mq_alloc_data data = {
>>>>>>>>            .q        = q,
>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>>                bio_wouldblock_error(bio);
>>>>>>>>            goto queue_exit;
>>>>>>>>        }
>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>
>>>>>>> Hello Jens,
>>>>>>>
>>>>>>> I guess the issue could be the following code run without grabbing
>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>>>>>>
>>>>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>>>>>>
>>>>>>> then elevator is switched to real one from none, and check on q->elevator
>>>>>>> becomes not consistent.
>>>>>>
>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it
>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice
>>>>>> benefit is that it allows dropping the weird ref get on plug flush,
>>>>>> and batches getting the refs as well.
>>>>>
>>>>> Yi/Steffen, can you try pulling this into your test kernel:
>>>>>
>>>>> git://git.kernel.dk/linux-block for-next
>>>>>
>>>>> and see if it fixes the issue for you. Thanks!
>>>>
>>>> It still can be reproduced with the latest linux-block/for-next, here is the log
>>>>
>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
>>>
>>> Hi Yi,
>>>
>>> Please try the following change:
>>>
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index e1e64964a31b..eb634a9c61ff 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>  		.q		= q,
>>>  		.flags		= flags,
>>>  		.cmd_flags	= op,
>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>  		.nr_tags	= 1,
>>>  	};
>>>  	struct request *rq;
>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>  	if (ret)
>>>  		return ERR_PTR(ret);
>>>  
>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>  	rq = __blk_mq_alloc_requests(&data);
>>>  	if (!rq)
>>>  		goto out_queue_exit;
>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>  		.q		= q,
>>>  		.flags		= flags,
>>>  		.cmd_flags	= op,
>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>  		.nr_tags	= 1,
>>>  	};
>>>  	u64 alloc_time_ns = 0;
>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>  	ret = blk_queue_enter(q, flags);
>>>  	if (ret)
>>>  		return ERR_PTR(ret);
>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>
>> Don't think that will compile, but I guess the point is that we can't do
> 
> It can compile.

s/,/;

for the new assignments.

>> this assignment before queue enter, in case we're in the midst of
>> switching schedulers. Which is indeed a valid concern.
> 
> Yeah, for scsi, real io sched is switched when adding disk, before
> that, the passthrough command need to see consistent q->elevator.

Yeah, I agree that the problem is most certainly there. Guess I'm just
surprised the timing works out reliably, but it sure looks like it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 15:03                     ` Jens Axboe
  2021-11-03 15:09                       ` Ming Lei
@ 2021-11-03 15:10                       ` Jens Axboe
  2021-11-03 15:16                         ` Ming Lei
  1 sibling, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 15:10 UTC (permalink / raw)
  To: Ming Lei, Yi Zhang
  Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi

On 11/3/21 9:03 AM, Jens Axboe wrote:
> On 11/3/21 8:57 AM, Ming Lei wrote:
>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> On 11/2/21 9:54 PM, Jens Axboe wrote:
>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
>>>>>>
>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>>>>>
>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>>>>>> hopefully shine a bit of light on the issue.
>>>>>>>>>>
>>>>>>>> Hi Jens
>>>>>>>>
>>>>>>>> Here is the full log:
>>>>>>>
>>>>>>> Thanks! I think I see what it could be - can you try this one as well,
>>>>>>> would like to confirm that the condition I think is triggering is what
>>>>>>> is triggering.
>>>>>>>
>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>>> index 07eb1412760b..81dede885231 100644
>>>>>>> --- a/block/blk-mq.c
>>>>>>> +++ b/block/blk-mq.c
>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>    if (plug && plug->cached_rq) {
>>>>>>>        rq = rq_list_pop(&plug->cached_rq);
>>>>>>>        INIT_LIST_HEAD(&rq->queuelist);
>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>    } else {
>>>>>>>        struct blk_mq_alloc_data data = {
>>>>>>>            .q        = q,
>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>                bio_wouldblock_error(bio);
>>>>>>>            goto queue_exit;
>>>>>>>        }
>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>
>>>>>> Hello Jens,
>>>>>>
>>>>>> I guess the issue could be the following code run without grabbing
>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>>>>>
>>>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>>>>>
>>>>>> then elevator is switched to real one from none, and check on q->elevator
>>>>>> becomes not consistent.
>>>>>
>>>>> Indeed, that’s where I was going with this. I have a patch, testing it
>>>>> locally but it’s getting late. Will send it out tomorrow. The nice
>>>>> benefit is that it allows dropping the weird ref get on plug flush,
>>>>> and batches getting the refs as well.
>>>>
>>>> Yi/Steffen, can you try pulling this into your test kernel:
>>>>
>>>> git://git.kernel.dk/linux-block for-next
>>>>
>>>> and see if it fixes the issue for you. Thanks!
>>>
>>> It still can be reproduced with the latest linux-block/for-next, here is the log
>>>
>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
>>
>> Hi Yi,
>>
>> Please try the following change:
>>
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index e1e64964a31b..eb634a9c61ff 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>  		.q		= q,
>>  		.flags		= flags,
>>  		.cmd_flags	= op,
>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>  		.nr_tags	= 1,
>>  	};
>>  	struct request *rq;
>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>  	if (ret)
>>  		return ERR_PTR(ret);
>>  
>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>  	rq = __blk_mq_alloc_requests(&data);
>>  	if (!rq)
>>  		goto out_queue_exit;
>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>  		.q		= q,
>>  		.flags		= flags,
>>  		.cmd_flags	= op,
>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>  		.nr_tags	= 1,
>>  	};
>>  	u64 alloc_time_ns = 0;
>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>  	ret = blk_queue_enter(q, flags);
>>  	if (ret)
>>  		return ERR_PTR(ret);
>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
> 
> Don't think that will compile, but I guess the point is that we can't do
> this assignment before queue enter, in case we're in the midst of
> switching schedulers. Which is indeed a valid concern.

Something like the below. Maybe? On top of the for-next that was already
pulled in.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index b01e05e02277..121f1898d529 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 	if (data->cmd_flags & REQ_NOWAIT)
 		data->flags |= BLK_MQ_REQ_NOWAIT;
 
-	if (data->rq_flags & RQF_ELV) {
+	if (q->elevator) {
 		struct elevator_queue *e = q->elevator;
 
+		data->rq_flags |= RQF_ELV;
+
 		/*
 		 * Flush/passthrough requests are special and go directly to the
 		 * dispatch list. Don't include reserved tags in the
@@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 		.q		= q,
 		.flags		= flags,
 		.cmd_flags	= op,
-		.rq_flags	= q->elevator ? RQF_ELV : 0,
 		.nr_tags	= 1,
 	};
 	struct request *rq;
@@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 		.q		= q,
 		.flags		= flags,
 		.cmd_flags	= op,
-		.rq_flags	= q->elevator ? RQF_ELV : 0,
 		.nr_tags	= 1,
 	};
 	u64 alloc_time_ns = 0;
@@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 
 	if (!q->elevator)
 		blk_mq_tag_busy(data.hctx);
+	else
+		data.rq_flags |= RQF_ELV;
 
 	ret = -EWOULDBLOCK;
 	tag = blk_mq_get_tag(&data);
@@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio)
 			.q		= q,
 			.nr_tags	= 1,
 			.cmd_flags	= bio->bi_opf,
-			.rq_flags	= q->elevator ? RQF_ELV : 0,
 		};
 
 		if (unlikely(!blk_try_enter_queue(q, false) &&

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 15:10                       ` Jens Axboe
@ 2021-11-03 15:16                         ` Ming Lei
  2021-11-03 15:41                           ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-11-03 15:16 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote:
> On 11/3/21 9:03 AM, Jens Axboe wrote:
> > On 11/3/21 8:57 AM, Ming Lei wrote:
> >> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
> >>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>
> >>>> On 11/2/21 9:54 PM, Jens Axboe wrote:
> >>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
> >>>>>>
> >>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
> >>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
> >>>>>>>>>>
> >>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
> >>>>>>>>>> hopefully shine a bit of light on the issue.
> >>>>>>>>>>
> >>>>>>>> Hi Jens
> >>>>>>>>
> >>>>>>>> Here is the full log:
> >>>>>>>
> >>>>>>> Thanks! I think I see what it could be - can you try this one as well,
> >>>>>>> would like to confirm that the condition I think is triggering is what
> >>>>>>> is triggering.
> >>>>>>>
> >>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>>>>>> index 07eb1412760b..81dede885231 100644
> >>>>>>> --- a/block/blk-mq.c
> >>>>>>> +++ b/block/blk-mq.c
> >>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>>>>>    if (plug && plug->cached_rq) {
> >>>>>>>        rq = rq_list_pop(&plug->cached_rq);
> >>>>>>>        INIT_LIST_HEAD(&rq->queuelist);
> >>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> >>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> >>>>>>>    } else {
> >>>>>>>        struct blk_mq_alloc_data data = {
> >>>>>>>            .q        = q,
> >>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>>>>>                bio_wouldblock_error(bio);
> >>>>>>>            goto queue_exit;
> >>>>>>>        }
> >>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> >>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> >>>>>>
> >>>>>> Hello Jens,
> >>>>>>
> >>>>>> I guess the issue could be the following code run without grabbing
> >>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
> >>>>>>
> >>>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
> >>>>>>
> >>>>>> then elevator is switched to real one from none, and check on q->elevator
> >>>>>> becomes not consistent.
> >>>>>
> >>>>> Indeed, that’s where I was going with this. I have a patch, testing it
> >>>>> locally but it’s getting late. Will send it out tomorrow. The nice
> >>>>> benefit is that it allows dropping the weird ref get on plug flush,
> >>>>> and batches getting the refs as well.
> >>>>
> >>>> Yi/Steffen, can you try pulling this into your test kernel:
> >>>>
> >>>> git://git.kernel.dk/linux-block for-next
> >>>>
> >>>> and see if it fixes the issue for you. Thanks!
> >>>
> >>> It still can be reproduced with the latest linux-block/for-next, here is the log
> >>>
> >>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
> >>
> >> Hi Yi,
> >>
> >> Please try the following change:
> >>
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index e1e64964a31b..eb634a9c61ff 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
> >>  		.q		= q,
> >>  		.flags		= flags,
> >>  		.cmd_flags	= op,
> >> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>  		.nr_tags	= 1,
> >>  	};
> >>  	struct request *rq;
> >> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
> >>  	if (ret)
> >>  		return ERR_PTR(ret);
> >>  
> >> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>  	rq = __blk_mq_alloc_requests(&data);
> >>  	if (!rq)
> >>  		goto out_queue_exit;
> >> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >>  		.q		= q,
> >>  		.flags		= flags,
> >>  		.cmd_flags	= op,
> >> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>  		.nr_tags	= 1,
> >>  	};
> >>  	u64 alloc_time_ns = 0;
> >> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >>  	ret = blk_queue_enter(q, flags);
> >>  	if (ret)
> >>  		return ERR_PTR(ret);
> >> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
> > 
> > Don't think that will compile, but I guess the point is that we can't do
> > this assignment before queue enter, in case we're in the midst of
> > switching schedulers. Which is indeed a valid concern.
> 
> Something like the below. Maybe? On top of the for-next that was already
> pulled in.
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b01e05e02277..121f1898d529 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>  	if (data->cmd_flags & REQ_NOWAIT)
>  		data->flags |= BLK_MQ_REQ_NOWAIT;
>  
> -	if (data->rq_flags & RQF_ELV) {
> +	if (q->elevator) {
>  		struct elevator_queue *e = q->elevator;
>  
> +		data->rq_flags |= RQF_ELV;
> +
>  		/*
>  		 * Flush/passthrough requests are special and go directly to the
>  		 * dispatch list. Don't include reserved tags in the
> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>  		.q		= q,
>  		.flags		= flags,
>  		.cmd_flags	= op,
> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>  		.nr_tags	= 1,
>  	};
>  	struct request *rq;
> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  		.q		= q,
>  		.flags		= flags,
>  		.cmd_flags	= op,
> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>  		.nr_tags	= 1,
>  	};
>  	u64 alloc_time_ns = 0;
> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  
>  	if (!q->elevator)
>  		blk_mq_tag_busy(data.hctx);
> +	else
> +		data.rq_flags |= RQF_ELV;
>  
>  	ret = -EWOULDBLOCK;
>  	tag = blk_mq_get_tag(&data);
> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio)
>  			.q		= q,
>  			.nr_tags	= 1,
>  			.cmd_flags	= bio->bi_opf,
> -			.rq_flags	= q->elevator ? RQF_ELV : 0,
>  		};

The above patch looks fine.

BTW, 9ede85cb670c ("block: move queue enter logic into blk_mq_submit_bio()") moves 
the queue enter into blk_mq_submit_bio(), which seems dangerous,
especially blk_mq_sched_bio_merge() needs hctx/ctx which requires
q_usage_counter to be grabbed.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 15:16                         ` Ming Lei
@ 2021-11-03 15:41                           ` Jens Axboe
  2021-11-03 15:49                             ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 15:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/3/21 9:16 AM, Ming Lei wrote:
> On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote:
>> On 11/3/21 9:03 AM, Jens Axboe wrote:
>>> On 11/3/21 8:57 AM, Ming Lei wrote:
>>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
>>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>
>>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote:
>>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>>>>>>>> hopefully shine a bit of light on the issue.
>>>>>>>>>>>>
>>>>>>>>>> Hi Jens
>>>>>>>>>>
>>>>>>>>>> Here is the full log:
>>>>>>>>>
>>>>>>>>> Thanks! I think I see what it could be - can you try this one as well,
>>>>>>>>> would like to confirm that the condition I think is triggering is what
>>>>>>>>> is triggering.
>>>>>>>>>
>>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>>>>> index 07eb1412760b..81dede885231 100644
>>>>>>>>> --- a/block/blk-mq.c
>>>>>>>>> +++ b/block/blk-mq.c
>>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>>>    if (plug && plug->cached_rq) {
>>>>>>>>>        rq = rq_list_pop(&plug->cached_rq);
>>>>>>>>>        INIT_LIST_HEAD(&rq->queuelist);
>>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>>>    } else {
>>>>>>>>>        struct blk_mq_alloc_data data = {
>>>>>>>>>            .q        = q,
>>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>>>                bio_wouldblock_error(bio);
>>>>>>>>>            goto queue_exit;
>>>>>>>>>        }
>>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>>
>>>>>>>> Hello Jens,
>>>>>>>>
>>>>>>>> I guess the issue could be the following code run without grabbing
>>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>>>>>>>
>>>>>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>>>>>>>
>>>>>>>> then elevator is switched to real one from none, and check on q->elevator
>>>>>>>> becomes not consistent.
>>>>>>>
>>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it
>>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice
>>>>>>> benefit is that it allows dropping the weird ref get on plug flush,
>>>>>>> and batches getting the refs as well.
>>>>>>
>>>>>> Yi/Steffen, can you try pulling this into your test kernel:
>>>>>>
>>>>>> git://git.kernel.dk/linux-block for-next
>>>>>>
>>>>>> and see if it fixes the issue for you. Thanks!
>>>>>
>>>>> It still can be reproduced with the latest linux-block/for-next, here is the log
>>>>>
>>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
>>>>
>>>> Hi Yi,
>>>>
>>>> Please try the following change:
>>>>
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index e1e64964a31b..eb634a9c61ff 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>>  		.q		= q,
>>>>  		.flags		= flags,
>>>>  		.cmd_flags	= op,
>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>  		.nr_tags	= 1,
>>>>  	};
>>>>  	struct request *rq;
>>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>>  	if (ret)
>>>>  		return ERR_PTR(ret);
>>>>  
>>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>  	rq = __blk_mq_alloc_requests(&data);
>>>>  	if (!rq)
>>>>  		goto out_queue_exit;
>>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>>  		.q		= q,
>>>>  		.flags		= flags,
>>>>  		.cmd_flags	= op,
>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>  		.nr_tags	= 1,
>>>>  	};
>>>>  	u64 alloc_time_ns = 0;
>>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>>  	ret = blk_queue_enter(q, flags);
>>>>  	if (ret)
>>>>  		return ERR_PTR(ret);
>>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>
>>> Don't think that will compile, but I guess the point is that we can't do
>>> this assignment before queue enter, in case we're in the midst of
>>> switching schedulers. Which is indeed a valid concern.
>>
>> Something like the below. Maybe? On top of the for-next that was already
>> pulled in.
>>
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index b01e05e02277..121f1898d529 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>>  	if (data->cmd_flags & REQ_NOWAIT)
>>  		data->flags |= BLK_MQ_REQ_NOWAIT;
>>  
>> -	if (data->rq_flags & RQF_ELV) {
>> +	if (q->elevator) {
>>  		struct elevator_queue *e = q->elevator;
>>  
>> +		data->rq_flags |= RQF_ELV;
>> +
>>  		/*
>>  		 * Flush/passthrough requests are special and go directly to the
>>  		 * dispatch list. Don't include reserved tags in the
>> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>  		.q		= q,
>>  		.flags		= flags,
>>  		.cmd_flags	= op,
>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>  		.nr_tags	= 1,
>>  	};
>>  	struct request *rq;
>> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>  		.q		= q,
>>  		.flags		= flags,
>>  		.cmd_flags	= op,
>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>  		.nr_tags	= 1,
>>  	};
>>  	u64 alloc_time_ns = 0;
>> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>  
>>  	if (!q->elevator)
>>  		blk_mq_tag_busy(data.hctx);
>> +	else
>> +		data.rq_flags |= RQF_ELV;
>>  
>>  	ret = -EWOULDBLOCK;
>>  	tag = blk_mq_get_tag(&data);
>> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio)
>>  			.q		= q,
>>  			.nr_tags	= 1,
>>  			.cmd_flags	= bio->bi_opf,
>> -			.rq_flags	= q->elevator ? RQF_ELV : 0,
>>  		};
> 
> The above patch looks fine.
> 
> BTW, 9ede85cb670c ("block: move queue enter logic into
> blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(),
> which seems dangerous, especially blk_mq_sched_bio_merge() needs
> hctx/ctx which requires q_usage_counter to be grabbed.

I think the best solution is to enter just for that as well, and just
retain that enter state. I'll update the patch, there's some real fixes
in there too for the batched alloc. Will post them later today.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 15:41                           ` Jens Axboe
@ 2021-11-03 15:49                             ` Jens Axboe
  2021-11-03 16:09                               ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 15:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/3/21 9:41 AM, Jens Axboe wrote:
> On 11/3/21 9:16 AM, Ming Lei wrote:
>> On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote:
>>> On 11/3/21 9:03 AM, Jens Axboe wrote:
>>>> On 11/3/21 8:57 AM, Ming Lei wrote:
>>>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
>>>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>
>>>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote:
>>>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>>>>>>>>> hopefully shine a bit of light on the issue.
>>>>>>>>>>>>>
>>>>>>>>>>> Hi Jens
>>>>>>>>>>>
>>>>>>>>>>> Here is the full log:
>>>>>>>>>>
>>>>>>>>>> Thanks! I think I see what it could be - can you try this one as well,
>>>>>>>>>> would like to confirm that the condition I think is triggering is what
>>>>>>>>>> is triggering.
>>>>>>>>>>
>>>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>>>>>> index 07eb1412760b..81dede885231 100644
>>>>>>>>>> --- a/block/blk-mq.c
>>>>>>>>>> +++ b/block/blk-mq.c
>>>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>>>>    if (plug && plug->cached_rq) {
>>>>>>>>>>        rq = rq_list_pop(&plug->cached_rq);
>>>>>>>>>>        INIT_LIST_HEAD(&rq->queuelist);
>>>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>>>>    } else {
>>>>>>>>>>        struct blk_mq_alloc_data data = {
>>>>>>>>>>            .q        = q,
>>>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>>>>                bio_wouldblock_error(bio);
>>>>>>>>>>            goto queue_exit;
>>>>>>>>>>        }
>>>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>>>
>>>>>>>>> Hello Jens,
>>>>>>>>>
>>>>>>>>> I guess the issue could be the following code run without grabbing
>>>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>>>>>>>>
>>>>>>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>>>>>>>>
>>>>>>>>> then elevator is switched to real one from none, and check on q->elevator
>>>>>>>>> becomes not consistent.
>>>>>>>>
>>>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it
>>>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice
>>>>>>>> benefit is that it allows dropping the weird ref get on plug flush,
>>>>>>>> and batches getting the refs as well.
>>>>>>>
>>>>>>> Yi/Steffen, can you try pulling this into your test kernel:
>>>>>>>
>>>>>>> git://git.kernel.dk/linux-block for-next
>>>>>>>
>>>>>>> and see if it fixes the issue for you. Thanks!
>>>>>>
>>>>>> It still can be reproduced with the latest linux-block/for-next, here is the log
>>>>>>
>>>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
>>>>>
>>>>> Hi Yi,
>>>>>
>>>>> Please try the following change:
>>>>>
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index e1e64964a31b..eb634a9c61ff 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>>>  		.q		= q,
>>>>>  		.flags		= flags,
>>>>>  		.cmd_flags	= op,
>>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>  		.nr_tags	= 1,
>>>>>  	};
>>>>>  	struct request *rq;
>>>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>>>  	if (ret)
>>>>>  		return ERR_PTR(ret);
>>>>>  
>>>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>  	rq = __blk_mq_alloc_requests(&data);
>>>>>  	if (!rq)
>>>>>  		goto out_queue_exit;
>>>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>>>  		.q		= q,
>>>>>  		.flags		= flags,
>>>>>  		.cmd_flags	= op,
>>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>  		.nr_tags	= 1,
>>>>>  	};
>>>>>  	u64 alloc_time_ns = 0;
>>>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>>>  	ret = blk_queue_enter(q, flags);
>>>>>  	if (ret)
>>>>>  		return ERR_PTR(ret);
>>>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>
>>>> Don't think that will compile, but I guess the point is that we can't do
>>>> this assignment before queue enter, in case we're in the midst of
>>>> switching schedulers. Which is indeed a valid concern.
>>>
>>> Something like the below. Maybe? On top of the for-next that was already
>>> pulled in.
>>>
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index b01e05e02277..121f1898d529 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>>>  	if (data->cmd_flags & REQ_NOWAIT)
>>>  		data->flags |= BLK_MQ_REQ_NOWAIT;
>>>  
>>> -	if (data->rq_flags & RQF_ELV) {
>>> +	if (q->elevator) {
>>>  		struct elevator_queue *e = q->elevator;
>>>  
>>> +		data->rq_flags |= RQF_ELV;
>>> +
>>>  		/*
>>>  		 * Flush/passthrough requests are special and go directly to the
>>>  		 * dispatch list. Don't include reserved tags in the
>>> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>  		.q		= q,
>>>  		.flags		= flags,
>>>  		.cmd_flags	= op,
>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>  		.nr_tags	= 1,
>>>  	};
>>>  	struct request *rq;
>>> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>  		.q		= q,
>>>  		.flags		= flags,
>>>  		.cmd_flags	= op,
>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>  		.nr_tags	= 1,
>>>  	};
>>>  	u64 alloc_time_ns = 0;
>>> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>  
>>>  	if (!q->elevator)
>>>  		blk_mq_tag_busy(data.hctx);
>>> +	else
>>> +		data.rq_flags |= RQF_ELV;
>>>  
>>>  	ret = -EWOULDBLOCK;
>>>  	tag = blk_mq_get_tag(&data);
>>> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio)
>>>  			.q		= q,
>>>  			.nr_tags	= 1,
>>>  			.cmd_flags	= bio->bi_opf,
>>> -			.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>  		};
>>
>> The above patch looks fine.
>>
>> BTW, 9ede85cb670c ("block: move queue enter logic into
>> blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(),
>> which seems dangerous, especially blk_mq_sched_bio_merge() needs
>> hctx/ctx which requires q_usage_counter to be grabbed.
> 
> I think the best solution is to enter just for that as well, and just
> retain that enter state. I'll update the patch, there's some real fixes
> in there too for the batched alloc. Will post them later today.

Is it needed, though? As far as I can tell, it's only needed
persistently for having the IO inflight, otherwise if the premise is
that the queue can just go away, we're in trouble before that too. And I
don't think that's the case.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 15:49                             ` Jens Axboe
@ 2021-11-03 16:09                               ` Ming Lei
  2021-11-03 16:36                                 ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2021-11-03 16:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On Wed, Nov 03, 2021 at 09:49:20AM -0600, Jens Axboe wrote:
> On 11/3/21 9:41 AM, Jens Axboe wrote:
> > On 11/3/21 9:16 AM, Ming Lei wrote:
> >> On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote:
> >>> On 11/3/21 9:03 AM, Jens Axboe wrote:
> >>>> On 11/3/21 8:57 AM, Ming Lei wrote:
> >>>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
> >>>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>>>>
> >>>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote:
> >>>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
> >>>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
> >>>>>>>>>>>>> hopefully shine a bit of light on the issue.
> >>>>>>>>>>>>>
> >>>>>>>>>>> Hi Jens
> >>>>>>>>>>>
> >>>>>>>>>>> Here is the full log:
> >>>>>>>>>>
> >>>>>>>>>> Thanks! I think I see what it could be - can you try this one as well,
> >>>>>>>>>> would like to confirm that the condition I think is triggering is what
> >>>>>>>>>> is triggering.
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>>>>>>>>> index 07eb1412760b..81dede885231 100644
> >>>>>>>>>> --- a/block/blk-mq.c
> >>>>>>>>>> +++ b/block/blk-mq.c
> >>>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>>>>>>>>    if (plug && plug->cached_rq) {
> >>>>>>>>>>        rq = rq_list_pop(&plug->cached_rq);
> >>>>>>>>>>        INIT_LIST_HEAD(&rq->queuelist);
> >>>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> >>>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> >>>>>>>>>>    } else {
> >>>>>>>>>>        struct blk_mq_alloc_data data = {
> >>>>>>>>>>            .q        = q,
> >>>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>>>>>>>>                bio_wouldblock_error(bio);
> >>>>>>>>>>            goto queue_exit;
> >>>>>>>>>>        }
> >>>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
> >>>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
> >>>>>>>>>
> >>>>>>>>> Hello Jens,
> >>>>>>>>>
> >>>>>>>>> I guess the issue could be the following code run without grabbing
> >>>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
> >>>>>>>>>
> >>>>>>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
> >>>>>>>>>
> >>>>>>>>> then elevator is switched to real one from none, and check on q->elevator
> >>>>>>>>> becomes not consistent.
> >>>>>>>>
> >>>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it
> >>>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice
> >>>>>>>> benefit is that it allows dropping the weird ref get on plug flush,
> >>>>>>>> and batches getting the refs as well.
> >>>>>>>
> >>>>>>> Yi/Steffen, can you try pulling this into your test kernel:
> >>>>>>>
> >>>>>>> git://git.kernel.dk/linux-block for-next
> >>>>>>>
> >>>>>>> and see if it fixes the issue for you. Thanks!
> >>>>>>
> >>>>>> It still can be reproduced with the latest linux-block/for-next, here is the log
> >>>>>>
> >>>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
> >>>>>
> >>>>> Hi Yi,
> >>>>>
> >>>>> Please try the following change:
> >>>>>
> >>>>>
> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>>>> index e1e64964a31b..eb634a9c61ff 100644
> >>>>> --- a/block/blk-mq.c
> >>>>> +++ b/block/blk-mq.c
> >>>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
> >>>>>  		.q		= q,
> >>>>>  		.flags		= flags,
> >>>>>  		.cmd_flags	= op,
> >>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>>>>  		.nr_tags	= 1,
> >>>>>  	};
> >>>>>  	struct request *rq;
> >>>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
> >>>>>  	if (ret)
> >>>>>  		return ERR_PTR(ret);
> >>>>>  
> >>>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>>>>  	rq = __blk_mq_alloc_requests(&data);
> >>>>>  	if (!rq)
> >>>>>  		goto out_queue_exit;
> >>>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >>>>>  		.q		= q,
> >>>>>  		.flags		= flags,
> >>>>>  		.cmd_flags	= op,
> >>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>>>>  		.nr_tags	= 1,
> >>>>>  	};
> >>>>>  	u64 alloc_time_ns = 0;
> >>>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >>>>>  	ret = blk_queue_enter(q, flags);
> >>>>>  	if (ret)
> >>>>>  		return ERR_PTR(ret);
> >>>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>>>
> >>>> Don't think that will compile, but I guess the point is that we can't do
> >>>> this assignment before queue enter, in case we're in the midst of
> >>>> switching schedulers. Which is indeed a valid concern.
> >>>
> >>> Something like the below. Maybe? On top of the for-next that was already
> >>> pulled in.
> >>>
> >>>
> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>> index b01e05e02277..121f1898d529 100644
> >>> --- a/block/blk-mq.c
> >>> +++ b/block/blk-mq.c
> >>> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
> >>>  	if (data->cmd_flags & REQ_NOWAIT)
> >>>  		data->flags |= BLK_MQ_REQ_NOWAIT;
> >>>  
> >>> -	if (data->rq_flags & RQF_ELV) {
> >>> +	if (q->elevator) {
> >>>  		struct elevator_queue *e = q->elevator;
> >>>  
> >>> +		data->rq_flags |= RQF_ELV;
> >>> +
> >>>  		/*
> >>>  		 * Flush/passthrough requests are special and go directly to the
> >>>  		 * dispatch list. Don't include reserved tags in the
> >>> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
> >>>  		.q		= q,
> >>>  		.flags		= flags,
> >>>  		.cmd_flags	= op,
> >>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>>  		.nr_tags	= 1,
> >>>  	};
> >>>  	struct request *rq;
> >>> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >>>  		.q		= q,
> >>>  		.flags		= flags,
> >>>  		.cmd_flags	= op,
> >>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>>  		.nr_tags	= 1,
> >>>  	};
> >>>  	u64 alloc_time_ns = 0;
> >>> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> >>>  
> >>>  	if (!q->elevator)
> >>>  		blk_mq_tag_busy(data.hctx);
> >>> +	else
> >>> +		data.rq_flags |= RQF_ELV;
> >>>  
> >>>  	ret = -EWOULDBLOCK;
> >>>  	tag = blk_mq_get_tag(&data);
> >>> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio)
> >>>  			.q		= q,
> >>>  			.nr_tags	= 1,
> >>>  			.cmd_flags	= bio->bi_opf,
> >>> -			.rq_flags	= q->elevator ? RQF_ELV : 0,
> >>>  		};
> >>
> >> The above patch looks fine.
> >>
> >> BTW, 9ede85cb670c ("block: move queue enter logic into
> >> blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(),
> >> which seems dangerous, especially blk_mq_sched_bio_merge() needs
> >> hctx/ctx which requires q_usage_counter to be grabbed.
> > 
> > I think the best solution is to enter just for that as well, and just
> > retain that enter state. I'll update the patch, there's some real fixes
> > in there too for the batched alloc. Will post them later today.
> 
> Is it needed, though? As far as I can tell, it's only needed
> persistently for having the IO inflight, otherwise if the premise is
> that the queue can just go away, we're in trouble before that too. And I
> don't think that's the case.

inflight bio just means that bdev is opened, and request queue won't
go away.

But a lot things still can happen: elevator switch, update nr_hw_queues,
clean up request queue, so looks blk_mq_sched_bio_merge() not safe
without grabbing .q_usage_counter?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
  2021-11-03 16:09                               ` Ming Lei
@ 2021-11-03 16:36                                 ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2021-11-03 16:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List,
	linux-scsi

On 11/3/21 10:09 AM, Ming Lei wrote:
> On Wed, Nov 03, 2021 at 09:49:20AM -0600, Jens Axboe wrote:
>> On 11/3/21 9:41 AM, Jens Axboe wrote:
>>> On 11/3/21 9:16 AM, Ming Lei wrote:
>>>> On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote:
>>>>> On 11/3/21 9:03 AM, Jens Axboe wrote:
>>>>>> On 11/3/21 8:57 AM, Ming Lei wrote:
>>>>>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote:
>>>>>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>>>
>>>>>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote:
>>>>>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>>>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>>>>>>>>>>> hopefully shine a bit of light on the issue.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Jens
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the full log:
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks! I think I see what it could be - can you try this one as well,
>>>>>>>>>>>> would like to confirm that the condition I think is triggering is what
>>>>>>>>>>>> is triggering.
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>>>>>>>> index 07eb1412760b..81dede885231 100644
>>>>>>>>>>>> --- a/block/blk-mq.c
>>>>>>>>>>>> +++ b/block/blk-mq.c
>>>>>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>>>>>>    if (plug && plug->cached_rq) {
>>>>>>>>>>>>        rq = rq_list_pop(&plug->cached_rq);
>>>>>>>>>>>>        INIT_LIST_HEAD(&rq->queuelist);
>>>>>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>>>>>>    } else {
>>>>>>>>>>>>        struct blk_mq_alloc_data data = {
>>>>>>>>>>>>            .q        = q,
>>>>>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>>>>>>>>                bio_wouldblock_error(bio);
>>>>>>>>>>>>            goto queue_exit;
>>>>>>>>>>>>        }
>>>>>>>>>>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>>>>>>>>>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>>>>>>>>>
>>>>>>>>>>> Hello Jens,
>>>>>>>>>>>
>>>>>>>>>>> I guess the issue could be the following code run without grabbing
>>>>>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>>>>>>>>>>
>>>>>>>>>>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>>>>>>>>>>
>>>>>>>>>>> then elevator is switched to real one from none, and check on q->elevator
>>>>>>>>>>> becomes not consistent.
>>>>>>>>>>
>>>>>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it
>>>>>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice
>>>>>>>>>> benefit is that it allows dropping the weird ref get on plug flush,
>>>>>>>>>> and batches getting the refs as well.
>>>>>>>>>
>>>>>>>>> Yi/Steffen, can you try pulling this into your test kernel:
>>>>>>>>>
>>>>>>>>> git://git.kernel.dk/linux-block for-next
>>>>>>>>>
>>>>>>>>> and see if it fixes the issue for you. Thanks!
>>>>>>>>
>>>>>>>> It still can be reproduced with the latest linux-block/for-next, here is the log
>>>>>>>>
>>>>>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next
>>>>>>>
>>>>>>> Hi Yi,
>>>>>>>
>>>>>>> Please try the following change:
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>>>> index e1e64964a31b..eb634a9c61ff 100644
>>>>>>> --- a/block/blk-mq.c
>>>>>>> +++ b/block/blk-mq.c
>>>>>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>>>>>  		.q		= q,
>>>>>>>  		.flags		= flags,
>>>>>>>  		.cmd_flags	= op,
>>>>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>>>  		.nr_tags	= 1,
>>>>>>>  	};
>>>>>>>  	struct request *rq;
>>>>>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>>>>>  	if (ret)
>>>>>>>  		return ERR_PTR(ret);
>>>>>>>  
>>>>>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>>>  	rq = __blk_mq_alloc_requests(&data);
>>>>>>>  	if (!rq)
>>>>>>>  		goto out_queue_exit;
>>>>>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>>>>>  		.q		= q,
>>>>>>>  		.flags		= flags,
>>>>>>>  		.cmd_flags	= op,
>>>>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>>>  		.nr_tags	= 1,
>>>>>>>  	};
>>>>>>>  	u64 alloc_time_ns = 0;
>>>>>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>>>>>  	ret = blk_queue_enter(q, flags);
>>>>>>>  	if (ret)
>>>>>>>  		return ERR_PTR(ret);
>>>>>>> +	data.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>>
>>>>>> Don't think that will compile, but I guess the point is that we can't do
>>>>>> this assignment before queue enter, in case we're in the midst of
>>>>>> switching schedulers. Which is indeed a valid concern.
>>>>>
>>>>> Something like the below. Maybe? On top of the for-next that was already
>>>>> pulled in.
>>>>>
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index b01e05e02277..121f1898d529 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>>>>>  	if (data->cmd_flags & REQ_NOWAIT)
>>>>>  		data->flags |= BLK_MQ_REQ_NOWAIT;
>>>>>  
>>>>> -	if (data->rq_flags & RQF_ELV) {
>>>>> +	if (q->elevator) {
>>>>>  		struct elevator_queue *e = q->elevator;
>>>>>  
>>>>> +		data->rq_flags |= RQF_ELV;
>>>>> +
>>>>>  		/*
>>>>>  		 * Flush/passthrough requests are special and go directly to the
>>>>>  		 * dispatch list. Don't include reserved tags in the
>>>>> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>>>>>  		.q		= q,
>>>>>  		.flags		= flags,
>>>>>  		.cmd_flags	= op,
>>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>  		.nr_tags	= 1,
>>>>>  	};
>>>>>  	struct request *rq;
>>>>> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>>>  		.q		= q,
>>>>>  		.flags		= flags,
>>>>>  		.cmd_flags	= op,
>>>>> -		.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>  		.nr_tags	= 1,
>>>>>  	};
>>>>>  	u64 alloc_time_ns = 0;
>>>>> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>>>>>  
>>>>>  	if (!q->elevator)
>>>>>  		blk_mq_tag_busy(data.hctx);
>>>>> +	else
>>>>> +		data.rq_flags |= RQF_ELV;
>>>>>  
>>>>>  	ret = -EWOULDBLOCK;
>>>>>  	tag = blk_mq_get_tag(&data);
>>>>> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio)
>>>>>  			.q		= q,
>>>>>  			.nr_tags	= 1,
>>>>>  			.cmd_flags	= bio->bi_opf,
>>>>> -			.rq_flags	= q->elevator ? RQF_ELV : 0,
>>>>>  		};
>>>>
>>>> The above patch looks fine.
>>>>
>>>> BTW, 9ede85cb670c ("block: move queue enter logic into
>>>> blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(),
>>>> which seems dangerous, especially blk_mq_sched_bio_merge() needs
>>>> hctx/ctx which requires q_usage_counter to be grabbed.
>>>
>>> I think the best solution is to enter just for that as well, and just
>>> retain that enter state. I'll update the patch, there's some real fixes
>>> in there too for the batched alloc. Will post them later today.
>>
>> Is it needed, though? As far as I can tell, it's only needed
>> persistently for having the IO inflight, otherwise if the premise is
>> that the queue can just go away, we're in trouble before that too. And I
>> don't think that's the case.
> 
> inflight bio just means that bdev is opened, and request queue won't
> go away.
> 
> But a lot things still can happen: elevator switch, update nr_hw_queues,
> clean up request queue, so looks blk_mq_sched_bio_merge() not safe
> without grabbing .q_usage_counter?

Yes good point, we need a consistent sched/queue view at that point.
I'll update it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2>]

* RE:(2) [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178
       [not found]         ` <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2>
@ 2021-11-03  3:28           ` Daejun Park
  0 siblings, 0 replies; 24+ messages in thread
From: Daejun Park @ 2021-11-03  3:28 UTC (permalink / raw)
  To: Jens Axboe, Yi Zhang, Daejun Park
  Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi

Hi Jens,

>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>> hopefully shine a bit of light on the issue.
>>>
>> Hi Jens
>> 
>> Here is the full log:
> 
>Thanks! I think I see what it could be - can you try this one as well,
>would like to confirm that the condition I think is triggering is what
>is triggering.
> 
>diff --git a/block/blk-mq.c b/block/blk-mq.c
>index 07eb1412760b..81dede885231 100644
>--- a/block/blk-mq.c
>+++ b/block/blk-mq.c
>@@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>         if (plug && plug->cached_rq) {
>                 rq = rq_list_pop(&plug->cached_rq);
>                 INIT_LIST_HEAD(&rq->queuelist);
>+                WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>+                WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>         } else {
>                 struct blk_mq_alloc_data data = {
>                         .q                = q,
>@@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>                                 bio_wouldblock_error(bio);
>                         goto queue_exit;
>                 }
>+                WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>+                WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>         }
> 
>         trace_block_getrq(bio);
> 
>-- 
>Jens Axboe

The first reported warning was started from calling scsi_execute(), so how
about add the checking code in __scsi_execute()?

Thanks,
Daejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-11-05 11:14 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHj4cs-NUKzGj5pgzRhDgdrGGbgPBqUoQ44+xgvk6njH9a_RYQ@mail.gmail.com>
2021-11-02 19:00 ` [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 Steffen Maier
2021-11-02 19:02   ` Jens Axboe
2021-11-02 20:03     ` Jens Axboe
2021-11-03  2:21       ` Yi Zhang
2021-11-03  3:21         ` Jens Axboe
2021-11-03  3:51           ` Ming Lei
2021-11-03  3:54             ` Jens Axboe
2021-11-03  4:00               ` Yi Zhang
2021-11-03 19:03                 ` Jens Axboe
2021-11-05 11:13                   ` Yi Zhang
2021-11-03 11:59               ` Jens Axboe
2021-11-03 13:59                 ` Yi Zhang
2021-11-03 14:26                   ` Jens Axboe
2021-11-03 14:57                   ` Ming Lei
2021-11-03 15:03                     ` Jens Axboe
2021-11-03 15:09                       ` Ming Lei
2021-11-03 15:12                         ` Jens Axboe
2021-11-03 15:10                       ` Jens Axboe
2021-11-03 15:16                         ` Ming Lei
2021-11-03 15:41                           ` Jens Axboe
2021-11-03 15:49                             ` Jens Axboe
2021-11-03 16:09                               ` Ming Lei
2021-11-03 16:36                                 ` Jens Axboe
     [not found]         ` <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2>
2021-11-03  3:28           ` Daejun Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).