* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 [not found] <CAHj4cs-NUKzGj5pgzRhDgdrGGbgPBqUoQ44+xgvk6njH9a_RYQ@mail.gmail.com> @ 2021-11-02 19:00 ` Steffen Maier 2021-11-02 19:02 ` Jens Axboe 0 siblings, 1 reply; 24+ messages in thread From: Steffen Maier @ 2021-11-02 19:00 UTC (permalink / raw) To: Yi Zhang, linux-block, Linux-Next Mailing List, linux-scsi On 11/2/21 07:42, Yi Zhang wrote: > Below WARNING was triggered with blktests srp/001 on the latest > linux-block/for-next, and it cannot be reproduced with v5.15, pls help > check it, thanks. > > 88d2c6ab15f7 (origin/for-next) Merge branch 'for-5.16/block' into for-next Same warning here with a slightly different stack trace. It breaks root-fs on zfcp-attached SCSI disks for us, because we run our CI intentionally with panic_on_warn. > [ 9.031740] ------------[ cut here ]------------ > [ 9.031743] WARNING: CPU: 13 PID: 196 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 > [ 9.031751] Modules linked in: nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) dm_service_time(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) zfcp(E) scsi_transport_fc(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) zcrypt_cex4(E) vfio(E) eadm_sch(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E) > [ 9.031785] CPU: 13 PID: 196 Comm: kworker/13:2 Tainted: G E 5.16.0-20211102.rc0.git0.9febf1194306.300.fc34.s390x+next #1 > [ 9.031789] Hardware name: IBM 3906 M04 704 (LPAR) > [ 9.031791] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua] > [ 9.031795] Krnl PSW : 0704e00180000000 000000006558e948 (blk_mq_sched_insert_request+0x58/0x178) > [ 9.031800] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 > [ 9.031803] Krnl GPRS: 0000000000000080 00000000000004c6 00000000ade56000 0000000000000001 > [ 9.031806] 0000000000000001 0000000000000000 00000000a2d6a400 0000000084003c00 > [ 9.031808] 0000000000000000 0000000000000001 0000000000000001 00000000ade56000 > [ 9.031810] 000000008aef0000 000003ff7af59400 000003800e3d7b00 000003800e3d7a90 > [ 9.031817] Krnl Code: 000000006558e93c: a71effff chi %r1,-1 > 000000006558e940: a7840004 brc 8,000000006558e948 > #000000006558e944: af000000 mc 0,0 > >000000006558e948: 5810b01c l %r1,28(%r11) > 000000006558e94c: ec213bbb0055 risbg %r2,%r1,59,187,0 > 000000006558e952: a7740057 brc 7,000000006558ea00 > 000000006558e956: 5810b018 l %r1,24(%r11) > 000000006558e95a: c01b000000ff nilf %r1,255 > [ 9.031833] Call Trace: > [ 9.031835] [<000000006558e948>] blk_mq_sched_insert_request+0x58/0x178 > [ 9.031838] [<000000006557effe>] blk_execute_rq+0x56/0xd8 > [ 9.031841] [<0000000065768708>] __scsi_execute+0x118/0x240 > [ 9.031847] [<000003ff803c3788>] alua_rtpg+0x120/0x8f8 [scsi_dh_alua] > [ 9.031849] [<000003ff803c402c>] alua_rtpg_work+0xcc/0x648 [scsi_dh_alua] > [ 9.031852] [<0000000064f024d2>] process_one_work+0x1fa/0x470 > [ 9.031856] [<0000000064f02c74>] worker_thread+0x64/0x498 > [ 9.031859] [<0000000064f0a894>] kthread+0x17c/0x188 > [ 9.031861] [<0000000064e933c4>] __ret_from_fork+0x3c/0x58 > [ 9.031864] [<0000000065a71cea>] ret_from_fork+0xa/0x40 > [ 9.031868] Last Breaking-Event-Address: > [ 9.031869] [<000000006557ef72>] blk_execute_rq_nowait+0x82/0x98 > [ 9.031871] Kernel panic - not syncing: panic_on_warn set ... > [ 3881.829489] ------------[ cut here ]------------ > [ 3881.829493] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 > blk_mq_sched_insert_request+0x54/0x178 > [ 3881.829504] Modules linked in: ib_srp scsi_transport_srp > target_core_pscsi target_core_file ib_srpt target_core_iblock > target_core_mod rdma_cm iw_cm ib_cm ib_umad scsi_debug rdma_rxe > ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_rdac scsi_dh_emc > scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc qeth > qdio ccwgroup vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 drm fb > fuse font drm_panel_orientation_quirks i2c_core backlight zram > ip_tables xfs crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes > sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod > pkey zcrypt > [ 3881.829553] CPU: 1 PID: 1386 Comm: kworker/u128:2 Not tainted 5.15.0+ #3 > [ 3881.829556] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) > [ 3881.829558] Workqueue: events_unbound async_run_entry_fn > [ 3881.829564] Krnl PSW : 0704e00180000000 000000001055afc0 > (blk_mq_sched_insert_request+0x58/0x178) > [ 3881.829569] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 > CC:2 PM:0 RI:0 EA:3 > [ 3881.829572] Krnl GPRS: 0000000000000004 000000000000003d > 00000000448bf400 0000000000000001 > [ 3881.829575] 0000000000000001 0000000000000000 > 000000004352bc00 000000002e0f3000 > [ 3881.829577] 0000000000000000 0000000000000001 > 0000000000000001 00000000448bf400 > [ 3881.829580] 000000001ee72100 000003ff7e82dd00 > 00000380022f7838 00000380022f77c8 > [ 3881.829587] Krnl Code: 000000001055afb4: a71effff chi %r1,-1 > 000000001055afb8: a7840004 brc 8,000000001055afc0 > #000000001055afbc: af000000 mc 0,0 > >000000001055afc0: 5810b01c l %r1,28(%r11) > 000000001055afc4: ec213bbb0055 risbg %r2,%r1,59,187,0 > 000000001055afca: a7740057 brc 7,000000001055b078 > 000000001055afce: 5810b018 l %r1,24(%r11) > 000000001055afd2: c01b000000ff nilf %r1,255 > [ 3881.829607] Call Trace: > [ 3881.829609] [<000000001055afc0>] blk_mq_sched_insert_request+0x58/0x178 > [ 3881.829616] [<000000001054b876>] blk_execute_rq+0x56/0xd8 > [ 3881.829620] [<000000001070e3a0>] __scsi_execute+0x110/0x230 > [ 3881.829625] [<000000001070e602>] scsi_mode_sense+0x142/0x340 > [ 3881.829627] [<000000001071f8ee>] sd_revalidate_disk.isra.0+0x74e/0x2240 > [ 3881.829632] [<0000000010721912>] sd_probe+0x312/0x4b0 > [ 3881.829634] [<00000000106d4c30>] really_probe+0xd0/0x4b0 > [ 3881.829639] [<00000000106d51c0>] driver_probe_device+0x40/0xf0 > [ 3881.829642] [<00000000106d58cc>] __device_attach_driver+0xa4/0x128 > [ 3881.829645] [<00000000106d1fd0>] bus_for_each_drv+0x88/0xc0 > [ 3881.829649] [<00000000106d4130>] __device_attach_async_helper+0x90/0xf0 > [ 3881.829652] [<000000000ffb0f46>] async_run_entry_fn+0x4e/0x1b0 > [ 3881.829655] [<000000000ffa384a>] process_one_work+0x21a/0x498 > [ 3881.829658] [<000000000ffa3ff4>] worker_thread+0x64/0x498 > [ 3881.829661] [<000000000ffac8e0>] kthread+0x150/0x160 > [ 3881.829665] [<000000000ff37468>] __ret_from_fork+0x40/0x58 > [ 3881.829670] [<0000000010a8550a>] ret_from_fork+0xa/0x30 > [ 3881.829675] Last Breaking-Event-Address: > [ 3881.829676] [<0000000000000007>] 0x7 > [ 3881.829679] ---[ end trace a501db666d088cc7 ]--- -- Mit freundlichen Gruessen / Kind regards Steffen Maier Linux on IBM Z and LinuxONE https://www.ibm.com/privacy/us/en/ IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-02 19:00 ` [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 Steffen Maier @ 2021-11-02 19:02 ` Jens Axboe 2021-11-02 20:03 ` Jens Axboe 0 siblings, 1 reply; 24+ messages in thread From: Jens Axboe @ 2021-11-02 19:02 UTC (permalink / raw) To: Steffen Maier, Yi Zhang, linux-block, Linux-Next Mailing List, linux-scsi On 11/2/21 1:00 PM, Steffen Maier wrote: > On 11/2/21 07:42, Yi Zhang wrote: >> Below WARNING was triggered with blktests srp/001 on the latest >> linux-block/for-next, and it cannot be reproduced with v5.15, pls help >> check it, thanks. >> >> 88d2c6ab15f7 (origin/for-next) Merge branch 'for-5.16/block' into for-next > > Same warning here with a slightly different stack trace. > It breaks root-fs on zfcp-attached SCSI disks for us, because we run our CI > intentionally with panic_on_warn. > >> [ 9.031740] ------------[ cut here ]------------ >> [ 9.031743] WARNING: CPU: 13 PID: 196 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 >> [ 9.031751] Modules linked in: nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) dm_service_time(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) zfcp(E) scsi_transport_fc(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) zcrypt_cex4(E) vfio(E) eadm_sch(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E) >> [ 9.031785] CPU: 13 PID: 196 Comm: kworker/13:2 Tainted: G E 5.16.0-20211102.rc0.git0.9febf1194306.300.fc34.s390x+next #1 >> [ 9.031789] Hardware name: IBM 3906 M04 704 (LPAR) >> [ 9.031791] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua] >> [ 9.031795] Krnl PSW : 0704e00180000000 000000006558e948 (blk_mq_sched_insert_request+0x58/0x178) >> [ 9.031800] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 >> [ 9.031803] Krnl GPRS: 0000000000000080 00000000000004c6 00000000ade56000 0000000000000001 >> [ 9.031806] 0000000000000001 0000000000000000 00000000a2d6a400 0000000084003c00 >> [ 9.031808] 0000000000000000 0000000000000001 0000000000000001 00000000ade56000 >> [ 9.031810] 000000008aef0000 000003ff7af59400 000003800e3d7b00 000003800e3d7a90 >> [ 9.031817] Krnl Code: 000000006558e93c: a71effff chi %r1,-1 >> 000000006558e940: a7840004 brc 8,000000006558e948 >> #000000006558e944: af000000 mc 0,0 >> >000000006558e948: 5810b01c l %r1,28(%r11) >> 000000006558e94c: ec213bbb0055 risbg %r2,%r1,59,187,0 >> 000000006558e952: a7740057 brc 7,000000006558ea00 >> 000000006558e956: 5810b018 l %r1,24(%r11) >> 000000006558e95a: c01b000000ff nilf %r1,255 >> [ 9.031833] Call Trace: >> [ 9.031835] [<000000006558e948>] blk_mq_sched_insert_request+0x58/0x178 >> [ 9.031838] [<000000006557effe>] blk_execute_rq+0x56/0xd8 >> [ 9.031841] [<0000000065768708>] __scsi_execute+0x118/0x240 >> [ 9.031847] [<000003ff803c3788>] alua_rtpg+0x120/0x8f8 [scsi_dh_alua] >> [ 9.031849] [<000003ff803c402c>] alua_rtpg_work+0xcc/0x648 [scsi_dh_alua] >> [ 9.031852] [<0000000064f024d2>] process_one_work+0x1fa/0x470 >> [ 9.031856] [<0000000064f02c74>] worker_thread+0x64/0x498 >> [ 9.031859] [<0000000064f0a894>] kthread+0x17c/0x188 >> [ 9.031861] [<0000000064e933c4>] __ret_from_fork+0x3c/0x58 >> [ 9.031864] [<0000000065a71cea>] ret_from_fork+0xa/0x40 >> [ 9.031868] Last Breaking-Event-Address: >> [ 9.031869] [<000000006557ef72>] blk_execute_rq_nowait+0x82/0x98 >> [ 9.031871] Kernel panic - not syncing: panic_on_warn set ... I'm looking into this one, it's a bit puzzling. The WARN is: WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG)); which is "we have an elevator", yet the tag isn't initialized to BLK_MQ_NO_TAG. That seems to hint at the initialization changes there, but nothing sticks out there for me. I'll keep looking. -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-02 19:02 ` Jens Axboe @ 2021-11-02 20:03 ` Jens Axboe 2021-11-03 2:21 ` Yi Zhang 0 siblings, 1 reply; 24+ messages in thread From: Jens Axboe @ 2021-11-02 20:03 UTC (permalink / raw) To: Steffen Maier, Yi Zhang, linux-block, Linux-Next Mailing List, linux-scsi On 11/2/21 1:02 PM, Jens Axboe wrote: > On 11/2/21 1:00 PM, Steffen Maier wrote: >> On 11/2/21 07:42, Yi Zhang wrote: >>> Below WARNING was triggered with blktests srp/001 on the latest >>> linux-block/for-next, and it cannot be reproduced with v5.15, pls help >>> check it, thanks. >>> >>> 88d2c6ab15f7 (origin/for-next) Merge branch 'for-5.16/block' into for-next >> >> Same warning here with a slightly different stack trace. >> It breaks root-fs on zfcp-attached SCSI disks for us, because we run our CI >> intentionally with panic_on_warn. >> >>> [ 9.031740] ------------[ cut here ]------------ >>> [ 9.031743] WARNING: CPU: 13 PID: 196 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 >>> [ 9.031751] Modules linked in: nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) dm_service_time(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) zfcp(E) scsi_transport_fc(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) s390_trng(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) zcrypt_cex4(E) vfio(E) eadm_sch(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E) >>> [ 9.031785] CPU: 13 PID: 196 Comm: kworker/13:2 Tainted: G E 5.16.0-20211102.rc0.git0.9febf1194306.300.fc34.s390x+next #1 >>> [ 9.031789] Hardware name: IBM 3906 M04 704 (LPAR) >>> [ 9.031791] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua] >>> [ 9.031795] Krnl PSW : 0704e00180000000 000000006558e948 (blk_mq_sched_insert_request+0x58/0x178) >>> [ 9.031800] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 >>> [ 9.031803] Krnl GPRS: 0000000000000080 00000000000004c6 00000000ade56000 0000000000000001 >>> [ 9.031806] 0000000000000001 0000000000000000 00000000a2d6a400 0000000084003c00 >>> [ 9.031808] 0000000000000000 0000000000000001 0000000000000001 00000000ade56000 >>> [ 9.031810] 000000008aef0000 000003ff7af59400 000003800e3d7b00 000003800e3d7a90 >>> [ 9.031817] Krnl Code: 000000006558e93c: a71effff chi %r1,-1 >>> 000000006558e940: a7840004 brc 8,000000006558e948 >>> #000000006558e944: af000000 mc 0,0 >>> >000000006558e948: 5810b01c l %r1,28(%r11) >>> 000000006558e94c: ec213bbb0055 risbg %r2,%r1,59,187,0 >>> 000000006558e952: a7740057 brc 7,000000006558ea00 >>> 000000006558e956: 5810b018 l %r1,24(%r11) >>> 000000006558e95a: c01b000000ff nilf %r1,255 >>> [ 9.031833] Call Trace: >>> [ 9.031835] [<000000006558e948>] blk_mq_sched_insert_request+0x58/0x178 >>> [ 9.031838] [<000000006557effe>] blk_execute_rq+0x56/0xd8 >>> [ 9.031841] [<0000000065768708>] __scsi_execute+0x118/0x240 >>> [ 9.031847] [<000003ff803c3788>] alua_rtpg+0x120/0x8f8 [scsi_dh_alua] >>> [ 9.031849] [<000003ff803c402c>] alua_rtpg_work+0xcc/0x648 [scsi_dh_alua] >>> [ 9.031852] [<0000000064f024d2>] process_one_work+0x1fa/0x470 >>> [ 9.031856] [<0000000064f02c74>] worker_thread+0x64/0x498 >>> [ 9.031859] [<0000000064f0a894>] kthread+0x17c/0x188 >>> [ 9.031861] [<0000000064e933c4>] __ret_from_fork+0x3c/0x58 >>> [ 9.031864] [<0000000065a71cea>] ret_from_fork+0xa/0x40 >>> [ 9.031868] Last Breaking-Event-Address: >>> [ 9.031869] [<000000006557ef72>] blk_execute_rq_nowait+0x82/0x98 >>> [ 9.031871] Kernel panic - not syncing: panic_on_warn set ... > > I'm looking into this one, it's a bit puzzling. The WARN is: > > WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG)); > > which is "we have an elevator", yet the tag isn't initialized to BLK_MQ_NO_TAG. > That seems to hint at the initialization changes there, but nothing sticks out > there for me. > > I'll keep looking. Can either one of you try with this patch? Won't fix anything, but it'll hopefully shine a bit of light on the issue. diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 4a6789e4398b..1b7647722ec0 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -429,7 +429,8 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head, struct blk_mq_ctx *ctx = rq->mq_ctx; struct blk_mq_hw_ctx *hctx = rq->mq_hctx; - WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG)); + if (e && (rq->tag != BLK_MQ_NO_TAG)) + printk("tag=%d/%d, e=%lx, rq cmd_flags %x, rq_flags %x\n", rq->tag, rq->internal_tag, (long) e, rq->cmd_flags, rq->rq_flags); if (blk_mq_sched_bypass_insert(hctx, rq)) { /* -- Jens Axboe ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-02 20:03 ` Jens Axboe @ 2021-11-03 2:21 ` Yi Zhang 2021-11-03 3:21 ` Jens Axboe [not found] ` <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2> 0 siblings, 2 replies; 24+ messages in thread From: Yi Zhang @ 2021-11-03 2:21 UTC (permalink / raw) To: Jens Axboe Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi > > Can either one of you try with this patch? Won't fix anything, but it'll > hopefully shine a bit of light on the issue. > Hi Jens Here is the full log: [ 566.964613] run blktests srp/001 at 2021-11-02 22:09:12 [ 567.372541] alua: device handler registered [ 567.375340] emc: device handler registered [ 567.388737] rdac: device handler registered [ 567.403792] null_blk: module loaded [ 567.624077] rdma_rxe: loaded [ 567.629083] infiniband enc8000_rxe: set active [ 567.629087] infiniband enc8000_rxe: added enc8000 [ 567.699017] scsi_debug:sdebug_add_store: dif_storep 524288 bytes @ 000000005c9bf0dc [ 567.699682] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1) [ 567.699686] scsi_debug:sdebug_driver_probe: host protection DIF3 DIX3 [ 567.699691] scsi host0: scsi_debug: version 0190 [20200710] dev_size_mb=32, opts=0x0, submit_queues=1, statistics=0 [ 567.700433] scsi 0:0:0:0: Direct-Access Linux scsi_debug 0190 PQ: 0 ANSI: 7 [ 567.700588] sd 0:0:0:0: Power-on or device reset occurred [ 567.700610] sd 0:0:0:0: [sda] Enabling DIF Type 3 protection [ 567.700634] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.6 MB/32.0 MiB) [ 567.700643] sd 0:0:0:0: [sda] Write Protect is off [ 567.700648] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 [ 567.700658] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 567.700679] sd 0:0:0:0: [sda] Optimal transfer size 524288 bytes [ 567.700807] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 567.787563] sd 0:0:0:0: [sda] Enabling DIX T10-DIF-TYPE3-CRC protection [ 567.787568] sd 0:0:0:0: [sda] DIF application tag size 6 [ 567.887644] sd 0:0:0:0: [sda] Attached SCSI disk [ 568.453337] Rounding down aligned max_sectors from 4294967295 to 4294967288 [ 568.488877] ib_srpt:srpt_add_one: ib_srpt device = 00000000dd1cba21 [ 568.488883] ib_srpt:srpt_use_srq: ib_srpt srpt_use_srq(enc8000_rxe): use_srq = 0; ret = 0 [ 568.488885] ib_srpt:srpt_add_one: ib_srpt Target login info: id_ext=00debdfffebeef80,ioc_guid=00debdfffebeef80,pkey=ffff,service_id=00debdfffebeef80 [ 568.488924] ib_srpt:srpt_add_one: ib_srpt added enc8000_rxe. [ 568.933299] Rounding down aligned max_sectors from 255 to 248 [ 568.942144] Rounding down aligned max_sectors from 255 to 248 [ 568.951076] Rounding down aligned max_sectors from 4294967295 to 4294967288 [ 569.055204] ib_srp:srp_add_one: ib_srp: srp_add_one: 18446744073709551615 / 4096 = 4503599627370495 <> 512 [ 569.055208] ib_srp:srp_add_one: ib_srp: enc8000_rxe: mr_page_shift = 12, device->max_mr_size = 0xffffffffffffffff, device->max_fast_reg_page_list_len = 512, max_pages_per_mr = 512, mr_max_size = 0x200000 [ 569.072666] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0 [ 569.072672] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555 [ 569.072674] ib_srp:add_target_store: ib_srp: max_sectors = 1024; max_pages_per_mr = 512; mr_page_size = 4096; max_sectors_per_mr = 4096; mr_per_cmd = 2 [ 569.072676] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 569.073279] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff [ 569.073366] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68 [ 569.076164] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib: max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 0000000097d2923d [ 569.076177] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr 10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80 [ 569.076201] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection sess=00000000ffaa765e name=10.16.69.39 ch=0000000097d2923d [ 569.076233] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 569.076236] scsi host1: ib_srp: using immediate data [ 569.076606] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-18: queued zerolength write [ 569.076620] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-18 wc->status 0 [ 569.076907] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff [ 569.076989] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68 [ 569.079227] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib: max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 0000000078d4dcf5 [ 569.079240] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr 10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80 [ 569.079255] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection sess=0000000015a0b0b5 name=10.16.69.39 ch=0000000078d4dcf5 [ 569.079311] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 569.079313] scsi host1: ib_srp: using immediate data [ 569.079669] scsi host1: SRP.T10:00DEBDFFFEBEEF80 [ 569.079675] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-20: queued zerolength write [ 569.079686] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-20 wc->status 0 [ 569.080892] scsi 1:0:0:0: Direct-Access LIO-ORG IBLOCK 4.0 PQ: 0 ANSI: 6 [ 569.081009] scsi 1:0:0:0: alua: supports implicit and explicit TPGS [ 569.081014] scsi 1:0:0:0: alua: device naa.60014056e756c6c62300000000000000 port group 0 rel port 1 [ 569.081143] sd 1:0:0:0: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical [ 569.081849] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 569.097426] sd 1:0:0:0: alua: transition timeout set to 60 seconds [ 569.097431] sd 1:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA [ 569.097619] sd 1:0:0:0: [sdb] 65536 512-byte logical blocks: (33.6 MB/32.0 MiB) [ 569.097649] sd 1:0:0:0: [sdb] Write Protect is off [ 569.097652] sd 1:0:0:0: [sdb] Mode Sense: 43 00 00 08 [ 569.097680] scsi 1:0:0:2: Direct-Access LIO-ORG IBLOCK 4.0 PQ: 0 ANSI: 6 [ 569.097702] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 569.097714] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION. [ 569.097753] sd 1:0:0:0: [sdb] Optimal transfer size 126976 bytes [ 569.097794] scsi 1:0:0:2: alua: supports implicit and explicit TPGS [ 569.097798] scsi 1:0:0:2: alua: device naa.60014057363736964626700000000000 port group 0 rel port 1 [ 569.098042] sd 1:0:0:2: [sdc] 65536 512-byte logical blocks: (33.6 MB/32.0 MiB) [ 569.098068] sd 1:0:0:2: [sdc] Write Protect is off [ 569.098070] sd 1:0:0:2: [sdc] Mode Sense: 43 00 10 08 [ 569.098117] sd 1:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 569.098129] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION. [ 569.098169] sd 1:0:0:2: [sdc] Optimal transfer size 524288 bytes [ 569.099162] sd 1:0:0:2: Attached scsi generic sg2 type 0 [ 569.099412] scsi 1:0:0:1: Direct-Access LIO-ORG IBLOCK 4.0 PQ: 0 ANSI: 6 [ 569.099531] scsi 1:0:0:1: alua: supports implicit and explicit TPGS [ 569.099535] scsi 1:0:0:1: alua: device naa.60014056e756c6c62310000000000000 port group 0 rel port 1 [ 569.102434] sd 1:0:0:1: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical [ 569.102913] sd 1:0:0:1: Attached scsi generic sg3 type 0 [ 569.102941] ib_srp:srp_add_target: ib_srp: host1: SCSI scan succeeded - detected 3 LUNs [ 569.102943] scsi host1: ib_srp: new target: id_ext 00debdfffebeef80 ioc_guid 00debdfffebeef80 sgid fe80:0000:0000:0000:00de:bdff:febe:ef80 dest 10.16.69.39 [ 569.104669] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0 [ 569.104673] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555 [ 569.104679] ib_srp:srp_parse_in: ib_srp: [2620:52:0:1040:de:bdff:febe:ef80] -> [2620:52:0:1040:de:bdff:febe:ef80]:0/168838439%0 [ 569.104683] ib_srp:srp_parse_in: ib_srp: [2620:52:0:1040:de:bdff:febe:ef80]:5555 -> [2620:52:0:1040:de:bdff:febe:ef80]:5555/168838439%0 [ 569.104686] scsi host2: ib_srp: Already connected to target port with id_ext=00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=2620:0052:0000:1040:00de:bdff:febe:ef80 [ 569.127417] sd 1:0:0:1: alua: transition timeout set to 60 seconds [ 569.127422] sd 1:0:0:1: alua: port group 00 state A non-preferred supports TOlUSNA [ 569.127518] sd 1:0:0:2: alua: transition timeout set to 60 seconds [ 569.127523] sd 1:0:0:2: alua: port group 00 state A non-preferred supports TOlUSNA [ 569.127803] sd 1:0:0:1: [sdd] 65536 512-byte logical blocks: (33.6 MB/32.0 MiB) [ 569.128220] sd 1:0:0:1: [sdd] Write Protect is off [ 569.128223] sd 1:0:0:1: [sdd] Mode Sense: 43 00 00 08 [ 569.128273] sd 1:0:0:1: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 569.128284] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION. [ 569.128320] sd 1:0:0:1: [sdd] Optimal transfer size 126976 bytes [ 569.140904] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0 [ 569.140909] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555 [ 569.140914] ib_srp:srp_parse_in: ib_srp: [2620:52:0:1040:de:bdff:febe:ef80] -> [2620:52:0:1040:de:bdff:febe:ef80]:0/168838439%0 [ 569.140919] ib_srp:srp_parse_in: ib_srp: [2620:52:0:1040:de:bdff:febe:ef80]:5555 -> [2620:52:0:1040:de:bdff:febe:ef80]:5555/168838439%0 [ 569.140926] ib_srp:srp_parse_in: ib_srp: [fe80::de:bdff:febe:ef80%2] -> [fe80::de:bdff:febe:ef80]:0/168838439%2 [ 569.140932] ib_srp:srp_parse_in: ib_srp: [fe80::de:bdff:febe:ef80%2]:5555 -> [fe80::de:bdff:febe:ef80]:5555/168838439%2 [ 569.140934] scsi host2: ib_srp: Already connected to target port with id_ext=00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=fe80:0000:0000:0000:00de:bdff:febe:ef80 [ 569.197816] tag=31/-1, e=16ac2000, rq cmd_flags 22, rq_flags 2800 [ 569.247577] sd 1:0:0:2: [sdc] Attached SCSI disk [ 569.248420] tag=25/-1, e=1c08c400, rq cmd_flags 22, rq_flags 2800 [ 569.248569] sd 1:0:0:1: [sdd] Attached SCSI disk [ 569.248931] tag=29/-1, e=16ac0800, rq cmd_flags 22, rq_flags 2800 [ 569.249108] sd 1:0:0:0: [sdb] Attached SCSI disk [ 569.305302] rdma_rxe: rxe_invalidate_mr: rkey (0x84f6) doesn't match mr->ibmr.rkey (0x84f7) [ 569.305315] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE 00000000a216794f [ 569.367710] scsi 1:0:0:0: alua: Detached [ 576.367413] scsi host1: SRP abort called [ 579.907417] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 579.907624] ib_srpt receiving failed for ioctx 000000005e721910 with status 5 [ 579.907631] ib_srpt receiving failed for ioctx 0000000056beb878 with status 5 [ 579.907634] ib_srpt receiving failed for ioctx 000000001f2eb0be with status 5 [ 579.907636] ib_srpt receiving failed for ioctx 00000000338f74b5 with status 5 [ 579.907639] ib_srpt receiving failed for ioctx 00000000d57a4874 with status 5 [ 579.907642] ib_srpt receiving failed for ioctx 00000000806f7498 with status 5 [ 579.907646] ib_srpt receiving failed for ioctx 00000000330004a8 with status 5 [ 579.907649] ib_srpt receiving failed for ioctx 00000000833f52ea with status 5 [ 579.907652] ib_srpt receiving failed for ioctx 00000000a4a4cd8e with status 5 [ 579.907656] ib_srpt receiving failed for ioctx 000000006f7d8b81 with status 5 [ 580.127744] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff [ 580.127890] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68 [ 580.130681] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib: max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 000000003dffffde [ 580.130694] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr 10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80 [ 580.130714] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection sess=00000000d7a559cb name=10.16.69.39 ch=000000003dffffde [ 580.130737] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 580.130740] scsi host1: ib_srp: using immediate data [ 580.130785] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-23: queued zerolength write [ 580.130801] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-23 wc->status 0 [ 580.130825] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff [ 580.130926] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68 [ 580.132805] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib: max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 00000000518459a5 [ 580.132820] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr 10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80 [ 580.132832] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection sess=00000000292cad95 name=10.16.69.39 ch=00000000518459a5 [ 580.132851] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 580.132852] scsi host1: ib_srp: using immediate data [ 580.132875] scsi host1: ib_srp: reconnect succeeded [ 580.132883] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-24: queued zerolength write [ 580.132894] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-24 wc->status 0 [ 582.027396] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-20: queued zerolength write [ 582.027406] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-18: queued zerolength write [ 582.027425] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-20 wc->status 5 [ 582.027429] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-18 wc->status 5 [ 582.027432] ib_srpt:srpt_release_channel_work: ib_srpt 10.16.69.39-20 [ 582.027440] ib_srpt:srpt_release_channel_work: ib_srpt 10.16.69.39-18 [ 614.717672] ib_srpt Closing channel 10.16.69.39-23 because target enc8000_rxe_1 has been disabled [ 614.717687] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-23: queued zerolength write [ 614.717693] ib_srpt Closing channel 10.16.69.39-24 because target enc8000_rxe_1 has been disabled [ 614.717696] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-24: queued zerolength write [ 614.717802] srpt_recv_done: 246 callbacks suppressed [ 614.717803] ib_srpt receiving failed for ioctx 000000005b444422 with status 5 [ 614.717809] ib_srpt receiving failed for ioctx 00000000776294fd with status 5 [ 614.717812] ib_srpt receiving failed for ioctx 000000002feadcfd with status 5 [ 614.717815] ib_srpt receiving failed for ioctx 000000006a886bb2 with status 5 [ 614.717818] ib_srpt receiving failed for ioctx 0000000016397dad with status 5 [ 614.717822] ib_srpt receiving failed for ioctx 00000000740fece1 with status 5 [ 614.717825] ib_srpt receiving failed for ioctx 00000000e7518963 with status 5 [ 614.717829] ib_srpt receiving failed for ioctx 00000000eb32ccfd with status 5 [ 614.717833] ib_srpt receiving failed for ioctx 000000009d275157 with status 5 [ 614.717837] ib_srpt receiving failed for ioctx 00000000549482a1 with status 5 [ 614.717865] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-23 wc->status 5 [ 614.717880] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-24 wc->status 5 [ 614.717894] ib_srpt:srpt_release_channel_work: ib_srpt 10.16.69.39-23 [ 614.717902] ib_srpt:srpt_release_channel_work: ib_srpt 10.16.69.39-24 [ 614.717915] scsi host1: ib_srp: received DREQ [ 614.717997] scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE 000000005a1c1db0 [ 614.721889] scsi host1: ib_srp: received DREQ [ 614.721905] ib_srpt:srpt_close_ch: ib_srpt 10.16.69.39: already closed [ 614.721964] ib_srpt:srpt_close_ch: ib_srpt 10.16.69.39: already closed [ 616.827436] scsi host1: ib_srp: connection closed [ 616.827446] scsi host1: ib_srp: connection closed [ 625.347390] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 625.567932] scsi host1: ib_srp: REJ received [ 625.567935] scsi host1: REJ reason 0x8 [ 625.567953] scsi host1: reconnect attempt 1 failed (-104) [ 629.787421] fast_io_fail_tmo expired for SRP port-1:1 / host1. [ 636.187360] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 636.407959] scsi host1: ib_srp: REJ received [ 636.407963] scsi host1: REJ reason 0x8 [ 636.407982] scsi host1: reconnect attempt 2 failed (-104) [ 646.427427] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 646.647993] scsi host1: ib_srp: REJ received [ 646.647997] scsi host1: REJ reason 0x8 [ 646.648017] scsi host1: reconnect attempt 3 failed (-104) [ 656.667405] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 656.887953] scsi host1: ib_srp: REJ received [ 656.887958] scsi host1: REJ reason 0x8 [ 656.887979] scsi host1: reconnect attempt 4 failed (-104) [ 666.907388] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 667.127961] scsi host1: ib_srp: REJ received [ 667.127965] scsi host1: REJ reason 0x8 [ 667.127989] scsi host1: reconnect attempt 5 failed (-104) [ 677.147368] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 677.377871] scsi host1: ib_srp: REJ received [ 677.377874] scsi host1: REJ reason 0x8 [ 677.377892] scsi host1: reconnect attempt 6 failed (-104) [ 687.387428] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 687.607973] scsi host1: ib_srp: REJ received [ 687.607977] scsi host1: REJ reason 0x8 [ 687.608006] scsi host1: reconnect attempt 7 failed (-104) [ 687.608014] sd 1:0:0:1: rejecting I/O to offline device [ 697.627395] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 697.848018] scsi host1: ib_srp: REJ received [ 697.848022] scsi host1: REJ reason 0x8 [ 697.848070] ------------[ cut here ]------------ [ 697.848076] WARNING: CPU: 1 PID: 1973 at block/blk-mq.c:294 blk_mq_unquiesce_queue+0xb2/0xc8 [ 697.848087] Modules linked in: ib_srp scsi_transport_srp rdma_cm iw_cm ib_cm ib_umad scsi_debug rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc qeth qdio ccwgroup vfio_ccw mdev zcrypt_cex4 vfio_iommu_type1 vfio drm fb fuse font drm_panel_orientation_quirks i2c_core backlight zram ip_tables xfs crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod pkey zcrypt [last unloaded: target_core_mod] [ 697.848149] CPU: 1 PID: 1973 Comm: kworker/1:5 Not tainted 5.15.0+ #4 [ 697.848154] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) [ 697.848158] Workqueue: events_long srp_reconnect_work [scsi_transport_srp] [ 697.848168] Krnl PSW : 0404c00180000000 000000003253c696 (blk_mq_unquiesce_queue+0xb6/0xc8) [ 697.848183] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 697.848187] Krnl GPRS: 00000000000000d4 0000000000000000 00000000078b4f30 0000000000000002 [ 697.848190] 0000000000000004 0000000000000009 ffffffffffffff98 00000000326f7df0 [ 697.848192] 000003800729fcdc 0000000000000007 070003800729fb00 00000000078b4ec0 [ 697.848194] 000000001b9b2100 0000000000000000 000003800729fb00 000003800729fac0 [ 697.848203] Krnl Code: 000000003253c68a: a7180001 lhi %r1,1 000000003253c68e: a7f4ffd7 brc 15,000000003253c63c #000000003253c692: af000000 mc 0,0 >000000003253c696: a7180000 lhi %r1,0 000000003253c69a: a7f4ffd1 brc 15,000000003253c63c 000000003253c69e: c0e500071f89 brasl %r14,00000000326205b0 000000003253c6a4: a7f4ffbe brc 15,000000003253c620 000000003253c6a8: c004002ce124 brcl 0,0000000032ad88f0 [ 697.848221] Call Trace: [ 697.848224] [<000000003253c696>] blk_mq_unquiesce_queue+0xb6/0xc8 [ 697.848230] [<00000000326f7da0>] scsi_internal_device_unblock_nowait+0x50/0xa0 [ 697.848235] [<00000000326f7e30>] device_unblock+0x40/0x50 [ 697.848238] [<00000000326ef9c8>] starget_for_each_device+0xa8/0xd0 [ 697.848244] [<00000000326f862e>] target_unblock+0x56/0x68 [ 697.848247] [<00000000326b7018>] device_for_each_child+0x60/0xa0 [ 697.848251] [<00000000326f7ea6>] scsi_target_unblock+0x66/0x78 [ 697.848253] [<000003ff808e3872>] srp_reconnect_rport+0x202/0x238 [scsi_transport_srp] [ 697.848340] [<000003ff808e3902>] srp_reconnect_work+0x5a/0xf0 [scsi_transport_srp] [ 697.848345] [<0000000031f8b62a>] process_one_work+0x21a/0x498 [ 697.848349] [<0000000031f8bdd4>] worker_thread+0x64/0x498 [ 697.848351] [<0000000031f9471c>] kthread+0x184/0x190 [ 697.848356] [<0000000031f1f468>] __ret_from_fork+0x40/0x58 [ 697.848359] [<0000000032a6d2da>] ret_from_fork+0xa/0x30 [ 697.848365] Last Breaking-Event-Address: [ 697.848367] [<0000000000000000>] 0x0 [ 697.848370] ---[ end trace 270726b44805023e ]--- [ 697.848374] scsi host1: reconnect attempt 8 failed (-104) [ 707.867368] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 708.087976] scsi host1: ib_srp: REJ received [ 708.087981] scsi host1: REJ reason 0x8 [ 708.088024] scsi host1: reconnect attempt 9 failed (-104) [ 718.107392] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 718.327884] scsi host1: ib_srp: REJ received [ 718.327888] scsi host1: REJ reason 0x8 [ 718.327929] scsi host1: reconnect attempt 10 failed (-104) [ 728.347398] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 728.567888] scsi host1: ib_srp: REJ received [ 728.567891] scsi host1: REJ reason 0x8 [ 728.567931] scsi host1: reconnect attempt 11 failed (-104) [ 735.447460] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 738.587419] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 738.807992] scsi host1: ib_srp: REJ received [ 738.807995] scsi host1: REJ reason 0x8 [ 738.808049] scsi host1: reconnect attempt 12 failed (-104) [ 759.067426] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 759.288047] scsi host1: ib_srp: REJ received [ 759.288051] scsi host1: REJ reason 0x8 [ 759.288103] scsi host1: reconnect attempt 13 failed (-104) [ 789.787399] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 790.007875] scsi host1: ib_srp: REJ received [ 790.007878] scsi host1: REJ reason 0x8 [ 790.007922] scsi host1: reconnect attempt 14 failed (-104) [ 830.107393] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 830.329148] scsi host1: ib_srp: REJ received [ 830.329152] scsi host1: REJ reason 0x8 [ 830.329221] scsi host1: reconnect attempt 15 failed (-104) [ 883.867389] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 884.087954] scsi host1: ib_srp: REJ received [ 884.087958] scsi host1: REJ reason 0x8 [ 884.088004] scsi host1: reconnect attempt 16 failed (-104) [ 945.307398] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 945.527867] scsi host1: ib_srp: REJ received [ 945.527870] scsi host1: REJ reason 0x8 [ 945.527914] scsi host1: reconnect attempt 17 failed (-104) [ 1016.987360] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 1017.207882] scsi host1: ib_srp: REJ received [ 1017.207885] scsi host1: REJ reason 0x8 [ 1017.207937] scsi host1: reconnect attempt 18 failed (-104) [ 1098.907436] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 1099.127947] scsi host1: ib_srp: REJ received [ 1099.127951] scsi host1: REJ reason 0x8 [ 1099.128003] scsi host1: reconnect attempt 19 failed (-104) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 4a6789e4398b..1b7647722ec0 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -429,7 +429,8 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head, > struct blk_mq_ctx *ctx = rq->mq_ctx; > struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > > - WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG)); > + if (e && (rq->tag != BLK_MQ_NO_TAG)) > + printk("tag=%d/%d, e=%lx, rq cmd_flags %x, rq_flags %x\n", rq->tag, rq->internal_tag, (long) e, rq->cmd_flags, rq->rq_flags); > > if (blk_mq_sched_bypass_insert(hctx, rq)) { > /* > > -- > Jens Axboe > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 2:21 ` Yi Zhang @ 2021-11-03 3:21 ` Jens Axboe 2021-11-03 3:51 ` Ming Lei [not found] ` <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2> 1 sibling, 1 reply; 24+ messages in thread From: Jens Axboe @ 2021-11-03 3:21 UTC (permalink / raw) To: Yi Zhang; +Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/2/21 8:21 PM, Yi Zhang wrote: >> >> Can either one of you try with this patch? Won't fix anything, but it'll >> hopefully shine a bit of light on the issue. >> > Hi Jens > > Here is the full log: Thanks! I think I see what it could be - can you try this one as well, would like to confirm that the condition I think is triggering is what is triggering. diff --git a/block/blk-mq.c b/block/blk-mq.c index 07eb1412760b..81dede885231 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) if (plug && plug->cached_rq) { rq = rq_list_pop(&plug->cached_rq); INIT_LIST_HEAD(&rq->queuelist); + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); } else { struct blk_mq_alloc_data data = { .q = q, @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) bio_wouldblock_error(bio); goto queue_exit; } + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); } trace_block_getrq(bio); -- Jens Axboe ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 3:21 ` Jens Axboe @ 2021-11-03 3:51 ` Ming Lei 2021-11-03 3:54 ` Jens Axboe 0 siblings, 1 reply; 24+ messages in thread From: Ming Lei @ 2021-11-03 3:51 UTC (permalink / raw) To: Jens Axboe Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: > On 11/2/21 8:21 PM, Yi Zhang wrote: > >> > >> Can either one of you try with this patch? Won't fix anything, but it'll > >> hopefully shine a bit of light on the issue. > >> > > Hi Jens > > > > Here is the full log: > > Thanks! I think I see what it could be - can you try this one as well, > would like to confirm that the condition I think is triggering is what > is triggering. > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 07eb1412760b..81dede885231 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) > if (plug && plug->cached_rq) { > rq = rq_list_pop(&plug->cached_rq); > INIT_LIST_HEAD(&rq->queuelist); > + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > } else { > struct blk_mq_alloc_data data = { > .q = q, > @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) > bio_wouldblock_error(bio); > goto queue_exit; > } > + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); Hello Jens, I guess the issue could be the following code run without grabbing ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). .rq_flags = q->elevator ? RQF_ELV : 0, then elevator is switched to real one from none, and check on q->elevator becomes not consistent. Thanks, Ming ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 3:51 ` Ming Lei @ 2021-11-03 3:54 ` Jens Axboe 2021-11-03 4:00 ` Yi Zhang 2021-11-03 11:59 ` Jens Axboe 0 siblings, 2 replies; 24+ messages in thread From: Jens Axboe @ 2021-11-03 3:54 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: > > On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>> >>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>> hopefully shine a bit of light on the issue. >>>>> >>> Hi Jens >>> >>> Here is the full log: >> >> Thanks! I think I see what it could be - can you try this one as well, >> would like to confirm that the condition I think is triggering is what >> is triggering. >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 07eb1412760b..81dede885231 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >> if (plug && plug->cached_rq) { >> rq = rq_list_pop(&plug->cached_rq); >> INIT_LIST_HEAD(&rq->queuelist); >> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >> } else { >> struct blk_mq_alloc_data data = { >> .q = q, >> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >> bio_wouldblock_error(bio); >> goto queue_exit; >> } >> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > > Hello Jens, > > I guess the issue could be the following code run without grabbing > ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > > .rq_flags = q->elevator ? RQF_ELV : 0, > > then elevator is switched to real one from none, and check on q->elevator > becomes not consistent. Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 3:54 ` Jens Axboe @ 2021-11-03 4:00 ` Yi Zhang 2021-11-03 19:03 ` Jens Axboe 2021-11-03 11:59 ` Jens Axboe 1 sibling, 1 reply; 24+ messages in thread From: Yi Zhang @ 2021-11-03 4:00 UTC (permalink / raw) To: Jens Axboe Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi > > > > Hello Jens, > > > > I guess the issue could be the following code run without grabbing > > ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > > > > .rq_flags = q->elevator ? RQF_ELV : 0, > > > > then elevator is switched to real one from none, and check on q->elevator > > becomes not consistent. > > Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well. > Hi Jens Here is the log in case you still need it. :) [ 147.962222] run blktests srp/001 at 2021-11-02 23:57:02 [ 148.220309] alua: device handler registered [ 148.223332] emc: device handler registered [ 148.226203] rdac: device handler registered [ 148.231724] null_blk: module loaded [ 150.275727] rdma_rxe: loaded [ 150.281728] infiniband enc8000_rxe: set active [ 150.281732] infiniband enc8000_rxe: added enc8000 [ 150.381380] scsi_debug:sdebug_add_store: dif_storep 524288 bytes @ 0000000098 4c2b06 [ 150.382109] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0 . poll_q/nr_hw = (0/1) [ 150.382112] scsi_debug:sdebug_driver_probe: host protection DIF3 DIX3 [ 150.382116] scsi host0: scsi_debug: version 0190 [20200710] [ 150.382116] dev_size_mb=32, opts=0x0, submit_queues=1, statistics=0 [ 150.382802] scsi 0:0:0:0: Direct-Access Linux scsi_debug 0190 PQ : 0 ANSI: 7 [ 150.383007] sd 0:0:0:0: Power-on or device reset occurred [ 150.383029] sd 0:0:0:0: [sda] Enabling DIF Type 3 protection [ 150.383053] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.6 MB/32.0 Mi B) [ 150.383061] sd 0:0:0:0: [sda] Write Protect is off [ 150.383075] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supp orts DPO and FUA [ 150.383093] sd 0:0:0:0: [sda] Optimal transfer size 524288 bytes [ 150.383104] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 150.467327] sd 0:0:0:0: [sda] Enabling DIX T10-DIF-TYPE3-CRC protection [ 150.467332] sd 0:0:0:0: [sda] DIF application tag size 6 [ 150.547390] sd 0:0:0:0: [sda] Attached SCSI disk [ 150.655136] Rounding down aligned max_sectors from 4294967295 to 4294967288 [ 151.972911] Rounding down aligned max_sectors from 255 to 248 [ 151.982162] Rounding down aligned max_sectors from 255 to 248 [ 151.991500] Rounding down aligned max_sectors from 4294967295 to 4294967288 [ 153.254537] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000 :00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_i u_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff [ 153.262242] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000 :00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_i u_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff [ 153.264644] scsi host1: SRP.T10:00DEBDFFFEBEEF80 [ 153.265188] scsi 1:0:0:0: Direct-Access LIO-ORG IBLOCK 4.0 PQ : 0 ANSI: 6 [ 153.265611] scsi 1:0:0:0: alua: supports implicit and explicit TPGS [ 153.265618] scsi 1:0:0:0: alua: device naa.60014056e756c6c62300000000000000 ort group 0 rel port 1 [ 153.265765] sd 1:0:0:0: Warning! Received an indication that the LUN assignme nts on this target have changed. The Linux SCSI layer does not automatical [ 153.265782] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 153.287283] sd 1:0:0:0: alua: transition timeout set to 60 seconds [ 153.287299] sd 1:0:0:0: alua: port group 00 state A non-preferred supports TO lUSNA [ 153.287479] scsi 1:0:0:2: Direct-Access LIO-ORG IBLOCK 4.0 PQ : 0 ANSI: 6 [ 153.287656] sd 1:0:0:0: [sdb] 65536 512-byte logical blocks: (33.6 MB/32.0 Mi B) [ 153.287687] scsi 1:0:0:2: alua: supports implicit and explicit TPGS [ 153.287691] scsi 1:0:0:2: alua: device naa.60014057363736964626700000000000 p ort group 0 rel port 1 [ 153.287824] sd 1:0:0:2: Attached scsi generic sg2 type 0 [ 153.288006] sd 1:0:0:0: [sdb] Write Protect is off [ 153.288133] scsi 1:0:0:1: Direct-Access LIO-ORG IBLOCK 4.0 PQ : 0 ANSI: 6 [ 153.288161] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doe sn't support DPO or FUA [ 153.288171] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CON DITION. [ 153.288817] sd 1:0:0:0: [sdb] Optimal transfer size 126976 bytes [ 153.289825] sd 1:0:0:2: [sdc] 65536 512-byte logical blocks: (33.6 MB/32.0 Mi B) [ 153.289853] scsi 1:0:0:1: alua: supports implicit and explicit TPGS [ 153.289857] scsi 1:0:0:1: alua: device naa.60014056e756c6c62310000000000000 p ort group 0 rel port 1 [ 153.289981] sd 1:0:0:1: Warning! Received an indication that the LUN assignme nts on this target have changed. The Linux SCSI layer does not automatical [ 153.289993] sd 1:0:0:1: Attached scsi generic sg3 type 0 [ 153.291415] sd 1:0:0:2: [sdc] Write Protect is off [ 153.291574] sd 1:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supp orts DPO and FUA [ 153.291584] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CON DITION. [ 153.291634] sd 1:0:0:2: [sdc] Optimal transfer size 524288 bytes [ 153.293587] scsi host2: ib_srp: Already connected to target port with id_ext= 00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=2620:0052:0000:1040:00de:bdff:fe be:ef80 [ 153.327364] sd 1:0:0:2: alua: transition timeout set to 60 seconds [ 153.327371] sd 1:0:0:2: alua: port group 00 state A non-preferred supports TO lUSNA [ 153.329782] scsi host2: ib_srp: Already connected to target port with id_ext= 00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=fe80:0000:0000:0000:00de:bdff:fe be:ef80 [ 153.347178] sd 1:0:0:1: alua: transition timeout set to 60 seconds [ 153.347183] sd 1:0:0:1: alua: port group 00 state A non-preferred supports TO lUSNA [ 153.347301] sd 1:0:0:1: [sdd] 65536 512-byte logical blocks: (33.6 MB/32.0 Mi B) [ 153.347327] sd 1:0:0:1: [sdd] Write Protect is off [ 153.347376] sd 1:0:0:1: [sdd] Write cache: disabled, read cache: enabled, doe sn't support DPO or FUA [ 153.347386] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CON DITION. [ 153.347423] sd 1:0:0:1: [sdd] Optimal transfer size 126976 bytes [ 153.397703] ------------[ cut here ]------------ [ 153.397707] WARNING: CPU: 1 PID: 38 at block/blk-mq-sched.c:432 blk_mq_sched_ insert_request+0x54/0x178 [ 153.397719] Modules linked in: ib_srp scsi_transport_srp target_core_pscsi ta rget_core_file ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm ib _umad scsi_debug rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_r dac scsi_dh_emc scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc qeth qdio ccwgroup vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 drm fb font d rm_panel_orientation_quirks i2c_core fuse backlight zram ip_tables xfs crc32_vx_ s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod pkey zcrypt [ 153.397770] CPU: 1 PID: 38 Comm: kworker/u128:1 Not tainted 5.15.0.v2+ #5 [ 153.397774] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) [ 153.397776] Workqueue: events_unbound async_run_entry_fn [ 153.397783] Krnl PSW : 0704e00180000000 000000002bd6e0c0 (blk_mq_sched_insert _request+0x58/0x178) [ 153.397788] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI: 0 EA:3 [ 153.397792] Krnl GPRS: 0000000000000004 0000000000000022 000000001da68800 000 0000000000001 [ 153.397794] 0000000000000001 0000000000000000 000000000c99f800 000 00000168b1c00 [ 153.397796] 0000000000000000 0000000000000001 0000000000000001 000 000001da68800 [ 153.397799] 0000000000f5c200 000003ff7e82b000 0000038000353830 00 00380003537c0 [ 153.397807] Krnl Code: 000000002bd6e0b4: a71effff chi %r1,-1 [ 153.397807] 000000002bd6e0b8: a7840004 brc 8,000000 002bd6e0c0 [ 153.397807] #000000002bd6e0bc: af000000 mc 0,0 [ 153.397807] >000000002bd6e0c0: 5810b01c l %r1,28(% r11) [ 153.397807] 000000002bd6e0c4: ec213bbb0055 risbg %r2,%r1, 59,187,0 [ 153.397807] 000000002bd6e0ca: a7740057 brc 7,000000 002bd6e178 [ 153.397807] 000000002bd6e0ce: 5810b018 l %r1,24(% r11) [ 153.397807] 000000002bd6e0d2: c01b000000ff nilf %r1,255 [ 153.397827] Call Trace: [ 153.397829] [<000000002bd6e0c0>] blk_mq_sched_insert_request+0x58/0x178 [ 153.397834] [<000000002bd5e926>] blk_execute_rq+0x56/0xd8 [ 153.397840] [<000000002bf20d50>] __scsi_execute+0x110/0x230 [ 153.397846] [<000000002bf20fb2>] scsi_mode_sense+0x142/0x340 [ 153.397849] [<000000002bf322c6>] sd_revalidate_disk.isra.0+0x74e/0x2240 [ 153.397853] [<000000002bf342ea>] sd_probe+0x312/0x4b0 [ 153.397856] [<000000002bee76e0>] really_probe+0xd0/0x4b0 [ 153.397862] [<000000002bee7c70>] driver_probe_device+0x40/0xf0 [ 153.397865] [<000000002bee837c>] __device_attach_driver+0xa4/0x128 [ 153.397869] [<000000002bee4a80>] bus_for_each_drv+0x88/0xc0 [ 153.397872] [<000000002bee6be0>] __device_attach_async_helper+0x90/0xf0 [ 153.397875] [<000000002b7c0d1e>] async_run_entry_fn+0x4e/0x1b0 [ 153.397878] [<000000002b7b362a>] process_one_work+0x21a/0x498 [ 153.397881] [<000000002b7b3dd4>] worker_thread+0x64/0x498 [ 153.397884] [<000000002b7bc71c>] kthread+0x184/0x190 [ 153.397889] [<000000002b747468>] __ret_from_fork+0x40/0x58 [ 153.397892] [<000000002c2952fa>] ret_from_fork+0xa/0x30 [ 153.397899] Last Breaking-Event-Address: [ 153.397901] [<0000000000000000>] 0x0 [ 153.397903] ---[ end trace e8c7933cbb1a7d90 ]--- [ 153.398050] sd 1:0:0:0: [sdb] Attached SCSI disk [ 153.428144] ------------[ cut here ]------------ [ 153.428148] WARNING: CPU: 0 PID: 7 at block/blk-mq-sched.c:432 blk_mq_sched_i nsert_request+0x54/0x178 [ 153.428154] Modules linked in: ib_srp scsi_transport_srp target_core_pscsi ta rget_core_file ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm i _umad scsi_debug rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_r dac scsi_dh_emc scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc qeth qdio ccwgroup vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 drm fb font d rm_panel_orientation_quirks i2c_core fuse backlight zram ip_tables xfs crc32_vx_ s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod pkey zcrypt [ 153.428200] CPU: 0 PID: 7 Comm: kworker/u128:0 Tainted: G W 5. 15.0.v2+ #5 [ 153.428203] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) [ 153.428205] Workqueue: events_unbound async_run_entry_fn [ 153.428208] Krnl PSW : 0704e00180000000 000000002bd6e0c0 (blk_mq_sched_insert _request+0x58/0x178) [ 153.428214] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI: 0 EA:3 [ 153.428218] Krnl GPRS: 0000000000000004 000000000000003a 000000001d86e800 000 0000000000001 [ 153.428220] 0000000000000001 0000000000000000 000000001d83d800 000 000001e74dc00 [ 153.428222] 0000000000000000 0000000000000001 0000000000000001 000 000001d86e800 [ 153.428225] 00000000006ba100 000003ff7e80b700 000003800003f830 000 003800003f7c0 [ 153.428231] Krnl Code: 000000002bd6e0b4: a71effff chi %r1,-1 [ 153.428231] 000000002bd6e0b8: a7840004 brc 8,000000 002bd6e0c0 [ 153.428231] #000000002bd6e0bc: af000000 mc 0,0 [ 153.428231] >000000002bd6e0c0: 5810b01c l %r1,28(% r11) [ 153.428231] 000000002bd6e0c4: ec213bbb0055 risbg %r2,%r1, 59,187,0 [ 153.428231] 000000002bd6e0ca: a7740057 brc 7,000000 002bd6e178 [ 153.428231] 000000002bd6e0ce: 5810b018 l %r1,24(% r11) [ 153.428231] 000000002bd6e0d2: c01b000000ff nilf %r1,255 [ 153.428251] Call Trace: [ 153.428253] [<000000002bd6e0c0>] blk_mq_sched_insert_request+0x58/0x178 [ 153.428257] [<000000002bd5e926>] blk_execute_rq+0x56/0xd8 [ 153.428261] [<000000002bf20d50>] __scsi_execute+0x110/0x230 [ 153.428264] [<000000002bf20fb2>] scsi_mode_sense+0x142/0x340 [ 153.428267] [<000000002bf322c6>] sd_revalidate_disk.isra.0+0x74e/0x2240 [ 153.428270] [<000000002bf342ea>] sd_probe+0x312/0x4b0 [ 153.428273] [<000000002bee76e0>] really_probe+0xd0/0x4b0 [ 153.428277] [<000000002bee7c70>] driver_probe_device+0x40/0xf0 [ 153.428280] [<000000002bee837c>] __device_attach_driver+0xa4/0x128 [ 153.428283] [<000000002bee4a80>] bus_for_each_drv+0x88/0xc0 [ 153.428286] [<000000002bee6be0>] __device_attach_async_helper+0x90/0xf0 [ 153.428289] [<000000002b7c0d1e>] async_run_entry_fn+0x4e/0x1b0 [ 153.428292] [<000000002b7b362a>] process_one_work+0x21a/0x498 [ 153.428295] [<000000002b7b3dd4>] worker_thread+0x64/0x498 [ 153.428298] [<000000002b7bc71c>] kthread+0x184/0x190 [ 153.428301] [<000000002b747468>] __ret_from_fork+0x40/0x58 [ 153.428304] [<000000002c2952fa>] ret_from_fork+0xa/0x30 [ 153.428308] Last Breaking-Event-Address: [ 153.428309] [<0000000000000000>] 0x0 [ 153.428311] ---[ end trace e8c7933cbb1a7d91 ]--- -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 4:00 ` Yi Zhang @ 2021-11-03 19:03 ` Jens Axboe 2021-11-05 11:13 ` Yi Zhang 0 siblings, 1 reply; 24+ messages in thread From: Jens Axboe @ 2021-11-03 19:03 UTC (permalink / raw) To: Yi Zhang Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/2/21 10:00 PM, Yi Zhang wrote: >>> >>> Hello Jens, >>> >>> I guess the issue could be the following code run without grabbing >>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>> >>> .rq_flags = q->elevator ? RQF_ELV : 0, >>> >>> then elevator is switched to real one from none, and check on q->elevator >>> becomes not consistent. >> >> Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well. >> > > Hi Jens > Here is the log in case you still need it. :) Can you retry with the updated for-next pulled into -git? -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 19:03 ` Jens Axboe @ 2021-11-05 11:13 ` Yi Zhang 0 siblings, 0 replies; 24+ messages in thread From: Yi Zhang @ 2021-11-05 11:13 UTC (permalink / raw) To: Jens Axboe Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On Thu, Nov 4, 2021 at 3:03 AM Jens Axboe <axboe@kernel.dk> wrote: > > On 11/2/21 10:00 PM, Yi Zhang wrote: > >>> > >>> Hello Jens, > >>> > >>> I guess the issue could be the following code run without grabbing > >>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > >>> > >>> .rq_flags = q->elevator ? RQF_ELV : 0, > >>> > >>> then elevator is switched to real one from none, and check on q->elevator > >>> becomes not consistent. > >> > >> Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well. > >> > > > > Hi Jens > > Here is the log in case you still need it. :) > > Can you retry with the updated for-next pulled into -git? Hi Jens Sorry for the delay, the issue cannot be reproduced now. > > -- > Jens Axboe > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 3:54 ` Jens Axboe 2021-11-03 4:00 ` Yi Zhang @ 2021-11-03 11:59 ` Jens Axboe 2021-11-03 13:59 ` Yi Zhang 1 sibling, 1 reply; 24+ messages in thread From: Jens Axboe @ 2021-11-03 11:59 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/2/21 9:54 PM, Jens Axboe wrote: > On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: >> >> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>> >>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>> hopefully shine a bit of light on the issue. >>>>>> >>>> Hi Jens >>>> >>>> Here is the full log: >>> >>> Thanks! I think I see what it could be - can you try this one as well, >>> would like to confirm that the condition I think is triggering is what >>> is triggering. >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index 07eb1412760b..81dede885231 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>> if (plug && plug->cached_rq) { >>> rq = rq_list_pop(&plug->cached_rq); >>> INIT_LIST_HEAD(&rq->queuelist); >>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>> } else { >>> struct blk_mq_alloc_data data = { >>> .q = q, >>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>> bio_wouldblock_error(bio); >>> goto queue_exit; >>> } >>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >> >> Hello Jens, >> >> I guess the issue could be the following code run without grabbing >> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >> >> .rq_flags = q->elevator ? RQF_ELV : 0, >> >> then elevator is switched to real one from none, and check on q->elevator >> becomes not consistent. > > Indeed, that’s where I was going with this. I have a patch, testing it > locally but it’s getting late. Will send it out tomorrow. The nice > benefit is that it allows dropping the weird ref get on plug flush, > and batches getting the refs as well. Yi/Steffen, can you try pulling this into your test kernel: git://git.kernel.dk/linux-block for-next and see if it fixes the issue for you. Thanks! -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 11:59 ` Jens Axboe @ 2021-11-03 13:59 ` Yi Zhang 2021-11-03 14:26 ` Jens Axboe 2021-11-03 14:57 ` Ming Lei 0 siblings, 2 replies; 24+ messages in thread From: Yi Zhang @ 2021-11-03 13:59 UTC (permalink / raw) To: Jens Axboe Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 11/2/21 9:54 PM, Jens Axboe wrote: > > On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: > >> > >> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: > >>>> On 11/2/21 8:21 PM, Yi Zhang wrote: > >>>>>> > >>>>>> Can either one of you try with this patch? Won't fix anything, but it'll > >>>>>> hopefully shine a bit of light on the issue. > >>>>>> > >>>> Hi Jens > >>>> > >>>> Here is the full log: > >>> > >>> Thanks! I think I see what it could be - can you try this one as well, > >>> would like to confirm that the condition I think is triggering is what > >>> is triggering. > >>> > >>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>> index 07eb1412760b..81dede885231 100644 > >>> --- a/block/blk-mq.c > >>> +++ b/block/blk-mq.c > >>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) > >>> if (plug && plug->cached_rq) { > >>> rq = rq_list_pop(&plug->cached_rq); > >>> INIT_LIST_HEAD(&rq->queuelist); > >>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > >>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > >>> } else { > >>> struct blk_mq_alloc_data data = { > >>> .q = q, > >>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) > >>> bio_wouldblock_error(bio); > >>> goto queue_exit; > >>> } > >>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > >>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > >> > >> Hello Jens, > >> > >> I guess the issue could be the following code run without grabbing > >> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > >> > >> .rq_flags = q->elevator ? RQF_ELV : 0, > >> > >> then elevator is switched to real one from none, and check on q->elevator > >> becomes not consistent. > > > > Indeed, that’s where I was going with this. I have a patch, testing it > > locally but it’s getting late. Will send it out tomorrow. The nice > > benefit is that it allows dropping the weird ref get on plug flush, > > and batches getting the refs as well. > > Yi/Steffen, can you try pulling this into your test kernel: > > git://git.kernel.dk/linux-block for-next > > and see if it fixes the issue for you. Thanks! It still can be reproduced with the latest linux-block/for-next, here is the log fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next [ 965.892911] run blktests srp/001 at 2021-11-03 09:54:14 [ 966.069421] alua: device handler registered [ 966.072163] emc: device handler registered [ 966.074955] rdac: device handler registered [ 966.079931] null_blk: module loaded [ 966.207798] rdma_rxe: loaded [ 966.213462] infiniband enc8000_rxe: set active [ 966.213467] infiniband enc8000_rxe: added enc8000 [ 966.259104] scsi_debug:sdebug_add_store: dif_storep 524288 bytes @ 00000000340c6f55 [ 966.259306] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1) [ 966.259309] scsi_debug:sdebug_driver_probe: host protection DIF3 DIX3 [ 966.259314] scsi host0: scsi_debug: version 0190 [20200710] dev_size_mb=32, opts=0x0, submit_queues=1, statistics=0 [ 966.259933] scsi 0:0:0:0: Direct-Access Linux scsi_debug 0190 PQ: 0 ANSI: 7 [ 966.260273] sd 0:0:0:0: Power-on or device reset occurred [ 966.260299] sd 0:0:0:0: [sda] Enabling DIF Type 3 protection [ 966.260327] sd 0:0:0:0: [sda] 65536 512-byte logical blocks: (33.6 MB/32.0 MiB) [ 966.260337] sd 0:0:0:0: [sda] Write Protect is off [ 966.260341] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 [ 966.260352] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 966.260372] sd 0:0:0:0: [sda] Optimal transfer size 524288 bytes [ 966.261748] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 966.350416] sd 0:0:0:0: [sda] Enabling DIX T10-DIF-TYPE3-CRC protection [ 966.350422] sd 0:0:0:0: [sda] DIF application tag size 6 [ 966.450576] sd 0:0:0:0: [sda] Attached SCSI disk [ 966.667676] Rounding down aligned max_sectors from 4294967295 to 4294967288 [ 966.703431] ib_srpt:srpt_add_one: ib_srpt device = 00000000d6a9642e [ 966.703441] ib_srpt:srpt_use_srq: ib_srpt srpt_use_srq(enc8000_rxe): use_srq = 0; ret = 0 [ 966.703443] ib_srpt:srpt_add_one: ib_srpt Target login info: id_ext=00debdfffebeef80,ioc_guid=00debdfffebeef80,pkey=ffff,service_id=00debdfffebeef80 [ 966.703499] ib_srpt:srpt_add_one: ib_srpt added enc8000_rxe. [ 967.002396] Rounding down aligned max_sectors from 255 to 248 [ 967.011605] Rounding down aligned max_sectors from 255 to 248 [ 967.037191] Rounding down aligned max_sectors from 4294967295 to 4294967288 [ 967.130091] ib_srp:srp_add_one: ib_srp: srp_add_one: 18446744073709551615 / 4096 = 4503599627370495 <> 512 [ 967.130097] ib_srp:srp_add_one: ib_srp: enc8000_rxe: mr_page_shift = 12, device->max_mr_size = 0xffffffffffffffff, device->max_fast_reg_page_list_len = 512, max_pages_per_mr = 512, mr_max_size = 0x200000 [ 967.148347] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0 [ 967.148352] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555 [ 967.148355] ib_srp:add_target_store: ib_srp: max_sectors = 1024; max_pages_per_mr = 512; mr_page_size = 4096; max_sectors_per_mr = 4096; mr_per_cmd = 2 [ 967.148358] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 967.148754] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff [ 967.148856] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68 [ 967.151639] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib: max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 00000000d562fdc1 [ 967.151654] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr 10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80 [ 967.151674] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection sess=0000000047d911b7 name=10.16.69.39 ch=00000000d562fdc1 [ 967.151699] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 967.151702] scsi host1: ib_srp: using immediate data [ 967.152088] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-18: queued zerolength write [ 967.152101] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-18 wc->status 0 [ 967.152444] ib_srpt Received SRP_LOGIN_REQ with i_port_id fe80:0000:0000:0000:00de:bdff:febe:ef80, t_port_id 00de:bdff:febe:ef80:00de:bdff:febe:ef80 and it_iu_len 8260 on port 1 (guid=fe80:0000:0000:0000:00de:bdff:febe:ef80); pkey 0xffff [ 967.152538] ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68 [ 967.154559] ib_srpt:srpt_create_ch_ib: ib_srpt srpt_create_ch_ib: max_cqe= 8191 max_sge= 32 sq_size = 4096 ch= 000000001d71a872 [ 967.154572] ib_srpt:srpt_cm_req_recv: ib_srpt registering src addr 10.16.69.39 or i_port_id 0xfe8000000000000000debdfffebeef80 [ 967.154586] ib_srpt:srpt_cm_req_recv: ib_srpt Establish connection sess=00000000af249bba name=10.16.69.39 ch=000000001d71a872 [ 967.154607] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 967.154609] scsi host1: ib_srp: using immediate data [ 967.155020] ib_srpt:srpt_zerolength_write: ib_srpt 10.16.69.39-20: queued zerolength write [ 967.155031] ib_srpt:srpt_zerolength_write_done: ib_srpt 10.16.69.39-20 wc->status 0 [ 967.155036] scsi host1: SRP.T10:00DEBDFFFEBEEF80 [ 967.155533] scsi 1:0:0:0: Direct-Access LIO-ORG IBLOCK 4.0 PQ: 0 ANSI: 6 [ 967.155909] scsi 1:0:0:0: alua: supports implicit and explicit TPGS [ 967.155914] scsi 1:0:0:0: alua: device naa.60014056e756c6c62300000000000000 port group 0 rel port 1 [ 967.156070] sd 1:0:0:0: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical [ 967.156091] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 967.170537] sd 1:0:0:0: alua: transition timeout set to 60 seconds [ 967.170543] sd 1:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA [ 967.170806] sd 1:0:0:0: [sdb] 65536 512-byte logical blocks: (33.6 MB/32.0 MiB) [ 967.170855] sd 1:0:0:0: [sdb] Write Protect is off [ 967.170858] sd 1:0:0:0: [sdb] Mode Sense: 43 00 00 08 [ 967.170907] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 967.170918] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION. [ 967.170956] sd 1:0:0:0: [sdb] Optimal transfer size 126976 bytes [ 967.172013] scsi 1:0:0:2: Direct-Access LIO-ORG IBLOCK 4.0 PQ: 0 ANSI: 6 [ 967.172138] scsi 1:0:0:2: alua: supports implicit and explicit TPGS [ 967.172142] scsi 1:0:0:2: alua: device naa.60014057363736964626700000000000 port group 0 rel port 1 [ 967.172205] sd 1:0:0:2: Attached scsi generic sg2 type 0 [ 967.172404] sd 1:0:0:2: [sdc] 65536 512-byte logical blocks: (33.6 MB/32.0 MiB) [ 967.172425] scsi 1:0:0:1: Direct-Access LIO-ORG IBLOCK 4.0 PQ: 0 ANSI: 6 [ 967.172434] sd 1:0:0:2: [sdc] Write Protect is off [ 967.172437] sd 1:0:0:2: [sdc] Mode Sense: 43 00 10 08 [ 967.172488] sd 1:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 967.172499] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION. [ 967.172538] sd 1:0:0:2: [sdc] Optimal transfer size 524288 bytes [ 967.172542] scsi 1:0:0:1: alua: supports implicit and explicit TPGS [ 967.172546] scsi 1:0:0:1: alua: device naa.60014056e756c6c62310000000000000 port group 0 rel port 1 [ 967.172624] sd 1:0:0:1: Attached scsi generic sg3 type 0 [ 967.172648] ib_srp:srp_add_target: ib_srp: host1: SCSI scan succeeded - detected 3 LUNs [ 967.172650] scsi host1: ib_srp: new target: id_ext 00debdfffebeef80 ioc_guid 00debdfffebeef80 sgid fe80:0000:0000:0000:00de:bdff:febe:ef80 dest 10.16.69.39 [ 967.173447] sd 1:0:0:1: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical [ 967.175069] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0 [ 967.175073] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555 [ 967.175080] ib_srp:srp_parse_in: ib_srp: [2620:52:0:1040:de:bdff:febe:ef80] -> [2620:52:0:1040:de:bdff:febe:ef80]:0/168838439%0 [ 967.175085] ib_srp:srp_parse_in: ib_srp: [2620:52:0:1040:de:bdff:febe:ef80]:5555 -> [2620:52:0:1040:de:bdff:febe:ef80]:5555/168838439%0 [ 967.175087] scsi host2: ib_srp: Already connected to target port with id_ext=00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=2620:0052:0000:1040:00de:bdff:febe:ef80 [ 967.190459] sd 1:0:0:1: alua: transition timeout set to 60 seconds [ 967.190464] sd 1:0:0:1: alua: port group 00 state A non-preferred supports TOlUSNA [ 967.190612] sd 1:0:0:1: [sdd] 65536 512-byte logical blocks: (33.6 MB/32.0 MiB) [ 967.190639] sd 1:0:0:1: [sdd] Write Protect is off [ 967.190642] sd 1:0:0:1: [sdd] Mode Sense: 43 00 00 08 [ 967.190688] sd 1:0:0:1: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 967.190698] srpt/10.16.69.39: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION. [ 967.190735] sd 1:0:0:1: [sdd] Optimal transfer size 126976 bytes [ 967.230346] sd 1:0:0:2: alua: transition timeout set to 60 seconds [ 967.230351] sd 1:0:0:2: alua: port group 00 state A non-preferred supports TOlUSNA [ 967.232123] ib_srp:srp_parse_in: ib_srp: 10.16.69.39 -> 10.16.69.39:0 [ 967.232127] ib_srp:srp_parse_in: ib_srp: 10.16.69.39:5555 -> 10.16.69.39:5555 [ 967.232133] ib_srp:srp_parse_in: ib_srp: [2620:52:0:1040:de:bdff:febe:ef80] -> [2620:52:0:1040:de:bdff:febe:ef80]:0/168838439%0 [ 967.232137] ib_srp:srp_parse_in: ib_srp: [2620:52:0:1040:de:bdff:febe:ef80]:5555 -> [2620:52:0:1040:de:bdff:febe:ef80]:5555/168838439%0 [ 967.232143] ib_srp:srp_parse_in: ib_srp: [fe80::de:bdff:febe:ef80%2] -> [fe80::de:bdff:febe:ef80]:0/168838439%2 [ 967.232147] ib_srp:srp_parse_in: ib_srp: [fe80::de:bdff:febe:ef80%2]:5555 -> [fe80::de:bdff:febe:ef80]:5555/168838439%2 [ 967.232150] scsi host2: ib_srp: Already connected to target port with id_ext=00debdfffebeef80;ioc_guid=00debdfffebeef80;dest=fe80:0000:0000:0000:00de:bdff:febe:ef80 [ 967.295512] ------------[ cut here ]------------ [ 967.295517] WARNING: CPU: 1 PID: 8 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 [ 967.295529] Modules linked in: ib_srp scsi_transport_srp target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm ib_umad scsi_debug rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel null_blk scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath ib_core sunrpc qeth_l2 bridge stp llc qeth zcrypt_cex4 qdio ccwgroup vfio_ccw mdev vfio_iommu_type1 vfio drm fb font drm_panel_orientation_quirks i2c_core fuse backlight zram ip_tables xfs crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 sha256_s390 sha1_s390 sha_common dasd_eckd_mod dasd_mod pkey zcrypt [ 967.295579] CPU: 1 PID: 8 Comm: kworker/u128:0 Not tainted 5.15.0.v3+ #6 [ 967.295582] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) [ 967.295584] Workqueue: events_unbound async_run_entry_fn [ 967.295591] Krnl PSW : 0704e00180000000 000000004feaa208 (blk_mq_sched_insert_request+0x58/0x178) [ 967.295596] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 967.295600] Krnl GPRS: 0000000000000040 000000000000001d 000000001eb37400 0000000000000001 [ 967.295603] 0000000000000001 0000000000000000 000000001dbea800 000000001c739800 [ 967.295605] 0000000000000000 0000000000000001 0000000000000001 000000001eb37400 [ 967.295607] 00000000006ba100 000003ff7e82da00 0000038000047818 00000380000477a8 [ 967.295615] Krnl Code: 000000004feaa1fc: a71effff chi %r1,-1 000000004feaa200: a7840004 brc 8,000000004feaa208 #000000004feaa204: af000000 mc 0,0 >000000004feaa208: 5810b01c l %r1,28(%r11) 000000004feaa20c: ec213bbb0055 risbg %r2,%r1,59,187,0 000000004feaa212: a7740057 brc 7,000000004feaa2c0 000000004feaa216: 5810b018 l %r1,24(%r11) 000000004feaa21a: c01b000000ff nilf %r1,255 [ 967.295635] Call Trace: [ 967.295637] [<000000004feaa208>] blk_mq_sched_insert_request+0x58/0x178 [ 967.295643] [<000000004fe9aa2e>] blk_execute_rq+0x56/0xd8 [ 967.295649] [<000000005005cea0>] __scsi_execute+0x110/0x230 [ 967.295654] [<00000000500534bc>] scsi_vpd_inquiry+0x7c/0xc0 [ 967.295660] [<000000005005354a>] scsi_get_vpd_page+0x4a/0xf8 [ 967.295663] [<000000005006ebdc>] sd_revalidate_disk.isra.0+0xf14/0x2240 [ 967.295667] [<000000005007043a>] sd_probe+0x312/0x4b0 [ 967.295670] [<0000000050023830>] really_probe+0xd0/0x4b0 [ 967.295675] [<0000000050023dc0>] driver_probe_device+0x40/0xf0 [ 967.295679] [<00000000500244cc>] __device_attach_driver+0xa4/0x128 [ 967.295682] [<0000000050020bd0>] bus_for_each_drv+0x88/0xc0 [ 967.295685] [<0000000050022d30>] __device_attach_async_helper+0x90/0xf0 [ 967.295688] [<000000004f8fcd1e>] async_run_entry_fn+0x4e/0x1b0 [ 967.295691] [<000000004f8ef62a>] process_one_work+0x21a/0x498 [ 967.295695] [<000000004f8efdd4>] worker_thread+0x64/0x498 [ 967.295697] [<000000004f8f871c>] kthread+0x184/0x190 [ 967.295702] [<000000004f883468>] __ret_from_fork+0x40/0x58 [ 967.295706] [<00000000503d144a>] ret_from_fork+0xa/0x30 [ 967.295712] Last Breaking-Event-Address: [ 967.295713] [<0000000000000003>] 0x3 [ 967.295716] ---[ end trace faff7345b32090bf ]--- > > -- > Jens Axboe > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 13:59 ` Yi Zhang @ 2021-11-03 14:26 ` Jens Axboe 2021-11-03 14:57 ` Ming Lei 1 sibling, 0 replies; 24+ messages in thread From: Jens Axboe @ 2021-11-03 14:26 UTC (permalink / raw) To: Yi Zhang Cc: Ming Lei, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/3/21 7:59 AM, Yi Zhang wrote: > On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: >> >> On 11/2/21 9:54 PM, Jens Axboe wrote: >>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: >>>> >>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>>>> >>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>>>> hopefully shine a bit of light on the issue. >>>>>>>> >>>>>> Hi Jens >>>>>> >>>>>> Here is the full log: >>>>> >>>>> Thanks! I think I see what it could be - can you try this one as well, >>>>> would like to confirm that the condition I think is triggering is what >>>>> is triggering. >>>>> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>> index 07eb1412760b..81dede885231 100644 >>>>> --- a/block/blk-mq.c >>>>> +++ b/block/blk-mq.c >>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>> if (plug && plug->cached_rq) { >>>>> rq = rq_list_pop(&plug->cached_rq); >>>>> INIT_LIST_HEAD(&rq->queuelist); >>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>> } else { >>>>> struct blk_mq_alloc_data data = { >>>>> .q = q, >>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>> bio_wouldblock_error(bio); >>>>> goto queue_exit; >>>>> } >>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>> >>>> Hello Jens, >>>> >>>> I guess the issue could be the following code run without grabbing >>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>>> >>>> .rq_flags = q->elevator ? RQF_ELV : 0, >>>> >>>> then elevator is switched to real one from none, and check on q->elevator >>>> becomes not consistent. >>> >>> Indeed, that’s where I was going with this. I have a patch, testing it >>> locally but it’s getting late. Will send it out tomorrow. The nice >>> benefit is that it allows dropping the weird ref get on plug flush, >>> and batches getting the refs as well. >> >> Yi/Steffen, can you try pulling this into your test kernel: >> >> git://git.kernel.dk/linux-block for-next >> >> and see if it fixes the issue for you. Thanks! > > It still can be reproduced with the latest linux-block/for-next, here > is the log > > fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into > for-next Funky! Thanks for re-testing, I guess I need to think even harder about this. Can't seem to reproduce it here at all, which makes it a bit harder to poke at. -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 13:59 ` Yi Zhang 2021-11-03 14:26 ` Jens Axboe @ 2021-11-03 14:57 ` Ming Lei 2021-11-03 15:03 ` Jens Axboe 1 sibling, 1 reply; 24+ messages in thread From: Ming Lei @ 2021-11-03 14:57 UTC (permalink / raw) To: Yi Zhang Cc: Jens Axboe, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi, ming.lei On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: > On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: > > > > On 11/2/21 9:54 PM, Jens Axboe wrote: > > > On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: > > >> > > >> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: > > >>>> On 11/2/21 8:21 PM, Yi Zhang wrote: > > >>>>>> > > >>>>>> Can either one of you try with this patch? Won't fix anything, but it'll > > >>>>>> hopefully shine a bit of light on the issue. > > >>>>>> > > >>>> Hi Jens > > >>>> > > >>>> Here is the full log: > > >>> > > >>> Thanks! I think I see what it could be - can you try this one as well, > > >>> would like to confirm that the condition I think is triggering is what > > >>> is triggering. > > >>> > > >>> diff --git a/block/blk-mq.c b/block/blk-mq.c > > >>> index 07eb1412760b..81dede885231 100644 > > >>> --- a/block/blk-mq.c > > >>> +++ b/block/blk-mq.c > > >>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) > > >>> if (plug && plug->cached_rq) { > > >>> rq = rq_list_pop(&plug->cached_rq); > > >>> INIT_LIST_HEAD(&rq->queuelist); > > >>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > > >>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > > >>> } else { > > >>> struct blk_mq_alloc_data data = { > > >>> .q = q, > > >>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) > > >>> bio_wouldblock_error(bio); > > >>> goto queue_exit; > > >>> } > > >>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > > >>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > > >> > > >> Hello Jens, > > >> > > >> I guess the issue could be the following code run without grabbing > > >> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > > >> > > >> .rq_flags = q->elevator ? RQF_ELV : 0, > > >> > > >> then elevator is switched to real one from none, and check on q->elevator > > >> becomes not consistent. > > > > > > Indeed, that’s where I was going with this. I have a patch, testing it > > > locally but it’s getting late. Will send it out tomorrow. The nice > > > benefit is that it allows dropping the weird ref get on plug flush, > > > and batches getting the refs as well. > > > > Yi/Steffen, can you try pulling this into your test kernel: > > > > git://git.kernel.dk/linux-block for-next > > > > and see if it fixes the issue for you. Thanks! > > It still can be reproduced with the latest linux-block/for-next, here is the log > > fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next Hi Yi, Please try the following change: diff --git a/block/blk-mq.c b/block/blk-mq.c index e1e64964a31b..eb634a9c61ff 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, .q = q, .flags = flags, .cmd_flags = op, - .rq_flags = q->elevator ? RQF_ELV : 0, .nr_tags = 1, }; struct request *rq; @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, if (ret) return ERR_PTR(ret); + data.rq_flags = q->elevator ? RQF_ELV : 0, rq = __blk_mq_alloc_requests(&data); if (!rq) goto out_queue_exit; @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, .q = q, .flags = flags, .cmd_flags = op, - .rq_flags = q->elevator ? RQF_ELV : 0, .nr_tags = 1, }; u64 alloc_time_ns = 0; @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, ret = blk_queue_enter(q, flags); if (ret) return ERR_PTR(ret); + data.rq_flags = q->elevator ? RQF_ELV : 0, /* * Check if the hardware context is actually mapped to anything. Thanks, Ming ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 14:57 ` Ming Lei @ 2021-11-03 15:03 ` Jens Axboe 2021-11-03 15:09 ` Ming Lei 2021-11-03 15:10 ` Jens Axboe 0 siblings, 2 replies; 24+ messages in thread From: Jens Axboe @ 2021-11-03 15:03 UTC (permalink / raw) To: Ming Lei, Yi Zhang Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/3/21 8:57 AM, Ming Lei wrote: > On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: >> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: >>> >>> On 11/2/21 9:54 PM, Jens Axboe wrote: >>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: >>>>> >>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>>>>> >>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>>>>> hopefully shine a bit of light on the issue. >>>>>>>>> >>>>>>> Hi Jens >>>>>>> >>>>>>> Here is the full log: >>>>>> >>>>>> Thanks! I think I see what it could be - can you try this one as well, >>>>>> would like to confirm that the condition I think is triggering is what >>>>>> is triggering. >>>>>> >>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>> index 07eb1412760b..81dede885231 100644 >>>>>> --- a/block/blk-mq.c >>>>>> +++ b/block/blk-mq.c >>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>> if (plug && plug->cached_rq) { >>>>>> rq = rq_list_pop(&plug->cached_rq); >>>>>> INIT_LIST_HEAD(&rq->queuelist); >>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>> } else { >>>>>> struct blk_mq_alloc_data data = { >>>>>> .q = q, >>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>> bio_wouldblock_error(bio); >>>>>> goto queue_exit; >>>>>> } >>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>> >>>>> Hello Jens, >>>>> >>>>> I guess the issue could be the following code run without grabbing >>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>>>> >>>>> .rq_flags = q->elevator ? RQF_ELV : 0, >>>>> >>>>> then elevator is switched to real one from none, and check on q->elevator >>>>> becomes not consistent. >>>> >>>> Indeed, that’s where I was going with this. I have a patch, testing it >>>> locally but it’s getting late. Will send it out tomorrow. The nice >>>> benefit is that it allows dropping the weird ref get on plug flush, >>>> and batches getting the refs as well. >>> >>> Yi/Steffen, can you try pulling this into your test kernel: >>> >>> git://git.kernel.dk/linux-block for-next >>> >>> and see if it fixes the issue for you. Thanks! >> >> It still can be reproduced with the latest linux-block/for-next, here is the log >> >> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next > > Hi Yi, > > Please try the following change: > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index e1e64964a31b..eb634a9c61ff 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > .q = q, > .flags = flags, > .cmd_flags = op, > - .rq_flags = q->elevator ? RQF_ELV : 0, > .nr_tags = 1, > }; > struct request *rq; > @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > if (ret) > return ERR_PTR(ret); > > + data.rq_flags = q->elevator ? RQF_ELV : 0, > rq = __blk_mq_alloc_requests(&data); > if (!rq) > goto out_queue_exit; > @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > .q = q, > .flags = flags, > .cmd_flags = op, > - .rq_flags = q->elevator ? RQF_ELV : 0, > .nr_tags = 1, > }; > u64 alloc_time_ns = 0; > @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > ret = blk_queue_enter(q, flags); > if (ret) > return ERR_PTR(ret); > + data.rq_flags = q->elevator ? RQF_ELV : 0, Don't think that will compile, but I guess the point is that we can't do this assignment before queue enter, in case we're in the midst of switching schedulers. Which is indeed a valid concern. -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 15:03 ` Jens Axboe @ 2021-11-03 15:09 ` Ming Lei 2021-11-03 15:12 ` Jens Axboe 2021-11-03 15:10 ` Jens Axboe 1 sibling, 1 reply; 24+ messages in thread From: Ming Lei @ 2021-11-03 15:09 UTC (permalink / raw) To: Jens Axboe Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On Wed, Nov 03, 2021 at 09:03:02AM -0600, Jens Axboe wrote: > On 11/3/21 8:57 AM, Ming Lei wrote: > > On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: > >> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: > >>> > >>> On 11/2/21 9:54 PM, Jens Axboe wrote: > >>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: > >>>>> > >>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: > >>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: > >>>>>>>>> > >>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll > >>>>>>>>> hopefully shine a bit of light on the issue. > >>>>>>>>> > >>>>>>> Hi Jens > >>>>>>> > >>>>>>> Here is the full log: > >>>>>> > >>>>>> Thanks! I think I see what it could be - can you try this one as well, > >>>>>> would like to confirm that the condition I think is triggering is what > >>>>>> is triggering. > >>>>>> > >>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>>>>> index 07eb1412760b..81dede885231 100644 > >>>>>> --- a/block/blk-mq.c > >>>>>> +++ b/block/blk-mq.c > >>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) > >>>>>> if (plug && plug->cached_rq) { > >>>>>> rq = rq_list_pop(&plug->cached_rq); > >>>>>> INIT_LIST_HEAD(&rq->queuelist); > >>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > >>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > >>>>>> } else { > >>>>>> struct blk_mq_alloc_data data = { > >>>>>> .q = q, > >>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) > >>>>>> bio_wouldblock_error(bio); > >>>>>> goto queue_exit; > >>>>>> } > >>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > >>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > >>>>> > >>>>> Hello Jens, > >>>>> > >>>>> I guess the issue could be the following code run without grabbing > >>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > >>>>> > >>>>> .rq_flags = q->elevator ? RQF_ELV : 0, > >>>>> > >>>>> then elevator is switched to real one from none, and check on q->elevator > >>>>> becomes not consistent. > >>>> > >>>> Indeed, that’s where I was going with this. I have a patch, testing it > >>>> locally but it’s getting late. Will send it out tomorrow. The nice > >>>> benefit is that it allows dropping the weird ref get on plug flush, > >>>> and batches getting the refs as well. > >>> > >>> Yi/Steffen, can you try pulling this into your test kernel: > >>> > >>> git://git.kernel.dk/linux-block for-next > >>> > >>> and see if it fixes the issue for you. Thanks! > >> > >> It still can be reproduced with the latest linux-block/for-next, here is the log > >> > >> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next > > > > Hi Yi, > > > > Please try the following change: > > > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index e1e64964a31b..eb634a9c61ff 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > > .q = q, > > .flags = flags, > > .cmd_flags = op, > > - .rq_flags = q->elevator ? RQF_ELV : 0, > > .nr_tags = 1, > > }; > > struct request *rq; > > @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > > if (ret) > > return ERR_PTR(ret); > > > > + data.rq_flags = q->elevator ? RQF_ELV : 0, > > rq = __blk_mq_alloc_requests(&data); > > if (!rq) > > goto out_queue_exit; > > @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > > .q = q, > > .flags = flags, > > .cmd_flags = op, > > - .rq_flags = q->elevator ? RQF_ELV : 0, > > .nr_tags = 1, > > }; > > u64 alloc_time_ns = 0; > > @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > > ret = blk_queue_enter(q, flags); > > if (ret) > > return ERR_PTR(ret); > > + data.rq_flags = q->elevator ? RQF_ELV : 0, > > Don't think that will compile, but I guess the point is that we can't do It can compile. > this assignment before queue enter, in case we're in the midst of > switching schedulers. Which is indeed a valid concern. Yeah, for scsi, real io sched is switched when adding disk, before that, the passthrough command need to see consistent q->elevator. Thanks, Ming ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 15:09 ` Ming Lei @ 2021-11-03 15:12 ` Jens Axboe 0 siblings, 0 replies; 24+ messages in thread From: Jens Axboe @ 2021-11-03 15:12 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/3/21 9:09 AM, Ming Lei wrote: > On Wed, Nov 03, 2021 at 09:03:02AM -0600, Jens Axboe wrote: >> On 11/3/21 8:57 AM, Ming Lei wrote: >>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: >>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: >>>>> >>>>> On 11/2/21 9:54 PM, Jens Axboe wrote: >>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: >>>>>>> >>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>>>>>>> >>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>>>>>>> hopefully shine a bit of light on the issue. >>>>>>>>>>> >>>>>>>>> Hi Jens >>>>>>>>> >>>>>>>>> Here is the full log: >>>>>>>> >>>>>>>> Thanks! I think I see what it could be - can you try this one as well, >>>>>>>> would like to confirm that the condition I think is triggering is what >>>>>>>> is triggering. >>>>>>>> >>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>>>> index 07eb1412760b..81dede885231 100644 >>>>>>>> --- a/block/blk-mq.c >>>>>>>> +++ b/block/blk-mq.c >>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>> if (plug && plug->cached_rq) { >>>>>>>> rq = rq_list_pop(&plug->cached_rq); >>>>>>>> INIT_LIST_HEAD(&rq->queuelist); >>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>>> } else { >>>>>>>> struct blk_mq_alloc_data data = { >>>>>>>> .q = q, >>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>> bio_wouldblock_error(bio); >>>>>>>> goto queue_exit; >>>>>>>> } >>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>> >>>>>>> Hello Jens, >>>>>>> >>>>>>> I guess the issue could be the following code run without grabbing >>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>>>>>> >>>>>>> .rq_flags = q->elevator ? RQF_ELV : 0, >>>>>>> >>>>>>> then elevator is switched to real one from none, and check on q->elevator >>>>>>> becomes not consistent. >>>>>> >>>>>> Indeed, that’s where I was going with this. I have a patch, testing it >>>>>> locally but it’s getting late. Will send it out tomorrow. The nice >>>>>> benefit is that it allows dropping the weird ref get on plug flush, >>>>>> and batches getting the refs as well. >>>>> >>>>> Yi/Steffen, can you try pulling this into your test kernel: >>>>> >>>>> git://git.kernel.dk/linux-block for-next >>>>> >>>>> and see if it fixes the issue for you. Thanks! >>>> >>>> It still can be reproduced with the latest linux-block/for-next, here is the log >>>> >>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next >>> >>> Hi Yi, >>> >>> Please try the following change: >>> >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index e1e64964a31b..eb634a9c61ff 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>> .q = q, >>> .flags = flags, >>> .cmd_flags = op, >>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>> .nr_tags = 1, >>> }; >>> struct request *rq; >>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>> if (ret) >>> return ERR_PTR(ret); >>> >>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >>> rq = __blk_mq_alloc_requests(&data); >>> if (!rq) >>> goto out_queue_exit; >>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>> .q = q, >>> .flags = flags, >>> .cmd_flags = op, >>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>> .nr_tags = 1, >>> }; >>> u64 alloc_time_ns = 0; >>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>> ret = blk_queue_enter(q, flags); >>> if (ret) >>> return ERR_PTR(ret); >>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >> >> Don't think that will compile, but I guess the point is that we can't do > > It can compile. s/,/; for the new assignments. >> this assignment before queue enter, in case we're in the midst of >> switching schedulers. Which is indeed a valid concern. > > Yeah, for scsi, real io sched is switched when adding disk, before > that, the passthrough command need to see consistent q->elevator. Yeah, I agree that the problem is most certainly there. Guess I'm just surprised the timing works out reliably, but it sure looks like it. -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 15:03 ` Jens Axboe 2021-11-03 15:09 ` Ming Lei @ 2021-11-03 15:10 ` Jens Axboe 2021-11-03 15:16 ` Ming Lei 1 sibling, 1 reply; 24+ messages in thread From: Jens Axboe @ 2021-11-03 15:10 UTC (permalink / raw) To: Ming Lei, Yi Zhang Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/3/21 9:03 AM, Jens Axboe wrote: > On 11/3/21 8:57 AM, Ming Lei wrote: >> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: >>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: >>>> >>>> On 11/2/21 9:54 PM, Jens Axboe wrote: >>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: >>>>>> >>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>>>>>> >>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>>>>>> hopefully shine a bit of light on the issue. >>>>>>>>>> >>>>>>>> Hi Jens >>>>>>>> >>>>>>>> Here is the full log: >>>>>>> >>>>>>> Thanks! I think I see what it could be - can you try this one as well, >>>>>>> would like to confirm that the condition I think is triggering is what >>>>>>> is triggering. >>>>>>> >>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>>> index 07eb1412760b..81dede885231 100644 >>>>>>> --- a/block/blk-mq.c >>>>>>> +++ b/block/blk-mq.c >>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>> if (plug && plug->cached_rq) { >>>>>>> rq = rq_list_pop(&plug->cached_rq); >>>>>>> INIT_LIST_HEAD(&rq->queuelist); >>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>> } else { >>>>>>> struct blk_mq_alloc_data data = { >>>>>>> .q = q, >>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>> bio_wouldblock_error(bio); >>>>>>> goto queue_exit; >>>>>>> } >>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>> >>>>>> Hello Jens, >>>>>> >>>>>> I guess the issue could be the following code run without grabbing >>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>>>>> >>>>>> .rq_flags = q->elevator ? RQF_ELV : 0, >>>>>> >>>>>> then elevator is switched to real one from none, and check on q->elevator >>>>>> becomes not consistent. >>>>> >>>>> Indeed, that’s where I was going with this. I have a patch, testing it >>>>> locally but it’s getting late. Will send it out tomorrow. The nice >>>>> benefit is that it allows dropping the weird ref get on plug flush, >>>>> and batches getting the refs as well. >>>> >>>> Yi/Steffen, can you try pulling this into your test kernel: >>>> >>>> git://git.kernel.dk/linux-block for-next >>>> >>>> and see if it fixes the issue for you. Thanks! >>> >>> It still can be reproduced with the latest linux-block/for-next, here is the log >>> >>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next >> >> Hi Yi, >> >> Please try the following change: >> >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index e1e64964a31b..eb634a9c61ff 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >> .q = q, >> .flags = flags, >> .cmd_flags = op, >> - .rq_flags = q->elevator ? RQF_ELV : 0, >> .nr_tags = 1, >> }; >> struct request *rq; >> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >> if (ret) >> return ERR_PTR(ret); >> >> + data.rq_flags = q->elevator ? RQF_ELV : 0, >> rq = __blk_mq_alloc_requests(&data); >> if (!rq) >> goto out_queue_exit; >> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >> .q = q, >> .flags = flags, >> .cmd_flags = op, >> - .rq_flags = q->elevator ? RQF_ELV : 0, >> .nr_tags = 1, >> }; >> u64 alloc_time_ns = 0; >> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >> ret = blk_queue_enter(q, flags); >> if (ret) >> return ERR_PTR(ret); >> + data.rq_flags = q->elevator ? RQF_ELV : 0, > > Don't think that will compile, but I guess the point is that we can't do > this assignment before queue enter, in case we're in the midst of > switching schedulers. Which is indeed a valid concern. Something like the below. Maybe? On top of the for-next that was already pulled in. diff --git a/block/blk-mq.c b/block/blk-mq.c index b01e05e02277..121f1898d529 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) if (data->cmd_flags & REQ_NOWAIT) data->flags |= BLK_MQ_REQ_NOWAIT; - if (data->rq_flags & RQF_ELV) { + if (q->elevator) { struct elevator_queue *e = q->elevator; + data->rq_flags |= RQF_ELV; + /* * Flush/passthrough requests are special and go directly to the * dispatch list. Don't include reserved tags in the @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, .q = q, .flags = flags, .cmd_flags = op, - .rq_flags = q->elevator ? RQF_ELV : 0, .nr_tags = 1, }; struct request *rq; @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, .q = q, .flags = flags, .cmd_flags = op, - .rq_flags = q->elevator ? RQF_ELV : 0, .nr_tags = 1, }; u64 alloc_time_ns = 0; @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, if (!q->elevator) blk_mq_tag_busy(data.hctx); + else + data.rq_flags |= RQF_ELV; ret = -EWOULDBLOCK; tag = blk_mq_get_tag(&data); @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio) .q = q, .nr_tags = 1, .cmd_flags = bio->bi_opf, - .rq_flags = q->elevator ? RQF_ELV : 0, }; if (unlikely(!blk_try_enter_queue(q, false) && -- Jens Axboe ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 15:10 ` Jens Axboe @ 2021-11-03 15:16 ` Ming Lei 2021-11-03 15:41 ` Jens Axboe 0 siblings, 1 reply; 24+ messages in thread From: Ming Lei @ 2021-11-03 15:16 UTC (permalink / raw) To: Jens Axboe Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote: > On 11/3/21 9:03 AM, Jens Axboe wrote: > > On 11/3/21 8:57 AM, Ming Lei wrote: > >> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: > >>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: > >>>> > >>>> On 11/2/21 9:54 PM, Jens Axboe wrote: > >>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: > >>>>>> > >>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: > >>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: > >>>>>>>>>> > >>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll > >>>>>>>>>> hopefully shine a bit of light on the issue. > >>>>>>>>>> > >>>>>>>> Hi Jens > >>>>>>>> > >>>>>>>> Here is the full log: > >>>>>>> > >>>>>>> Thanks! I think I see what it could be - can you try this one as well, > >>>>>>> would like to confirm that the condition I think is triggering is what > >>>>>>> is triggering. > >>>>>>> > >>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>>>>>> index 07eb1412760b..81dede885231 100644 > >>>>>>> --- a/block/blk-mq.c > >>>>>>> +++ b/block/blk-mq.c > >>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) > >>>>>>> if (plug && plug->cached_rq) { > >>>>>>> rq = rq_list_pop(&plug->cached_rq); > >>>>>>> INIT_LIST_HEAD(&rq->queuelist); > >>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > >>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > >>>>>>> } else { > >>>>>>> struct blk_mq_alloc_data data = { > >>>>>>> .q = q, > >>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) > >>>>>>> bio_wouldblock_error(bio); > >>>>>>> goto queue_exit; > >>>>>>> } > >>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > >>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > >>>>>> > >>>>>> Hello Jens, > >>>>>> > >>>>>> I guess the issue could be the following code run without grabbing > >>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > >>>>>> > >>>>>> .rq_flags = q->elevator ? RQF_ELV : 0, > >>>>>> > >>>>>> then elevator is switched to real one from none, and check on q->elevator > >>>>>> becomes not consistent. > >>>>> > >>>>> Indeed, that’s where I was going with this. I have a patch, testing it > >>>>> locally but it’s getting late. Will send it out tomorrow. The nice > >>>>> benefit is that it allows dropping the weird ref get on plug flush, > >>>>> and batches getting the refs as well. > >>>> > >>>> Yi/Steffen, can you try pulling this into your test kernel: > >>>> > >>>> git://git.kernel.dk/linux-block for-next > >>>> > >>>> and see if it fixes the issue for you. Thanks! > >>> > >>> It still can be reproduced with the latest linux-block/for-next, here is the log > >>> > >>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next > >> > >> Hi Yi, > >> > >> Please try the following change: > >> > >> > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > >> index e1e64964a31b..eb634a9c61ff 100644 > >> --- a/block/blk-mq.c > >> +++ b/block/blk-mq.c > >> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > >> .q = q, > >> .flags = flags, > >> .cmd_flags = op, > >> - .rq_flags = q->elevator ? RQF_ELV : 0, > >> .nr_tags = 1, > >> }; > >> struct request *rq; > >> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > >> if (ret) > >> return ERR_PTR(ret); > >> > >> + data.rq_flags = q->elevator ? RQF_ELV : 0, > >> rq = __blk_mq_alloc_requests(&data); > >> if (!rq) > >> goto out_queue_exit; > >> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > >> .q = q, > >> .flags = flags, > >> .cmd_flags = op, > >> - .rq_flags = q->elevator ? RQF_ELV : 0, > >> .nr_tags = 1, > >> }; > >> u64 alloc_time_ns = 0; > >> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > >> ret = blk_queue_enter(q, flags); > >> if (ret) > >> return ERR_PTR(ret); > >> + data.rq_flags = q->elevator ? RQF_ELV : 0, > > > > Don't think that will compile, but I guess the point is that we can't do > > this assignment before queue enter, in case we're in the midst of > > switching schedulers. Which is indeed a valid concern. > > Something like the below. Maybe? On top of the for-next that was already > pulled in. > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index b01e05e02277..121f1898d529 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) > if (data->cmd_flags & REQ_NOWAIT) > data->flags |= BLK_MQ_REQ_NOWAIT; > > - if (data->rq_flags & RQF_ELV) { > + if (q->elevator) { > struct elevator_queue *e = q->elevator; > > + data->rq_flags |= RQF_ELV; > + > /* > * Flush/passthrough requests are special and go directly to the > * dispatch list. Don't include reserved tags in the > @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > .q = q, > .flags = flags, > .cmd_flags = op, > - .rq_flags = q->elevator ? RQF_ELV : 0, > .nr_tags = 1, > }; > struct request *rq; > @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > .q = q, > .flags = flags, > .cmd_flags = op, > - .rq_flags = q->elevator ? RQF_ELV : 0, > .nr_tags = 1, > }; > u64 alloc_time_ns = 0; > @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > > if (!q->elevator) > blk_mq_tag_busy(data.hctx); > + else > + data.rq_flags |= RQF_ELV; > > ret = -EWOULDBLOCK; > tag = blk_mq_get_tag(&data); > @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio) > .q = q, > .nr_tags = 1, > .cmd_flags = bio->bi_opf, > - .rq_flags = q->elevator ? RQF_ELV : 0, > }; The above patch looks fine. BTW, 9ede85cb670c ("block: move queue enter logic into blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(), which seems dangerous, especially blk_mq_sched_bio_merge() needs hctx/ctx which requires q_usage_counter to be grabbed. Thanks, Ming ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 15:16 ` Ming Lei @ 2021-11-03 15:41 ` Jens Axboe 2021-11-03 15:49 ` Jens Axboe 0 siblings, 1 reply; 24+ messages in thread From: Jens Axboe @ 2021-11-03 15:41 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/3/21 9:16 AM, Ming Lei wrote: > On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote: >> On 11/3/21 9:03 AM, Jens Axboe wrote: >>> On 11/3/21 8:57 AM, Ming Lei wrote: >>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: >>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: >>>>>> >>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote: >>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: >>>>>>>> >>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>>>>>>>> >>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>>>>>>>> hopefully shine a bit of light on the issue. >>>>>>>>>>>> >>>>>>>>>> Hi Jens >>>>>>>>>> >>>>>>>>>> Here is the full log: >>>>>>>>> >>>>>>>>> Thanks! I think I see what it could be - can you try this one as well, >>>>>>>>> would like to confirm that the condition I think is triggering is what >>>>>>>>> is triggering. >>>>>>>>> >>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>>>>> index 07eb1412760b..81dede885231 100644 >>>>>>>>> --- a/block/blk-mq.c >>>>>>>>> +++ b/block/blk-mq.c >>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>>> if (plug && plug->cached_rq) { >>>>>>>>> rq = rq_list_pop(&plug->cached_rq); >>>>>>>>> INIT_LIST_HEAD(&rq->queuelist); >>>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>>>> } else { >>>>>>>>> struct blk_mq_alloc_data data = { >>>>>>>>> .q = q, >>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>>> bio_wouldblock_error(bio); >>>>>>>>> goto queue_exit; >>>>>>>>> } >>>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>>> >>>>>>>> Hello Jens, >>>>>>>> >>>>>>>> I guess the issue could be the following code run without grabbing >>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>>>>>>> >>>>>>>> .rq_flags = q->elevator ? RQF_ELV : 0, >>>>>>>> >>>>>>>> then elevator is switched to real one from none, and check on q->elevator >>>>>>>> becomes not consistent. >>>>>>> >>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it >>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice >>>>>>> benefit is that it allows dropping the weird ref get on plug flush, >>>>>>> and batches getting the refs as well. >>>>>> >>>>>> Yi/Steffen, can you try pulling this into your test kernel: >>>>>> >>>>>> git://git.kernel.dk/linux-block for-next >>>>>> >>>>>> and see if it fixes the issue for you. Thanks! >>>>> >>>>> It still can be reproduced with the latest linux-block/for-next, here is the log >>>>> >>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next >>>> >>>> Hi Yi, >>>> >>>> Please try the following change: >>>> >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index e1e64964a31b..eb634a9c61ff 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>>> .q = q, >>>> .flags = flags, >>>> .cmd_flags = op, >>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>> .nr_tags = 1, >>>> }; >>>> struct request *rq; >>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>>> if (ret) >>>> return ERR_PTR(ret); >>>> >>>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >>>> rq = __blk_mq_alloc_requests(&data); >>>> if (!rq) >>>> goto out_queue_exit; >>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>>> .q = q, >>>> .flags = flags, >>>> .cmd_flags = op, >>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>> .nr_tags = 1, >>>> }; >>>> u64 alloc_time_ns = 0; >>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>>> ret = blk_queue_enter(q, flags); >>>> if (ret) >>>> return ERR_PTR(ret); >>>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >>> >>> Don't think that will compile, but I guess the point is that we can't do >>> this assignment before queue enter, in case we're in the midst of >>> switching schedulers. Which is indeed a valid concern. >> >> Something like the below. Maybe? On top of the for-next that was already >> pulled in. >> >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index b01e05e02277..121f1898d529 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) >> if (data->cmd_flags & REQ_NOWAIT) >> data->flags |= BLK_MQ_REQ_NOWAIT; >> >> - if (data->rq_flags & RQF_ELV) { >> + if (q->elevator) { >> struct elevator_queue *e = q->elevator; >> >> + data->rq_flags |= RQF_ELV; >> + >> /* >> * Flush/passthrough requests are special and go directly to the >> * dispatch list. Don't include reserved tags in the >> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >> .q = q, >> .flags = flags, >> .cmd_flags = op, >> - .rq_flags = q->elevator ? RQF_ELV : 0, >> .nr_tags = 1, >> }; >> struct request *rq; >> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >> .q = q, >> .flags = flags, >> .cmd_flags = op, >> - .rq_flags = q->elevator ? RQF_ELV : 0, >> .nr_tags = 1, >> }; >> u64 alloc_time_ns = 0; >> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >> >> if (!q->elevator) >> blk_mq_tag_busy(data.hctx); >> + else >> + data.rq_flags |= RQF_ELV; >> >> ret = -EWOULDBLOCK; >> tag = blk_mq_get_tag(&data); >> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio) >> .q = q, >> .nr_tags = 1, >> .cmd_flags = bio->bi_opf, >> - .rq_flags = q->elevator ? RQF_ELV : 0, >> }; > > The above patch looks fine. > > BTW, 9ede85cb670c ("block: move queue enter logic into > blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(), > which seems dangerous, especially blk_mq_sched_bio_merge() needs > hctx/ctx which requires q_usage_counter to be grabbed. I think the best solution is to enter just for that as well, and just retain that enter state. I'll update the patch, there's some real fixes in there too for the batched alloc. Will post them later today. -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 15:41 ` Jens Axboe @ 2021-11-03 15:49 ` Jens Axboe 2021-11-03 16:09 ` Ming Lei 0 siblings, 1 reply; 24+ messages in thread From: Jens Axboe @ 2021-11-03 15:49 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/3/21 9:41 AM, Jens Axboe wrote: > On 11/3/21 9:16 AM, Ming Lei wrote: >> On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote: >>> On 11/3/21 9:03 AM, Jens Axboe wrote: >>>> On 11/3/21 8:57 AM, Ming Lei wrote: >>>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: >>>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: >>>>>>> >>>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote: >>>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: >>>>>>>>> >>>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>>>>>>>>> hopefully shine a bit of light on the issue. >>>>>>>>>>>>> >>>>>>>>>>> Hi Jens >>>>>>>>>>> >>>>>>>>>>> Here is the full log: >>>>>>>>>> >>>>>>>>>> Thanks! I think I see what it could be - can you try this one as well, >>>>>>>>>> would like to confirm that the condition I think is triggering is what >>>>>>>>>> is triggering. >>>>>>>>>> >>>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>>>>>> index 07eb1412760b..81dede885231 100644 >>>>>>>>>> --- a/block/blk-mq.c >>>>>>>>>> +++ b/block/blk-mq.c >>>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>>>> if (plug && plug->cached_rq) { >>>>>>>>>> rq = rq_list_pop(&plug->cached_rq); >>>>>>>>>> INIT_LIST_HEAD(&rq->queuelist); >>>>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>>>>> } else { >>>>>>>>>> struct blk_mq_alloc_data data = { >>>>>>>>>> .q = q, >>>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>>>> bio_wouldblock_error(bio); >>>>>>>>>> goto queue_exit; >>>>>>>>>> } >>>>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>>>> >>>>>>>>> Hello Jens, >>>>>>>>> >>>>>>>>> I guess the issue could be the following code run without grabbing >>>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>>>>>>>> >>>>>>>>> .rq_flags = q->elevator ? RQF_ELV : 0, >>>>>>>>> >>>>>>>>> then elevator is switched to real one from none, and check on q->elevator >>>>>>>>> becomes not consistent. >>>>>>>> >>>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it >>>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice >>>>>>>> benefit is that it allows dropping the weird ref get on plug flush, >>>>>>>> and batches getting the refs as well. >>>>>>> >>>>>>> Yi/Steffen, can you try pulling this into your test kernel: >>>>>>> >>>>>>> git://git.kernel.dk/linux-block for-next >>>>>>> >>>>>>> and see if it fixes the issue for you. Thanks! >>>>>> >>>>>> It still can be reproduced with the latest linux-block/for-next, here is the log >>>>>> >>>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next >>>>> >>>>> Hi Yi, >>>>> >>>>> Please try the following change: >>>>> >>>>> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>> index e1e64964a31b..eb634a9c61ff 100644 >>>>> --- a/block/blk-mq.c >>>>> +++ b/block/blk-mq.c >>>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>>>> .q = q, >>>>> .flags = flags, >>>>> .cmd_flags = op, >>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>>> .nr_tags = 1, >>>>> }; >>>>> struct request *rq; >>>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>>>> if (ret) >>>>> return ERR_PTR(ret); >>>>> >>>>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >>>>> rq = __blk_mq_alloc_requests(&data); >>>>> if (!rq) >>>>> goto out_queue_exit; >>>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>>>> .q = q, >>>>> .flags = flags, >>>>> .cmd_flags = op, >>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>>> .nr_tags = 1, >>>>> }; >>>>> u64 alloc_time_ns = 0; >>>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>>>> ret = blk_queue_enter(q, flags); >>>>> if (ret) >>>>> return ERR_PTR(ret); >>>>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >>>> >>>> Don't think that will compile, but I guess the point is that we can't do >>>> this assignment before queue enter, in case we're in the midst of >>>> switching schedulers. Which is indeed a valid concern. >>> >>> Something like the below. Maybe? On top of the for-next that was already >>> pulled in. >>> >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index b01e05e02277..121f1898d529 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) >>> if (data->cmd_flags & REQ_NOWAIT) >>> data->flags |= BLK_MQ_REQ_NOWAIT; >>> >>> - if (data->rq_flags & RQF_ELV) { >>> + if (q->elevator) { >>> struct elevator_queue *e = q->elevator; >>> >>> + data->rq_flags |= RQF_ELV; >>> + >>> /* >>> * Flush/passthrough requests are special and go directly to the >>> * dispatch list. Don't include reserved tags in the >>> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>> .q = q, >>> .flags = flags, >>> .cmd_flags = op, >>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>> .nr_tags = 1, >>> }; >>> struct request *rq; >>> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>> .q = q, >>> .flags = flags, >>> .cmd_flags = op, >>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>> .nr_tags = 1, >>> }; >>> u64 alloc_time_ns = 0; >>> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>> >>> if (!q->elevator) >>> blk_mq_tag_busy(data.hctx); >>> + else >>> + data.rq_flags |= RQF_ELV; >>> >>> ret = -EWOULDBLOCK; >>> tag = blk_mq_get_tag(&data); >>> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio) >>> .q = q, >>> .nr_tags = 1, >>> .cmd_flags = bio->bi_opf, >>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>> }; >> >> The above patch looks fine. >> >> BTW, 9ede85cb670c ("block: move queue enter logic into >> blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(), >> which seems dangerous, especially blk_mq_sched_bio_merge() needs >> hctx/ctx which requires q_usage_counter to be grabbed. > > I think the best solution is to enter just for that as well, and just > retain that enter state. I'll update the patch, there's some real fixes > in there too for the batched alloc. Will post them later today. Is it needed, though? As far as I can tell, it's only needed persistently for having the IO inflight, otherwise if the premise is that the queue can just go away, we're in trouble before that too. And I don't think that's the case. -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 15:49 ` Jens Axboe @ 2021-11-03 16:09 ` Ming Lei 2021-11-03 16:36 ` Jens Axboe 0 siblings, 1 reply; 24+ messages in thread From: Ming Lei @ 2021-11-03 16:09 UTC (permalink / raw) To: Jens Axboe Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On Wed, Nov 03, 2021 at 09:49:20AM -0600, Jens Axboe wrote: > On 11/3/21 9:41 AM, Jens Axboe wrote: > > On 11/3/21 9:16 AM, Ming Lei wrote: > >> On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote: > >>> On 11/3/21 9:03 AM, Jens Axboe wrote: > >>>> On 11/3/21 8:57 AM, Ming Lei wrote: > >>>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: > >>>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: > >>>>>>> > >>>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote: > >>>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: > >>>>>>>>> > >>>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: > >>>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll > >>>>>>>>>>>>> hopefully shine a bit of light on the issue. > >>>>>>>>>>>>> > >>>>>>>>>>> Hi Jens > >>>>>>>>>>> > >>>>>>>>>>> Here is the full log: > >>>>>>>>>> > >>>>>>>>>> Thanks! I think I see what it could be - can you try this one as well, > >>>>>>>>>> would like to confirm that the condition I think is triggering is what > >>>>>>>>>> is triggering. > >>>>>>>>>> > >>>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>>>>>>>>> index 07eb1412760b..81dede885231 100644 > >>>>>>>>>> --- a/block/blk-mq.c > >>>>>>>>>> +++ b/block/blk-mq.c > >>>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) > >>>>>>>>>> if (plug && plug->cached_rq) { > >>>>>>>>>> rq = rq_list_pop(&plug->cached_rq); > >>>>>>>>>> INIT_LIST_HEAD(&rq->queuelist); > >>>>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > >>>>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > >>>>>>>>>> } else { > >>>>>>>>>> struct blk_mq_alloc_data data = { > >>>>>>>>>> .q = q, > >>>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) > >>>>>>>>>> bio_wouldblock_error(bio); > >>>>>>>>>> goto queue_exit; > >>>>>>>>>> } > >>>>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); > >>>>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > >>>>>>>>> > >>>>>>>>> Hello Jens, > >>>>>>>>> > >>>>>>>>> I guess the issue could be the following code run without grabbing > >>>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > >>>>>>>>> > >>>>>>>>> .rq_flags = q->elevator ? RQF_ELV : 0, > >>>>>>>>> > >>>>>>>>> then elevator is switched to real one from none, and check on q->elevator > >>>>>>>>> becomes not consistent. > >>>>>>>> > >>>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it > >>>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice > >>>>>>>> benefit is that it allows dropping the weird ref get on plug flush, > >>>>>>>> and batches getting the refs as well. > >>>>>>> > >>>>>>> Yi/Steffen, can you try pulling this into your test kernel: > >>>>>>> > >>>>>>> git://git.kernel.dk/linux-block for-next > >>>>>>> > >>>>>>> and see if it fixes the issue for you. Thanks! > >>>>>> > >>>>>> It still can be reproduced with the latest linux-block/for-next, here is the log > >>>>>> > >>>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next > >>>>> > >>>>> Hi Yi, > >>>>> > >>>>> Please try the following change: > >>>>> > >>>>> > >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>>>> index e1e64964a31b..eb634a9c61ff 100644 > >>>>> --- a/block/blk-mq.c > >>>>> +++ b/block/blk-mq.c > >>>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > >>>>> .q = q, > >>>>> .flags = flags, > >>>>> .cmd_flags = op, > >>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, > >>>>> .nr_tags = 1, > >>>>> }; > >>>>> struct request *rq; > >>>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > >>>>> if (ret) > >>>>> return ERR_PTR(ret); > >>>>> > >>>>> + data.rq_flags = q->elevator ? RQF_ELV : 0, > >>>>> rq = __blk_mq_alloc_requests(&data); > >>>>> if (!rq) > >>>>> goto out_queue_exit; > >>>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > >>>>> .q = q, > >>>>> .flags = flags, > >>>>> .cmd_flags = op, > >>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, > >>>>> .nr_tags = 1, > >>>>> }; > >>>>> u64 alloc_time_ns = 0; > >>>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > >>>>> ret = blk_queue_enter(q, flags); > >>>>> if (ret) > >>>>> return ERR_PTR(ret); > >>>>> + data.rq_flags = q->elevator ? RQF_ELV : 0, > >>>> > >>>> Don't think that will compile, but I guess the point is that we can't do > >>>> this assignment before queue enter, in case we're in the midst of > >>>> switching schedulers. Which is indeed a valid concern. > >>> > >>> Something like the below. Maybe? On top of the for-next that was already > >>> pulled in. > >>> > >>> > >>> diff --git a/block/blk-mq.c b/block/blk-mq.c > >>> index b01e05e02277..121f1898d529 100644 > >>> --- a/block/blk-mq.c > >>> +++ b/block/blk-mq.c > >>> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) > >>> if (data->cmd_flags & REQ_NOWAIT) > >>> data->flags |= BLK_MQ_REQ_NOWAIT; > >>> > >>> - if (data->rq_flags & RQF_ELV) { > >>> + if (q->elevator) { > >>> struct elevator_queue *e = q->elevator; > >>> > >>> + data->rq_flags |= RQF_ELV; > >>> + > >>> /* > >>> * Flush/passthrough requests are special and go directly to the > >>> * dispatch list. Don't include reserved tags in the > >>> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > >>> .q = q, > >>> .flags = flags, > >>> .cmd_flags = op, > >>> - .rq_flags = q->elevator ? RQF_ELV : 0, > >>> .nr_tags = 1, > >>> }; > >>> struct request *rq; > >>> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > >>> .q = q, > >>> .flags = flags, > >>> .cmd_flags = op, > >>> - .rq_flags = q->elevator ? RQF_ELV : 0, > >>> .nr_tags = 1, > >>> }; > >>> u64 alloc_time_ns = 0; > >>> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > >>> > >>> if (!q->elevator) > >>> blk_mq_tag_busy(data.hctx); > >>> + else > >>> + data.rq_flags |= RQF_ELV; > >>> > >>> ret = -EWOULDBLOCK; > >>> tag = blk_mq_get_tag(&data); > >>> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio) > >>> .q = q, > >>> .nr_tags = 1, > >>> .cmd_flags = bio->bi_opf, > >>> - .rq_flags = q->elevator ? RQF_ELV : 0, > >>> }; > >> > >> The above patch looks fine. > >> > >> BTW, 9ede85cb670c ("block: move queue enter logic into > >> blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(), > >> which seems dangerous, especially blk_mq_sched_bio_merge() needs > >> hctx/ctx which requires q_usage_counter to be grabbed. > > > > I think the best solution is to enter just for that as well, and just > > retain that enter state. I'll update the patch, there's some real fixes > > in there too for the batched alloc. Will post them later today. > > Is it needed, though? As far as I can tell, it's only needed > persistently for having the IO inflight, otherwise if the premise is > that the queue can just go away, we're in trouble before that too. And I > don't think that's the case. inflight bio just means that bdev is opened, and request queue won't go away. But a lot things still can happen: elevator switch, update nr_hw_queues, clean up request queue, so looks blk_mq_sched_bio_merge() not safe without grabbing .q_usage_counter? Thanks, Ming ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 2021-11-03 16:09 ` Ming Lei @ 2021-11-03 16:36 ` Jens Axboe 0 siblings, 0 replies; 24+ messages in thread From: Jens Axboe @ 2021-11-03 16:36 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi On 11/3/21 10:09 AM, Ming Lei wrote: > On Wed, Nov 03, 2021 at 09:49:20AM -0600, Jens Axboe wrote: >> On 11/3/21 9:41 AM, Jens Axboe wrote: >>> On 11/3/21 9:16 AM, Ming Lei wrote: >>>> On Wed, Nov 03, 2021 at 09:10:17AM -0600, Jens Axboe wrote: >>>>> On 11/3/21 9:03 AM, Jens Axboe wrote: >>>>>> On 11/3/21 8:57 AM, Ming Lei wrote: >>>>>>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: >>>>>>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@kernel.dk> wrote: >>>>>>>>> >>>>>>>>> On 11/2/21 9:54 PM, Jens Axboe wrote: >>>>>>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>>>>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>>>>>>>>>>> hopefully shine a bit of light on the issue. >>>>>>>>>>>>>>> >>>>>>>>>>>>> Hi Jens >>>>>>>>>>>>> >>>>>>>>>>>>> Here is the full log: >>>>>>>>>>>> >>>>>>>>>>>> Thanks! I think I see what it could be - can you try this one as well, >>>>>>>>>>>> would like to confirm that the condition I think is triggering is what >>>>>>>>>>>> is triggering. >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>>>>>>>> index 07eb1412760b..81dede885231 100644 >>>>>>>>>>>> --- a/block/blk-mq.c >>>>>>>>>>>> +++ b/block/blk-mq.c >>>>>>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>>>>>> if (plug && plug->cached_rq) { >>>>>>>>>>>> rq = rq_list_pop(&plug->cached_rq); >>>>>>>>>>>> INIT_LIST_HEAD(&rq->queuelist); >>>>>>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>>>>>>> } else { >>>>>>>>>>>> struct blk_mq_alloc_data data = { >>>>>>>>>>>> .q = q, >>>>>>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>>>>>> bio_wouldblock_error(bio); >>>>>>>>>>>> goto queue_exit; >>>>>>>>>>>> } >>>>>>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>>>>>> >>>>>>>>>>> Hello Jens, >>>>>>>>>>> >>>>>>>>>>> I guess the issue could be the following code run without grabbing >>>>>>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>>>>>>>>>> >>>>>>>>>>> .rq_flags = q->elevator ? RQF_ELV : 0, >>>>>>>>>>> >>>>>>>>>>> then elevator is switched to real one from none, and check on q->elevator >>>>>>>>>>> becomes not consistent. >>>>>>>>>> >>>>>>>>>> Indeed, that’s where I was going with this. I have a patch, testing it >>>>>>>>>> locally but it’s getting late. Will send it out tomorrow. The nice >>>>>>>>>> benefit is that it allows dropping the weird ref get on plug flush, >>>>>>>>>> and batches getting the refs as well. >>>>>>>>> >>>>>>>>> Yi/Steffen, can you try pulling this into your test kernel: >>>>>>>>> >>>>>>>>> git://git.kernel.dk/linux-block for-next >>>>>>>>> >>>>>>>>> and see if it fixes the issue for you. Thanks! >>>>>>>> >>>>>>>> It still can be reproduced with the latest linux-block/for-next, here is the log >>>>>>>> >>>>>>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next >>>>>>> >>>>>>> Hi Yi, >>>>>>> >>>>>>> Please try the following change: >>>>>>> >>>>>>> >>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>>> index e1e64964a31b..eb634a9c61ff 100644 >>>>>>> --- a/block/blk-mq.c >>>>>>> +++ b/block/blk-mq.c >>>>>>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>>>>>> .q = q, >>>>>>> .flags = flags, >>>>>>> .cmd_flags = op, >>>>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>>>>> .nr_tags = 1, >>>>>>> }; >>>>>>> struct request *rq; >>>>>>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>>>>>> if (ret) >>>>>>> return ERR_PTR(ret); >>>>>>> >>>>>>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >>>>>>> rq = __blk_mq_alloc_requests(&data); >>>>>>> if (!rq) >>>>>>> goto out_queue_exit; >>>>>>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>>>>>> .q = q, >>>>>>> .flags = flags, >>>>>>> .cmd_flags = op, >>>>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>>>>> .nr_tags = 1, >>>>>>> }; >>>>>>> u64 alloc_time_ns = 0; >>>>>>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>>>>>> ret = blk_queue_enter(q, flags); >>>>>>> if (ret) >>>>>>> return ERR_PTR(ret); >>>>>>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >>>>>> >>>>>> Don't think that will compile, but I guess the point is that we can't do >>>>>> this assignment before queue enter, in case we're in the midst of >>>>>> switching schedulers. Which is indeed a valid concern. >>>>> >>>>> Something like the below. Maybe? On top of the for-next that was already >>>>> pulled in. >>>>> >>>>> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>> index b01e05e02277..121f1898d529 100644 >>>>> --- a/block/blk-mq.c >>>>> +++ b/block/blk-mq.c >>>>> @@ -433,9 +433,11 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data) >>>>> if (data->cmd_flags & REQ_NOWAIT) >>>>> data->flags |= BLK_MQ_REQ_NOWAIT; >>>>> >>>>> - if (data->rq_flags & RQF_ELV) { >>>>> + if (q->elevator) { >>>>> struct elevator_queue *e = q->elevator; >>>>> >>>>> + data->rq_flags |= RQF_ELV; >>>>> + >>>>> /* >>>>> * Flush/passthrough requests are special and go directly to the >>>>> * dispatch list. Don't include reserved tags in the >>>>> @@ -494,7 +496,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>>>> .q = q, >>>>> .flags = flags, >>>>> .cmd_flags = op, >>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>>> .nr_tags = 1, >>>>> }; >>>>> struct request *rq; >>>>> @@ -524,7 +525,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>>>> .q = q, >>>>> .flags = flags, >>>>> .cmd_flags = op, >>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>>> .nr_tags = 1, >>>>> }; >>>>> u64 alloc_time_ns = 0; >>>>> @@ -565,6 +565,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>>>> >>>>> if (!q->elevator) >>>>> blk_mq_tag_busy(data.hctx); >>>>> + else >>>>> + data.rq_flags |= RQF_ELV; >>>>> >>>>> ret = -EWOULDBLOCK; >>>>> tag = blk_mq_get_tag(&data); >>>>> @@ -2560,7 +2562,6 @@ void blk_mq_submit_bio(struct bio *bio) >>>>> .q = q, >>>>> .nr_tags = 1, >>>>> .cmd_flags = bio->bi_opf, >>>>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>>>> }; >>>> >>>> The above patch looks fine. >>>> >>>> BTW, 9ede85cb670c ("block: move queue enter logic into >>>> blk_mq_submit_bio()") moves the queue enter into blk_mq_submit_bio(), >>>> which seems dangerous, especially blk_mq_sched_bio_merge() needs >>>> hctx/ctx which requires q_usage_counter to be grabbed. >>> >>> I think the best solution is to enter just for that as well, and just >>> retain that enter state. I'll update the patch, there's some real fixes >>> in there too for the batched alloc. Will post them later today. >> >> Is it needed, though? As far as I can tell, it's only needed >> persistently for having the IO inflight, otherwise if the premise is >> that the queue can just go away, we're in trouble before that too. And I >> don't think that's the case. > > inflight bio just means that bdev is opened, and request queue won't > go away. > > But a lot things still can happen: elevator switch, update nr_hw_queues, > clean up request queue, so looks blk_mq_sched_bio_merge() not safe > without grabbing .q_usage_counter? Yes good point, we need a consistent sched/queue view at that point. I'll update it. -- Jens Axboe ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2>]
* RE:(2) [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 [not found] ` <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2> @ 2021-11-03 3:28 ` Daejun Park 0 siblings, 0 replies; 24+ messages in thread From: Daejun Park @ 2021-11-03 3:28 UTC (permalink / raw) To: Jens Axboe, Yi Zhang, Daejun Park Cc: Steffen Maier, linux-block, Linux-Next Mailing List, linux-scsi Hi Jens, >>> Can either one of you try with this patch? Won't fix anything, but it'll >>> hopefully shine a bit of light on the issue. >>> >> Hi Jens >> >> Here is the full log: > >Thanks! I think I see what it could be - can you try this one as well, >would like to confirm that the condition I think is triggering is what >is triggering. > >diff --git a/block/blk-mq.c b/block/blk-mq.c >index 07eb1412760b..81dede885231 100644 >--- a/block/blk-mq.c >+++ b/block/blk-mq.c >@@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) > if (plug && plug->cached_rq) { > rq = rq_list_pop(&plug->cached_rq); > INIT_LIST_HEAD(&rq->queuelist); >+ WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >+ WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > } else { > struct blk_mq_alloc_data data = { > .q = q, >@@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) > bio_wouldblock_error(bio); > goto queue_exit; > } >+ WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >+ WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > } > > trace_block_getrq(bio); > >-- >Jens Axboe The first reported warning was started from calling scsi_execute(), so how about add the checking code in __scsi_execute()? Thanks, Daejun ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2021-11-05 11:14 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAHj4cs-NUKzGj5pgzRhDgdrGGbgPBqUoQ44+xgvk6njH9a_RYQ@mail.gmail.com> 2021-11-02 19:00 ` [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178 Steffen Maier 2021-11-02 19:02 ` Jens Axboe 2021-11-02 20:03 ` Jens Axboe 2021-11-03 2:21 ` Yi Zhang 2021-11-03 3:21 ` Jens Axboe 2021-11-03 3:51 ` Ming Lei 2021-11-03 3:54 ` Jens Axboe 2021-11-03 4:00 ` Yi Zhang 2021-11-03 19:03 ` Jens Axboe 2021-11-05 11:13 ` Yi Zhang 2021-11-03 11:59 ` Jens Axboe 2021-11-03 13:59 ` Yi Zhang 2021-11-03 14:26 ` Jens Axboe 2021-11-03 14:57 ` Ming Lei 2021-11-03 15:03 ` Jens Axboe 2021-11-03 15:09 ` Ming Lei 2021-11-03 15:12 ` Jens Axboe 2021-11-03 15:10 ` Jens Axboe 2021-11-03 15:16 ` Ming Lei 2021-11-03 15:41 ` Jens Axboe 2021-11-03 15:49 ` Jens Axboe 2021-11-03 16:09 ` Ming Lei 2021-11-03 16:36 ` Jens Axboe [not found] ` <CGME20211103032116epcas2p13b9f3fad0fe84f58c9b7f36320c71854@epcms2p2> 2021-11-03 3:28 ` Daejun Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).