* first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands @ 2020-09-11 19:53 Borislav Petkov 2020-09-11 22:17 ` Borislav Petkov 0 siblings, 1 reply; 12+ messages in thread From: Borislav Petkov @ 2020-09-11 19:53 UTC (permalink / raw) To: Johannes Thumshirn, Damien Le Moal Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml Tztztz, you're not my colleague anymore and now you break my box?!?! :-))) Here's how: current Linus master doesn't finish booting on my workstation and what I see as the last line printed on the console is: ... [ 6.778532] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 6.785275] ata8.00: ATA-10: WDC WD1003FZEX-00K3CA0, 01.01A01, max UDMA/133 [ 6.792247] ata8.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA [ 6.799848] ata8.00: configured for UDMA/133 [ 6.804679] scsi 7:0:0:0: Direct-Access ATA WDC WD1003FZEX-0 1A01 PQ: 0 ANSI: 5 [ 6.812884] sd 7:0:0:0: Attached scsi generic sg2 type 0 [ 6.812934] sd 7:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [ 6.825897] sd 7:0:0:0: [sdc] 4096-byte physical blocks [ 6.831151] sd 7:0:0:0: [sdc] Write Protect is off [ 6.835966] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 6.841052] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 6.878559] sdc: sdc1 [ 6.881069] sd 7:0:0:0: [sdc] Attached SCSI disk <--- EOF I had to bisect on the workstation (yah, I know, bisecting on the workstation is no fun...), see bisect log at the end. Now, looking at that patch: 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") yeah, that doesn't revert cleanly. But it talks about zoned-something devices and that rings a bell because you guys broke my zoned device once already: [ 4.954946] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 4.972405] ata4.00: NCQ Send/Recv Log not supported [ 4.988254] ata4.00: ATA-10: ST8000AS0022-1WL17Z, SN01, max UDMA/133 [ 4.994569] ata4.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 32), AA [ 5.004576] ata4.00: NCQ Send/Recv Log not supported [ 5.011398] ata4.00: configured for UDMA/133 [ 5.015727] scsi 3:0:0:0: Direct-Access ATA ST8000AS0022-1WL SN01 PQ: 0 ANSI: 5 [ 5.023849] sd 3:0:0:0: Attached scsi generic sg1 type 0 [ 5.023889] sd 3:0:0:0: [sdb] Host-aware zoned block device [ 5.034730] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [ 5.042511] sd 3:0:0:0: [sdb] 4096-byte physical blocks [ 5.047722] sd 3:0:0:0: [sdb] Write Protect is off [ 5.052499] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 5.057538] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.143300] sdb: sdb1 [ 5.145683] sdb: disabling host aware zoned block device support due to partitions [ 5.153311] sd 3:0:0:0: [sdb] Attached SCSI disk so I thought, lemme disable that device and see if it boots. If it does, it probably points more in the direction of this patch. So I booted with "libata.force=4.00:disable" and yap, the box was up just fine. As to a reproducer - I'm guessing grabbing such a host-aware zoned device and slapping a single partition on it, would probably trigger it. If not, I can try to test patches but it'll take a while because this is my main box. $ git bisect start # bad: [b3a9e3b9622ae10064826dccb4f7a52bd88c7407] Linux 5.8-rc1 git bisect bad b3a9e3b9622ae10064826dccb4f7a52bd88c7407 # good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7 git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162 # bad: [ee01c4d72adffb7d424535adf630f2955748fa8b] Merge branch 'akpm' (patches from Andrew) git bisect bad ee01c4d72adffb7d424535adf630f2955748fa8b # bad: [16d91548d1057691979de4686693f0ff92f46000] Merge tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux git bisect bad 16d91548d1057691979de4686693f0ff92f46000 # good: [cfa3b8068b09f25037146bfd5eed041b78878bee] Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma git bisect good cfa3b8068b09f25037146bfd5eed041b78878bee # good: [3fd911b69b3117e03181262fc19ae6c3ef6962ce] Merge tag 'drm-misc-next-2020-05-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-next git bisect good 3fd911b69b3117e03181262fc19ae6c3ef6962ce # good: [1966391fa576e1fb2701be8bcca197d8f72737b7] mm/migrate.c: attach_page_private already does the get_page git bisect good 1966391fa576e1fb2701be8bcca197d8f72737b7 # good: [0c8d3fceade2ab1bbac68bca013e62bfdb851d19] bcache: configure the asynchronous registertion to be experimental git bisect good 0c8d3fceade2ab1bbac68bca013e62bfdb851d19 # bad: [f41030a20b38552a2da3b3f6bc9e7a78637d6c23] Merge tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux git bisect bad f41030a20b38552a2da3b3f6bc9e7a78637d6c23 # bad: [419c3d5e8012928fbf9a086b07b618146cc9277b] blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG git bisect bad 419c3d5e8012928fbf9a086b07b618146cc9277b # bad: [b2f609e191edc9c7a9dec603318461eeb23f8a6b] block: move the blk-mq calls out of part_in_flight{,_rw} git bisect bad b2f609e191edc9c7a9dec603318461eeb23f8a6b # bad: [29b2a3aa296711cfdadafbf627c2d9a388fc84ee] block: export bio_release_pages and bio_iov_iter_get_pages git bisect bad 29b2a3aa296711cfdadafbf627c2d9a388fc84ee # good: [02992df822e7e36685593aad10721a5a9f8d3402] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no git bisect good 02992df822e7e36685593aad10721a5a9f8d3402 # good: [e732671aa5f67232cf760666a15242dead003362] block: Modify revalidate zones git bisect good e732671aa5f67232cf760666a15242dead003362 # bad: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands git bisect bad 5795eb443060148796beeba106e4366d7f1458a6 # good: [02494d35ba5547562aae4d9c4df2d6ec33d29012] scsi: sd_zbc: factor out sanity checks for zoned commands git bisect good 02494d35ba5547562aae4d9c4df2d6ec33d29012 # first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-11 19:53 first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands Borislav Petkov @ 2020-09-11 22:17 ` Borislav Petkov 2020-09-11 22:22 ` Randy Dunlap 2020-09-11 23:07 ` Borislav Petkov 0 siblings, 2 replies; 12+ messages in thread From: Borislav Petkov @ 2020-09-11 22:17 UTC (permalink / raw) To: Johannes Thumshirn, Damien Le Moal Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On Fri, Sep 11, 2020 at 09:53:12PM +0200, Borislav Petkov wrote: > Now, looking at that patch: > > 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") > > yeah, that doesn't revert cleanly. But it talks about zoned-something > devices and that rings a bell because you guys broke my zoned device > once already: Ok, so Johannes and I poked a bit on IRC and here it is: # CONFIG_BLK_DEV_ZONED is not set. Enabling it, fixes the issue. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-11 22:17 ` Borislav Petkov @ 2020-09-11 22:22 ` Randy Dunlap 2020-09-11 22:26 ` Johannes Thumshirn 2020-09-11 23:07 ` Borislav Petkov 1 sibling, 1 reply; 12+ messages in thread From: Randy Dunlap @ 2020-09-11 22:22 UTC (permalink / raw) To: Borislav Petkov, Johannes Thumshirn, Damien Le Moal Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On 9/11/20 3:17 PM, Borislav Petkov wrote: > On Fri, Sep 11, 2020 at 09:53:12PM +0200, Borislav Petkov wrote: >> Now, looking at that patch: >> >> 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") >> >> yeah, that doesn't revert cleanly. But it talks about zoned-something >> devices and that rings a bell because you guys broke my zoned device >> once already: > > Ok, so Johannes and I poked a bit on IRC and here it is: > > # CONFIG_BLK_DEV_ZONED is not set. > > Enabling it, fixes the issue. > Uh, you are not saying that enabling that CONFIG_ is the final fix, are you? If so, do I need to enable it, even if I don't have a zoned block device? -- ~Randy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-11 22:22 ` Randy Dunlap @ 2020-09-11 22:26 ` Johannes Thumshirn 2020-09-11 22:38 ` Randy Dunlap 0 siblings, 1 reply; 12+ messages in thread From: Johannes Thumshirn @ 2020-09-11 22:26 UTC (permalink / raw) To: Randy Dunlap, Borislav Petkov, Damien Le Moal Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On 12/09/2020 00:22, Randy Dunlap wrote: > On 9/11/20 3:17 PM, Borislav Petkov wrote: >> On Fri, Sep 11, 2020 at 09:53:12PM +0200, Borislav Petkov wrote: >>> Now, looking at that patch: >>> >>> 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") >>> >>> yeah, that doesn't revert cleanly. But it talks about zoned-something >>> devices and that rings a bell because you guys broke my zoned device >>> once already: >> >> Ok, so Johannes and I poked a bit on IRC and here it is: >> >> # CONFIG_BLK_DEV_ZONED is not set. >> >> Enabling it, fixes the issue. >> > > Uh, you are not saying that enabling that CONFIG_ is the final fix, are you? > > If so, do I need to enable it, even if I don't have a zoned block device? > No he does have a zoned block device and no this is not the final fix, I think one of the stubbed out functions is broken, but it's midnight here so we're calling it a day and chime back in on Monday. And this setup is a bit special, as Boris is using partitions on a host-aware zoned block device which is somewhat exotic (see add_partition()). Byte, Johannes ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-11 22:26 ` Johannes Thumshirn @ 2020-09-11 22:38 ` Randy Dunlap 0 siblings, 0 replies; 12+ messages in thread From: Randy Dunlap @ 2020-09-11 22:38 UTC (permalink / raw) To: Johannes Thumshirn, Borislav Petkov, Damien Le Moal Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On 9/11/20 3:26 PM, Johannes Thumshirn wrote: > On 12/09/2020 00:22, Randy Dunlap wrote: >> On 9/11/20 3:17 PM, Borislav Petkov wrote: >>> On Fri, Sep 11, 2020 at 09:53:12PM +0200, Borislav Petkov wrote: >>>> Now, looking at that patch: >>>> >>>> 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") >>>> >>>> yeah, that doesn't revert cleanly. But it talks about zoned-something >>>> devices and that rings a bell because you guys broke my zoned device >>>> once already: >>> >>> Ok, so Johannes and I poked a bit on IRC and here it is: >>> >>> # CONFIG_BLK_DEV_ZONED is not set. >>> >>> Enabling it, fixes the issue. >>> >> >> Uh, you are not saying that enabling that CONFIG_ is the final fix, are you? >> >> If so, do I need to enable it, even if I don't have a zoned block device? >> > > No he does have a zoned block device and no this is not the final fix, I > think one of the stubbed out functions is broken, but it's midnight here > so we're calling it a day and chime back in on Monday. Sure, thanks for the answers. > And this setup is a bit special, as Boris is using partitions on a host-aware > zoned block device which is somewhat exotic (see add_partition()). > > Byte, ? :) -- ~Randy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-11 22:17 ` Borislav Petkov 2020-09-11 22:22 ` Randy Dunlap @ 2020-09-11 23:07 ` Borislav Petkov 2020-09-12 2:31 ` Damien Le Moal 1 sibling, 1 reply; 12+ messages in thread From: Borislav Petkov @ 2020-09-11 23:07 UTC (permalink / raw) To: Johannes Thumshirn, Damien Le Moal Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On Sat, Sep 12, 2020 at 12:17:59AM +0200, Borislav Petkov wrote: > Enabling it, fixes the issue. Btw, I just hit the below warn with 5.8, while booting with the above config option enabled. Looks familiar and I didn't trigger it with 5.9-rc4+ so you guys either fixed it or something changed in-between: [ 5.124321] ata4.00: NCQ Send/Recv Log not supported [ 5.131484] ata4.00: configured for UDMA/133 [ 5.135847] scsi 3:0:0:0: Direct-Access ATA ST8000AS0022-1WL SN01 PQ: 0 ANSI: 5 [ 5.143972] sd 3:0:0:0: Attached scsi generic sg1 type 0 [ 5.144033] sd 3:0:0:0: [sdb] Host-aware zoned block device [ 5.177105] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) [ 5.184880] sd 3:0:0:0: [sdb] 4096-byte physical blocks [ 5.190084] sd 3:0:0:0: [sdb] 29808 zones of 524288 logical blocks + 1 runt zone [ 5.197439] sd 3:0:0:0: [sdb] Write Protect is off [ 5.202220] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 5.207260] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.356631] sdb: sdb1 [ 5.359014] sdb: disabling host aware zoned block device support due to partitions [ 5.389941] ------------[ cut here ]------------ [ 5.394557] WARNING: CPU: 8 PID: 164 at block/blk-settings.c:236 blk_queue_max_zone_append_sectors+0x12/0x40 [ 5.404300] Modules linked in: [ 5.407365] CPU: 8 PID: 164 Comm: kworker/u32:6 Not tainted 5.8.0 #7 [ 5.413682] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019 [ 5.424191] Workqueue: events_unbound async_run_entry_fn [ 5.429482] RIP: 0010:blk_queue_max_zone_append_sectors+0x12/0x40 [ 5.435543] Code: fe 0f 00 00 53 48 89 fb 0f 86 3d 07 00 00 48 89 b3 e0 03 00 00 5b c3 90 0f 1f 44 00 00 8b 87 40 04 00 00 ff c8 83 f8 01 76 03 <0f> 0b c3 8b 87 f8 03 00 00 39 87 f0 03 00 00 0f 46 87 f0 03 00 00 [ 5.454099] RSP: 0018:ffffc90000697c60 EFLAGS: 00010282 [ 5.459306] RAX: 00000000ffffffff RBX: ffff8887fa0a9400 RCX: 0000000000000000 [ 5.466390] RDX: ffff8887faf0d400 RSI: 0000000000000540 RDI: ffff8887f0dde6c8 [ 5.473474] RBP: 0000000000007471 R08: 00000000001d1c40 R09: ffff8887fee29ad0 [ 5.480559] R10: 00000001434bac00 R11: 0000000000358275 R12: 0000000000080000 [ 5.487643] R13: ffff8887f0dde6c8 R14: ffff8887fa0a9738 R15: 0000000000000000 [ 5.494726] FS: 0000000000000000(0000) GS:ffff8887fee00000(0000) knlGS:0000000000000000 [ 5.502757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.508474] CR2: 0000000000000000 CR3: 0000000002209000 CR4: 00000000003406e0 [ 5.515558] Call Trace: [ 5.518026] sd_zbc_read_zones+0x323/0x480 [ 5.522122] sd_revalidate_disk+0x122b/0x2000 [ 5.526472] ? __device_add_disk+0x2f7/0x4e0 [ 5.530738] sd_probe+0x347/0x44b [ 5.534058] really_probe+0x2c4/0x3f0 [ 5.537720] driver_probe_device+0xe1/0x150 [ 5.541902] ? driver_allows_async_probing+0x50/0x50 [ 5.546852] bus_for_each_drv+0x6a/0xa0 [ 5.550683] __device_attach_async_helper+0x8c/0xd0 [ 5.555547] async_run_entry_fn+0x4a/0x180 [ 5.559636] process_one_work+0x1a5/0x3a0 [ 5.563637] worker_thread+0x50/0x3a0 [ 5.567300] ? process_one_work+0x3a0/0x3a0 [ 5.571480] kthread+0x117/0x160 [ 5.574715] ? kthread_park+0x90/0x90 [ 5.578377] ret_from_fork+0x22/0x30 [ 5.581960] ---[ end trace 94141003236730cf ]--- [ 5.586578] sd 3:0:0:0: [sdb] Attached SCSI disk [ 6.186783] ata5: failed to resume link (SControl 0) [ 6.191818] ata5: SATA link down (SStatus 0 SControl 0) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-11 23:07 ` Borislav Petkov @ 2020-09-12 2:31 ` Damien Le Moal 2020-09-12 8:37 ` Borislav Petkov 2020-09-12 9:09 ` Johannes Thumshirn 0 siblings, 2 replies; 12+ messages in thread From: Damien Le Moal @ 2020-09-12 2:31 UTC (permalink / raw) To: Borislav Petkov, Johannes Thumshirn Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On 2020/09/12 8:07, Borislav Petkov wrote: > On Sat, Sep 12, 2020 at 12:17:59AM +0200, Borislav Petkov wrote: >> Enabling it, fixes the issue. > > Btw, I just hit the below warn with 5.8, while booting with the above > config option enabled. Looks familiar and I didn't trigger it with > 5.9-rc4+ so you guys either fixed it or something changed in-between: > > [ 5.124321] ata4.00: NCQ Send/Recv Log not supported > [ 5.131484] ata4.00: configured for UDMA/133 > [ 5.135847] scsi 3:0:0:0: Direct-Access ATA ST8000AS0022-1WL SN01 PQ: 0 ANSI: 5 > [ 5.143972] sd 3:0:0:0: Attached scsi generic sg1 type 0 > [ 5.144033] sd 3:0:0:0: [sdb] Host-aware zoned block device > [ 5.177105] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) > [ 5.184880] sd 3:0:0:0: [sdb] 4096-byte physical blocks > [ 5.190084] sd 3:0:0:0: [sdb] 29808 zones of 524288 logical blocks + 1 runt zone > [ 5.197439] sd 3:0:0:0: [sdb] Write Protect is off > [ 5.202220] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > [ 5.207260] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > [ 5.356631] sdb: sdb1 > [ 5.359014] sdb: disabling host aware zoned block device support due to partitions > [ 5.389941] ------------[ cut here ]------------ > [ 5.394557] WARNING: CPU: 8 PID: 164 at block/blk-settings.c:236 blk_queue_max_zone_append_sectors+0x12/0x40 > [ 5.404300] Modules linked in: > [ 5.407365] CPU: 8 PID: 164 Comm: kworker/u32:6 Not tainted 5.8.0 #7 > [ 5.413682] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019 > [ 5.424191] Workqueue: events_unbound async_run_entry_fn > [ 5.429482] RIP: 0010:blk_queue_max_zone_append_sectors+0x12/0x40 > [ 5.435543] Code: fe 0f 00 00 53 48 89 fb 0f 86 3d 07 00 00 48 89 b3 e0 03 00 00 5b c3 90 0f 1f 44 00 00 8b 87 40 04 00 00 ff c8 83 f8 01 76 03 <0f> 0b c3 8b 87 f8 03 00 00 39 87 f0 03 00 00 0f 46 87 f0 03 00 00 > [ 5.454099] RSP: 0018:ffffc90000697c60 EFLAGS: 00010282 > [ 5.459306] RAX: 00000000ffffffff RBX: ffff8887fa0a9400 RCX: 0000000000000000 > [ 5.466390] RDX: ffff8887faf0d400 RSI: 0000000000000540 RDI: ffff8887f0dde6c8 > [ 5.473474] RBP: 0000000000007471 R08: 00000000001d1c40 R09: ffff8887fee29ad0 > [ 5.480559] R10: 00000001434bac00 R11: 0000000000358275 R12: 0000000000080000 > [ 5.487643] R13: ffff8887f0dde6c8 R14: ffff8887fa0a9738 R15: 0000000000000000 > [ 5.494726] FS: 0000000000000000(0000) GS:ffff8887fee00000(0000) knlGS:0000000000000000 > [ 5.502757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 5.508474] CR2: 0000000000000000 CR3: 0000000002209000 CR4: 00000000003406e0 > [ 5.515558] Call Trace: > [ 5.518026] sd_zbc_read_zones+0x323/0x480 > [ 5.522122] sd_revalidate_disk+0x122b/0x2000 > [ 5.526472] ? __device_add_disk+0x2f7/0x4e0 > [ 5.530738] sd_probe+0x347/0x44b > [ 5.534058] really_probe+0x2c4/0x3f0 > [ 5.537720] driver_probe_device+0xe1/0x150 > [ 5.541902] ? driver_allows_async_probing+0x50/0x50 > [ 5.546852] bus_for_each_drv+0x6a/0xa0 > [ 5.550683] __device_attach_async_helper+0x8c/0xd0 > [ 5.555547] async_run_entry_fn+0x4a/0x180 > [ 5.559636] process_one_work+0x1a5/0x3a0 > [ 5.563637] worker_thread+0x50/0x3a0 > [ 5.567300] ? process_one_work+0x3a0/0x3a0 > [ 5.571480] kthread+0x117/0x160 > [ 5.574715] ? kthread_park+0x90/0x90 > [ 5.578377] ret_from_fork+0x22/0x30 > [ 5.581960] ---[ end trace 94141003236730cf ]--- > [ 5.586578] sd 3:0:0:0: [sdb] Attached SCSI disk > [ 6.186783] ata5: failed to resume link (SControl 0) > [ 6.191818] ata5: SATA link down (SStatus 0 SControl 0) > Can you try this: diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 95018e650f2d..620539162ef1 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2968,8 +2968,13 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp) } else { sdkp->zoned = (buffer[8] >> 4) & 3; if (sdkp->zoned == 1 && !disk_has_partitions(sdkp->disk)) { +#ifdef CONFIG_BLK_DEV_ZONED /* Host-aware */ q->limits.zoned = BLK_ZONED_HA; +#else + /* Host-aware drive is treated as a regular disk */ + q->limits.zoned = BLK_ZONED_NONE; +#endif } else { /* * Treat drive-managed devices and host-aware devices @@ -3404,12 +3409,12 @@ static int sd_probe(struct device *dev) sdkp->first_scan = 1; sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS; + sd_revalidate_disk(gd); + error = sd_zbc_init_disk(sdkp); if (error) goto out_free_index; - sd_revalidate_disk(gd); - gd->flags = GENHD_FL_EXT_DEVT; if (sdp->removable) { gd->flags |= GENHD_FL_REMOVABLE; diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 4933e7daf17d..f4dc81d48a01 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -241,6 +241,8 @@ static inline void sd_zbc_release_disk(struct scsi_disk *sdkp) {} static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { + if (sd_is_zoned(sdkp)) + sdkp->capacity = 0; return 0; } That should fix the above as well as the hang on boot with CONFIG_BLK_DEV_ZONED disabled (for that one I do not totally understand what is going on...). We do not have any host-aware disk for testing (as far as I know, nobody is selling these anymore), so our test setup is a bit lame in this area. We'll rig something up with tcmu-runner emulation to add tests for these devices to avoid a repeat of such problem. And we'll make sure to add a test for host-aware+partitions, since we at least know for sure there is one user :) Johannes: The "goto out_free_index;" on sd_zbc_init_disk() failure is wrong I think: the disk is already added and a ref taken on the dev, but out_free_index does not seem to do cleanup for that. Need to revisit this. Cheers. -- Damien Le Moal Western Digital Research ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-12 2:31 ` Damien Le Moal @ 2020-09-12 8:37 ` Borislav Petkov 2020-09-12 12:18 ` Damien Le Moal 2020-09-12 9:09 ` Johannes Thumshirn 1 sibling, 1 reply; 12+ messages in thread From: Borislav Petkov @ 2020-09-12 8:37 UTC (permalink / raw) To: Damien Le Moal Cc: Johannes Thumshirn, Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml Hi Damien, On Sat, Sep 12, 2020 at 02:31:55AM +0000, Damien Le Moal wrote: > Can you try this: sure, but it is white-space damaged: checking file drivers/scsi/sd.c patch: **** malformed patch at line 86: scsi_disk *sdkp) Welcome to the world of outlook and how sending patches with it never works. You guys might need linux.wdc.com now :-))) > That should fix the above as well as the hang on boot with CONFIG_BLK_DEV_ZONED > disabled (for that one I do not totally understand what is going on...). > > We do not have any host-aware disk for testing (as far as I know, nobody is > selling these anymore), Yeah, so Johannes said. I love it how a (relatively) brand new technology gets immediately deprecated :-\ > so our test setup is a bit lame in this area. We'll rig something up > with tcmu-runner emulation to add tests for these devices to avoid > a repeat of such problem. And we'll make sure to add a test for > host-aware+partitions, since we at least know for sure there is one > user :) Bah, I use it as a big data dump so if you say, I should make use of it as a proper zoned device (I've skimmed through http://zonedstorage.io/ a bit last night), I can try to find some time... Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-12 8:37 ` Borislav Petkov @ 2020-09-12 12:18 ` Damien Le Moal 2020-09-12 13:05 ` Borislav Petkov 0 siblings, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2020-09-12 12:18 UTC (permalink / raw) To: Borislav Petkov Cc: Johannes Thumshirn, Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On 2020/09/12 17:37, Borislav Petkov wrote: > Hi Damien, > > On Sat, Sep 12, 2020 at 02:31:55AM +0000, Damien Le Moal wrote: >> Can you try this: > > sure, but it is white-space damaged: > > checking file drivers/scsi/sd.c > patch: **** malformed patch at line 86: scsi_disk *sdkp) > > Welcome to the world of outlook and how sending patches with it never > works. You guys might need linux.wdc.com now :-))) Working on it :) But it was Thunderbird, getting real plain text emails with outlook is impossible. Corruption I think came from the copy-paste from the Mac bash terminal... Tabs get replaced by spacers. >> That should fix the above as well as the hang on boot with CONFIG_BLK_DEV_ZONED >> disabled (for that one I do not totally understand what is going on...). >> >> We do not have any host-aware disk for testing (as far as I know, nobody is >> selling these anymore), > > Yeah, so Johannes said. I love it how a (relatively) brand new > technology gets immediately deprecated :-\ Host-managed is still a thing, getting bigger. But host-aware never really gained a lot of traction due to, I think, the potentially very weird performance profile they can get into (Hmmm... similar to recent drive-managed noise...) >> so our test setup is a bit lame in this area. We'll rig something up >> with tcmu-runner emulation to add tests for these devices to avoid >> a repeat of such problem. And we'll make sure to add a test for >> host-aware+partitions, since we at least know for sure there is one >> user :) > > Bah, I use it as a big data dump so if you say, I should make use of it > as a proper zoned device (I've skimmed through http://zonedstorage.io/ a > bit last night), I can try to find some time... No worries, we will fix the mess (sorry we hit you again !). Also, Naohiro just posted btrfs zone support v7 !! Luckily, we can get that into 5.11. The patch was space corrupted, but could you still try it ? Did it solve your problem ? I can recend it (minus space corruption) if needed. Cheers. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-12 12:18 ` Damien Le Moal @ 2020-09-12 13:05 ` Borislav Petkov 0 siblings, 0 replies; 12+ messages in thread From: Borislav Petkov @ 2020-09-12 13:05 UTC (permalink / raw) To: Damien Le Moal Cc: Johannes Thumshirn, Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On Sat, Sep 12, 2020 at 12:18:52PM +0000, Damien Le Moal wrote: > But it was Thunderbird, getting real plain text emails with outlook is > impossible. Yeap. > Corruption I think came from the copy-paste from the Mac bash > terminal... Tabs get replaced by spacers. Yeah, never copy-paste hunks. I go "git diff > /tmp/diff" and then load it into the editor which has the mail opened ":r /tmp/diff" (vim). > Host-managed is still a thing, getting bigger. But host-aware never really > gained a lot of traction due to, I think, the potentially very weird performance > profile they can get into (Hmmm... similar to recent drive-managed noise...) Yeah, I had the suspicion that it would be some raisins like that. > No worries, we will fix the mess (sorry we hit you again !). Yeah, thanks and no probs. > The patch was space corrupted, but could you still try it ? Did it solve your > problem ? I can recend it (minus space corruption) if needed. Yeah, I see [ 3.263400] ata4: SATA max UDMA/133 abar m131072@0xfe380000 port 0xfe380280 irq 45 [ 4.943083] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 4.951590] ata4.00: NCQ Send/Recv Log not supported [ 4.961868] ata4.00: ATA-10: ST8000AS0022-1WL17Z, SN01, max UDMA/133 [ 4.977167] ata4.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 32), AA [ 4.987217] ata4.00: NCQ Send/Recv Log not supported [ 5.004230] ata4.00: configured for UDMA/133 but no "sdc" device or a partition or so. So I can't even do fdisk -l on it. I see your other version - I'll test that later because real life awaits and it is weekend and so on... Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-12 2:31 ` Damien Le Moal 2020-09-12 8:37 ` Borislav Petkov @ 2020-09-12 9:09 ` Johannes Thumshirn 2020-09-12 12:59 ` Damien Le Moal 1 sibling, 1 reply; 12+ messages in thread From: Johannes Thumshirn @ 2020-09-12 9:09 UTC (permalink / raw) To: Damien Le Moal, Borislav Petkov Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On 12/09/2020 04:31, Damien Le Moal wrote: > On 2020/09/12 8:07, Borislav Petkov wrote: >> On Sat, Sep 12, 2020 at 12:17:59AM +0200, Borislav Petkov wrote: >>> Enabling it, fixes the issue. >> >> Btw, I just hit the below warn with 5.8, while booting with the above >> config option enabled. Looks familiar and I didn't trigger it with >> 5.9-rc4+ so you guys either fixed it or something changed in-between: >> >> [ 5.124321] ata4.00: NCQ Send/Recv Log not supported >> [ 5.131484] ata4.00: configured for UDMA/133 >> [ 5.135847] scsi 3:0:0:0: Direct-Access ATA ST8000AS0022-1WL SN01 PQ: 0 ANSI: 5 >> [ 5.143972] sd 3:0:0:0: Attached scsi generic sg1 type 0 >> [ 5.144033] sd 3:0:0:0: [sdb] Host-aware zoned block device >> [ 5.177105] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) >> [ 5.184880] sd 3:0:0:0: [sdb] 4096-byte physical blocks >> [ 5.190084] sd 3:0:0:0: [sdb] 29808 zones of 524288 logical blocks + 1 runt zone >> [ 5.197439] sd 3:0:0:0: [sdb] Write Protect is off >> [ 5.202220] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 >> [ 5.207260] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA >> [ 5.356631] sdb: sdb1 >> [ 5.359014] sdb: disabling host aware zoned block device support due to partitions >> [ 5.389941] ------------[ cut here ]------------ >> [ 5.394557] WARNING: CPU: 8 PID: 164 at block/blk-settings.c:236 blk_queue_max_zone_append_sectors+0x12/0x40 >> [ 5.404300] Modules linked in: >> [ 5.407365] CPU: 8 PID: 164 Comm: kworker/u32:6 Not tainted 5.8.0 #7 >> [ 5.413682] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019 >> [ 5.424191] Workqueue: events_unbound async_run_entry_fn >> [ 5.429482] RIP: 0010:blk_queue_max_zone_append_sectors+0x12/0x40 >> [ 5.435543] Code: fe 0f 00 00 53 48 89 fb 0f 86 3d 07 00 00 48 89 b3 e0 03 00 00 5b c3 90 0f 1f 44 00 00 8b 87 40 04 00 00 ff c8 83 f8 01 76 03 <0f> 0b c3 8b 87 f8 03 00 00 39 87 f0 03 00 00 0f 46 87 f0 03 00 00 >> [ 5.454099] RSP: 0018:ffffc90000697c60 EFLAGS: 00010282 >> [ 5.459306] RAX: 00000000ffffffff RBX: ffff8887fa0a9400 RCX: 0000000000000000 >> [ 5.466390] RDX: ffff8887faf0d400 RSI: 0000000000000540 RDI: ffff8887f0dde6c8 >> [ 5.473474] RBP: 0000000000007471 R08: 00000000001d1c40 R09: ffff8887fee29ad0 >> [ 5.480559] R10: 00000001434bac00 R11: 0000000000358275 R12: 0000000000080000 >> [ 5.487643] R13: ffff8887f0dde6c8 R14: ffff8887fa0a9738 R15: 0000000000000000 >> [ 5.494726] FS: 0000000000000000(0000) GS:ffff8887fee00000(0000) knlGS:0000000000000000 >> [ 5.502757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 5.508474] CR2: 0000000000000000 CR3: 0000000002209000 CR4: 00000000003406e0 >> [ 5.515558] Call Trace: >> [ 5.518026] sd_zbc_read_zones+0x323/0x480 >> [ 5.522122] sd_revalidate_disk+0x122b/0x2000 >> [ 5.526472] ? __device_add_disk+0x2f7/0x4e0 >> [ 5.530738] sd_probe+0x347/0x44b >> [ 5.534058] really_probe+0x2c4/0x3f0 >> [ 5.537720] driver_probe_device+0xe1/0x150 >> [ 5.541902] ? driver_allows_async_probing+0x50/0x50 >> [ 5.546852] bus_for_each_drv+0x6a/0xa0 >> [ 5.550683] __device_attach_async_helper+0x8c/0xd0 >> [ 5.555547] async_run_entry_fn+0x4a/0x180 >> [ 5.559636] process_one_work+0x1a5/0x3a0 >> [ 5.563637] worker_thread+0x50/0x3a0 >> [ 5.567300] ? process_one_work+0x3a0/0x3a0 >> [ 5.571480] kthread+0x117/0x160 >> [ 5.574715] ? kthread_park+0x90/0x90 >> [ 5.578377] ret_from_fork+0x22/0x30 >> [ 5.581960] ---[ end trace 94141003236730cf ]--- >> [ 5.586578] sd 3:0:0:0: [sdb] Attached SCSI disk >> [ 6.186783] ata5: failed to resume link (SControl 0) >> [ 6.191818] ata5: SATA link down (SStatus 0 SControl 0) >> This looks like we're trying to configure zone append max sectors on a device that doesn't have the zoned flag set. > > Can you try this: > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 95018e650f2d..620539162ef1 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2968,8 +2968,13 @@ static void sd_read_block_characteristics(struct > scsi_disk *sdkp) > } else { > sdkp->zoned = (buffer[8] >> 4) & 3; > if (sdkp->zoned == 1 && !disk_has_partitions(sdkp->disk)) { > +#ifdef CONFIG_BLK_DEV_ZONED > /* Host-aware */ > q->limits.zoned = BLK_ZONED_HA; > +#else > + /* Host-aware drive is treated as a regular disk */ > + q->limits.zoned = BLK_ZONED_NONE; > +#endif > } else { > /* > * Treat drive-managed devices and host-aware devices > @@ -3404,12 +3409,12 @@ static int sd_probe(struct device *dev) > sdkp->first_scan = 1; > sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS; > > + sd_revalidate_disk(gd); > + > error = sd_zbc_init_disk(sdkp); > if (error) > goto out_free_index; > > - sd_revalidate_disk(gd); > - I don't get how my patch may have broken this. If we have CONFIG_BLK_DEV_ZONED=n, sd_zbc_init_disk() is stubbed out and return 0 unconditionally. So the call path will remain exactly the same. > gd->flags = GENHD_FL_EXT_DEVT; > if (sdp->removable) { > gd->flags |= GENHD_FL_REMOVABLE; > diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h > index 4933e7daf17d..f4dc81d48a01 100644 > --- a/drivers/scsi/sd.h > +++ b/drivers/scsi/sd.h > @@ -241,6 +241,8 @@ static inline void sd_zbc_release_disk(struct scsi_disk > *sdkp) {} > static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, > unsigned char *buf) > { > + if (sd_is_zoned(sdkp)) > + sdkp->capacity = 0; > return 0; > } > > That should fix the above as well as the hang on boot with CONFIG_BLK_DEV_ZONED > disabled (for that one I do not totally understand what is going on...). > > We do not have any host-aware disk for testing (as far as I know, nobody is > selling these anymore), so our test setup is a bit lame in this area. We'll rig > something up with tcmu-runner emulation to add tests for these devices to avoid > a repeat of such problem. And we'll make sure to add a test for > host-aware+partitions, since we at least know for sure there is one user :) > > Johannes: The "goto out_free_index;" on sd_zbc_init_disk() failure is wrong I > think: the disk is already added and a ref taken on the dev, but out_free_index > does not seem to do cleanup for that. Need to revisit this. Yes just seen it as well, will be cooking a fix for that. I'll build a test env to nail this down. Byte, Johannes ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands 2020-09-12 9:09 ` Johannes Thumshirn @ 2020-09-12 12:59 ` Damien Le Moal 0 siblings, 0 replies; 12+ messages in thread From: Damien Le Moal @ 2020-09-12 12:59 UTC (permalink / raw) To: Johannes Thumshirn, Borislav Petkov Cc: Christoph Hellwig, Martin K. Petersen, Hannes Reinecke, Jens Axboe, lkml On 2020/09/12 18:09, Johannes Thumshirn wrote: > On 12/09/2020 04:31, Damien Le Moal wrote: >> On 2020/09/12 8:07, Borislav Petkov wrote: >>> On Sat, Sep 12, 2020 at 12:17:59AM +0200, Borislav Petkov wrote: >>>> Enabling it, fixes the issue. >>> >>> Btw, I just hit the below warn with 5.8, while booting with the above >>> config option enabled. Looks familiar and I didn't trigger it with >>> 5.9-rc4+ so you guys either fixed it or something changed in-between: >>> >>> [ 5.124321] ata4.00: NCQ Send/Recv Log not supported >>> [ 5.131484] ata4.00: configured for UDMA/133 >>> [ 5.135847] scsi 3:0:0:0: Direct-Access ATA ST8000AS0022-1WL SN01 PQ: 0 ANSI: 5 >>> [ 5.143972] sd 3:0:0:0: Attached scsi generic sg1 type 0 >>> [ 5.144033] sd 3:0:0:0: [sdb] Host-aware zoned block device >>> [ 5.177105] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) >>> [ 5.184880] sd 3:0:0:0: [sdb] 4096-byte physical blocks >>> [ 5.190084] sd 3:0:0:0: [sdb] 29808 zones of 524288 logical blocks + 1 runt zone >>> [ 5.197439] sd 3:0:0:0: [sdb] Write Protect is off >>> [ 5.202220] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 >>> [ 5.207260] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA >>> [ 5.356631] sdb: sdb1 >>> [ 5.359014] sdb: disabling host aware zoned block device support due to partitions >>> [ 5.389941] ------------[ cut here ]------------ >>> [ 5.394557] WARNING: CPU: 8 PID: 164 at block/blk-settings.c:236 blk_queue_max_zone_append_sectors+0x12/0x40 >>> [ 5.404300] Modules linked in: >>> [ 5.407365] CPU: 8 PID: 164 Comm: kworker/u32:6 Not tainted 5.8.0 #7 >>> [ 5.413682] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019 >>> [ 5.424191] Workqueue: events_unbound async_run_entry_fn >>> [ 5.429482] RIP: 0010:blk_queue_max_zone_append_sectors+0x12/0x40 >>> [ 5.435543] Code: fe 0f 00 00 53 48 89 fb 0f 86 3d 07 00 00 48 89 b3 e0 03 00 00 5b c3 90 0f 1f 44 00 00 8b 87 40 04 00 00 ff c8 83 f8 01 76 03 <0f> 0b c3 8b 87 f8 03 00 00 39 87 f0 03 00 00 0f 46 87 f0 03 00 00 >>> [ 5.454099] RSP: 0018:ffffc90000697c60 EFLAGS: 00010282 >>> [ 5.459306] RAX: 00000000ffffffff RBX: ffff8887fa0a9400 RCX: 0000000000000000 >>> [ 5.466390] RDX: ffff8887faf0d400 RSI: 0000000000000540 RDI: ffff8887f0dde6c8 >>> [ 5.473474] RBP: 0000000000007471 R08: 00000000001d1c40 R09: ffff8887fee29ad0 >>> [ 5.480559] R10: 00000001434bac00 R11: 0000000000358275 R12: 0000000000080000 >>> [ 5.487643] R13: ffff8887f0dde6c8 R14: ffff8887fa0a9738 R15: 0000000000000000 >>> [ 5.494726] FS: 0000000000000000(0000) GS:ffff8887fee00000(0000) knlGS:0000000000000000 >>> [ 5.502757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 5.508474] CR2: 0000000000000000 CR3: 0000000002209000 CR4: 00000000003406e0 >>> [ 5.515558] Call Trace: >>> [ 5.518026] sd_zbc_read_zones+0x323/0x480 >>> [ 5.522122] sd_revalidate_disk+0x122b/0x2000 >>> [ 5.526472] ? __device_add_disk+0x2f7/0x4e0 >>> [ 5.530738] sd_probe+0x347/0x44b >>> [ 5.534058] really_probe+0x2c4/0x3f0 >>> [ 5.537720] driver_probe_device+0xe1/0x150 >>> [ 5.541902] ? driver_allows_async_probing+0x50/0x50 >>> [ 5.546852] bus_for_each_drv+0x6a/0xa0 >>> [ 5.550683] __device_attach_async_helper+0x8c/0xd0 >>> [ 5.555547] async_run_entry_fn+0x4a/0x180 >>> [ 5.559636] process_one_work+0x1a5/0x3a0 >>> [ 5.563637] worker_thread+0x50/0x3a0 >>> [ 5.567300] ? process_one_work+0x3a0/0x3a0 >>> [ 5.571480] kthread+0x117/0x160 >>> [ 5.574715] ? kthread_park+0x90/0x90 >>> [ 5.578377] ret_from_fork+0x22/0x30 >>> [ 5.581960] ---[ end trace 94141003236730cf ]--- >>> [ 5.586578] sd 3:0:0:0: [sdb] Attached SCSI disk >>> [ 6.186783] ata5: failed to resume link (SControl 0) >>> [ 6.191818] ata5: SATA link down (SStatus 0 SControl 0) >>> > > > This looks like we're trying to configure zone append max sectors > on a device that doesn't have the zoned flag set. Yep. That's because sd_zbc_revalidate_zones() entry test uses sd_is_zoned() and does not look at queue->limits.zoned which was changed to BLK_ZONE_NONE while sd_is_zoned() still correctly says that the drive is host-aware. We need to change the entry test to use blk_queue_is_zoned() instead of sd_is_zoned(). > >> >> Can you try this: >> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index 95018e650f2d..620539162ef1 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -2968,8 +2968,13 @@ static void sd_read_block_characteristics(struct >> scsi_disk *sdkp) >> } else { >> sdkp->zoned = (buffer[8] >> 4) & 3; >> if (sdkp->zoned == 1 && !disk_has_partitions(sdkp->disk)) { >> +#ifdef CONFIG_BLK_DEV_ZONED >> /* Host-aware */ >> q->limits.zoned = BLK_ZONED_HA; >> +#else >> + /* Host-aware drive is treated as a regular disk */ >> + q->limits.zoned = BLK_ZONED_NONE; >> +#endif >> } else { >> /* >> * Treat drive-managed devices and host-aware devices >> @@ -3404,12 +3409,12 @@ static int sd_probe(struct device *dev) >> sdkp->first_scan = 1; >> sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS; >> >> + sd_revalidate_disk(gd); >> + >> error = sd_zbc_init_disk(sdkp); >> if (error) >> goto out_free_index; >> >> - sd_revalidate_disk(gd); >> - > > > I don't get how my patch may have broken this. If we have > CONFIG_BLK_DEV_ZONED=n, sd_zbc_init_disk() is stubbed out and return 0 > unconditionally. So the call path will remain exactly the same. Yes, I am not 100% sure what is going on with CONFIG_BLK_DEV_ZONED=n. As far as I can see, everything should be OK, but clearly not... Let's fix that Monday first thing. Also, partitions/core.c may need some attention: with the zone append initialization already done, if a partition is created, changing the disk from HA to NONE should be done properly, cleaning up the zone offset array and max zone append sectors... But we can't call into scsi directly, so need to check that revalidation is triggered and detect the change... Not pretty. And the reverse too: if partitions are deleted, we need to go back to zoned==HA and so reinitialize zone append emulation. At this point, I really would love to simply treat host-aware disks as regular disks, always... That would make things *a lot* more simple. > >> gd->flags = GENHD_FL_EXT_DEVT; >> if (sdp->removable) { >> gd->flags |= GENHD_FL_REMOVABLE; >> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h >> index 4933e7daf17d..f4dc81d48a01 100644 >> --- a/drivers/scsi/sd.h >> +++ b/drivers/scsi/sd.h >> @@ -241,6 +241,8 @@ static inline void sd_zbc_release_disk(struct scsi_disk >> *sdkp) {} >> static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, >> unsigned char *buf) >> { >> + if (sd_is_zoned(sdkp)) >> + sdkp->capacity = 0; >> return 0; >> } >> >> That should fix the above as well as the hang on boot with CONFIG_BLK_DEV_ZONED >> disabled (for that one I do not totally understand what is going on...). >> >> We do not have any host-aware disk for testing (as far as I know, nobody is >> selling these anymore), so our test setup is a bit lame in this area. We'll rig >> something up with tcmu-runner emulation to add tests for these devices to avoid >> a repeat of such problem. And we'll make sure to add a test for >> host-aware+partitions, since we at least know for sure there is one user :) >> >> Johannes: The "goto out_free_index;" on sd_zbc_init_disk() failure is wrong I >> think: the disk is already added and a ref taken on the dev, but out_free_index >> does not seem to do cleanup for that. Need to revisit this. > > Yes just seen it as well, will be cooking a fix for that. > > I'll build a test env to nail this down. Here is my patch, revisited. Not complete yet, b ut no space corruption this time ! diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 95018e650f2d..49d6dc374fb6 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2968,22 +2968,36 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp) } else { sdkp->zoned = (buffer[8] >> 4) & 3; if (sdkp->zoned == 1 && !disk_has_partitions(sdkp->disk)) { +#ifdef CONFIG_BLK_DEV_ZONED /* Host-aware */ q->limits.zoned = BLK_ZONED_HA; +#else + /* Host-aware drive is treated as a regular disk */ + q->limits.zoned = BLK_ZONED_NONE; +#endif } else { /* * Treat drive-managed devices and host-aware devices * with partitions as regular block devices. */ q->limits.zoned = BLK_ZONED_NONE; - if (sdkp->zoned == 2 && sdkp->first_scan) - sd_printk(KERN_NOTICE, sdkp, - "Drive-managed SMR disk\n"); } } - if (blk_queue_is_zoned(q) && sdkp->first_scan) + + if (!sdkp->first_scan) + goto out; + + if (blk_queue_is_zoned(q)) { sd_printk(KERN_NOTICE, sdkp, "Host-%s zoned block device\n", q->limits.zoned == BLK_ZONED_HM ? "managed" : "aware"); + } else { + if (sdkp->zoned == 1) + sd_printk(KERN_NOTICE, sdkp, + "Host-aware SMR disk used as regular disk\n"); + else if (sdkp->zoned == 2) + sd_printk(KERN_NOTICE, sdkp, + "Drive-managed SMR disk\n"); + } out: kfree(buffer); @@ -3404,12 +3418,12 @@ static int sd_probe(struct device *dev) sdkp->first_scan = 1; sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS; + sd_revalidate_disk(gd); + error = sd_zbc_init_disk(sdkp); if (error) goto out_free_index; - sd_revalidate_disk(gd); - gd->flags = GENHD_FL_EXT_DEVT; if (sdp->removable) { gd->flags |= GENHD_FL_REMOVABLE; diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 4933e7daf17d..f4dc81d48a01 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -241,6 +241,8 @@ static inline void sd_zbc_release_disk(struct scsi_disk *sdkp) {} static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) { + if (sd_is_zoned(sdkp)) + sdkp->capacity = 0; return 0; } -- Damien Le Moal Western Digital Research ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-09-12 13:05 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-11 19:53 first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands Borislav Petkov 2020-09-11 22:17 ` Borislav Petkov 2020-09-11 22:22 ` Randy Dunlap 2020-09-11 22:26 ` Johannes Thumshirn 2020-09-11 22:38 ` Randy Dunlap 2020-09-11 23:07 ` Borislav Petkov 2020-09-12 2:31 ` Damien Le Moal 2020-09-12 8:37 ` Borislav Petkov 2020-09-12 12:18 ` Damien Le Moal 2020-09-12 13:05 ` Borislav Petkov 2020-09-12 9:09 ` Johannes Thumshirn 2020-09-12 12:59 ` Damien Le Moal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).