Linux-SCSI Archive on lore.kernel.org
 help / color / Atom feed
From: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>, Jens Axboe <axboe@kernel.dk>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] block: don't do revalidate zones on invalid devices
Date: Fri, 31 Jul 2020 08:41:27 +0000
Message-ID: <SN4PR0401MB359839E29558531A690AD3A19B4E0@SN4PR0401MB3598.namprd04.prod.outlook.com> (raw)
In-Reply-To: <MWHPR04MB37588ABB75E6A8B1E587AEDDE74E0@MWHPR04MB3758.namprd04.prod.outlook.com>

On 31/07/2020 10:01, Damien Le Moal wrote:
> On 2020/07/31 16:43, Johannes Thumshirn wrote:
>> On 30/07/2020 14:33, Damien Le Moal wrote:
>>> On 2020/07/30 20:25, Johannes Thumshirn wrote:
>>>> When we loose a device for whatever reason while (re)scanning zones, we
>>>> trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
>>>> log:
>>>>
>>>> sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
>>>> sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
>>>> sd 0:0:0:0: [sda] Write Protect is off
>>>> sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
>>>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>>>> sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
>>>> sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
>>>> sd 0:0:0:0: [sda] Sense Key : 0xb [current]
>>>> sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
>>>> sda: failed to revalidate zones
>>>> sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
>>>> sda: detected capacity change from 14000519643136 to 0
>>>> ==================================================================
>>>> BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
>>>> Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58
>>>>
>>>> CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
>>>> Workqueue: events_unbound async_run_entry_fn
>>>> Call Trace:
>>>>  dump_stack+0x7d/0xb0
>>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>>  kasan_report.cold+0x5/0x37
>>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>>  check_memory_region+0x145/0x1a0
>>>>  blk_revalidate_zone_cb+0x1b7/0x550
>>>>  sd_zbc_parse_report+0x1f1/0x370
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  ? sectors_to_logical+0x60/0x60
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  sd_zbc_report_zones+0x3c4/0x5e0
>>>>  ? sd_dif_config_host+0x500/0x500
>>>>  blk_revalidate_disk_zones+0x231/0x44d
>>>>  ? _raw_write_lock_irqsave+0xb0/0xb0
>>>>  ? blk_queue_free_zone_bitmaps+0xd0/0xd0
>>>>  sd_zbc_read_zones+0x8cf/0x11a0
>>>>  sd_revalidate_disk+0x305c/0x64e0
>>>>  ? __device_add_disk+0x776/0xf20
>>>>  ? read_capacity_16.part.0+0x1080/0x1080
>>>>  ? blk_alloc_devt+0x250/0x250
>>>>  ? create_object.isra.0+0x595/0xa20
>>>>  ? kasan_unpoison_shadow+0x33/0x40
>>>>  sd_probe+0x8dc/0xcd2
>>>>  really_probe+0x20e/0xaf0
>>>>  __driver_attach_async_helper+0x249/0x2d0
>>>>  async_run_entry_fn+0xbe/0x560
>>>>  process_one_work+0x764/0x1290
>>>>  ? _raw_read_unlock_irqrestore+0x30/0x30
>>>>  worker_thread+0x598/0x12f0
>>>>  ? __kthread_parkme+0xc6/0x1b0
>>>>  ? schedule+0xed/0x2c0
>>>>  ? process_one_work+0x1290/0x1290
>>>>  kthread+0x36b/0x440
>>>>  ? kthread_create_worker_on_cpu+0xa0/0xa0
>>>>  ret_from_fork+0x22/0x30
>>>> ==================================================================
>>>>
>>>> When the device is already gone we end up with the following scenario:
>>>> The device's capacity is 0 and thus the number of zones will be 0 as well. When
>>>> allocating the bitmap for the conventional zones, we then trip over a NULL
>>>> pointer.
>>>>
>>>> So if we encounter a zoned block device with a 0 capacity, don't dare to
>>>> revalidate the zones sizes.
>>>>
>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>> ---
>>>>
>>>> Note: This is a hot-fix for 5.8, we're working on something to make a
>>>> recoverable error recoverable.
>>>>
>>>>
>>>>  block/blk-zoned.c | 3 +++
>>>>  1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>>> index 23831fa8701d..480dfff69a00 100644
>>>> --- a/block/blk-zoned.c
>>>> +++ b/block/blk-zoned.c
>>>> @@ -497,6 +497,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>>>>  	if (WARN_ON_ONCE(!queue_is_mq(q)))
>>>>  		return -EIO;
>>>>  
>>>> +	if (!get_capacity(disk))
>>>> +		return -EIO;
>>>> +
>>>>  	/*
>>>>  	 * Ensure that all memory allocations in this context are done as if
>>>>  	 * GFP_NOIO was specified.
>>>>
>>>
>>> I reworked sd_zbc_read_zones() and sd_zbc_revalidate_zones() to allow recovering
>>> from simple temporary errors and avoid this problem. Will send the patch
>>> tomorrow or so after some more testing.
>>>
>>> But even with that patch applied, I think this patch makes the generic block
>>> code more solid. So:
>>>
>>> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>
>>>
>>
>> Jens any chance we can still get this into 5.8?
> 
> By the way, this needs a "fixes" tag too. And probably cc stable for 5.7.
> Looking at 5.4 LTS, the bug is not present since there is a test on !nr_zones
> and the entire revalidation is different anyway (callback was introduced in 5.5
> if I remember correctly).
> 
> 

The callback got introduced with commit d41003513e61 ("block: rework zone reporting") in
v5.5-rc1 but looking at blk_revalidate_zone_cb() at this commit I think passing in a 
gendisk with 0 capacity won't do much harm.

The correct fixes tag will be:

Fixes: 6c6b35491422 ("block: set the zone size in blk_revalidate_disk_zones atomically")

Starting with this commit we're calculating the number of zones based on the disk's capacity
unconditional of what the capacity could be.

But that commit landed in v5.5 as well.

  reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200730112517.12816-1-johannes.thumshirn@wdc.com>
2020-07-30 12:32 ` Damien Le Moal
2020-07-31  7:43   ` Johannes Thumshirn
2020-07-31  8:01     ` Damien Le Moal
2020-07-31  8:41       ` Johannes Thumshirn [this message]
2020-07-31 22:33     ` Jens Axboe
2020-08-03  8:49       ` Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN4PR0401MB359839E29558531A690AD3A19B4E0@SN4PR0401MB3598.namprd04.prod.outlook.com \
    --to=johannes.thumshirn@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-SCSI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-scsi/0 linux-scsi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-scsi linux-scsi/ https://lore.kernel.org/linux-scsi \
		linux-scsi@vger.kernel.org
	public-inbox-index linux-scsi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-scsi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git