All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "Javier González" <javier@javigon.com>
Cc: Keith Busch <kbusch@kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>, "sagi@grimberg.me" <sagi@grimberg.me>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	SelvaKumar S <selvakuma.s1@samsung.com>,
	Kanchan Joshi <joshi.k@samsung.com>,
	Nitesh Shetty <nj.shetty@samsung.com>
Subject: Re: [PATCH 6/6] nvme: Add consistency check for zone count
Date: Fri, 26 Jun 2020 07:42:04 +0000	[thread overview]
Message-ID: <CY4PR04MB3751D51C40A39FF0ACC79DE0E7930@CY4PR04MB3751.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20200626072900.rjigm3wiya4sdufv@mpHalley.localdomain

On 2020/06/26 16:29, Javier González wrote:
> On 26.06.2020 07:09, Damien Le Moal wrote:
>> On 2020/06/26 15:55, Javier González wrote:
>>> On 26.06.2020 06:49, Damien Le Moal wrote:
>>>> On 2020/06/26 15:13, Javier González wrote:
>>>>> On 26.06.2020 00:04, Damien Le Moal wrote:
>>>>>> On 2020/06/26 6:49, Keith Busch wrote:
>>>>>>> On Thu, Jun 25, 2020 at 02:21:52PM +0200, Javier González wrote:
>>>>>>>>  drivers/nvme/host/zns.c | 7 +++++++
>>>>>>>>  1 file changed, 7 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
>>>>>>>> index 7d8381fe7665..de806788a184 100644
>>>>>>>> --- a/drivers/nvme/host/zns.c
>>>>>>>> +++ b/drivers/nvme/host/zns.c
>>>>>>>> @@ -234,6 +234,13 @@ static int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
>>>>>>>>  		sector += ns->zsze * nz;
>>>>>>>>  	}
>>>>>>>>
>>>>>>>> +	if (nr_zones < 0 && zone_idx != ns->nr_zones) {
>>>>>>>> +		dev_err(ns->ctrl->device, "inconsistent zone count %u/%u\n",
>>>>>>>> +				zone_idx, ns->nr_zones);
>>>>>>>> +		ret = -EINVAL;
>>>>>>>> +		goto out_free;
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>>  	ret = zone_idx;
>>>>>>>
>>>>>>> nr_zones is unsigned, so it's never < 0.
>>>>>>>
>>>>>>> The API we're providing doesn't require zone_idx equal the namespace's
>>>>>>> nr_zones at the end, though. A subset of the total number of zones can
>>>>>>> be requested here.
>>>>>>>
>>>>>
>>>>> I did see nr_zones coming with -1; guess it is my compiler.
>>>>
>>>> See include/linux/blkdev.h. -1 is:
>>>>
>>>> #define BLK_ALL_ZONES  ((unsigned int)-1)
>>>>
>>>> Which is documented in block/blk-zoned.c:
>>>>
>>>> /**
>>>> * blkdev_report_zones - Get zones information
>>>> * @bdev:       Target block device
>>>> * @sector:     Sector from which to report zones
>>>> * @nr_zones:   Maximum number of zones to report
>>>> * @cb:         Callback function called for each reported zone
>>>> * @data:       Private data for the callback
>>>> *
>>>> * Description:
>>>> *    Get zone information starting from the zone containing @sector for at most
>>>> *    @nr_zones, and call @cb for each zone reported by the device.
>>>> *    To report all zones in a device starting from @sector, the BLK_ALL_ZONES
>>>> *    constant can be passed to @nr_zones.
>>>> *    Returns the number of zones reported by the device, or a negative errno
>>>> *    value in case of failure.
>>>> *
>>>> *    Note: The caller must use memalloc_noXX_save/restore() calls to control
>>>> *    memory allocations done within this function.
>>>> */
>>>> int blkdev_report_zones(struct block_device *bdev, sector_t sector,
>>>>                        unsigned int nr_zones, report_zones_cb cb, void *data)
>>>>
>>>>>
>>>>>>
>>>>>> Yes, absolutely. zone_idx is not an absolute zone number. It is the index of the
>>>>>> reported zone descriptor in the current report range requested by the user,
>>>>>> which is not necessarily for the entire drive (i.e., provided nr zones is less
>>>>>> than the total number of zones of the disk and/or start sector is > 0). So
>>>>>> zone_idx indicates the actual number of zones reported, it is not the total
>>>>>
>>>>> I see. As I can see, when nr_zones comes undefined I believed we could
>>>>> assume that zone_idx is absolute, but I can be wrong.
>>>>
>>>> No. zone_idx is *always* the index of the zone in the current report. Whatever
>>>> that report is, regardless of the report starting point and number of zones
>>>> requested. E.g. For a single zone report (nr_zones = 1), you will always see
>>>> zone_idx = 0. For a full report, zone_idx will correspond to the zone number.
>>>> This is used for example in blk_revalidate_disk_zones() to initialize the zone
>>>> bitmaps.
>>>>
>>>>> Does it make sense to support this check with an additional counter and
>>>>> a explicit nr_zones initialization when undefined or you
>>>>> prefer to just remove it as Matias suggested?
>>>>
>>>> The check is not needed at all.
>>>>
>>>> If the device is buggy and reports more zones than the device capacity or any
>>>> other bugs, the driver can catch that when it processes the report.
>>>> blk_revalidate_disk_zones() also has many checks.
>>>
>>> I have managed to create a QEMU ZNS device that gave me a headache with
>>> a little bit of extra capacity that triggered an additional zone report.
>>> This was the motivation for the patch.
>>
>> The device emulation sound buggy... If the capacity is wrong, then the report
>> will be too since zones are all supposed to be sequential (no holes between
>> zones) and up to the disk capacity only (last zone start + len = capacity + 1)
>>
>> If one or the other is wrong, this should be easy to detect. Normally,
>> blk_revalidate_disk_zones() should be able to catch that.
> 
> We have the capability to select the reported device capacity manually
> for a number of reasons. One of the different test configurations in our
> CI did go through.

If you change the drive capacity on the fly (e.g. with a low level format ?),
you must revalidate the disk/drive to get the changed capacity. A lot of things
will break otherwise This is not just report zones that will be incorrect.

> 
> But it is OK, I will remove the check on V2.
> 
> Javier
> 


-- 
Damien Le Moal
Western Digital Research

WARNING: multiple messages have this Message-ID (diff)
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "Javier González" <javier@javigon.com>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
	SelvaKumar S <selvakuma.s1@samsung.com>,
	"sagi@grimberg.me" <sagi@grimberg.me>,
	Kanchan Joshi <joshi.k@samsung.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Nitesh Shetty <nj.shetty@samsung.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Keith Busch <kbusch@kernel.org>, "hch@lst.de" <hch@lst.de>
Subject: Re: [PATCH 6/6] nvme: Add consistency check for zone count
Date: Fri, 26 Jun 2020 07:42:04 +0000	[thread overview]
Message-ID: <CY4PR04MB3751D51C40A39FF0ACC79DE0E7930@CY4PR04MB3751.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20200626072900.rjigm3wiya4sdufv@mpHalley.localdomain

On 2020/06/26 16:29, Javier González wrote:
> On 26.06.2020 07:09, Damien Le Moal wrote:
>> On 2020/06/26 15:55, Javier González wrote:
>>> On 26.06.2020 06:49, Damien Le Moal wrote:
>>>> On 2020/06/26 15:13, Javier González wrote:
>>>>> On 26.06.2020 00:04, Damien Le Moal wrote:
>>>>>> On 2020/06/26 6:49, Keith Busch wrote:
>>>>>>> On Thu, Jun 25, 2020 at 02:21:52PM +0200, Javier González wrote:
>>>>>>>>  drivers/nvme/host/zns.c | 7 +++++++
>>>>>>>>  1 file changed, 7 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
>>>>>>>> index 7d8381fe7665..de806788a184 100644
>>>>>>>> --- a/drivers/nvme/host/zns.c
>>>>>>>> +++ b/drivers/nvme/host/zns.c
>>>>>>>> @@ -234,6 +234,13 @@ static int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
>>>>>>>>  		sector += ns->zsze * nz;
>>>>>>>>  	}
>>>>>>>>
>>>>>>>> +	if (nr_zones < 0 && zone_idx != ns->nr_zones) {
>>>>>>>> +		dev_err(ns->ctrl->device, "inconsistent zone count %u/%u\n",
>>>>>>>> +				zone_idx, ns->nr_zones);
>>>>>>>> +		ret = -EINVAL;
>>>>>>>> +		goto out_free;
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>>  	ret = zone_idx;
>>>>>>>
>>>>>>> nr_zones is unsigned, so it's never < 0.
>>>>>>>
>>>>>>> The API we're providing doesn't require zone_idx equal the namespace's
>>>>>>> nr_zones at the end, though. A subset of the total number of zones can
>>>>>>> be requested here.
>>>>>>>
>>>>>
>>>>> I did see nr_zones coming with -1; guess it is my compiler.
>>>>
>>>> See include/linux/blkdev.h. -1 is:
>>>>
>>>> #define BLK_ALL_ZONES  ((unsigned int)-1)
>>>>
>>>> Which is documented in block/blk-zoned.c:
>>>>
>>>> /**
>>>> * blkdev_report_zones - Get zones information
>>>> * @bdev:       Target block device
>>>> * @sector:     Sector from which to report zones
>>>> * @nr_zones:   Maximum number of zones to report
>>>> * @cb:         Callback function called for each reported zone
>>>> * @data:       Private data for the callback
>>>> *
>>>> * Description:
>>>> *    Get zone information starting from the zone containing @sector for at most
>>>> *    @nr_zones, and call @cb for each zone reported by the device.
>>>> *    To report all zones in a device starting from @sector, the BLK_ALL_ZONES
>>>> *    constant can be passed to @nr_zones.
>>>> *    Returns the number of zones reported by the device, or a negative errno
>>>> *    value in case of failure.
>>>> *
>>>> *    Note: The caller must use memalloc_noXX_save/restore() calls to control
>>>> *    memory allocations done within this function.
>>>> */
>>>> int blkdev_report_zones(struct block_device *bdev, sector_t sector,
>>>>                        unsigned int nr_zones, report_zones_cb cb, void *data)
>>>>
>>>>>
>>>>>>
>>>>>> Yes, absolutely. zone_idx is not an absolute zone number. It is the index of the
>>>>>> reported zone descriptor in the current report range requested by the user,
>>>>>> which is not necessarily for the entire drive (i.e., provided nr zones is less
>>>>>> than the total number of zones of the disk and/or start sector is > 0). So
>>>>>> zone_idx indicates the actual number of zones reported, it is not the total
>>>>>
>>>>> I see. As I can see, when nr_zones comes undefined I believed we could
>>>>> assume that zone_idx is absolute, but I can be wrong.
>>>>
>>>> No. zone_idx is *always* the index of the zone in the current report. Whatever
>>>> that report is, regardless of the report starting point and number of zones
>>>> requested. E.g. For a single zone report (nr_zones = 1), you will always see
>>>> zone_idx = 0. For a full report, zone_idx will correspond to the zone number.
>>>> This is used for example in blk_revalidate_disk_zones() to initialize the zone
>>>> bitmaps.
>>>>
>>>>> Does it make sense to support this check with an additional counter and
>>>>> a explicit nr_zones initialization when undefined or you
>>>>> prefer to just remove it as Matias suggested?
>>>>
>>>> The check is not needed at all.
>>>>
>>>> If the device is buggy and reports more zones than the device capacity or any
>>>> other bugs, the driver can catch that when it processes the report.
>>>> blk_revalidate_disk_zones() also has many checks.
>>>
>>> I have managed to create a QEMU ZNS device that gave me a headache with
>>> a little bit of extra capacity that triggered an additional zone report.
>>> This was the motivation for the patch.
>>
>> The device emulation sound buggy... If the capacity is wrong, then the report
>> will be too since zones are all supposed to be sequential (no holes between
>> zones) and up to the disk capacity only (last zone start + len = capacity + 1)
>>
>> If one or the other is wrong, this should be easy to detect. Normally,
>> blk_revalidate_disk_zones() should be able to catch that.
> 
> We have the capability to select the reported device capacity manually
> for a number of reasons. One of the different test configurations in our
> CI did go through.

If you change the drive capacity on the fly (e.g. with a low level format ?),
you must revalidate the disk/drive to get the changed capacity. A lot of things
will break otherwise This is not just report zones that will be incorrect.

> 
> But it is OK, I will remove the check on V2.
> 
> Javier
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-06-26  7:42 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-25 12:21 [PATCH 0/6] ZNS: Extra features for current patches Javier González
2020-06-25 12:21 ` Javier González
2020-06-25 12:21 ` [PATCH 1/6] block: introduce IOCTL for zone mgmt Javier González
2020-06-25 12:21   ` Javier González
2020-06-26  1:17   ` Damien Le Moal
2020-06-26  1:17     ` Damien Le Moal
2020-06-26  6:01     ` Javier González
2020-06-26  6:01       ` Javier González
2020-06-26  6:37       ` Damien Le Moal
2020-06-26  6:37         ` Damien Le Moal
2020-06-26  6:51         ` Javier González
2020-06-26  6:51           ` Javier González
2020-06-26  7:03           ` Damien Le Moal
2020-06-26  7:03             ` Damien Le Moal
2020-06-26  7:08             ` Javier González
2020-06-26  7:08               ` Javier González
2020-06-25 12:21 ` [PATCH 2/6] block: add support for selecting all zones Javier González
2020-06-25 12:21   ` Javier González
2020-06-26  1:27   ` Damien Le Moal
2020-06-26  1:27     ` Damien Le Moal
2020-06-26  5:58     ` Javier González
2020-06-26  5:58       ` Javier González
2020-06-26  6:35       ` Damien Le Moal
2020-06-26  6:35         ` Damien Le Moal
2020-06-26  6:52         ` Javier González
2020-06-26  6:52           ` Javier González
2020-06-26  7:06           ` Damien Le Moal
2020-06-26  7:06             ` Damien Le Moal
2020-06-25 12:21 ` [PATCH 3/6] block: add support for zone offline transition Javier González
2020-06-25 12:21   ` Javier González
2020-06-25 14:12   ` Matias Bjørling
2020-06-25 14:12     ` Matias Bjørling
2020-06-25 19:48     ` Javier González
2020-06-25 19:48       ` Javier González
2020-06-26  1:14       ` Damien Le Moal
2020-06-26  1:14         ` Damien Le Moal
2020-06-26  6:18         ` Javier González
2020-06-26  6:18           ` Javier González
2020-06-26  9:11         ` hch
2020-06-26  9:11           ` hch
2020-06-26  9:15           ` Damien Le Moal
2020-06-26  9:15             ` Damien Le Moal
2020-06-26  9:17             ` hch
2020-06-26  9:17               ` hch
2020-06-26 10:02               ` Javier González
2020-06-26 10:02                 ` Javier González
2020-06-26  9:07     ` Christoph Hellwig
2020-06-26  9:07       ` Christoph Hellwig
2020-06-26  1:34   ` Damien Le Moal
2020-06-26  1:34     ` Damien Le Moal
2020-06-26  6:08     ` Javier González
2020-06-26  6:08       ` Javier González
2020-06-26  6:42       ` Damien Le Moal
2020-06-26  6:42         ` Damien Le Moal
2020-06-26  6:58         ` Javier González
2020-06-26  6:58           ` Javier González
2020-06-26  7:17           ` Damien Le Moal
2020-06-26  7:17             ` Damien Le Moal
2020-06-26  7:26             ` Javier González
2020-06-26  7:26               ` Javier González
2020-06-25 12:21 ` [PATCH 4/6] block: introduce IOCTL to report dev properties Javier González
2020-06-25 12:21   ` Javier González
2020-06-25 13:10   ` Matias Bjørling
2020-06-25 13:10     ` Matias Bjørling
2020-06-25 19:42     ` Javier González
2020-06-25 19:42       ` Javier González
2020-06-25 19:58       ` Matias Bjørling
2020-06-25 19:58         ` Matias Bjørling
2020-06-26  6:24         ` Javier González
2020-06-26  6:24           ` Javier González
2020-06-25 20:25       ` Keith Busch
2020-06-25 20:25         ` Keith Busch
2020-06-26  6:28         ` Javier González
2020-06-26  6:28           ` Javier González
2020-06-26 15:52           ` Keith Busch
2020-06-26 15:52             ` Keith Busch
2020-06-26 16:25             ` Javier González
2020-06-26 16:25               ` Javier González
2020-06-26  0:57       ` Damien Le Moal
2020-06-26  0:57         ` Damien Le Moal
2020-06-26  6:27         ` Javier González
2020-06-26  6:27           ` Javier González
2020-06-26  1:38   ` Damien Le Moal
2020-06-26  1:38     ` Damien Le Moal
2020-06-26  6:22     ` Javier González
2020-06-26  6:22       ` Javier González
2020-06-25 12:21 ` [PATCH 5/6] block: add zone attr. to zone mgmt IOCTL struct Javier González
2020-06-25 12:21   ` Javier González
2020-06-25 15:13   ` Matias Bjørling
2020-06-25 15:13     ` Matias Bjørling
2020-06-25 19:51     ` Javier González
2020-06-25 19:51       ` Javier González
2020-06-26  1:45   ` Damien Le Moal
2020-06-26  1:45     ` Damien Le Moal
2020-06-26  6:03     ` Javier González
2020-06-26  6:03       ` Javier González
2020-06-26  6:38       ` Damien Le Moal
2020-06-26  6:38         ` Damien Le Moal
2020-06-26  6:49         ` Javier González
2020-06-26  6:49           ` Javier González
2020-06-26  9:14   ` Christoph Hellwig
2020-06-26  9:14     ` Christoph Hellwig
2020-06-26 10:01     ` Javier González
2020-06-26 10:01       ` Javier González
2020-06-25 12:21 ` [PATCH 6/6] nvme: Add consistency check for zone count Javier González
2020-06-25 12:21   ` Javier González
2020-06-25 13:16   ` Matias Bjørling
2020-06-25 13:16     ` Matias Bjørling
2020-06-25 19:45     ` Javier González
2020-06-25 19:45       ` Javier González
2020-06-25 21:49   ` Keith Busch
2020-06-25 21:49     ` Keith Busch
2020-06-26  0:04     ` Damien Le Moal
2020-06-26  0:04       ` Damien Le Moal
2020-06-26  6:13       ` Javier González
2020-06-26  6:13         ` Javier González
2020-06-26  6:49         ` Damien Le Moal
2020-06-26  6:49           ` Damien Le Moal
2020-06-26  6:55           ` Javier González
2020-06-26  6:55             ` Javier González
2020-06-26  7:09             ` Damien Le Moal
2020-06-26  7:09               ` Damien Le Moal
2020-06-26  7:29               ` Javier González
2020-06-26  7:29                 ` Javier González
2020-06-26  7:42                 ` Damien Le Moal [this message]
2020-06-26  7:42                   ` Damien Le Moal
2020-06-26  9:16   ` Christoph Hellwig
2020-06-26  9:16     ` Christoph Hellwig
2020-06-26 10:03     ` Javier González
2020-06-26 10:03       ` Javier González
2020-06-25 13:04 ` [PATCH 0/6] ZNS: Extra features for current patches Matias Bjørling
2020-06-25 13:04   ` Matias Bjørling
2020-06-25 14:48   ` Matias Bjørling
2020-06-25 14:48     ` Matias Bjørling
2020-06-25 19:39     ` Javier González
2020-06-25 19:39       ` Javier González
2020-06-25 19:53       ` Matias Bjørling
2020-06-25 19:53         ` Matias Bjørling
2020-06-26  6:26         ` Javier González
2020-06-26  6:26           ` Javier González

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR04MB3751D51C40A39FF0ACC79DE0E7930@CY4PR04MB3751.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=javier@javigon.com \
    --cc=joshi.k@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nj.shetty@samsung.com \
    --cc=sagi@grimberg.me \
    --cc=selvakuma.s1@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.