From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "Javier González" <javier@javigon.com>
Cc: "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"hch@lst.de" <hch@lst.de>,
"kbusch@kernel.org" <kbusch@kernel.org>,
"sagi@grimberg.me" <sagi@grimberg.me>,
"axboe@kernel.dk" <axboe@kernel.dk>,
SelvaKumar S <selvakuma.s1@samsung.com>,
Kanchan Joshi <joshi.k@samsung.com>,
Nitesh Shetty <nj.shetty@samsung.com>
Subject: Re: [PATCH 2/6] block: add support for selecting all zones
Date: Fri, 26 Jun 2020 07:06:34 +0000 [thread overview]
Message-ID: <CY4PR04MB3751CD082BC0F77A5284EC34E7930@CY4PR04MB3751.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20200626065241.u4fd3m7624kdsonw@mpHalley.localdomain
On 2020/06/26 15:52, Javier González wrote:
> On 26.06.2020 06:35, Damien Le Moal wrote:
>> On 2020/06/26 14:58, Javier González wrote:
>>> On 26.06.2020 01:27, Damien Le Moal wrote:
>>>> On 2020/06/25 21:22, Javier González wrote:
>>>>> From: Javier González <javier.gonz@samsung.com>
>>>>>
>>>>> Add flag to allow selecting all zones on a single zone management
>>>>> operation
>>>>>
>>>>> Signed-off-by: Javier González <javier.gonz@samsung.com>
>>>>> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
>>>>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
>>>>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
>>>>> ---
>>>>> block/blk-zoned.c | 3 +++
>>>>> include/linux/blk_types.h | 3 ++-
>>>>> include/uapi/linux/blkzoned.h | 9 +++++++++
>>>>> 3 files changed, 14 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>>>> index e87c60004dc5..29194388a1bb 100644
>>>>> --- a/block/blk-zoned.c
>>>>> +++ b/block/blk-zoned.c
>>>>> @@ -420,6 +420,9 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
>>>>> return -ENOTTY;
>>>>> }
>>>>>
>>>>> + if (zmgmt.flags & BLK_ZONE_SELECT_ALL)
>>>>> + op |= REQ_ZONE_ALL;
>>>>> +
>>>>> return blkdev_zone_mgmt(bdev, op, zmgmt.sector, zmgmt.nr_sectors,
>>>>> GFP_KERNEL);
>>>>> }
>>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>>>>> index ccb895f911b1..16b57fb2b99c 100644
>>>>> --- a/include/linux/blk_types.h
>>>>> +++ b/include/linux/blk_types.h
>>>>> @@ -351,6 +351,7 @@ enum req_flag_bits {
>>>>> * work item to avoid such priority inversions.
>>>>> */
>>>>> __REQ_CGROUP_PUNT,
>>>>> + __REQ_ZONE_ALL, /* apply zone operation to all zones */
>>>>>
>>>>> /* command specific flags for REQ_OP_WRITE_ZEROES: */
>>>>> __REQ_NOUNMAP, /* do not free blocks when zeroing */
>>>>> @@ -378,7 +379,7 @@ enum req_flag_bits {
>>>>> #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND)
>>>>> #define REQ_NOWAIT (1ULL << __REQ_NOWAIT)
>>>>> #define REQ_CGROUP_PUNT (1ULL << __REQ_CGROUP_PUNT)
>>>>> -
>>>>> +#define REQ_ZONE_ALL (1ULL << __REQ_ZONE_ALL)
>>>>> #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP)
>>>>> #define REQ_HIPRI (1ULL << __REQ_HIPRI)
>>>>>
>>>>> diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
>>>>> index 07b5fde21d9f..a8c89fe58f97 100644
>>>>> --- a/include/uapi/linux/blkzoned.h
>>>>> +++ b/include/uapi/linux/blkzoned.h
>>>>> @@ -157,6 +157,15 @@ enum blk_zone_action {
>>>>> BLK_ZONE_MGMT_RESET = 0x4,
>>>>> };
>>>>>
>>>>> +/**
>>>>> + * enum blk_zone_mgmt_flags - Flags for blk_zone_mgmt
>>>>> + *
>>>>> + * BLK_ZONE_SELECT_ALL: Select all zones for current zone action
>>>>> + */
>>>>> +enum blk_zone_mgmt_flags {
>>>>> + BLK_ZONE_SELECT_ALL = 1 << 0,
>>>>> +};
>>>>> +
>>>>> /**
>>>>> * struct blk_zone_mgmt - Extended zoned management
>>>>> *
>>>>>
>>>>
>>>> NACK.
>>>>
>>>> Details:
>>>> 1) REQ_OP_ZONE_RESET together with REQ_ZONE_ALL is the same as
>>>> REQ_OP_ZONE_RESET_ALL, isn't it ? You are duplicating a functionality that
>>>> already exists.
>>>> 2) The patch introduces REQ_ZONE_ALL at the block layer only without defining
>>>> how it ties into SCSI and NVMe driver use of it. Is REQ_ZONE_ALL indicating that
>>>> the zone management commands are to be executed with the ALL bit set ? If yes,
>>>> that will break device-mapper. See the special code for handling
>>>> REQ_OP_ZONE_RESET_ALL. That code is in place for a reason: the target block
>>>> device may not be an entire physical device. In that case, applying a zone
>>>> management command to all zones of the physical drive is wrong.
>>>> 3) REQ_ZONE_ALL seems completely equivalent to specifying a sector range of [0
>>>> .. drive capacity]. So what is the point ? The current interface handles that.
>>>> That is how we chose between REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL right now.
>>>> 4) Without any in-kernel user, I do not see the point. And for applications, I
>>>> do not see any good use case for doing open all, close all, offline all or
>>>> finish all. If you have any such good use case, please elaborate.
>>>>
>>>
>>> The main use if reset all, but without having to look through all zones,
>>> as it imposes an overhead when we have a large number of zones. Having
>>> the possibility to offload it to HW is more efficient.
>>>
>>> I had not thought about the device mapper use case. Would it be an
>>> option to translate this into REQ_OP_ZONE_RESET_ALL when we have a
>>> device mapper (or any other case where this might break) and then leave
>>> the bit go to the driver if it applies to the whole device?
>>
>> But REQ_OP_ZONE_RESET_ALL is already implemented and supported and will reset
>> all zones of a drive using a single command if the ioctl is called for the
>> entire sector range of a physical drive. For device mapper with a partial
>> mapping, the per zone reset loop will be used. If you have no other use case for
>> the REQ_ZONE_ALL flag, what is the point here ? Reset is already optimized for
>> the all zones case
>
> OK. I might have missed this. I thought we were sending several commands
> instead of a single reset with the bit. I will check again. Thanks for
> pointing at this.
In block/blk-zoned.c, there is:
static inline bool blkdev_allow_reset_all_zones(struct block_device *bdev,
sector_t sector,
sector_t nr_sectors)
{
if (!blk_queue_zone_resetall(bdev_get_queue(bdev)))
return false;
/*
* REQ_OP_ZONE_RESET_ALL can be executed only if the number of sectors
* of the applicable zone range is the entire disk.
*/
return !sector && nr_sectors == get_capacity(bdev->bd_disk);
}
And:
int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
sector_t sector, sector_t nr_sectors,
gfp_t gfp_mask)
{
...
while (sector < end_sector) {
bio = blk_next_bio(bio, 0, gfp_mask);
bio_set_dev(bio, bdev);
/*
* Special case for the zone reset operation that reset all
* zones, this is useful for applications like mkfs.
*/
if (op == REQ_OP_ZONE_RESET &&
blkdev_allow_reset_all_zones(bdev, sector, nr_sectors)) {
bio->bi_opf = REQ_OP_ZONE_RESET_ALL;
break;
^^^^^ This means one command only...
}
bio->bi_opf = op | REQ_SYNC;
bio->bi_iter.bi_sector = sector;
sector += zone_sectors;
/* This may take a while, so be nice to others */
cond_resched();
}
ret = submit_bio_wait(bio);
bio_put(bio);
return ret;
}
And see scsi/sd_zbc.c and zns.c. REQ_OP_ZONE_RESET_ALL end up setting the ALL
bit for reset command.
>
> Javier
>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2020-06-26 7:06 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-25 12:21 [PATCH 0/6] ZNS: Extra features for current patches Javier González
2020-06-25 12:21 ` [PATCH 1/6] block: introduce IOCTL for zone mgmt Javier González
2020-06-26 1:17 ` Damien Le Moal
2020-06-26 6:01 ` Javier González
2020-06-26 6:37 ` Damien Le Moal
2020-06-26 6:51 ` Javier González
2020-06-26 7:03 ` Damien Le Moal
2020-06-26 7:08 ` Javier González
2020-06-25 12:21 ` [PATCH 2/6] block: add support for selecting all zones Javier González
2020-06-26 1:27 ` Damien Le Moal
2020-06-26 5:58 ` Javier González
2020-06-26 6:35 ` Damien Le Moal
2020-06-26 6:52 ` Javier González
2020-06-26 7:06 ` Damien Le Moal [this message]
2020-06-25 12:21 ` [PATCH 3/6] block: add support for zone offline transition Javier González
2020-06-25 14:12 ` Matias Bjørling
2020-06-25 19:48 ` Javier González
2020-06-26 1:14 ` Damien Le Moal
2020-06-26 6:18 ` Javier González
2020-06-26 9:11 ` hch
2020-06-26 9:15 ` Damien Le Moal
2020-06-26 9:17 ` hch
2020-06-26 10:02 ` Javier González
2020-06-26 9:07 ` Christoph Hellwig
2020-06-26 1:34 ` Damien Le Moal
2020-06-26 6:08 ` Javier González
2020-06-26 6:42 ` Damien Le Moal
2020-06-26 6:58 ` Javier González
2020-06-26 7:17 ` Damien Le Moal
2020-06-26 7:26 ` Javier González
2020-06-25 12:21 ` [PATCH 4/6] block: introduce IOCTL to report dev properties Javier González
2020-06-25 13:10 ` Matias Bjørling
2020-06-25 19:42 ` Javier González
2020-06-25 19:58 ` Matias Bjørling
2020-06-26 6:24 ` Javier González
2020-06-25 20:25 ` Keith Busch
2020-06-26 6:28 ` Javier González
2020-06-26 15:52 ` Keith Busch
2020-06-26 16:25 ` Javier González
2020-06-26 0:57 ` Damien Le Moal
2020-06-26 6:27 ` Javier González
2020-06-26 1:38 ` Damien Le Moal
2020-06-26 6:22 ` Javier González
2020-06-25 12:21 ` [PATCH 5/6] block: add zone attr. to zone mgmt IOCTL struct Javier González
2020-06-25 15:13 ` Matias Bjørling
2020-06-25 19:51 ` Javier González
2020-06-26 1:45 ` Damien Le Moal
2020-06-26 6:03 ` Javier González
2020-06-26 6:38 ` Damien Le Moal
2020-06-26 6:49 ` Javier González
2020-06-26 9:14 ` Christoph Hellwig
2020-06-26 10:01 ` Javier González
2020-06-25 12:21 ` [PATCH 6/6] nvme: Add consistency check for zone count Javier González
2020-06-25 13:16 ` Matias Bjørling
2020-06-25 19:45 ` Javier González
2020-06-25 21:49 ` Keith Busch
2020-06-26 0:04 ` Damien Le Moal
2020-06-26 6:13 ` Javier González
2020-06-26 6:49 ` Damien Le Moal
2020-06-26 6:55 ` Javier González
2020-06-26 7:09 ` Damien Le Moal
2020-06-26 7:29 ` Javier González
2020-06-26 7:42 ` Damien Le Moal
2020-06-26 9:16 ` Christoph Hellwig
2020-06-26 10:03 ` Javier González
2020-06-25 13:04 ` [PATCH 0/6] ZNS: Extra features for current patches Matias Bjørling
2020-06-25 14:48 ` Matias Bjørling
2020-06-25 19:39 ` Javier González
2020-06-25 19:53 ` Matias Bjørling
2020-06-26 6:26 ` Javier González
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CY4PR04MB3751CD082BC0F77A5284EC34E7930@CY4PR04MB3751.namprd04.prod.outlook.com \
--to=damien.lemoal@wdc.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=javier@javigon.com \
--cc=joshi.k@samsung.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=nj.shetty@samsung.com \
--cc=sagi@grimberg.me \
--cc=selvakuma.s1@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).