linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "Javier González" <javier@javigon.com>
Cc: "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>,
	"kbusch@kernel.org" <kbusch@kernel.org>,
	"sagi@grimberg.me" <sagi@grimberg.me>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"mb@lightnvm.io" <mb@lightnvm.io>,
	SelvaKumar S <selvakuma.s1@samsung.com>,
	Kanchan Joshi <joshi.k@samsung.com>,
	Nitesh Shetty <nj.shetty@samsung.com>
Subject: Re: [PATCH 1/4] block: Add zone flags to queue zone prop.
Date: Fri, 3 Jul 2020 08:35:48 +0000	[thread overview]
Message-ID: <CY4PR04MB3751C8F2A0ECAFD045B30AA7E76A0@CY4PR04MB3751.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20200703062815.qfo7tvonmgqqfcha@mpHalley.local

On 2020/07/03 15:28, Javier González wrote:
> On 03.07.2020 05:20, Damien Le Moal wrote:
>> On 2020/07/02 19:27, Javier González wrote:
>>> On 02.07.2020 08:49, Damien Le Moal wrote:
>>>> On 2020/07/02 17:34, Javier González wrote:
>>>>> On 02.07.2020 07:54, Damien Le Moal wrote:
>>>>>> On 2020/07/02 15:55, Javier González wrote:
>>>>>>> From: Javier González <javier.gonz@samsung.com>
>>>>>>>
>>>>>>> As the zoned block device will have to deal with features that are
>>>>>>> optional for the backend device, add a flag field to inform the block
>>>>>>> layer about supported features. This builds on top of
>>>>>>> blk_zone_report_flags and extendes to the zone report code.
>>>>>>>
>>>>>>> Signed-off-by: Javier González <javier.gonz@samsung.com>
>>>>>>> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
>>>>>>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
>>>>>>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
>>>>>>> ---
>>>>>>>  block/blk-zoned.c              | 3 ++-
>>>>>>>  drivers/block/null_blk_zoned.c | 2 ++
>>>>>>>  drivers/nvme/host/zns.c        | 1 +
>>>>>>>  drivers/scsi/sd.c              | 2 ++
>>>>>>>  include/linux/blkdev.h         | 3 +++
>>>>>>>  5 files changed, 10 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>>>>>> index 81152a260354..0f156e96e48f 100644
>>>>>>> --- a/block/blk-zoned.c
>>>>>>> +++ b/block/blk-zoned.c
>>>>>>> @@ -312,7 +312,8 @@ int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode,
>>>>>>>  		return ret;
>>>>>>>
>>>>>>>  	rep.nr_zones = ret;
>>>>>>> -	rep.flags = BLK_ZONE_REP_CAPACITY;
>>>>>>> +	rep.flags = q->zone_flags;
>>>>>>
>>>>>> zone_flags seem to be a fairly generic flags field while rep.flags is only about
>>>>>> the reported descriptors structure. So you may want to define a
>>>>>> BLK_ZONE_REP_FLAGS_MASK and have:
>>>>>>
>>>>>> 	rep.flags = q->zone_flags & BLK_ZONE_REP_FLAGS_MASK;
>>>>>>
>>>>>> to avoid flags meaningless for the user being set.
>>>>>>
>>>>>> In any case, since *all* zoned block devices now report capacity, I do not
>>>>>> really see the point to add BLK_ZONE_REP_FLAGS_MASK to q->zone_flags, especially
>>>>>> considering that you set the flag for all zoned device types, including scsi
>>>>>> which does not have zone capacity. This makes q->zone_flags rather confusing
>>>>>> instead of clearly defining the device features as you mentioned in the commit
>>>>>> message.
>>>>>>
>>>>>> I think it may be better to just drop this patch, and if needed, introduce the
>>>>>> zone_flags field where it may be needed (e.g. OFFLINE zone ioctl support).
>>>>>
>>>>> I am using this as a way to pass the OFFLINE support flag to the block
>>>>> layer. I used this too for the attributes. Are you thinking of a
>>>>> different way to send this?
>>>>>
>>>>> I believe this fits too for capacity, but we can just pass it in
>>>>> all report as before if you prefer.
>>>>
>>>> The point is that this patch as is does nothing and is needed as a preparatory
>>>> patch if we want to have the offline flag set in the report. But:
>>>> 1) As commented in the offline ioctl patch, I am not sure the flag makes a lot
>>>> of sense. sysfs or nothing at all may be OK as well. When we introduced the new
>>>> open/close/finish ioctls, we did not add flags to signal that the device
>>>> supports them. Granted, it was for nullblk and scsi, and both had the support.
>>>> But running an application using these on an old kernel, and you will get
>>>> -ENOTTY, meaning, not supported. So simply introducing the offline ioctl without
>>>> any flag would be OK I think.
>>>
>>> I see. My understanding after some comments from Christoph was that we
>>> should use these bits to signal any optional features / capabilities
>>> that would depend on the underlying driver, just as it is done with the
>>> capacity flag today.
>>>
>>> If not for the offline transition, for the attributes, I see it exactly
>>> as the same use case as capacity, where we signal that a new field is
>>> reported in the report structure.
>>>
>>> Am I missing something here?
>>
>> Using the report flags for reporting supported operations/features is fine, but
>> as already mentioned, then document that by changing the description of enum
>> blk_zone_report_flags. Right now, it still says:
>>
>> * enum blk_zone_report_flags - Feature flags of reported zone descriptors.
>>
>> Which kind of really point to things the zone descriptor itself has compared to
>> earlier versions of the descriptor rather than what the device can do or not.
> 
> I see how the offline transition collides with this model. But can we
> agree that the zone attributes is something that fits here?

Flags may be fine. I still do not see it though due to your patches missing
device mapper support. Please send patches that cover all zoned block device
types, including device mapper to show that this is a good approach.

>> And as I argued already, using a flag is fine only for letting a user know about
>> a supported feature, but that is far from ideal to have device-mapper easily
>> discover what a target can support or not. For that, stacked queue limits are
>> much simpler. Which leads to exporting that limit in sysfs rather than using a
>> flag for the user interface.
> 
> See below
> 
>>
>>>> Since DM support for this would be nice too and we now are in a situation where
>>>> not all devices support the  ioctl, instead of a report flag (a little out of
>>>> place), a queue limit exported through sysfs may be a cleaner way to both
>>>> support DM correctly (stacked limits) and signal the support to the user. If you
>>>> choose this approach, then this patch is not needed. The zoned_flags, or a
>>>> regular queue flag like for RESET_ALL can be added in the offline ioctl patch.
>>>
>>> I see. If we can reach an agreement that the above is a bad
>>> understanding on my side, I will be happy to translate this into a sysfs
>>> entry. If it is OK, I'll give it this week in the mailing list and send
>>> a V4 next week.
>>
>> It is all about device mapper support... How are you planning to do it using a
>> queue flag without said flags being stackable as queue limits ?
> 
> I guess what I am struggle with here is that you see the capacity
> feature as a different extension than the attributes and the offline
> transition.
> 
> The way I see it, there are things the device supports and things that
> the driver supports. ZAC/ZBC does not support a size != capacity, but
> the driver sets these values to be the same. One thing I struggle with
> is that saying that we support capacity and setting size = capacity puts
> the burden in the user to check these values across all zones to
> determine if they can support the driver or not. I think it would be
> clear to have a feature explicitly stating that capacity != size.

From the user (application or in-kernel) perspective, *all* zoned device now
have a zone capacity. I do not think it matters if that comes from the device or
from the driver. It is unconditionally supported. The report flag
BLK_ZONE_REP_CAPACITY exist to signal to user applications that the zone
descriptor has the capactiy field, to allow for backward compatibility with
older kernels. See patches for fio or util-linux blkzone posted to see its use.

Having a flag that explicitly states that zone capacity != zone size would be
useful only and only if *all* zones of the device have that property. What if
only some zones have a capacity lower than the zone size ? The user would still
need a zone report to sort things out.

> For the attributes, this is very similar - some apply, some don't, and
> we only report the ones that make sense for each driver. I do not see
> how device mappers are treated differently here than in the existing
> capacity features.

The capacity feature is supported by *all* zoned block device. Support coming
from the device or from the driver does not matter. The API is consistent for
all device types. So in my opinion, there is no need to consider zone capacity
as an attribute. It is supported by everything, it is now part of the API,
signaled with BLK_ZONE_REP_CAPACITY for backward compatibility handling.

You are mixing up the report flags describing the structure of the zone
descriptor API with optional device features attributes. Adding an attribute
field to the report zones descriptors is fine. The presence of that field can be
signaled with a BLK_ZONE_REP_ATTR report flag as you already proposed. No problem.

But then, imagine a device mapper target using multiple devices with different
optional features. How do you propose that the device mapper target sorts this
out and come up with a coherent set of attributes for its target report zones ?
Please propose a solution for that. Your patches so far are all missing that.

> For the offline transition, I see that the current patches do not set
> flags properly and we would need to inherit the underlying device if we
> want to do it this way. I can understand from your comments that this is
> not a good way of doing it. I see in one of your comments from V1 that
> we could deal with this transition in the scsi driver and keep it in
> memory (i.e., device is still read-only, but driver transitions to
> offline) - can you comment on this?

That would not be persistent across reboot, unlike for ZNS drive. So forget
about this. That was wishful thinking from me. This will not work.

Explicit zone offlining is truly an optional feature of the device. Signaling
its support to the user is necessary, but again, I would like that to be done in
a way that does not force device mapper targets to forever report nothing for
the optional features. A flag field can certainly be used, but it has to  be
handled in blk_queue_stack_limits(). That may be as simple as taking a bitwise
AND of each device zone attribute flags.


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2020-07-03  8:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-02  6:54 [V2 PATCH 0/4] ZNS: Extra features for current patche Javier González
2020-07-02  6:54 ` [PATCH 1/4] block: Add zone flags to queue zone prop Javier González
2020-07-02  7:54   ` Damien Le Moal
2020-07-02  8:34     ` Javier González
2020-07-02  8:49       ` Damien Le Moal
2020-07-02 10:27         ` Javier González
2020-07-03  5:20           ` Damien Le Moal
2020-07-03  6:28             ` Javier González
2020-07-03  8:35               ` Damien Le Moal [this message]
2020-07-02  6:54 ` [PATCH 2/4] block: add support for zone offline transition Javier González
2020-07-02  8:10   ` Damien Le Moal
2020-07-02  8:39     ` Javier González
2020-07-02  8:52       ` Damien Le Moal
2020-07-02  6:54 ` [PATCH 3/4] nvme: Add consistency check for zone count Javier González
2020-07-02  8:16   ` Damien Le Moal
2020-07-02  8:19   ` Johannes Thumshirn
2020-07-02  8:27     ` Javier González
2020-07-02  6:54 ` [PATCH 4/4] block: add attributes to zone report Javier González
2020-07-02  8:27   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR04MB3751C8F2A0ECAFD045B30AA7E76A0@CY4PR04MB3751.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=javier@javigon.com \
    --cc=joshi.k@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mb@lightnvm.io \
    --cc=nj.shetty@samsung.com \
    --cc=sagi@grimberg.me \
    --cc=selvakuma.s1@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).