All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
@ 2023-03-01 23:34 Khazhy Kumykov
  2023-03-01 23:50 ` Bart Van Assche
  2023-03-02  0:02 ` Damien Le Moal
  0 siblings, 2 replies; 9+ messages in thread
From: Khazhy Kumykov @ 2023-03-01 23:34 UTC (permalink / raw)
  To: lsf-pc, linux-block, linux-scsi

HSMR HDDs are a type of SMR HDD that allow for a dynamic mixture of
CMR and SMR zones, allowing users to convert regions of the disk
between the two. The way this is implemented as specified by the SCSI
ZAC-2 specification is there’s a set of “CMR” regions and “SMR”
regions. These may be grouped into “realms” that may, as a group, be
online or offline. Zone management can bring online a domain/zone and
offline any corresponding domains/zones.

I’d like to discuss what path makes sense for supporting these
devices, and also how to avoid potential issues specific to the “mixed
CMR & SMR IO traffic” use case - particularly around latency due to
potentially unneeded (from the perspective of an application) zone
management commands.

Points of Discussion
====

 - There’s already support in the kernel for marking zones
online/offline and cmr/smr, but this is fixed, not dynamic. Would
there be hiccups with allowing zones to come online/offline while
running?

 - There may be multiple CMR “zones” that are contiguous in LBA space.
A benefit of HSMR disks is, to a certain extent, software which is
designed for all-CMR disks can work similarly on a contiguous CMR area
of the HSMR disk (modulo handling “resizes”). This may result in IO
that can straddle two CMR “zones”. It’s not a problem for writes to
span CMR zones, but it is for SMR zones, so this distinction is useful
to have in the block layer.

 - What makes sense as an interface for managing these types of
not-quite CMR and not quite SMR disks? Some of the featureset overlaps
with existing SMR support in blkdev_zone_mgmt_ioctl, so perhaps the
additional conversion commands could be added there?

 - mitigating & limiting tail latency effects due to report zones
commands / limiting “unnecessary” zone management calls.

Thanks,
Khazhy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
  2023-03-01 23:34 [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms Khazhy Kumykov
@ 2023-03-01 23:50 ` Bart Van Assche
  2023-03-02  0:06   ` Damien Le Moal
  2023-03-02  0:02 ` Damien Le Moal
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2023-03-01 23:50 UTC (permalink / raw)
  To: Khazhy Kumykov, lsf-pc, linux-block, linux-scsi

On 3/1/23 15:34, Khazhy Kumykov wrote:
>   - There’s already support in the kernel for marking zones
> online/offline and cmr/smr, but this is fixed, not dynamic. Would
> there be hiccups with allowing zones to come online/offline while
> running?

It may be easier to convince HDD vendors to modify their firmware such 
that the conventional and SMR zones are reported to the Linux kernel as 
different logical units rather than adding domains & realms support in 
the Linux kernel. If anyone else has another opinion, feel free to share 
that opinion.

Whether or not others agree with my opinion, I think this is a good 
topic for LSF/MM.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
  2023-03-01 23:34 [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms Khazhy Kumykov
  2023-03-01 23:50 ` Bart Van Assche
@ 2023-03-02  0:02 ` Damien Le Moal
  1 sibling, 0 replies; 9+ messages in thread
From: Damien Le Moal @ 2023-03-02  0:02 UTC (permalink / raw)
  To: Khazhy Kumykov, lsf-pc, linux-block, linux-scsi

On 3/2/23 08:34, Khazhy Kumykov wrote:
> HSMR HDDs are a type of SMR HDD that allow for a dynamic mixture of
> CMR and SMR zones, allowing users to convert regions of the disk
> between the two. The way this is implemented as specified by the SCSI
> ZAC-2 specification is there’s a set of “CMR” regions and “SMR”
> regions. These may be grouped into “realms” that may, as a group, be
> online or offline. Zone management can bring online a domain/zone and
> offline any corresponding domains/zones.
> 
> I’d like to discuss what path makes sense for supporting these
> devices, and also how to avoid potential issues specific to the “mixed
> CMR & SMR IO traffic” use case - particularly around latency due to
> potentially unneeded (from the perspective of an application) zone
> management commands.

Hard no on supporting these. See below.

> 
> Points of Discussion
> ====
> 
>  - There’s already support in the kernel for marking zones
> online/offline and cmr/smr, but this is fixed, not dynamic. Would
> there be hiccups with allowing zones to come online/offline while
> running?

No, there is no support for "marking" zones offline (or read only): transitions
into these states are not explicit due to any command execution, but determined
by the drive, and asynchronous as far as the host is concerned. There is support
for *detecting* offline zones though, so that FSes do not attempt to use these
dead zones. But that is more part of error processing than the regular IO path
because seeing offline zones is not expected, but rather, the result of a drive
going bad. HMSMR would essentially allow users to explicitly offline zones,
wreaking the IO path and potentially generating lots of IO errors.

So HSMR support should only be allowed (if it ever is) to be controlled by a
file system, not by the user. And if the user wants to do raw block device IOs,
then it can use passthrough commands to control the activation state of zones.

>  - There may be multiple CMR “zones” that are contiguous in LBA space.
> A benefit of HSMR disks is, to a certain extent, software which is
> designed for all-CMR disks can work similarly on a contiguous CMR area
> of the HSMR disk (modulo handling “resizes”). This may result in IO
> that can straddle two CMR “zones”. It’s not a problem for writes to
> span CMR zones, but it is for SMR zones, so this distinction is useful
> to have in the block layer.

Writes to CMR zones on regular host-managed SMR can straddle CMR zone boundaries
too (but not CMR-to-SMR boundary). We do not allow it because micro optimizing
for this case is not worth the overhead it introduces. So hard no on this.

>  - What makes sense as an interface for managing these types of
> not-quite CMR and not quite SMR disks? Some of the featureset overlaps
> with existing SMR support in blkdev_zone_mgmt_ioctl, so perhaps the
> additional conversion commands could be added there?

Passthrough commands. There are no kernel internal users of this, so I do not
see any need to add an interface for activate/deactivate zones. libzbc v6 is
coming soon with an API for zone domains/zone realms commands (already available
with the zone-domains branch of the source code).

>  - mitigating & limiting tail latency effects due to report zones
> commands / limiting “unnecessary” zone management calls.

There is no implicit zone management commands issued by the kernel, except the
one report zones done on disk scan/revalidate. Any zone management command is
explicit, asked for by the user or FS using the drive. So that is up to the user
to limit these to control the overhead.

In general, support for hybrid SMR (Zone domains / zone realms) is a hard no
from me. This feature set is a total nightmare to deal with in the kernel. It
opens a ton of corner cases that will require lots of checks in the hot path. We
definitely do not want that.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
  2023-03-01 23:50 ` Bart Van Assche
@ 2023-03-02  0:06   ` Damien Le Moal
  2023-03-02  0:51     ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2023-03-02  0:06 UTC (permalink / raw)
  To: Bart Van Assche, Khazhy Kumykov, lsf-pc, linux-block, linux-scsi

On 3/2/23 08:50, Bart Van Assche wrote:
> On 3/1/23 15:34, Khazhy Kumykov wrote:
>>   - There’s already support in the kernel for marking zones
>> online/offline and cmr/smr, but this is fixed, not dynamic. Would
>> there be hiccups with allowing zones to come online/offline while
>> running?
> 
> It may be easier to convince HDD vendors to modify their firmware such 
> that the conventional and SMR zones are reported to the Linux kernel as 
> different logical units rather than adding domains & realms support in 
> the Linux kernel. If anyone else has another opinion, feel free to share 
> that opinion.

That would not resolve the fact that each unit would still potentially have a
mix of active and inactive areas. Total nightmare to deal with unless a zone API
is also exposed for any user to figure out which zone is active.
That means that we would need to always expose these drives as zoned, using a
very weird zone model as zone domains/zone realms do not fit at all with the
current host-managed model. Lots of places need changes to handle these drives.
This will make things very messy all over.

> 
> Whether or not others agree with my opinion, I think this is a good 
> topic for LSF/MM.
> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
  2023-03-02  0:06   ` Damien Le Moal
@ 2023-03-02  0:51     ` Bart Van Assche
  2023-03-02  2:03       ` Damien Le Moal
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2023-03-02  0:51 UTC (permalink / raw)
  To: Damien Le Moal, Khazhy Kumykov, lsf-pc, linux-block, linux-scsi

On 3/1/23 16:06, Damien Le Moal wrote:
> On 3/2/23 08:50, Bart Van Assche wrote:
>> On 3/1/23 15:34, Khazhy Kumykov wrote:
>>>    - There’s already support in the kernel for marking zones
>>> online/offline and cmr/smr, but this is fixed, not dynamic. Would
>>> there be hiccups with allowing zones to come online/offline while
>>> running?
>>
>> It may be easier to convince HDD vendors to modify their firmware such
>> that the conventional and SMR zones are reported to the Linux kernel as
>> different logical units rather than adding domains & realms support in
>> the Linux kernel. If anyone else has another opinion, feel free to share
>> that opinion.
> 
> That would not resolve the fact that each unit would still potentially have a
> mix of active and inactive areas. Total nightmare to deal with unless a zone API
> is also exposed for any user to figure out which zone is active.
> That means that we would need to always expose these drives as zoned, using a
> very weird zone model as zone domains/zone realms do not fit at all with the
> current host-managed model. Lots of places need changes to handle these drives.
> This will make things very messy all over.

Do users need all the features that are supported by the domains & 
realms model? If not: what I had in mind is to let the HDD expose two 
logical units to the operating system that each have a contiguous range 
of active zones and hence not to support inactive zones.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
  2023-03-02  0:51     ` Bart Van Assche
@ 2023-03-02  2:03       ` Damien Le Moal
  2023-03-02 18:26         ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2023-03-02  2:03 UTC (permalink / raw)
  To: Bart Van Assche, Khazhy Kumykov, lsf-pc, linux-block, linux-scsi

On 3/2/23 09:51, Bart Van Assche wrote:
> On 3/1/23 16:06, Damien Le Moal wrote:
>> On 3/2/23 08:50, Bart Van Assche wrote:
>>> On 3/1/23 15:34, Khazhy Kumykov wrote:
>>>>    - There’s already support in the kernel for marking zones
>>>> online/offline and cmr/smr, but this is fixed, not dynamic. Would
>>>> there be hiccups with allowing zones to come online/offline while
>>>> running?
>>>
>>> It may be easier to convince HDD vendors to modify their firmware such
>>> that the conventional and SMR zones are reported to the Linux kernel as
>>> different logical units rather than adding domains & realms support in
>>> the Linux kernel. If anyone else has another opinion, feel free to share
>>> that opinion.
>>
>> That would not resolve the fact that each unit would still potentially have a
>> mix of active and inactive areas. Total nightmare to deal with unless a zone API
>> is also exposed for any user to figure out which zone is active.
>> That means that we would need to always expose these drives as zoned, using a
>> very weird zone model as zone domains/zone realms do not fit at all with the
>> current host-managed model. Lots of places need changes to handle these drives.
>> This will make things very messy all over.
> 
> Do users need all the features that are supported by the domains & 
> realms model? If not: what I had in mind is to let the HDD expose two 
> logical units to the operating system that each have a contiguous range 
> of active zones and hence not to support inactive zones.

But that is the issue: zones in the middle of each domain can be
activated/deactivated dynamically using zone activate command. So there is
always the possibility of ending up with a swiss cheese lun, full of hole of
unusable LBAs because the other domains (other LUN) activated some zones which
deactivate the equivalent zone(s) in the other domain.

With your idea, the 2 luns would not be independent as they both would be using
LBAs are mapped against a single set of physical blocks. Zone activate command
allows controlling which domains has the mapping active. So activating a zone in
one domains results in the zone[s] using the same mapping in the other domain to
be deactivated.

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
  2023-03-02  2:03       ` Damien Le Moal
@ 2023-03-02 18:26         ` Bart Van Assche
  2023-03-03  6:57           ` Damien Le Moal
  2023-03-03 11:57           ` Hannes Reinecke
  0 siblings, 2 replies; 9+ messages in thread
From: Bart Van Assche @ 2023-03-02 18:26 UTC (permalink / raw)
  To: Damien Le Moal, Khazhy Kumykov, lsf-pc, linux-block, linux-scsi

On 3/1/23 18:03, Damien Le Moal wrote:
> But that is the issue: zones in the middle of each domain can be
> activated/deactivated dynamically using zone activate command. So there is
> always the possibility of ending up with a swiss cheese lun, full of hole of
> unusable LBAs because the other domains (other LUN) activated some zones which
> deactivate the equivalent zone(s) in the other domain.
> 
> With your idea, the 2 luns would not be independent as they both would be using
> LBAs are mapped against a single set of physical blocks. Zone activate command
> allows controlling which domains has the mapping active. So activating a zone in
> one domains results in the zone[s] using the same mapping in the other domain to
> be deactivated.

Hi Damien,

Your reply made me realize that I should have provided more information. 
What I'm proposing is the following:
* Do not use any of the domains & realms features from ZBC-2.
* Do not make any zones visible to the host before configuration of the 
logical units has finished. Only make the logical units visible to the 
host after configuration of the logical units has finished. Do not 
support reconfiguration of the logical units while these are in use by 
the host.
* Only support active zones. Do not support inactive zones.
* Introduce a new mechanism for configuring the logical units.

This is not a new idea. The approach described above is already 
supported since considerable time by UFS devices. The provisioning 
mechanism supported by UFS devices is defined in the UFS standard and is 
not based on SCSI commands.

Bart.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
  2023-03-02 18:26         ` Bart Van Assche
@ 2023-03-03  6:57           ` Damien Le Moal
  2023-03-03 11:57           ` Hannes Reinecke
  1 sibling, 0 replies; 9+ messages in thread
From: Damien Le Moal @ 2023-03-03  6:57 UTC (permalink / raw)
  To: Bart Van Assche, Khazhy Kumykov, lsf-pc, linux-block, linux-scsi

On 3/3/23 03:26, Bart Van Assche wrote:
> On 3/1/23 18:03, Damien Le Moal wrote:
>> But that is the issue: zones in the middle of each domain can be
>> activated/deactivated dynamically using zone activate command. So there is
>> always the possibility of ending up with a swiss cheese lun, full of hole of
>> unusable LBAs because the other domains (other LUN) activated some zones which
>> deactivate the equivalent zone(s) in the other domain.
>>
>> With your idea, the 2 luns would not be independent as they both would be using
>> LBAs are mapped against a single set of physical blocks. Zone activate command
>> allows controlling which domains has the mapping active. So activating a zone in
>> one domains results in the zone[s] using the same mapping in the other domain to
>> be deactivated.
> 
> Hi Damien,
> 
> Your reply made me realize that I should have provided more information. 
> What I'm proposing is the following:
> * Do not use any of the domains & realms features from ZBC-2.
> * Do not make any zones visible to the host before configuration of the 
> logical units has finished. Only make the logical units visible to the 
> host after configuration of the logical units has finished. Do not 
> support reconfiguration of the logical units while these are in use by 
> the host.
> * Only support active zones. Do not support inactive zones.
> * Introduce a new mechanism for configuring the logical units.

That is not how the zone domains/zone realms feature is defined. Matching this
would require specifications changes. But an even bigger problem is that this
would not work for ATA drives (ZAC-2) as the concept of LUNs does not exist.

> 
> This is not a new idea. The approach described above is already 
> supported since considerable time by UFS devices. The provisioning 
> mechanism supported by UFS devices is defined in the UFS standard and is 
> not based on SCSI commands.
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms
  2023-03-02 18:26         ` Bart Van Assche
  2023-03-03  6:57           ` Damien Le Moal
@ 2023-03-03 11:57           ` Hannes Reinecke
  1 sibling, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2023-03-03 11:57 UTC (permalink / raw)
  To: Bart Van Assche, Damien Le Moal, Khazhy Kumykov, lsf-pc,
	linux-block, linux-scsi

On 3/2/23 19:26, Bart Van Assche wrote:
> On 3/1/23 18:03, Damien Le Moal wrote:
>> But that is the issue: zones in the middle of each domain can be
>> activated/deactivated dynamically using zone activate command. So 
>> there is
>> always the possibility of ending up with a swiss cheese lun, full of 
>> hole of
>> unusable LBAs because the other domains (other LUN) activated some 
>> zones which
>> deactivate the equivalent zone(s) in the other domain.
>>
>> With your idea, the 2 luns would not be independent as they both would 
>> be using
>> LBAs are mapped against a single set of physical blocks. Zone activate 
>> command
>> allows controlling which domains has the mapping active. So activating 
>> a zone in
>> one domains results in the zone[s] using the same mapping in the other 
>> domain to
>> be deactivated.
> 
> Hi Damien,
> 
> Your reply made me realize that I should have provided more information. 
> What I'm proposing is the following:
> * Do not use any of the domains & realms features from ZBC-2.
> * Do not make any zones visible to the host before configuration of the 
> logical units has finished. Only make the logical units visible to the 
> host after configuration of the logical units has finished. Do not 
> support reconfiguration of the logical units while these are in use by 
> the host.
> * Only support active zones. Do not support inactive zones.
> * Introduce a new mechanism for configuring the logical units.
> 
> This is not a new idea. The approach described above is already 
> supported since considerable time by UFS devices. The provisioning 
> mechanism supported by UFS devices is defined in the UFS standard and is 
> not based on SCSI commands.
> 
That really cries out for a device-mapper target.
Providing several LUNs only make sense if the hardware supports it; 
we've learned that lesson when developing support for Multi-actuator 
HDDs. If you want to have several logical disks without hardware support 
for it device-mapper is the way to go.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-03-03 11:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-01 23:34 [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms Khazhy Kumykov
2023-03-01 23:50 ` Bart Van Assche
2023-03-02  0:06   ` Damien Le Moal
2023-03-02  0:51     ` Bart Van Assche
2023-03-02  2:03       ` Damien Le Moal
2023-03-02 18:26         ` Bart Van Assche
2023-03-03  6:57           ` Damien Le Moal
2023-03-03 11:57           ` Hannes Reinecke
2023-03-02  0:02 ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.