All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Matias Bjørling" <m@bjorling.me>
To: Damien Le Moal <damien.lemoal@wdc.com>,
	Slava Dubeyko <Vyacheslav.Dubeyko@wdc.com>,
	Viacheslav Dubeyko <slava@dubeyko.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os
Date: Wed, 4 Jan 2017 13:39:18 +0100	[thread overview]
Message-ID: <2dc2b0eb-7083-733e-61a2-e9d0ca2cfda9@bjorling.me> (raw)
In-Reply-To: <a5acdf89-99c0-774d-cb08-76cb5d846f29@wdc.com>

On 01/04/2017 08:24 AM, Damien Le Moal wrote:
> 
> Slava,
> 
> On 1/4/17 11:59, Slava Dubeyko wrote:
>> What's the goal of SMR compatibility? Any unification or interface
>> abstraction has the goal to hide the peculiarities of underlying
>> hardware. But we have block device abstraction that hides all 
>> hardware's peculiarities perfectly. Also FTL (or any other
>> Translation Layer) is able to represent the device as sequence of
>> physical sectors without real knowledge on software side about 
>> sophisticated management activity on the device side. And, finally,
>> guys will be completely happy to use the regular file systems (ext4,
>> xfs) without necessity to modify software stack. But I believe that 
>> the goal of Open-channel SSD approach is completely opposite. Namely,
>> provide the opportunity for software side (file system, for example)
>> to manage the Open-channel SSD device with smarter policy.
> 
> The Zoned Block Device API is part of the block layer. So as such, it
> does abstract many aspects of the device characteristics, as so many
> other API of the block layer do (look at blkdev_issue_discard or zeroout
> implementations to see how far this can be pushed).
> 
> Regarding the use of open channel SSDs, I think you are absolutely
> correct: (1) some users may be very happy to use a regular, unmodified
> ext4 or xfs on top of an open channel SSD, as long as the FTL
> implementation does a complete abstraction of the device special
> features and presents a regular block device to upper layers. And
> conversely, (2) some file system implementations may prefer to directly
> use those special features and characteristics of open channel SSDs. No
> arguing with this.
> 
> But you are missing the parallel with SMR. For SMR, or more correctly
> zoned block devices since the ZBC or ZAC standards can equally apply to
> HDDs and SSDs, 3 models exists: drive-managed, host-aware and host-managed.
> 
> Case (1) above corresponds *exactly* to the drive managed model, with
> the difference that the abstraction of the device characteristics (SMR
> here) is in the drive FW and not in a host-level FTL implementation as
> it would be for open channel SSDs. Case (2) above corresponds to the
> host-managed model, that is, the device user has to deal with the device
> characteristics itself and use it correctly. The host-aware model lies
> in between these 2 extremes: it offers the possibility of complete
> abstraction by default, but also allows a user to optimize its operation
> for the device by allowing access to the device characteristics. So this
> would correspond to a possible third way of implementing an FTL for open
> channel SSDs.
> 
>> So, my key worry that the trying to hide under the same interface the
>> two different technologies (SMR and NAND flash) will be resulted in
>> the loss of opportunity to manage the device in more smarter way.
>> Because any unification has the goal to create a simple interface.
>> But SMR and NAND flash are significantly different technologies. And
>> if somebody creates technology-oriented file system, for example,
>> then it needs to have access to really special features of the
>> technology. Otherwise, interface will be overloaded by features of
>> both technologies and it will looks like as a mess.
> 
> I do not think so, as long as the device "model" is exposed to the user
> as the zoned block device interface does. This allows a user to adjust
> its operation depending on the device. This is true of course as long as
> each "model" has a clearly defined set of features associated. Again,
> that is the case for zoned block devices and an example of how this can
> be used is now in f2fs (which allows different operation modes for
> host-aware devices, but only one for host-managed devices). Again, I can
> see a clear parallel with open channel SSDs here.
> 
>> SMR zone and NAND flash erase block look comparable but, finally, it
>> significantly different stuff. Usually, SMR zone has 265 MB in size
>> but NAND flash erase block can vary from 512 KB to 8 MB (it will be
>> slightly larger in the future but not more than 32 MB, I suppose). It
>> is possible to group several erase blocks into aggregated entity but
>> it could be not very good policy from file system point of view.
> 
> Why not? For f2fs, the 2MB segments are grouped together into sections
> with a size matching the device zone size. That works well and can
> actually even reduce the garbage collection overhead in some cases.
> Nothing in the kernel zoned block device support limits the zone size to
> a particular minimum or maximum. The only direct implication of the zone
> size on the block I/O stack is that BIOs and requests cannot cross zone
> boundaries. In an extreme setup, a zone size of 4KB would work too and
> result in read/write commands of 4KB at most to the device.
> 
>> Another point that QLC device could have more tricky features of
>> erase blocks management. Also we should apply erase operation on NAND
>> flash erase block but it is not mandatory for the case of SMR zone.
> 
> Incorrect: host-managed devices require a zone "reset" (equivalent to
> discard/trim) to be reused after being written once. So again, the
> "tricky features" you mention will depend on the device "model",
> whatever this ends up to be for an open channel SSD.
> 
>> Because SMR zone could be simply re-written in sequential order if
>> all zone's data is invalid, for example. Also conventional zone could
>> be really tricky point. Because it is one zone only for the whole
>> device that could be updated in-place. Raw NAND flash, usually,
>> hasn't likewise conventional zone.
> 
> Conventional zones are optional in zoned block devices. There may be
> none at all and an implementation may well decide to not support a
> device without any conventional zones if some are required.
> In the case of open channel SSDs, the FTL implementation may well decide
> to expose a particular range of LBAs as "conventional zones" and have a
> lower level exposure for the remaining capacity whcih can then be
> optimally used by the file system based on the features available for
> that remaining LBA range. Again, a parallel is possible with SMR.
> 
>> Finally, if I really like to develop SMR- or NAND flash oriented file
>> system then I would like to play with peculiarities of concrete
>> technologies. And any unified interface will destroy the opportunity 
>> to create the really efficient solution. Finally, if my software
>> solution is unable to provide some fancy and efficient features then
>> guys will prefer to use the regular stack (ext4, xfs + block layer).
> 
> Not necessarily. Again think in terms of device "model" and associated
> feature set. An FS implementation may decide to support all possible
> models, with likely a resulting incredible complexity. More likely,
> similarly with what is happening with SMR, only models that make sense
> will be supported by FS implementation that can be easily modified.
> Example again here of f2fs: changes to support SMR were rather simple,
> whereas the initial effort to support SMR with ext4 was pretty much
> abandoned as it was too complex to integrate in the existing code while
> keeping the existing on-disk format.
> 
> Your argument above is actually making the same point: you want your
> implementation to use the device features directly. That is, your
> implementation wants a "host-managed" like device model. Using ext4 will
> require a "host-aware" or "drive-managed" model, which could be provided
> through a different FTL or device-mapper implementation in the case of
> open channel SSDs.
> 
> I am not trying to argue that open channel SSDs and zoned block devices
> should be supported under the exact same API. But I can definitely see
> clear parallels worth a discussion. As a first step, I would suggest
> trying to try defining open channel SSDs "models" and their feature set
> and see how these fit with the existing ZBC/ZAC defined models and at
> least estimate the implications on the block I/O stack. If adding the
> new models only results in the addition of a few top level functions or
> ioctls, it may be entirely feasible to integrate the two together.
> 

Thanks Damien. I couldn't have said it better my self.

The OCSSD 1.3 specification has been made with an eye towards the SMR
interface:

 - "Identification" - Follows the same "global" size definitions, and
also supports that each zone has its own local size.
 - "Get Report" command follows a very similar structure as SMR, such
that it can sit behind the "Report Zones" interface.
 - "Erase/Prepare Block" command follows the Reset block interface.

Those should fit right in. If the layout is planar, such that the OCSSD
only exposes a set of zones, it should be able to fit right into the
framework with minor modifications.

A couple of details are added when going towards managing multiple
parallel units, which is some of the things that require a bit of
discussion.

WARNING: multiple messages have this Message-ID (diff)
From: m@bjorling.me (Matias Bjørling)
Subject: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os
Date: Wed, 4 Jan 2017 13:39:18 +0100	[thread overview]
Message-ID: <2dc2b0eb-7083-733e-61a2-e9d0ca2cfda9@bjorling.me> (raw)
In-Reply-To: <a5acdf89-99c0-774d-cb08-76cb5d846f29@wdc.com>

On 01/04/2017 08:24 AM, Damien Le Moal wrote:
> 
> Slava,
> 
> On 1/4/17 11:59, Slava Dubeyko wrote:
>> What's the goal of SMR compatibility? Any unification or interface
>> abstraction has the goal to hide the peculiarities of underlying
>> hardware. But we have block device abstraction that hides all 
>> hardware's peculiarities perfectly. Also FTL (or any other
>> Translation Layer) is able to represent the device as sequence of
>> physical sectors without real knowledge on software side about 
>> sophisticated management activity on the device side. And, finally,
>> guys will be completely happy to use the regular file systems (ext4,
>> xfs) without necessity to modify software stack. But I believe that 
>> the goal of Open-channel SSD approach is completely opposite. Namely,
>> provide the opportunity for software side (file system, for example)
>> to manage the Open-channel SSD device with smarter policy.
> 
> The Zoned Block Device API is part of the block layer. So as such, it
> does abstract many aspects of the device characteristics, as so many
> other API of the block layer do (look at blkdev_issue_discard or zeroout
> implementations to see how far this can be pushed).
> 
> Regarding the use of open channel SSDs, I think you are absolutely
> correct: (1) some users may be very happy to use a regular, unmodified
> ext4 or xfs on top of an open channel SSD, as long as the FTL
> implementation does a complete abstraction of the device special
> features and presents a regular block device to upper layers. And
> conversely, (2) some file system implementations may prefer to directly
> use those special features and characteristics of open channel SSDs. No
> arguing with this.
> 
> But you are missing the parallel with SMR. For SMR, or more correctly
> zoned block devices since the ZBC or ZAC standards can equally apply to
> HDDs and SSDs, 3 models exists: drive-managed, host-aware and host-managed.
> 
> Case (1) above corresponds *exactly* to the drive managed model, with
> the difference that the abstraction of the device characteristics (SMR
> here) is in the drive FW and not in a host-level FTL implementation as
> it would be for open channel SSDs. Case (2) above corresponds to the
> host-managed model, that is, the device user has to deal with the device
> characteristics itself and use it correctly. The host-aware model lies
> in between these 2 extremes: it offers the possibility of complete
> abstraction by default, but also allows a user to optimize its operation
> for the device by allowing access to the device characteristics. So this
> would correspond to a possible third way of implementing an FTL for open
> channel SSDs.
> 
>> So, my key worry that the trying to hide under the same interface the
>> two different technologies (SMR and NAND flash) will be resulted in
>> the loss of opportunity to manage the device in more smarter way.
>> Because any unification has the goal to create a simple interface.
>> But SMR and NAND flash are significantly different technologies. And
>> if somebody creates technology-oriented file system, for example,
>> then it needs to have access to really special features of the
>> technology. Otherwise, interface will be overloaded by features of
>> both technologies and it will looks like as a mess.
> 
> I do not think so, as long as the device "model" is exposed to the user
> as the zoned block device interface does. This allows a user to adjust
> its operation depending on the device. This is true of course as long as
> each "model" has a clearly defined set of features associated. Again,
> that is the case for zoned block devices and an example of how this can
> be used is now in f2fs (which allows different operation modes for
> host-aware devices, but only one for host-managed devices). Again, I can
> see a clear parallel with open channel SSDs here.
> 
>> SMR zone and NAND flash erase block look comparable but, finally, it
>> significantly different stuff. Usually, SMR zone has 265 MB in size
>> but NAND flash erase block can vary from 512 KB to 8 MB (it will be
>> slightly larger in the future but not more than 32 MB, I suppose). It
>> is possible to group several erase blocks into aggregated entity but
>> it could be not very good policy from file system point of view.
> 
> Why not? For f2fs, the 2MB segments are grouped together into sections
> with a size matching the device zone size. That works well and can
> actually even reduce the garbage collection overhead in some cases.
> Nothing in the kernel zoned block device support limits the zone size to
> a particular minimum or maximum. The only direct implication of the zone
> size on the block I/O stack is that BIOs and requests cannot cross zone
> boundaries. In an extreme setup, a zone size of 4KB would work too and
> result in read/write commands of 4KB at most to the device.
> 
>> Another point that QLC device could have more tricky features of
>> erase blocks management. Also we should apply erase operation on NAND
>> flash erase block but it is not mandatory for the case of SMR zone.
> 
> Incorrect: host-managed devices require a zone "reset" (equivalent to
> discard/trim) to be reused after being written once. So again, the
> "tricky features" you mention will depend on the device "model",
> whatever this ends up to be for an open channel SSD.
> 
>> Because SMR zone could be simply re-written in sequential order if
>> all zone's data is invalid, for example. Also conventional zone could
>> be really tricky point. Because it is one zone only for the whole
>> device that could be updated in-place. Raw NAND flash, usually,
>> hasn't likewise conventional zone.
> 
> Conventional zones are optional in zoned block devices. There may be
> none at all and an implementation may well decide to not support a
> device without any conventional zones if some are required.
> In the case of open channel SSDs, the FTL implementation may well decide
> to expose a particular range of LBAs as "conventional zones" and have a
> lower level exposure for the remaining capacity whcih can then be
> optimally used by the file system based on the features available for
> that remaining LBA range. Again, a parallel is possible with SMR.
> 
>> Finally, if I really like to develop SMR- or NAND flash oriented file
>> system then I would like to play with peculiarities of concrete
>> technologies. And any unified interface will destroy the opportunity 
>> to create the really efficient solution. Finally, if my software
>> solution is unable to provide some fancy and efficient features then
>> guys will prefer to use the regular stack (ext4, xfs + block layer).
> 
> Not necessarily. Again think in terms of device "model" and associated
> feature set. An FS implementation may decide to support all possible
> models, with likely a resulting incredible complexity. More likely,
> similarly with what is happening with SMR, only models that make sense
> will be supported by FS implementation that can be easily modified.
> Example again here of f2fs: changes to support SMR were rather simple,
> whereas the initial effort to support SMR with ext4 was pretty much
> abandoned as it was too complex to integrate in the existing code while
> keeping the existing on-disk format.
> 
> Your argument above is actually making the same point: you want your
> implementation to use the device features directly. That is, your
> implementation wants a "host-managed" like device model. Using ext4 will
> require a "host-aware" or "drive-managed" model, which could be provided
> through a different FTL or device-mapper implementation in the case of
> open channel SSDs.
> 
> I am not trying to argue that open channel SSDs and zoned block devices
> should be supported under the exact same API. But I can definitely see
> clear parallels worth a discussion. As a first step, I would suggest
> trying to try defining open channel SSDs "models" and their feature set
> and see how these fit with the existing ZBC/ZAC defined models and at
> least estimate the implications on the block I/O stack. If adding the
> new models only results in the addition of a few top level functions or
> ioctls, it may be entirely feasible to integrate the two together.
> 

Thanks Damien. I couldn't have said it better my self.

The OCSSD 1.3 specification has been made with an eye towards the SMR
interface:

 - "Identification" - Follows the same "global" size definitions, and
also supports that each zone has its own local size.
 - "Get Report" command follows a very similar structure as SMR, such
that it can sit behind the "Report Zones" interface.
 - "Erase/Prepare Block" command follows the Reset block interface.

Those should fit right in. If the layout is planar, such that the OCSSD
only exposes a set of zones, it should be able to fit right into the
framework with minor modifications.

A couple of details are added when going towards managing multiple
parallel units, which is some of the things that require a bit of
discussion.

  reply	other threads:[~2017-01-04 12:39 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-02 21:06 [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os Matias Bjørling
2017-01-02 21:06 ` Matias Bjørling
2017-01-02 21:06 ` Matias Bjørling
2017-01-02 23:12 ` Viacheslav Dubeyko
2017-01-02 23:12   ` Viacheslav Dubeyko
2017-01-02 23:12   ` Viacheslav Dubeyko
2017-01-03  8:56   ` Matias Bjørling
2017-01-03  8:56     ` Matias Bjørling
2017-01-03 17:35     ` Viacheslav Dubeyko
2017-01-03 17:35       ` Viacheslav Dubeyko
2017-01-03 17:35       ` Viacheslav Dubeyko
2017-01-03 19:10       ` Matias Bjørling
2017-01-03 19:10         ` Matias Bjørling
2017-01-04  2:59         ` Slava Dubeyko
2017-01-04  2:59           ` Slava Dubeyko
2017-01-04  2:59           ` Slava Dubeyko
2017-01-04  7:24           ` Damien Le Moal
2017-01-04  7:24             ` Damien Le Moal
2017-01-04 12:39             ` Matias Bjørling [this message]
2017-01-04 12:39               ` Matias Bjørling
2017-01-04 16:57             ` Theodore Ts'o
2017-01-04 16:57               ` Theodore Ts'o
2017-01-10  1:42               ` Damien Le Moal
2017-01-10  1:42                 ` Damien Le Moal
2017-01-10  4:24                 ` Theodore Ts'o
2017-01-10  4:24                   ` Theodore Ts'o
2017-01-10 13:06                   ` Matias Bjorling
2017-01-10 13:06                     ` Matias Bjorling
2017-01-11  4:07                     ` Damien Le Moal
2017-01-11  4:07                       ` Damien Le Moal
2017-01-11  6:06                       ` Matias Bjorling
2017-01-11  6:06                         ` Matias Bjorling
2017-01-11  7:49                       ` Hannes Reinecke
2017-01-11  7:49                         ` Hannes Reinecke
2017-01-05 22:58             ` Slava Dubeyko
2017-01-05 22:58               ` Slava Dubeyko
2017-01-05 22:58               ` Slava Dubeyko
2017-01-06  1:11               ` Theodore Ts'o
2017-01-06  1:11                 ` Theodore Ts'o
2017-01-06 12:51                 ` Matias Bjørling
2017-01-06 12:51                   ` Matias Bjørling
2017-01-06 12:51                   ` Matias Bjørling
2017-01-09  6:49                 ` Slava Dubeyko
2017-01-09  6:49                   ` Slava Dubeyko
2017-01-09  6:49                   ` Slava Dubeyko
2017-01-09 14:55                   ` Theodore Ts'o
2017-01-09 14:55                     ` Theodore Ts'o
2017-01-09 14:55                     ` Theodore Ts'o
2017-01-06 13:05               ` Matias Bjørling
2017-01-06 13:05                 ` Matias Bjørling
2017-01-06 13:05                 ` Matias Bjørling
2017-01-06  1:09             ` Jaegeuk Kim
2017-01-06  1:09               ` Jaegeuk Kim
2017-01-06 12:55               ` Matias Bjørling
2017-01-06 12:55                 ` Matias Bjørling
2017-01-06 12:55                 ` Matias Bjørling
2017-01-12  1:33 ` [LSF/MM " Damien Le Moal
2017-01-12  2:18   ` [Lsf-pc] " James Bottomley
2017-01-12  2:18     ` James Bottomley
2017-01-12  2:35     ` Damien Le Moal
2017-01-12  2:35       ` Damien Le Moal
2017-01-12  2:38       ` James Bottomley
2017-01-12  2:38         ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2dc2b0eb-7083-733e-61a2-e9d0ca2cfda9@bjorling.me \
    --to=m@bjorling.me \
    --cc=Vyacheslav.Dubeyko@wdc.com \
    --cc=damien.lemoal@wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=slava@dubeyko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.