Re: [LSF/MM/BPF TOPIC] durability vs performance for flash devices (especially embedded!)

From: Ric Wheeler <ricwheeler@gmail.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Matthew Wilcox <willy@infradead.org>
Cc: "lsf-pc@lists.linux-foundation.org" 
	<lsf-pc@lists.linux-foundation.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] durability vs performance for flash devices (especially embedded!)
Date: Wed, 9 Jun 2021 21:11:53 -0400	[thread overview]
Message-ID: <751402df-606d-092d-e845-c423b69e3f84@gmail.com> (raw)
In-Reply-To: <DM6PR04MB7081477ECBE0BB4EC27D2C90E7359@DM6PR04MB7081.namprd04.prod.outlook.com>

On 6/9/21 8:16 PM, Damien Le Moal wrote:
> On 2021/06/10 3:47, Bart Van Assche wrote:
>> On 6/9/21 11:30 AM, Matthew Wilcox wrote:
>>> maybe you should read the paper.
>>>
>>> " Thiscomparison demonstrates that using F2FS, a flash-friendly file
>>> sys-tem, does not mitigate the wear-out problem, except inasmuch asit
>>> inadvertently rate limitsallI/O to the device"
>> It seems like my email was not clear enough? What I tried to make clear
>> is that I think that there is no way to solve the flash wear issue with
>> the traditional block interface. I think that F2FS in combination with
>> the zone interface is an effective solution.
>>
>> What is also relevant in this context is that the "Flash drive lifespan
>> is a problem" paper was published in 2017. I think that the first
>> commercial SSDs with a zone interface became available at a later time
>> (summer of 2020?).
> Yes, zone support in the block layer and f2fs was added with kernel 4.10
> released in Feb 2017. So the authors likely did not consider that as a solution,
> especially considering that at the time, it was all about SMR HDDs only. Now, we
> do have ZNS and things like SD-Express coming which may allow NVMe/ZNS on even
> the cheapest of consumer devices.
>
> That said, I do not think that f2fs is not yet an ideal solution as is since all
> its metadata need update in-place, so are subject to the drive implementation of
> FTL/weir leveling. And the quality of this varies between devices and vendors...
>
> btrfs zone support improves that as even the super blocks are not updated in
> place on zoned devices. Everything is copy-on-write, sequential write into
> zones. While the current block allocator is rather simple for now, it could be
> tweaked to add some weir leveling awareness, eventually (per zone weir leveling
> is something much easier to do inside the drive though, so the host should not
> care).
>
> In the context of zoned storage, the discussion could be around how to best
> support file systems. Do we keep modifying one file system after another to
> support zones, or implement weir leveling ? That is *very* hard to do and
> sometimes not reasonably feasible depending on the FS design.
>
> I do remember Dave Chinner talk back in 2018 LSF/MM (was it ?) where he
> discussed the idea of having block allocation moved out of FSes and turned into
> a kind of library common to many file systems. In the context of consumer flash
> weir leveling, and eventually zones (likely with some remapping needed), this
> may be something interesting to discuss again.
>
Some of the other bits that make this hard in the embedded space include 
layering on top of device mapper - using dm verity for example - and our usual 
problem of having apps that drive too many small IO's down to service sqlite 
transactions.

Looking to get some measurements done to show the write amplification - measure 
the amount of writes done in total by applications - and what that translates 
into for device requests. Anything done for metadata, logging, etc all counts as 
"write amplification" when viewed this way.

Useful to try and figure out what the best case durability of parts would be for 
specific workloads.

Measuring write amplification inside of a device is often possible as well so we 
could end up getting a pretty clear picture.

Regards,

Ric