From: Viacheslav Dubeyko <email@example.com>
To: Ric Wheeler <firstname.lastname@example.org>
Cc: Jaegeuk Kim <email@example.com>,
Bart Van Assche <firstname.lastname@example.org>,
Matthew Wilcox <email@example.com>,
Linux FS Devel <firstname.lastname@example.org>,
Subject: Re: [LSF/MM/BPF TOPIC] durability vs performance for flash devices (especially embedded!)
Date: Thu, 10 Jun 2021 10:57:35 -0700 [thread overview]
Message-ID: <973FD16E-0F60-4709-924E-8D15245C4EDB@dubeyko.com> (raw)
> On Jun 10, 2021, at 9:22 AM, Ric Wheeler <email@example.com> wrote:
> On 6/9/21 5:32 PM, Jaegeuk Kim wrote:
>> On Wed, Jun 9, 2021 at 11:47 AM Bart Van Assche <firstname.lastname@example.org <mailto:email@example.com>> wrote:
>> On 6/9/21 11:30 AM, Matthew Wilcox wrote:
>> > maybe you should read the paper.
>> > " Thiscomparison demonstrates that using F2FS, a flash-friendly file
>> > sys-tem, does not mitigate the wear-out problem, except inasmuch asit
>> > inadvertently rate limitsallI/O to the device"
>> Do you agree with that statement based on your insight? At least to me, that
>> paper is missing the fundamental GC problem which was supposed to be
>> evaluated by real workloads instead of using a simple benchmark generating
>> 4KB random writes only. And, they had to investigate more details in FTL/IO
>> patterns including UNMAP and LBA alignment between host and storage, which
>> all affect WAF. Based on that, the point of the zoned device is quite promising
>> to me, since it can address LBA alignment entirely and give a way that host
>> SW stack can control QoS.
> Just a note, using a pretty simple and optimal streaming write pattern, I have been able to burn out emmc parts in a little over a week.
> My test case creating a 1GB file (filled with random data just in case the device was looking for zero blocks to ignore) and then do a loop to cp and sync that file until the emmc device life time was shown as exhausted.
> This was a clean, best case sequential write so this is not just an issue with small, random writes.
> Of course, this is normal to wear them out, but for the super low end parts, taking away any of the device writes in our stack is costly given how little life they have....
I think that we need to distinguish various cases here. If we have pretty aged volume then GC plays the important role in write amplification issue. I believe that F2FS still has not very efficient GC subsystem. And, potentially, there is competition between FS’s GC and FTL’s GC. So, F2FS GC subsystem can be optimized in some way to reduce write amplification and GC competition. But I believe that the fundamental nature of F2FS GC subsystem doesn’t provide the way to exclude the write amplification issue completely. However, if GC is not playing then this source of write amplification can be excluded from the consideration.
The F2FS in-place update area is another source of write amplification issue that expected to be managed by FTL. This architectural decision doesn’t provide some room to make optimization here. Only if some metadata will be moved into the area that is living under Copy-On-Write policy. But it could be hard and time-consuming change.
Another source of write amplification issue in F2FS is the block mapping technique. Every update of logical block with user data results in update of block mapping metadata. So, this architectural solution still doesn’t provide a lot of room for optimization. Maybe, if some another metadata structure or mapping technique will be introduced.
So, if we exclude GC, in-place area, block mapping technique and other architectural decisions then the next possible direction to decrease write amplification could be not to update logical blocks frequently or to make lesser number of write operations. The most obvious solutions for this are: (1) compression, (2) deduplication, (3) combine several small files into one NAND page, (4) use inline technique to store small files’ content into the inode’s area.
I believe that additional potential issue of F2FS is the metadata reservation technique. I mean here that creation of a volume implies the reservation and initialization of metadata structures. It means that even if the metadata doesn’t contain yet any valuable info then, anyway, FTL implies that it’s valid data that needs to be managed to guarantee access to this content. Finally, FTL will move this data among erase blocks and it could decrease the lifetime of the device. Especially, if we are talking about NAND flash with not good endurance then read disturbance could play significant role.
next prev parent reply other threads:[~2021-06-10 17:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-09 10:53 [LSF/MM/BPF TOPIC] durability vs performance for flash devices (especially embedded!) Ric Wheeler
2021-06-09 18:05 ` Bart Van Assche
2021-06-09 18:30 ` Matthew Wilcox
2021-06-09 18:47 ` Bart Van Assche
2021-06-10 0:16 ` Damien Le Moal
2021-06-10 1:11 ` Ric Wheeler
2021-06-10 1:20 ` Ric Wheeler
2021-06-10 11:07 ` Tim Walker
2021-06-10 16:38 ` Keith Busch
[not found] ` <CAOtxgyeRf=+grEoHxVLEaSM=Yfx4KrSG5q96SmztpoWfP=QrDg@mail.gmail.com>
2021-06-10 16:22 ` Ric Wheeler
2021-06-10 17:06 ` Matthew Wilcox
2021-06-10 17:25 ` Ric Wheeler
2021-06-10 17:57 ` Viacheslav Dubeyko [this message]
2021-06-13 20:41 ` [LSF/MM/BPF TOPIC] SSDFS: LFS file system without GC operations + NAND flash devices lifetime prolongation Viacheslav Dubeyko
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).