linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
* [f2fs-dev] [DISCUSSION] f2fs for desktop
@ 2023-04-04  7:36 Juhyung Park
  2023-04-04 23:35 ` Jaegeuk Kim
  2023-04-10 15:44 ` Chao Yu
  0 siblings, 2 replies; 10+ messages in thread
From: Juhyung Park @ 2023-04-04  7:36 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim, Chao Yu; +Cc: Alexander Koskovich

Hi everyone,

I want to start a discussion on using f2fs for regular desktops/workstations.

There are growing number of interests in using f2fs as the general
root file-system:
2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193

I've been personally running f2fs on all of my x86 Linux boxes since
2015, and I have several concerns that I think we need to collectively
address for regular non-Android normies to use f2fs:

A. Bootloader and installer support
B. Host-side GC
C. Extended node bitmap

I'll go through each one.

=== A. Bootloader and installer support ===

It seems that both GRUB and systemd-boot supports f2fs without the
need for a separate ext4-formatted /boot partition.
Some distros are seemingly disabling f2fs module for GRUB though for
security reasons:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664

It's ultimately up to the distro folks to enable this, and still in
the worst-case scenario, they can specify a separate /boot partition
and format it to ext4 upon installation.

The installer itself to show f2fs and call mkfs.f2fs is being worked
on currently on Ubuntu. See the 2023 links above.

Nothing f2fs mainline developers should do here, imo.

=== B. Host-side GC ===

f2fs relieves most of the device-side GC but introduces a new
host-side GC. This is extremely confusing for people who have no
background in SSDs and flash storage to understand, let alone
discard/trim/erase complications.

In most consumer-grade blackbox SSDs, device-side GCs are handled
automatically for various workloads. f2fs, however, leaves that
responsibility to the userspace with conservative tuning on the
kernel-side by default. Android handles this by init.rc tunings and a
separate code running in vold to trigger gc_urgent.

For regular Linux desktop distros, f2fs just runs on the default
configuration set on the kernel and unless it’s running 24/7 with
plentiful idle time, it quickly runs out of free segments and starts
triggering foreground GC. This is giving people the wrong impression
that f2fs slows down far drastically than other file-systems when
that’s quite the contrary (i.e., less fragmentation overtime).

This is almost the equivalent of re-living the nightmare of trim. On
SSDs with very small to no over-provisioned space, running a
file-system with no discard what-so-ever (sadly still a common case
when an external SSD is used with no UAS) will also drastically slow
the performance down. On file-systems with no asynchronous discard,
mounting a file-system with the discard option adds a non-negligible
overhead on every remove/delete operations, so most distros now
(thankfully) use a timer job registered to systemd to trigger fstrim:
https://github.com/util-linux/util-linux/commits/master/sys-utils/fstrim.timer

This is still far from ideal. The default file-system, ext4, slows
down drastically almost to a halt when fstrim -a is called, especially
on SATA. For some reason that is still a mystery for me, people seem
to be happy with it. No one bothered to improve it for years
¯\_(ツ)_/¯.

So here’s my proposal:
As Linux distros don’t have a good mechanism for hinting when to
trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and
enable it by default.
This config will hook up ioctl(FITRIM), which is currently ignored on
f2fs - https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb
, to perform discard and GC on all invalid segments.
Userspace configuration with enough f2fs/GC knowledge such as Android
should disable it.

This will ensure that Linux distros that blindly call fstrim will at
least avoid constant slowdowns when free segments are depleted with
the occasional (once a week) slowdown, which *people are already
living with on ext4*. I'll even go further and mention that since f2fs
GC is a regular R/W workload, it doesn't cause an extreme slowdown
comparable to a level of a full file-system trim operation.

If this is acceptable, I’ll cook up a patch.

In an ideal world, all Linux distros should have an explicit f2fs GC
trigger mechanism (akin to
https://github.com/kdave/btrfsmaintenance#distro-integration ), but
it’s practically unrealistic to expect that, given the installer
doesn’t even support f2fs for now.

=== C. Extended node bitmap ===

f2fs by default have a very limited number of allowed inodes compared
to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs
and result in -ENOSPC.

Here are some of the stats collected from me and my colleague that we
use daily as a regular desktop with GUI, web-browsing and everything:
1. Laptop
Utilization: 68% (182914850 valid blocks, 462 discard blocks)
  - Node: 10234905 (Inode: 10106526, Other: 128379)
  - Data: 172679945
  - Inline_xattr Inode: 2004827
  - Inline_data Inode: 867204
  - Inline_dentry Inode: 51456

2. Desktop #1
Utilization: 55% (133310465 valid blocks, 0 discard blocks)
  - Node: 6389660 (Inode: 6289765, Other: 99895)
  - Data: 126920805
  - Inline_xattr Inode: 2253838
  - Inline_data Inode: 1119109
  - Inline_dentry Inode: 187958

3. Desktop #2
Utilization: 83% (202222003 valid blocks, 1 discard blocks)
  - Node: 21887836 (Inode: 21757139, Other: 130697)
  - Data: 180334167
  - Inline_xattr Inode: 39292
  - Inline_data Inode: 35213
  - Inline_dentry Inode: 1127

4. Colleague
Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
  - Node: 5629348 (Inode: 5542909, Other: 86439)
  - Data: 103023581
  - Inline_xattr Inode: 655752
  - Inline_data Inode: 259900
  - Inline_dentry Inode: 193000

5. Android phone (for reference)
Utilization: 78% (36505713 valid blocks, 1074 discard blocks)
  - Node: 704698 (Inode: 683337, Other: 21361)
  - Data: 35801015
  - Inline_xattr Inode: 683333
  - Inline_data Inode: 237470
  - Inline_dentry Inode: 112177

Chao Yu added a functionality to expand this via the -i flag passed to
mkfs.f2fs back in 2018 -
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4

I occasionally find myself in a weird position of having to tell
people "Oh you should use the -i option from mkfs.f2fs" when they
encounter this issue only after they’ve migrated most of the data and
ask back "Why isn’t this enabled by default?".

While this might not be an issue for the foreseeable future in
Android, I’d argue that this is a feature that needs to be enabled by
default for desktop environments with preferably a robust testing
infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to
make much sense as it introduces more complications to how
fuzzing/testing should be done.

I’ll also add that it’s a common practice for userspace mkfs tools to
introduce breaking default changes to older kernels (with options to
produce a legacy image, of course).

This was a lengthy email, but I hope I was being reasonable.

Jaegeuk and Chao, let me know what you think.
And as always, thanks for your hard work :)

Thanks,
regards


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-04-04  7:36 [f2fs-dev] [DISCUSSION] f2fs for desktop Juhyung Park
@ 2023-04-04 23:35 ` Jaegeuk Kim
  2023-04-10  8:39   ` Juhyung Park
  2023-04-10 15:44 ` Chao Yu
  1 sibling, 1 reply; 10+ messages in thread
From: Jaegeuk Kim @ 2023-04-04 23:35 UTC (permalink / raw)
  To: Juhyung Park; +Cc: Alexander Koskovich, linux-f2fs-devel

Hi Juhyung,

On 04/04, Juhyung Park wrote:
> Hi everyone,
> 
> I want to start a discussion on using f2fs for regular desktops/workstations.
> 
> There are growing number of interests in using f2fs as the general
> root file-system:
> 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
> 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
> 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
> 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193

This is quite promising. :)

> 
> I've been personally running f2fs on all of my x86 Linux boxes since
> 2015, and I have several concerns that I think we need to collectively
> address for regular non-Android normies to use f2fs:
> 
> A. Bootloader and installer support
> B. Host-side GC
> C. Extended node bitmap
> 
> I'll go through each one.
> 
> === A. Bootloader and installer support ===
> 
> It seems that both GRUB and systemd-boot supports f2fs without the
> need for a separate ext4-formatted /boot partition.
> Some distros are seemingly disabling f2fs module for GRUB though for
> security reasons:
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
> 
> It's ultimately up to the distro folks to enable this, and still in
> the worst-case scenario, they can specify a separate /boot partition
> and format it to ext4 upon installation.
> 
> The installer itself to show f2fs and call mkfs.f2fs is being worked
> on currently on Ubuntu. See the 2023 links above.
> 
> Nothing f2fs mainline developers should do here, imo.
> 
> === B. Host-side GC ===
> 
> f2fs relieves most of the device-side GC but introduces a new
> host-side GC. This is extremely confusing for people who have no
> background in SSDs and flash storage to understand, let alone
> discard/trim/erase complications.
> 
> In most consumer-grade blackbox SSDs, device-side GCs are handled
> automatically for various workloads. f2fs, however, leaves that
> responsibility to the userspace with conservative tuning on the
> kernel-side by default. Android handles this by init.rc tunings and a
> separate code running in vold to trigger gc_urgent.
> 
> For regular Linux desktop distros, f2fs just runs on the default
> configuration set on the kernel and unless it’s running 24/7 with
> plentiful idle time, it quickly runs out of free segments and starts
> triggering foreground GC. This is giving people the wrong impression
> that f2fs slows down far drastically than other file-systems when
> that’s quite the contrary (i.e., less fragmentation overtime).
> 
> This is almost the equivalent of re-living the nightmare of trim. On
> SSDs with very small to no over-provisioned space, running a
> file-system with no discard what-so-ever (sadly still a common case
> when an external SSD is used with no UAS) will also drastically slow
> the performance down. On file-systems with no asynchronous discard,
> mounting a file-system with the discard option adds a non-negligible
> overhead on every remove/delete operations, so most distros now
> (thankfully) use a timer job registered to systemd to trigger fstrim:
> https://github.com/util-linux/util-linux/commits/master/sys-utils/fstrim.timer
> 
> This is still far from ideal. The default file-system, ext4, slows
> down drastically almost to a halt when fstrim -a is called, especially
> on SATA. For some reason that is still a mystery for me, people seem
> to be happy with it. No one bothered to improve it for years
> ¯\_(ツ)_/¯.
> 
> So here’s my proposal:
> As Linux distros don’t have a good mechanism for hinting when to
> trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and
> enable it by default.
> This config will hook up ioctl(FITRIM), which is currently ignored on
> f2fs - https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb
> , to perform discard and GC on all invalid segments.
> Userspace configuration with enough f2fs/GC knowledge such as Android
> should disable it.

How about adding an option like "memory=high" to tune background GC parameters
seamlessly?

> 
> This will ensure that Linux distros that blindly call fstrim will at
> least avoid constant slowdowns when free segments are depleted with
> the occasional (once a week) slowdown, which *people are already
> living with on ext4*. I'll even go further and mention that since f2fs
> GC is a regular R/W workload, it doesn't cause an extreme slowdown
> comparable to a level of a full file-system trim operation.
> 
> If this is acceptable, I’ll cook up a patch.
> 
> In an ideal world, all Linux distros should have an explicit f2fs GC
> trigger mechanism (akin to
> https://github.com/kdave/btrfsmaintenance#distro-integration ), but
> it’s practically unrealistic to expect that, given the installer
> doesn’t even support f2fs for now.
> 
> === C. Extended node bitmap ===
> 
> f2fs by default have a very limited number of allowed inodes compared
> to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs
> and result in -ENOSPC.
> 
> Here are some of the stats collected from me and my colleague that we
> use daily as a regular desktop with GUI, web-browsing and everything:
> 1. Laptop
> Utilization: 68% (182914850 valid blocks, 462 discard blocks)
>   - Node: 10234905 (Inode: 10106526, Other: 128379)
>   - Data: 172679945
>   - Inline_xattr Inode: 2004827
>   - Inline_data Inode: 867204
>   - Inline_dentry Inode: 51456
> 
> 2. Desktop #1
> Utilization: 55% (133310465 valid blocks, 0 discard blocks)
>   - Node: 6389660 (Inode: 6289765, Other: 99895)
>   - Data: 126920805
>   - Inline_xattr Inode: 2253838
>   - Inline_data Inode: 1119109
>   - Inline_dentry Inode: 187958
> 
> 3. Desktop #2
> Utilization: 83% (202222003 valid blocks, 1 discard blocks)
>   - Node: 21887836 (Inode: 21757139, Other: 130697)
>   - Data: 180334167
>   - Inline_xattr Inode: 39292
>   - Inline_data Inode: 35213
>   - Inline_dentry Inode: 1127
> 
> 4. Colleague
> Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
>   - Node: 5629348 (Inode: 5542909, Other: 86439)
>   - Data: 103023581
>   - Inline_xattr Inode: 655752
>   - Inline_data Inode: 259900
>   - Inline_dentry Inode: 193000
> 
> 5. Android phone (for reference)
> Utilization: 78% (36505713 valid blocks, 1074 discard blocks)
>   - Node: 704698 (Inode: 683337, Other: 21361)
>   - Data: 35801015
>   - Inline_xattr Inode: 683333
>   - Inline_data Inode: 237470
>   - Inline_dentry Inode: 112177
> 
> Chao Yu added a functionality to expand this via the -i flag passed to
> mkfs.f2fs back in 2018 -
> https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4
> 
> I occasionally find myself in a weird position of having to tell
> people "Oh you should use the -i option from mkfs.f2fs" when they
> encounter this issue only after they’ve migrated most of the data and
> ask back "Why isn’t this enabled by default?".
> 
> While this might not be an issue for the foreseeable future in
> Android, I’d argue that this is a feature that needs to be enabled by
> default for desktop environments with preferably a robust testing
> infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to
> make much sense as it introduces more complications to how
> fuzzing/testing should be done.
> 
> I’ll also add that it’s a common practice for userspace mkfs tools to
> introduce breaking default changes to older kernels (with options to
> produce a legacy image, of course).

Do you have some measurements regarding to the additional space that large NAT
occupies?

Thanks,

> 
> This was a lengthy email, but I hope I was being reasonable.
> 
> Jaegeuk and Chao, let me know what you think.
> And as always, thanks for your hard work :)
> 
> Thanks,
> regards


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-04-04 23:35 ` Jaegeuk Kim
@ 2023-04-10  8:39   ` Juhyung Park
  0 siblings, 0 replies; 10+ messages in thread
From: Juhyung Park @ 2023-04-10  8:39 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: Alexander Koskovich, linux-f2fs-devel

Hi Jaegeuk, sorry for the late reply.

On Wed, Apr 5, 2023 at 8:35 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote:
>
> Hi Juhyung,
>
> > So here’s my proposal:
> > As Linux distros don’t have a good mechanism for hinting when to
> > trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and
> > enable it by default.
> > This config will hook up ioctl(FITRIM), which is currently ignored on
> > f2fs - https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb
> > , to perform discard and GC on all invalid segments.
> > Userspace configuration with enough f2fs/GC knowledge such as Android
> > should disable it.
>
> How about adding an option like "memory=high" to tune background GC parameters
> seamlessly?
>

Can you elaborate this design?
Not sure what both "memory" and "high" mean in that context.

Even if we tune BG GC parameters, I still think the same problem
exists: we don't know which heuristic covers all workloads and aged
environments.

I like the idea of a dynamic GC tuner in the kernel (if that's what
you meant), but I think it's complimentary to the proposed
CONFIG_F2FS_GC_UPON_FSTRIM, not a replacement unless we're 100%
certain that it can cover all workloads. My proposed
CONFIG_F2FS_GC_UPON_FSTRIM can act as a safeguard to *not* introduce
any more performance slowdowns compared to other file-systems.

> >
> > === C. Extended node bitmap ===
> >
> > f2fs by default have a very limited number of allowed inodes compared
> > to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs
> > and result in -ENOSPC.
> >
> > Here are some of the stats collected from me and my colleague that we
> > use daily as a regular desktop with GUI, web-browsing and everything:
> > 1. Laptop
> > Utilization: 68% (182914850 valid blocks, 462 discard blocks)
> >   - Node: 10234905 (Inode: 10106526, Other: 128379)
> >   - Data: 172679945
> >   - Inline_xattr Inode: 2004827
> >   - Inline_data Inode: 867204
> >   - Inline_dentry Inode: 51456
> >
> > 2. Desktop #1
> > Utilization: 55% (133310465 valid blocks, 0 discard blocks)
> >   - Node: 6389660 (Inode: 6289765, Other: 99895)
> >   - Data: 126920805
> >   - Inline_xattr Inode: 2253838
> >   - Inline_data Inode: 1119109
> >   - Inline_dentry Inode: 187958
> >
> > 3. Desktop #2
> > Utilization: 83% (202222003 valid blocks, 1 discard blocks)
> >   - Node: 21887836 (Inode: 21757139, Other: 130697)
> >   - Data: 180334167
> >   - Inline_xattr Inode: 39292
> >   - Inline_data Inode: 35213
> >   - Inline_dentry Inode: 1127
> >
> > 4. Colleague
> > Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
> >   - Node: 5629348 (Inode: 5542909, Other: 86439)
> >   - Data: 103023581
> >   - Inline_xattr Inode: 655752
> >   - Inline_data Inode: 259900
> >   - Inline_dentry Inode: 193000
> >
> > 5. Android phone (for reference)
> > Utilization: 78% (36505713 valid blocks, 1074 discard blocks)
> >   - Node: 704698 (Inode: 683337, Other: 21361)
> >   - Data: 35801015
> >   - Inline_xattr Inode: 683333
> >   - Inline_data Inode: 237470
> >   - Inline_dentry Inode: 112177
> >
> > Chao Yu added a functionality to expand this via the -i flag passed to
> > mkfs.f2fs back in 2018 -
> > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4
> >
> > I occasionally find myself in a weird position of having to tell
> > people "Oh you should use the -i option from mkfs.f2fs" when they
> > encounter this issue only after they’ve migrated most of the data and
> > ask back "Why isn’t this enabled by default?".
> >
> > While this might not be an issue for the foreseeable future in
> > Android, I’d argue that this is a feature that needs to be enabled by
> > default for desktop environments with preferably a robust testing
> > infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to
> > make much sense as it introduces more complications to how
> > fuzzing/testing should be done.
> >
> > I’ll also add that it’s a common practice for userspace mkfs tools to
> > introduce breaking default changes to older kernels (with options to
> > produce a legacy image, of course).
>
> Do you have some measurements regarding to the additional space that large NAT
> occupies?
>

Is this something that f2fs' debugfs captures?
If not, please let me know if there's a standard practice on how to measure it.

debugfs output on each system in order:

=====[ partition info(nvme0n1p4). #0, RW, CP: Good]=====
[SBI: fs_dirty recovered]
[SB: 1] [CP: 2] [SIT: 38] [NAT: 460] [SSA: 1024] [MAIN:
522763(OverProv:2050 Resv:1007)]

Current Time Sec: 229327 / Mounted Time Sec: 0

Policy:
  - IPU: [ FSYNC ]

Utilization: 68% (182885118 valid blocks, 598 discard blocks)
  - Node: 10222831 (Inode: 10094469, Other: 128362)
  - Data: 172662287
  - Inline_xattr Inode: 244470
  - Inline_data Inode: 92414
  - Inline_dentry Inode: 16497
  - Compressed Inode: 0, Blocks: 0
  - Swapfile Inode: 0
  - Orphan/Append/Update Inode: 26, 0, 0

Main area: 522763 segs, 522763 secs 522763 zones
    TYPE            segno    secno   zoneno  dirty_seg   full_seg  valid_blk
  - COLD   data:   496287   496287   496287       2128      73057   38372383
  - WARM   data:   496410   496410   496410        661     242675  124379666
  - HOT    data:    22368    22368    22368        161      19195    9856098
  - Dir   dnode:    20964    20964    20964         70       2481    1301404
  - File  dnode:    22893    22893    22893        753      16742    8913189
  - Indir nodes:      822      822      822          3         14       8225
  - Pinned file:       -1       -1       -1
  - ATGC   data:       -1       -1       -1

  - Valid: 354170
  - Dirty: 3770
  - Prefree: 0
  - Free: 164823 (164823)

CP calls: 1600 (BG: 420)
  - cp blocks : 12986
  - sit blocks : 25046
  - nat blocks : 35336
  - ssa blocks : 8504
CP merge:
  - Queued :    0
  - Issued : 1660
  - Total : 1663
  - Cur time :    1(ms)
  - Peak time :  165(ms)
GC calls: 1155 (BG: 1156)
  - data segments : 993 (993)
  - node segments : 162 (162)
  - Reclaimed segs :
    - Normal : 7
    - Idle CB : 0
    - Idle Greedy : 0
    - Idle AT : 0
    - Urgent High : 1148
    - Urgent Mid : 0
    - Urgent Low : 0
Try to move 177920 blocks (BG: 177920)
  - data blocks : 152337 (152337)
  - node blocks : 25583 (25583)
BG skip : IO: 143, Other: 10

Extent Cache (Read):
  - Hit Count: L1-1:290494 L1-2:36316 L2:3872
  - Hit Ratio: 17% (330682 / 1885295)
  - Inner Struct Count: tree: 223660(0), node: 108586

Extent Cache (Block Age):
  - Allocated Data Blocks: 2561142
  - Hit Count: L1:0 L2:0
  - Hit Ratio: 0% (0 / 0)
  - Inner Struct Count: tree: 0(0), node: 0

Balancing F2FS Async:
  - DIO (R:    0, W:    0)
  - IO_R (Data:    0, Node:    0, Meta:    0
  - IO_W (CP:    0, Data:    0, Flush: (   0 15980    1), Discard: (
0 11499)) cmd: 24771 undiscard:592388
  - atomic IO:    0 (Max.    3)
  - compress:    0, hit:       0
  - nodes:   36 in 48840
  - dents:   19 in dirs:   5 (  45)
  - datas:  249 in files:   0
  - quota datas:    0 in quota files:   0
  - meta:    2 in 4893
  - imeta:   31
  - fsync mark:   22
  - NATs:        66/    99805
  - SITs:        35/   522763
  - free_nids:      9360/ 43357933
  - alloc_nids:         0

Distribution of User Blocks: [ valid | invalid | free ]
  [----------------------------------|-|---------------]

IPU: 30928 blocks
SSR: 157339 blocks in 3514 segments
LFS: 2548926 blocks in 4979 segments

BDF: 99, avg. vblocks: 396

Memory: 374747 KB
  - static: 126900 KB
  - cached all: 32914 KB
  - read extent cache: 26855 KB
  - block age extent cache: 0 KB
  - paged : 214932 KB

=====[ partition info(nvme0n1p3). #0, RW, CP: Good]=====
[SB: 1] [CP: 2] [SIT: 34] [NAT: 418] [SSA: 930] [MAIN:
474781(OverProv:1955 Resv:960)]

Current Time Sec: 2144822 / Mounted Time Sec: 2

Policy:
  - IPU: [ FSYNC ]

Utilization: 55% (133484583 valid blocks, 0 discard blocks)
  - Node: 6365874 (Inode: 6265128, Other: 100746)
  - Data: 127118709
  - Inline_xattr Inode: 2271411
  - Inline_data Inode: 1120167
  - Inline_dentry Inode: 226245
  - Compressed Inode: 0, Blocks: 0
  - Swapfile Inode: 0
  - Orphan/Append/Update Inode: 12, 0, 0

Main area: 474781 segs, 474781 secs 474781 zones
    TYPE            segno    secno   zoneno  dirty_seg   full_seg  valid_blk
  - COLD   data:   326480   326480   326480         59      33625   17245779
  - WARM   data:    94665    94665    94665         22     194916   99808056
  - HOT    data:    11739    11739    11739          7      19473    9973372
  - Dir   dnode:     6525     6525     6525        103       1231     682165
  - File  dnode:     8957     8957     8957        621      10477    5679709
  - Indir nodes:      609      609      609          1          7       4000
  - Pinned file:       -1       -1       -1
  - ATGC   data:       -1       -1       -1

  - Valid: 259735
  - Dirty: 807
  - Prefree: 0
  - Free: 214239 (214239)

CP calls: 39861 (BG: 34749)
  - cp blocks : 322043
  - sit blocks : 218279
  - nat blocks : 2070086
  - ssa blocks : 55514
CP merge:
  - Queued :    0
  - Issued : 40665
  - Total : 40679
  - Cur time :    0(ms)
  - Peak time :  668(ms)
GC calls: 32936 (BG: 33191)
  - data segments : 12319 (12319)
  - node segments : 20617 (20617)
  - Reclaimed segs :
    - Normal : 32936
    - Idle CB : 0
    - Idle Greedy : 0
    - Idle AT : 0
    - Urgent High : 0
    - Urgent Mid : 0
    - Urgent Low : 0
Try to move 13622714 blocks (BG: 13622714)
  - data blocks : 4354751 (4354751)
  - node blocks : 9267963 (9267963)
BG skip : IO: 740, Other: 0

Extent Cache (Read):
  - Hit Count: L1-1:4404282 L1-2:2449632 L2:61448
  - Hit Ratio: 34% (6915362 / 20237092)
  - Inner Struct Count: tree: 2024530(0), node: 848681

Extent Cache (Block Age):
  - Allocated Data Blocks: 17851435
  - Hit Count: L1:0 L2:0
  - Hit Ratio: 0% (0 / 0)
  - Inner Struct Count: tree: 0(0), node: 0

Balancing F2FS Async:
  - DIO (R:    0, W:    0)
  - IO_R (Data:    0, Node:    0, Meta:    0
  - IO_W (CP:    0, Data:    0, Flush: (   0    0    1), Discard: (
0 78857)) cmd: 1852 undiscard:2873
  - atomic IO:    0 (Max.    0)
  - compress:    0, hit:       0
  - nodes:    0 in 3568509
  - dents:    0 in dirs:   0 (   2)
  - datas:    0 in files:   0
  - quota datas:    0 in quota files:   0
  - meta:    0 in 80118
  - imeta:    0
  - fsync mark:    0
  - NATs:         0/    99802
  - SITs:         0/   474781
  - free_nids:     75842/ 42322763
  - alloc_nids:         0

Distribution of User Blocks: [ valid | invalid | free ]
  [---------------------------|-|----------------------]

IPU: 1060837 blocks
SSR: 0 blocks in 0 segments
LFS: 28422813 blocks in 55514 segments

BDF: 99, avg. vblocks: 506

Memory: 14948530 KB
  - static: 115259 KB
  - cached all: 238763 KB
  - read extent cache: 233655 KB
  - block age extent cache: 0 KB
  - paged : 14594508 KB

=====[ partition info(nvme1n1p1). #1, RW, CP: Good]=====
[SB: 1] [CP: 2] [SIT: 34] [NAT: 418] [SSA: 931] [MAIN:
475548(OverProv:1956 Resv:960)]

Current Time Sec: 2144822 / Mounted Time Sec: 3

Policy:
  - IPU: [ FSYNC ]

Utilization: 83% (202224148 valid blocks, 26 discard blocks)
  - Node: 21888175 (Inode: 21757478, Other: 130697)
  - Data: 180335973
  - Inline_xattr Inode: 39665
  - Inline_data Inode: 35530
  - Inline_dentry Inode: 1133
  - Compressed Inode: 0, Blocks: 0
  - Swapfile Inode: 0
  - Orphan/Append/Update Inode: 0, 0, 0

Main area: 475548 segs, 475548 secs 475548 zones
    TYPE            segno    secno   zoneno  dirty_seg   full_seg  valid_blk
  - COLD   data:   371394   371394   371394          4     125054   64029544
  - WARM   data:    62872    62872    62872          2     213588  109357995
  - HOT    data:    56416    56416    56416          1      13571    6948434
  - Dir   dnode:    38945    38945    38945         15       3828    1967280
  - File  dnode:    26083    26083    26083         25      38827   19891815
  - Indir nodes:    32035    32035    32035          1         56      29080
  - Pinned file:       -1       -1       -1
  - ATGC   data:       -1       -1       -1

  - Valid: 394930
  - Dirty: 42
  - Prefree: 0
  - Free: 80576 (80576)

CP calls: 2507 (BG: 9077)
  - cp blocks : 17756
  - sit blocks : 12427
  - nat blocks : 92338
  - ssa blocks : 8736
CP merge:
  - Queued :    0
  - Issued : 9091
  - Total : 9091
  - Cur time :    0(ms)
  - Peak time : 1095(ms)
GC calls: 2192 (BG: 8833)
  - data segments : 781 (781)
  - node segments : 1411 (1411)
  - Reclaimed segs :
    - Normal : 2192
    - Idle CB : 0
    - Idle Greedy : 0
    - Idle AT : 0
    - Urgent High : 0
    - Urgent Mid : 0
    - Urgent Low : 0
Try to move 816343 blocks (BG: 816343)
  - data blocks : 241539 (241539)
  - node blocks : 574804 (574804)
BG skip : IO: 65, Other: 0

Extent Cache (Read):
  - Hit Count: L1-1:154989 L1-2:162385 L2:444
  - Hit Ratio: 64% (317818 / 495270)
  - Inner Struct Count: tree: 38118(0), node: 661

Extent Cache (Block Age):
  - Allocated Data Blocks: 3447097
  - Hit Count: L1:0 L2:0
  - Hit Ratio: 0% (0 / 0)
  - Inner Struct Count: tree: 0(0), node: 0

Balancing F2FS Async:
  - DIO (R:    0, W:    0)
  - IO_R (Data:    0, Node:    0, Meta:    0
  - IO_W (CP:    0, Data:    0, Flush: (   0    0    1), Discard: (
0 25545)) cmd:    0 undiscard:   0
  - atomic IO:    0 (Max.    0)
  - compress:    0, hit:       0
  - nodes:    0 in 114151
  - dents:    0 in dirs:   0 (   0)
  - datas:    0 in files:   0
  - quota datas:    0 in quota files:   0
  - meta:    0 in 31810
  - imeta:    0
  - fsync mark:    0
  - NATs:         0/    99571
  - SITs:         0/   475548
  - free_nids:     10557/ 26800462
  - alloc_nids:         0

Distribution of User Blocks: [ valid | invalid | free ]
  [-----------------------------------------|-|--------]

IPU: 55 blocks
SSR: 0 blocks in 0 segments
LFS: 4473987 blocks in 8736 segments

BDF: 99, avg. vblocks: 480

Memory: 705967 KB
  - static: 115434 KB
  - cached all: 6689 KB
  - read extent cache: 3322 KB
  - block age extent cache: 0 KB
  - paged : 583844 KB

=====[ partition info(nvme1n1p3). #0, RW, CP: Good]=====
[SBI: fs_dirty recovered]
[SB: 1] [CP: 2] [SIT: 68] [NAT: 52] [SSA: 1861] [MAIN:
950830(OverProv:2765 Resv:1341)]

Current Time Sec: 5410 / Mounted Time Sec: 4

Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
  - Node: 5629348 (Inode: 5542909, Other: 86439)
  - Data: 103023581
  - Inline_xattr Inode: 655752
  - Inline_data Inode: 259900
  - Inline_dentry Inode: 193000
  - Compressed Inode: 0, Blocks: 0
  - Swapfile Inode: 0
  - Orphan/Append/Update Inode: 52, 0, 0

Main area: 950830 segs, 950830 secs 950830 zones
    TYPE            segno    secno   zoneno  dirty_seg   full_seg  valid_blk
  - COLD   data:   292597   292597   292597        128      50139   25718354
  - WARM   data:   685385   685385   685385       8937     142173   74591050
  - HOT    data:    21291    21291    21291       1372       4332    2689025
  - Dir   dnode:    23471    23471    23471       1041        110     476470
  - File  dnode:    21255    21255    21255      11021       3507    5149194
  - Indir nodes:    22875    22875    22875         40          1       3683
  - Pinned file:       -1       -1       -1
  - ATGC   data:       -1       -1       -1

  - Valid: 200268
  - Dirty: 22533
  - Prefree: 0
  - Free: 728029 (728029)

CP calls: 1736 (BG: 2664)
  - cp blocks : 8821
  - sit blocks : 12481
  - nat blocks : 29761
  - ssa blocks : 31070
CP merge:
  - Queued :    0
  - Issued : 3280
  - Total : 3350
  - Cur time :    0(ms)
  - Peak time :   61(ms)
GC calls: 0 (BG: 0)
  - data segments : 0 (0)
  - node segments : 0 (0)
  - Reclaimed segs :
    - Normal : 0
    - Idle CB : 0
    - Idle Greedy : 0
    - Idle AT : 0
    - Urgent High : 0
    - Urgent Mid : 0
    - Urgent Low : 0
Try to move 0 blocks (BG: 0)
  - data blocks : 0 (0)
  - node blocks : 0 (0)
BG skip : IO: 88, Other: 0

Extent Cache (Read):
  - Hit Count: L1-1:163381 L1-2:3828 L2:2956
  - Hit Ratio: 2% (170165 / 7168533)
  - Inner Struct Count: tree: 453025(0), node: 193727

Extent Cache (Block Age):
  - Allocated Data Blocks: 15227950
  - Hit Count: L1:0 L2:0
  - Hit Ratio: 0% (0 / 0)
  - Inner Struct Count: tree: 0(0), node: 0

Balancing F2FS Async:
  - DIO (R:    0, W:    0)
  - IO_R (Data:    0, Node:    0, Meta:    0
  - IO_W (CP:    0, Data:    0, Flush: (   0 3920    1), Discard: (
0  550)) cmd: 23117 undiscard:15649654
  - atomic IO:    0 (Max.    2)
  - compress:    0, hit:       0
  - nodes:   52 in 1200554
  - dents:    4 in dirs:   1 (  16)
  - datas: 1796 in files:   0
  - quota datas:    0 in quota files:   0
  - meta:    2 in 63898
  - imeta:    9
  - fsync mark:    5
  - NATs:        18/    99583
  - SITs:        90/   950830
  - free_nids:    427597/   427597
  - alloc_nids:         0

Distribution of User Blocks: [ valid | invalid | free ]
  [-----------|-|--------------------------------------]

IPU: 7883 blocks
SSR: 0 blocks in 0 segments
LFS: 15907965 blocks in 31070 segments

BDF: 98, avg. vblocks: 270

Memory: 5343760 KB
  - static: 217726 KB
  - cached all: 68225 KB
  - read extent cache: 52553 KB
  - block age extent cache: 0 KB
  - paged : 5057808 KB

=====[ partition info(sda20). #0, RW, CP: Good]=====
[SBI: fs_dirty recovered quota_need_flush]
[SB: 1] [CP: 2] [SIT: 8] [NAT: 112] [SSA: 180] [MAIN:
91857(OverProv:862 Resv:433)]

Current Time Sec: 774835 / Mounted Time Sec: 3

Policy:
  - IPU: [ FSYNC ]

Utilization: 79% (37048389 valid blocks, 430 discard blocks)
  - Node: 719422 (Inode: 697722, Other: 21700)
  - Data: 36328967
  - Inline_xattr Inode: 445204
  - Inline_data Inode: 195233
  - Inline_dentry Inode: 51879
  - Compressed Inode: 0, Blocks: 0
  - Swapfile Inode: 0
  - Orphan/Append/Update Inode: 368, 0, 0

Main area: 91857 segs, 91857 secs 91857 zones
    TYPE            segno    secno   zoneno  dirty_seg   full_seg  valid_blk
  - COLD   data:    91264    91264    91264       1893      42822   22726490
  - WARM   data:     8424     8424     8424        766      25037   12909190
  - HOT    data:     7453     7453     7453        538       1282     692980
  - Dir   dnode:     8372     8372     8372        192        111     111076
  - File  dnode:     8051     8051     8051       1309        393     608259
  - Indir nodes:      256      256      256          1          0         85
  - Pinned file:       -1       -1       -1
  - ATGC   data:       -1       -1       -1

  - Valid: 69651
  - Dirty: 4693
  - Prefree: 0
  - Free: 17513 (17513)

CP calls: 130942 (BG: 2181)
  - cp blocks : 662119
  - sit blocks : 874869
  - nat blocks : 3043527
  - ssa blocks : 136941
CP merge:
  - Queued :    0
  - Issued : 134360
  - Total : 134806
  - Cur time :   22(ms)
  - Peak time :  525(ms)
GC calls: 30425 (BG: 30455)
  - data segments : 21321 (21321)
  - node segments : 9104 (9104)
  - Reclaimed segs :
    - Normal : 417
    - Idle CB : 0
    - Idle Greedy : 0
    - Idle AT : 0
    - Urgent High : 30008
    - Urgent Mid : 0
    - Urgent Low : 0
Try to move 3257936 blocks (BG: 3257936)
  - data blocks : 2470378 (2470378)
  - node blocks : 787558 (787558)
BG skip : IO: 7577, Other: 36953

Extent Cache (Read):
  - Hit Count: L1-1:13680740 L1-2:907686 L2:450749
  - Hit Ratio: 14% (15039175 / 105839291)
  - Inner Struct Count: tree: 369427(0), node: 447

Extent Cache (Block Age):
  - Allocated Data Blocks: 36567055
  - Hit Count: L1:0 L2:0
  - Hit Ratio: 0% (0 / 0)
  - Inner Struct Count: tree: 0(0), node: 0

Balancing F2FS Async:
  - DIO (R:    0, W:    0)
  - IO_R (Data:    0, Node:    0, Meta:    0
  - IO_W (CP:    0, Data:    0, Flush: (   0    0    1), Discard: (
0 406517)) cmd: 28565 undiscard:110127
  - atomic IO:    0 (Max.    8)
  - compress:    0, hit:       0
  - nodes:    6 in 2914
  - dents:    1 in dirs:   1 (  43)
  - datas:  505 in files:   0
  - quota datas:    1 in quota files:   3
  - meta:    0 in  714
  - imeta:    6
  - fsync mark:    5
  - NATs:        12/    31418
  - SITs:         7/    91857
  - free_nids:     10902/ 12326330
  - alloc_nids:         0

Distribution of User Blocks: [ valid | invalid | free ]
  [---------------------------------------|--|---------]

IPU: 3055488 blocks
SSR: 1881000 blocks in 52764 segments
LFS: 41233030 blocks in 80532 segments

BDF: 98, avg. vblocks: 295

Memory: 73523 KB
  - static: 22853 KB
  - cached all: 36158 KB
  - read extent cache: 31779 KB
  - block age extent cache: 0 KB
  - paged : 14512 KB

Thanks,

> Thanks,
>
> >
> > This was a lengthy email, but I hope I was being reasonable.
> >
> > Jaegeuk and Chao, let me know what you think.
> > And as always, thanks for your hard work :)
> >
> > Thanks,
> > regards


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-04-04  7:36 [f2fs-dev] [DISCUSSION] f2fs for desktop Juhyung Park
  2023-04-04 23:35 ` Jaegeuk Kim
@ 2023-04-10 15:44 ` Chao Yu
  2023-04-10 17:03   ` Juhyung Park
  1 sibling, 1 reply; 10+ messages in thread
From: Chao Yu @ 2023-04-10 15:44 UTC (permalink / raw)
  To: Juhyung Park, linux-f2fs-devel, Jaegeuk Kim; +Cc: Alexander Koskovich

Hi Juhyung,

On 2023/4/4 15:36, Juhyung Park wrote:
> Hi everyone,
> 
> I want to start a discussion on using f2fs for regular desktops/workstations.
> 
> There are growing number of interests in using f2fs as the general
> root file-system:
> 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
> 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
> 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
> 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
> 
> I've been personally running f2fs on all of my x86 Linux boxes since
> 2015, and I have several concerns that I think we need to collectively
> address for regular non-Android normies to use f2fs:
> 
> A. Bootloader and installer support
> B. Host-side GC
> C. Extended node bitmap
> 
> I'll go through each one.
> 
> === A. Bootloader and installer support ===
> 
> It seems that both GRUB and systemd-boot supports f2fs without the
> need for a separate ext4-formatted /boot partition.
> Some distros are seemingly disabling f2fs module for GRUB though for
> security reasons:
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
> 
> It's ultimately up to the distro folks to enable this, and still in
> the worst-case scenario, they can specify a separate /boot partition
> and format it to ext4 upon installation.
> 
> The installer itself to show f2fs and call mkfs.f2fs is being worked
> on currently on Ubuntu. See the 2023 links above.
> 
> Nothing f2fs mainline developers should do here, imo.
> 
> === B. Host-side GC ===
> 
> f2fs relieves most of the device-side GC but introduces a new
> host-side GC. This is extremely confusing for people who have no
> background in SSDs and flash storage to understand, let alone
> discard/trim/erase complications.
> 
> In most consumer-grade blackbox SSDs, device-side GCs are handled
> automatically for various workloads. f2fs, however, leaves that
> responsibility to the userspace with conservative tuning on the

We've proposed a f2fs feature named "space awared garbage collection"
and shipped it in huawei/honor's devices, but forgot to try upstreaming
it. :-P

In this feature, we introduced three mode:
- performance mode: something like write-gc in ftl, it can trigger
background gc more frequently and tune its speed according to free
segs and reclaimable blks ratio.
- lifetime mode: slow down background gc to avoid high waf if there
is less free space.
- balance mode: behave as usual.

I guess this may be helpful for Linux desktop distros since there is
no such storage service trigger gc_urgent.

> kernel-side by default. Android handles this by init.rc tunings and a
> separate code running in vold to trigger gc_urgent.
> 
> For regular Linux desktop distros, f2fs just runs on the default
> configuration set on the kernel and unless it’s running 24/7 with
> plentiful idle time, it quickly runs out of free segments and starts
> triggering foreground GC. This is giving people the wrong impression
> that f2fs slows down far drastically than other file-systems when
> that’s quite the contrary (i.e., less fragmentation overtime).
> 
> This is almost the equivalent of re-living the nightmare of trim. On
> SSDs with very small to no over-provisioned space, running a
> file-system with no discard what-so-ever (sadly still a common case
> when an external SSD is used with no UAS) will also drastically slow

What does UAS mean?

> the performance down. On file-systems with no asynchronous discard,

There is no such performance issue in f2fs, right? as f2fs enables
discard mount option by default, and supports async discard feature.

> mounting a file-system with the discard option adds a non-negligible
> overhead on every remove/delete operations, so most distros now
> (thankfully) use a timer job registered to systemd to trigger fstrim:
> https://github.com/util-linux/util-linux/commits/master/sys-utils/fstrim.timer
> 
> This is still far from ideal. The default file-system, ext4, slows
> down drastically almost to a halt when fstrim -a is called, especially
> on SATA. For some reason that is still a mystery for me, people seem
> to be happy with it. No one bothered to improve it for years
> ¯\_(ツ)_/¯.
> 
> So here’s my proposal:
> As Linux distros don’t have a good mechanism for hinting when to
> trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and
> enable it by default.
> This config will hook up ioctl(FITRIM), which is currently ignored on
> f2fs - https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb
> , to perform discard and GC on all invalid segments.
> Userspace configuration with enough f2fs/GC knowledge such as Android
> should disable it.
> 
> This will ensure that Linux distros that blindly call fstrim will at
> least avoid constant slowdowns when free segments are depleted with
> the occasional (once a week) slowdown, which *people are already
> living with on ext4*. I'll even go further and mention that since f2fs
> GC is a regular R/W workload, it doesn't cause an extreme slowdown
> comparable to a level of a full file-system trim operation.
> 
> If this is acceptable, I’ll cook up a patch.
> 
> In an ideal world, all Linux distros should have an explicit f2fs GC
> trigger mechanism (akin to
> https://github.com/kdave/btrfsmaintenance#distro-integration ), but
> it’s practically unrealistic to expect that, given the installer
> doesn’t even support f2fs for now.
> 
> === C. Extended node bitmap ===
> 
> f2fs by default have a very limited number of allowed inodes compared
> to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs
> and result in -ENOSPC.
> 
> Here are some of the stats collected from me and my colleague that we
> use daily as a regular desktop with GUI, web-browsing and everything:
> 1. Laptop
> Utilization: 68% (182914850 valid blocks, 462 discard blocks)
>    - Node: 10234905 (Inode: 10106526, Other: 128379)
>    - Data: 172679945
>    - Inline_xattr Inode: 2004827
>    - Inline_data Inode: 867204
>    - Inline_dentry Inode: 51456
> 
> 2. Desktop #1
> Utilization: 55% (133310465 valid blocks, 0 discard blocks)
>    - Node: 6389660 (Inode: 6289765, Other: 99895)
>    - Data: 126920805
>    - Inline_xattr Inode: 2253838
>    - Inline_data Inode: 1119109
>    - Inline_dentry Inode: 187958
> 
> 3. Desktop #2
> Utilization: 83% (202222003 valid blocks, 1 discard blocks)
>    - Node: 21887836 (Inode: 21757139, Other: 130697)
>    - Data: 180334167
>    - Inline_xattr Inode: 39292
>    - Inline_data Inode: 35213
>    - Inline_dentry Inode: 1127
> 
> 4. Colleague
> Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
>    - Node: 5629348 (Inode: 5542909, Other: 86439)
>    - Data: 103023581
>    - Inline_xattr Inode: 655752
>    - Inline_data Inode: 259900
>    - Inline_dentry Inode: 193000
> 
> 5. Android phone (for reference)
> Utilization: 78% (36505713 valid blocks, 1074 discard blocks)
>    - Node: 704698 (Inode: 683337, Other: 21361)
>    - Data: 35801015
>    - Inline_xattr Inode: 683333
>    - Inline_data Inode: 237470
>    - Inline_dentry Inode: 112177
> 
> Chao Yu added a functionality to expand this via the -i flag passed to
> mkfs.f2fs back in 2018 -
> https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4
> 
> I occasionally find myself in a weird position of having to tell
> people "Oh you should use the -i option from mkfs.f2fs" when they
> encounter this issue only after they’ve migrated most of the data and
> ask back "Why isn’t this enabled by default?".
> 
> While this might not be an issue for the foreseeable future in
> Android, I’d argue that this is a feature that needs to be enabled by
> default for desktop environments with preferably a robust testing

Yes, I guess we need to add some testcases and do some robust tests for
large nat_bitmap feature first. Since I do remember once its design flaw
corrupted data. :(

Thanks,

> infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to
> make much sense as it introduces more complications to how
> fuzzing/testing should be done.
> 
> I’ll also add that it’s a common practice for userspace mkfs tools to
> introduce breaking default changes to older kernels (with options to
> produce a legacy image, of course).
> 
> This was a lengthy email, but I hope I was being reasonable.
> 
> Jaegeuk and Chao, let me know what you think.
> And as always, thanks for your hard work :)
> 
> Thanks,
> regards


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-04-10 15:44 ` Chao Yu
@ 2023-04-10 17:03   ` Juhyung Park
  2023-04-20 16:19     ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Juhyung Park @ 2023-04-10 17:03 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Alexander Koskovich, linux-f2fs-devel

Hi Chao,

On Tue, Apr 11, 2023 at 12:44 AM Chao Yu <chao@kernel.org> wrote:
>
> Hi Juhyung,
>
> On 2023/4/4 15:36, Juhyung Park wrote:
> > Hi everyone,
> >
> > I want to start a discussion on using f2fs for regular desktops/workstations.
> >
> > There are growing number of interests in using f2fs as the general
> > root file-system:
> > 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
> > 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
> > 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
> > 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
> >
> > I've been personally running f2fs on all of my x86 Linux boxes since
> > 2015, and I have several concerns that I think we need to collectively
> > address for regular non-Android normies to use f2fs:
> >
> > A. Bootloader and installer support
> > B. Host-side GC
> > C. Extended node bitmap
> >
> > I'll go through each one.
> >
> > === A. Bootloader and installer support ===
> >
> > It seems that both GRUB and systemd-boot supports f2fs without the
> > need for a separate ext4-formatted /boot partition.
> > Some distros are seemingly disabling f2fs module for GRUB though for
> > security reasons:
> > https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
> >
> > It's ultimately up to the distro folks to enable this, and still in
> > the worst-case scenario, they can specify a separate /boot partition
> > and format it to ext4 upon installation.
> >
> > The installer itself to show f2fs and call mkfs.f2fs is being worked
> > on currently on Ubuntu. See the 2023 links above.
> >
> > Nothing f2fs mainline developers should do here, imo.
> >
> > === B. Host-side GC ===
> >
> > f2fs relieves most of the device-side GC but introduces a new
> > host-side GC. This is extremely confusing for people who have no
> > background in SSDs and flash storage to understand, let alone
> > discard/trim/erase complications.
> >
> > In most consumer-grade blackbox SSDs, device-side GCs are handled
> > automatically for various workloads. f2fs, however, leaves that
> > responsibility to the userspace with conservative tuning on the
>
> We've proposed a f2fs feature named "space awared garbage collection"
> and shipped it in huawei/honor's devices, but forgot to try upstreaming
> it. :-P
>
> In this feature, we introduced three mode:
> - performance mode: something like write-gc in ftl, it can trigger
> background gc more frequently and tune its speed according to free
> segs and reclaimable blks ratio.
> - lifetime mode: slow down background gc to avoid high waf if there
> is less free space.
> - balance mode: behave as usual.
>
> I guess this may be helpful for Linux desktop distros since there is
> no such storage service trigger gc_urgent.
>

That indeed sounds interesting.

If you need me to test something out, feel free to ask.

I manually trigger gc_urgent from time to time on my 2TB SSD laptop
(which, as a laptop, isn't left on 24/7 so f2fs have a bit of trouble
finding enough idle time to trigger GC sufficiently).
If I don't, I run out of free segments within a few weeks.

> > kernel-side by default. Android handles this by init.rc tunings and a
> > separate code running in vold to trigger gc_urgent.
> >
> > For regular Linux desktop distros, f2fs just runs on the default
> > configuration set on the kernel and unless it’s running 24/7 with
> > plentiful idle time, it quickly runs out of free segments and starts
> > triggering foreground GC. This is giving people the wrong impression
> > that f2fs slows down far drastically than other file-systems when
> > that’s quite the contrary (i.e., less fragmentation overtime).
> >
> > This is almost the equivalent of re-living the nightmare of trim. On
> > SSDs with very small to no over-provisioned space, running a
> > file-system with no discard what-so-ever (sadly still a common case
> > when an external SSD is used with no UAS) will also drastically slow
>
> What does UAS mean?
>

USB Attached SCSI. It's a protocol that sends SCSI commands over USB.
Most SATA-to-USB and NVMe-to-USB chips support it.

AFAIK, it's the only way of sending trim commands and query SMART data
over USB. (Plus, it's faster.)

If either the host or the chip doesn't support it, it's negotiated
through "usb-storage" (aka mass-storage), which then prevents anyone
from sending trim commands.

The external SSD shenanigan is a whole another rant for another day..

> > the performance down. On file-systems with no asynchronous discard,
>
> There is no such performance issue in f2fs, right? as f2fs enables
> discard mount option by default, and supports async discard feature.
>

Yup. It's one of my favorite f2fs feature :))

Though imo it might be a good idea to explicitly recommend people to
NOT disable it as a lot of "how to improve SSD performance on Linux"
guides online tell you to outright disable the "discard" mount option.
Like you said, those concerns are invalid on f2fs.

btrfs recently added discard=async and enabled it by default too, but
I'm not sure if their implementation aims to do the same with what
f2fs does.

> > mounting a file-system with the discard option adds a non-negligible
> > overhead on every remove/delete operations, so most distros now
> > (thankfully) use a timer job registered to systemd to trigger fstrim:
> > https://github.com/util-linux/util-linux/commits/master/sys-utils/fstrim.timer
> >
> > This is still far from ideal. The default file-system, ext4, slows
> > down drastically almost to a halt when fstrim -a is called, especially
> > on SATA. For some reason that is still a mystery for me, people seem
> > to be happy with it. No one bothered to improve it for years
> > ¯\_(ツ)_/¯.
> >
> > So here’s my proposal:
> > As Linux distros don’t have a good mechanism for hinting when to
> > trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and
> > enable it by default.
> > This config will hook up ioctl(FITRIM), which is currently ignored on
> > f2fs - https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb
> > , to perform discard and GC on all invalid segments.
> > Userspace configuration with enough f2fs/GC knowledge such as Android
> > should disable it.
> >
> > This will ensure that Linux distros that blindly call fstrim will at
> > least avoid constant slowdowns when free segments are depleted with
> > the occasional (once a week) slowdown, which *people are already
> > living with on ext4*. I'll even go further and mention that since f2fs
> > GC is a regular R/W workload, it doesn't cause an extreme slowdown
> > comparable to a level of a full file-system trim operation.
> >
> > If this is acceptable, I’ll cook up a patch.
> >
> > In an ideal world, all Linux distros should have an explicit f2fs GC
> > trigger mechanism (akin to
> > https://github.com/kdave/btrfsmaintenance#distro-integration ), but
> > it’s practically unrealistic to expect that, given the installer
> > doesn’t even support f2fs for now.
> >
> > === C. Extended node bitmap ===
> >
> > f2fs by default have a very limited number of allowed inodes compared
> > to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs
> > and result in -ENOSPC.
> >
> > Here are some of the stats collected from me and my colleague that we
> > use daily as a regular desktop with GUI, web-browsing and everything:
> > 1. Laptop
> > Utilization: 68% (182914850 valid blocks, 462 discard blocks)
> >    - Node: 10234905 (Inode: 10106526, Other: 128379)
> >    - Data: 172679945
> >    - Inline_xattr Inode: 2004827
> >    - Inline_data Inode: 867204
> >    - Inline_dentry Inode: 51456
> >
> > 2. Desktop #1
> > Utilization: 55% (133310465 valid blocks, 0 discard blocks)
> >    - Node: 6389660 (Inode: 6289765, Other: 99895)
> >    - Data: 126920805
> >    - Inline_xattr Inode: 2253838
> >    - Inline_data Inode: 1119109
> >    - Inline_dentry Inode: 187958
> >
> > 3. Desktop #2
> > Utilization: 83% (202222003 valid blocks, 1 discard blocks)
> >    - Node: 21887836 (Inode: 21757139, Other: 130697)
> >    - Data: 180334167
> >    - Inline_xattr Inode: 39292
> >    - Inline_data Inode: 35213
> >    - Inline_dentry Inode: 1127
> >
> > 4. Colleague
> > Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
> >    - Node: 5629348 (Inode: 5542909, Other: 86439)
> >    - Data: 103023581
> >    - Inline_xattr Inode: 655752
> >    - Inline_data Inode: 259900
> >    - Inline_dentry Inode: 193000
> >
> > 5. Android phone (for reference)
> > Utilization: 78% (36505713 valid blocks, 1074 discard blocks)
> >    - Node: 704698 (Inode: 683337, Other: 21361)
> >    - Data: 35801015
> >    - Inline_xattr Inode: 683333
> >    - Inline_data Inode: 237470
> >    - Inline_dentry Inode: 112177
> >
> > Chao Yu added a functionality to expand this via the -i flag passed to
> > mkfs.f2fs back in 2018 -
> > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4
> >
> > I occasionally find myself in a weird position of having to tell
> > people "Oh you should use the -i option from mkfs.f2fs" when they
> > encounter this issue only after they’ve migrated most of the data and
> > ask back "Why isn’t this enabled by default?".
> >
> > While this might not be an issue for the foreseeable future in
> > Android, I’d argue that this is a feature that needs to be enabled by
> > default for desktop environments with preferably a robust testing
>
> Yes, I guess we need to add some testcases and do some robust tests for
> large nat_bitmap feature first. Since I do remember once its design flaw
> corrupted data. :(
>

And I'm glad to report everything's been rock solid ever since that
fix :) I'm actively using it on many of my systems.

One thing to note here is that my colleague (Alexander Koskovich) ran
into an fsck issue with that feature enabled on a large SSD,
preventing boot.
I didn't encounter it as I never had f2fs-tools installed on my system
(which also tells you how robust f2fs was for years on my setup with
multiple sudden power-offs without any fsck runs).

See here for the fix if you missed it:
https://lore.kernel.org/all/CAD14+f0FbTXfaD_dM-RyFiPbaong-B-6hqrms2M4riidX9yVug@mail.gmail.com/

Btw, is there a downside (e.g., more disk usage, slower performance)
from using large nat_bitmap except for legacy kernel compatibility? I
was guessing not, but might as well ask to be sure.

Thanks, regards

> Thanks,
>
> > infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to
> > make much sense as it introduces more complications to how
> > fuzzing/testing should be done.
> >
> > I’ll also add that it’s a common practice for userspace mkfs tools to
> > introduce breaking default changes to older kernels (with options to
> > produce a legacy image, of course).
> >
> > This was a lengthy email, but I hope I was being reasonable.
> >
> > Jaegeuk and Chao, let me know what you think.
> > And as always, thanks for your hard work :)
> >
> > Thanks,
> > regards


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-04-10 17:03   ` Juhyung Park
@ 2023-04-20 16:19     ` Chao Yu
  2023-04-20 17:26       ` Juhyung Park
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2023-04-20 16:19 UTC (permalink / raw)
  To: Juhyung Park; +Cc: Jaegeuk Kim, Alexander Koskovich, linux-f2fs-devel

Hi JuHyung,

Sorry for delay reply.

On 2023/4/11 1:03, Juhyung Park wrote:
> Hi Chao,
> 
> On Tue, Apr 11, 2023 at 12:44 AM Chao Yu <chao@kernel.org> wrote:
>>
>> Hi Juhyung,
>>
>> On 2023/4/4 15:36, Juhyung Park wrote:
>>> Hi everyone,
>>>
>>> I want to start a discussion on using f2fs for regular desktops/workstations.
>>>
>>> There are growing number of interests in using f2fs as the general
>>> root file-system:
>>> 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
>>> 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
>>> 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
>>> 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
>>>
>>> I've been personally running f2fs on all of my x86 Linux boxes since
>>> 2015, and I have several concerns that I think we need to collectively
>>> address for regular non-Android normies to use f2fs:
>>>
>>> A. Bootloader and installer support
>>> B. Host-side GC
>>> C. Extended node bitmap
>>>
>>> I'll go through each one.
>>>
>>> === A. Bootloader and installer support ===
>>>
>>> It seems that both GRUB and systemd-boot supports f2fs without the
>>> need for a separate ext4-formatted /boot partition.
>>> Some distros are seemingly disabling f2fs module for GRUB though for
>>> security reasons:
>>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
>>>
>>> It's ultimately up to the distro folks to enable this, and still in
>>> the worst-case scenario, they can specify a separate /boot partition
>>> and format it to ext4 upon installation.
>>>
>>> The installer itself to show f2fs and call mkfs.f2fs is being worked
>>> on currently on Ubuntu. See the 2023 links above.
>>>
>>> Nothing f2fs mainline developers should do here, imo.
>>>
>>> === B. Host-side GC ===
>>>
>>> f2fs relieves most of the device-side GC but introduces a new
>>> host-side GC. This is extremely confusing for people who have no
>>> background in SSDs and flash storage to understand, let alone
>>> discard/trim/erase complications.
>>>
>>> In most consumer-grade blackbox SSDs, device-side GCs are handled
>>> automatically for various workloads. f2fs, however, leaves that
>>> responsibility to the userspace with conservative tuning on the
>>
>> We've proposed a f2fs feature named "space awared garbage collection"
>> and shipped it in huawei/honor's devices, but forgot to try upstreaming
>> it. :-P
>>
>> In this feature, we introduced three mode:
>> - performance mode: something like write-gc in ftl, it can trigger
>> background gc more frequently and tune its speed according to free
>> segs and reclaimable blks ratio.
>> - lifetime mode: slow down background gc to avoid high waf if there
>> is less free space.
>> - balance mode: behave as usual.
>>
>> I guess this may be helpful for Linux desktop distros since there is
>> no such storage service trigger gc_urgent.
>>
> 
> That indeed sounds interesting.
> 
> If you need me to test something out, feel free to ask.

Thanks a lot for that. :)

I'm trying to figure out a patch...

> 
> I manually trigger gc_urgent from time to time on my 2TB SSD laptop
> (which, as a laptop, isn't left on 24/7 so f2fs have a bit of trouble
> finding enough idle time to trigger GC sufficiently).
> If I don't, I run out of free segments within a few weeks.

Have you ever tried to config /sys/fs/f2fs/<disk>/gc_idle_interval?

Set the value to 0, and check free segment decrement in one day, it can
infer whether free segment will be exhausted after a few weeks.

> 
>>> kernel-side by default. Android handles this by init.rc tunings and a
>>> separate code running in vold to trigger gc_urgent.
>>>
>>> For regular Linux desktop distros, f2fs just runs on the default
>>> configuration set on the kernel and unless it’s running 24/7 with
>>> plentiful idle time, it quickly runs out of free segments and starts
>>> triggering foreground GC. This is giving people the wrong impression
>>> that f2fs slows down far drastically than other file-systems when
>>> that’s quite the contrary (i.e., less fragmentation overtime).
>>>
>>> This is almost the equivalent of re-living the nightmare of trim. On
>>> SSDs with very small to no over-provisioned space, running a
>>> file-system with no discard what-so-ever (sadly still a common case
>>> when an external SSD is used with no UAS) will also drastically slow
>>
>> What does UAS mean?
>>
> 
> USB Attached SCSI. It's a protocol that sends SCSI commands over USB.
> Most SATA-to-USB and NVMe-to-USB chips support it.
> 
> AFAIK, it's the only way of sending trim commands and query SMART data
> over USB. (Plus, it's faster.)
> 
> If either the host or the chip doesn't support it, it's negotiated
> through "usb-storage" (aka mass-storage), which then prevents anyone
> from sending trim commands.
> 
> The external SSD shenanigan is a whole another rant for another day..

Thanks for the explanation.

> 
>>> the performance down. On file-systems with no asynchronous discard,
>>
>> There is no such performance issue in f2fs, right? as f2fs enables
>> discard mount option by default, and supports async discard feature.
>>
> 
> Yup. It's one of my favorite f2fs feature :))
> 
> Though imo it might be a good idea to explicitly recommend people to
> NOT disable it as a lot of "how to improve SSD performance on Linux"
> guides online tell you to outright disable the "discard" mount option.
> Like you said, those concerns are invalid on f2fs.
> 
> btrfs recently added discard=async and enabled it by default too, but
> I'm not sure if their implementation aims to do the same with what
> f2fs does.
> 
>>> mounting a file-system with the discard option adds a non-negligible
>>> overhead on every remove/delete operations, so most distros now
>>> (thankfully) use a timer job registered to systemd to trigger fstrim:
>>> https://github.com/util-linux/util-linux/commits/master/sys-utils/fstrim.timer
>>>
>>> This is still far from ideal. The default file-system, ext4, slows
>>> down drastically almost to a halt when fstrim -a is called, especially
>>> on SATA. For some reason that is still a mystery for me, people seem
>>> to be happy with it. No one bothered to improve it for years
>>> ¯\_(ツ)_/¯.
>>>
>>> So here’s my proposal:
>>> As Linux distros don’t have a good mechanism for hinting when to
>>> trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and
>>> enable it by default.
>>> This config will hook up ioctl(FITRIM), which is currently ignored on
>>> f2fs - https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb
>>> , to perform discard and GC on all invalid segments.
>>> Userspace configuration with enough f2fs/GC knowledge such as Android
>>> should disable it.
>>>
>>> This will ensure that Linux distros that blindly call fstrim will at
>>> least avoid constant slowdowns when free segments are depleted with
>>> the occasional (once a week) slowdown, which *people are already
>>> living with on ext4*. I'll even go further and mention that since f2fs
>>> GC is a regular R/W workload, it doesn't cause an extreme slowdown
>>> comparable to a level of a full file-system trim operation.
>>>
>>> If this is acceptable, I’ll cook up a patch.
>>>
>>> In an ideal world, all Linux distros should have an explicit f2fs GC
>>> trigger mechanism (akin to
>>> https://github.com/kdave/btrfsmaintenance#distro-integration ), but
>>> it’s practically unrealistic to expect that, given the installer
>>> doesn’t even support f2fs for now.
>>>
>>> === C. Extended node bitmap ===
>>>
>>> f2fs by default have a very limited number of allowed inodes compared
>>> to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs
>>> and result in -ENOSPC.
>>>
>>> Here are some of the stats collected from me and my colleague that we
>>> use daily as a regular desktop with GUI, web-browsing and everything:
>>> 1. Laptop
>>> Utilization: 68% (182914850 valid blocks, 462 discard blocks)
>>>     - Node: 10234905 (Inode: 10106526, Other: 128379)
>>>     - Data: 172679945
>>>     - Inline_xattr Inode: 2004827
>>>     - Inline_data Inode: 867204
>>>     - Inline_dentry Inode: 51456
>>>
>>> 2. Desktop #1
>>> Utilization: 55% (133310465 valid blocks, 0 discard blocks)
>>>     - Node: 6389660 (Inode: 6289765, Other: 99895)
>>>     - Data: 126920805
>>>     - Inline_xattr Inode: 2253838
>>>     - Inline_data Inode: 1119109
>>>     - Inline_dentry Inode: 187958
>>>
>>> 3. Desktop #2
>>> Utilization: 83% (202222003 valid blocks, 1 discard blocks)
>>>     - Node: 21887836 (Inode: 21757139, Other: 130697)
>>>     - Data: 180334167
>>>     - Inline_xattr Inode: 39292
>>>     - Inline_data Inode: 35213
>>>     - Inline_dentry Inode: 1127
>>>
>>> 4. Colleague
>>> Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
>>>     - Node: 5629348 (Inode: 5542909, Other: 86439)
>>>     - Data: 103023581
>>>     - Inline_xattr Inode: 655752
>>>     - Inline_data Inode: 259900
>>>     - Inline_dentry Inode: 193000
>>>
>>> 5. Android phone (for reference)
>>> Utilization: 78% (36505713 valid blocks, 1074 discard blocks)
>>>     - Node: 704698 (Inode: 683337, Other: 21361)
>>>     - Data: 35801015
>>>     - Inline_xattr Inode: 683333
>>>     - Inline_data Inode: 237470
>>>     - Inline_dentry Inode: 112177
>>>
>>> Chao Yu added a functionality to expand this via the -i flag passed to
>>> mkfs.f2fs back in 2018 -
>>> https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4
>>>
>>> I occasionally find myself in a weird position of having to tell
>>> people "Oh you should use the -i option from mkfs.f2fs" when they
>>> encounter this issue only after they’ve migrated most of the data and
>>> ask back "Why isn’t this enabled by default?".
>>>
>>> While this might not be an issue for the foreseeable future in
>>> Android, I’d argue that this is a feature that needs to be enabled by
>>> default for desktop environments with preferably a robust testing
>>
>> Yes, I guess we need to add some testcases and do some robust tests for
>> large nat_bitmap feature first. Since I do remember once its design flaw
>> corrupted data. :(
>>
> 
> And I'm glad to report everything's been rock solid ever since that
> fix :) I'm actively using it on many of my systems.
> 
> One thing to note here is that my colleague (Alexander Koskovich) ran
> into an fsck issue with that feature enabled on a large SSD,
> preventing boot.
> I didn't encounter it as I never had f2fs-tools installed on my system
> (which also tells you how robust f2fs was for years on my setup with
> multiple sudden power-offs without any fsck runs).
> 
> See here for the fix if you missed it:
> https://lore.kernel.org/all/CAD14+f0FbTXfaD_dM-RyFiPbaong-B-6hqrms2M4riidX9yVug@mail.gmail.com/

Oh, I missed that one.

I added a comment on it, please check it.

> 
> Btw, is there a downside (e.g., more disk usage, slower performance)
> from using large nat_bitmap except for legacy kernel compatibility? I

- f2fs needs to reserve more NAT space, it may be wasted, if there is
less node(including inode), but I think that is a trade-off.

Do you suffer any performance issue when using nat_bitmap?

Thanks,

> was guessing not, but might as well ask to be sure.
> 
> Thanks, regards
> 
>> Thanks,
>>
>>> infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to
>>> make much sense as it introduces more complications to how
>>> fuzzing/testing should be done.
>>>
>>> I’ll also add that it’s a common practice for userspace mkfs tools to
>>> introduce breaking default changes to older kernels (with options to
>>> produce a legacy image, of course).
>>>
>>> This was a lengthy email, but I hope I was being reasonable.
>>>
>>> Jaegeuk and Chao, let me know what you think.
>>> And as always, thanks for your hard work :)
>>>
>>> Thanks,
>>> regards


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-04-20 16:19     ` Chao Yu
@ 2023-04-20 17:26       ` Juhyung Park
  2023-05-18  7:53         ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Juhyung Park @ 2023-04-20 17:26 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Alexander Koskovich, linux-f2fs-devel

Hi Chao,

On Fri, Apr 21, 2023 at 1:19 AM Chao Yu <chao@kernel.org> wrote:
>
> Hi JuHyung,
>
> Sorry for delay reply.
>
> On 2023/4/11 1:03, Juhyung Park wrote:
> > Hi Chao,
> >
> > On Tue, Apr 11, 2023 at 12:44 AM Chao Yu <chao@kernel.org> wrote:
> >>
> >> Hi Juhyung,
> >>
> >> On 2023/4/4 15:36, Juhyung Park wrote:
> >>> Hi everyone,
> >>>
> >>> I want to start a discussion on using f2fs for regular desktops/workstations.
> >>>
> >>> There are growing number of interests in using f2fs as the general
> >>> root file-system:
> >>> 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
> >>> 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
> >>> 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
> >>> 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
> >>>
> >>> I've been personally running f2fs on all of my x86 Linux boxes since
> >>> 2015, and I have several concerns that I think we need to collectively
> >>> address for regular non-Android normies to use f2fs:
> >>>
> >>> A. Bootloader and installer support
> >>> B. Host-side GC
> >>> C. Extended node bitmap
> >>>
> >>> I'll go through each one.
> >>>
> >>> === A. Bootloader and installer support ===
> >>>
> >>> It seems that both GRUB and systemd-boot supports f2fs without the
> >>> need for a separate ext4-formatted /boot partition.
> >>> Some distros are seemingly disabling f2fs module for GRUB though for
> >>> security reasons:
> >>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
> >>>
> >>> It's ultimately up to the distro folks to enable this, and still in
> >>> the worst-case scenario, they can specify a separate /boot partition
> >>> and format it to ext4 upon installation.
> >>>
> >>> The installer itself to show f2fs and call mkfs.f2fs is being worked
> >>> on currently on Ubuntu. See the 2023 links above.
> >>>
> >>> Nothing f2fs mainline developers should do here, imo.
> >>>
> >>> === B. Host-side GC ===
> >>>
> >>> f2fs relieves most of the device-side GC but introduces a new
> >>> host-side GC. This is extremely confusing for people who have no
> >>> background in SSDs and flash storage to understand, let alone
> >>> discard/trim/erase complications.
> >>>
> >>> In most consumer-grade blackbox SSDs, device-side GCs are handled
> >>> automatically for various workloads. f2fs, however, leaves that
> >>> responsibility to the userspace with conservative tuning on the
> >>
> >> We've proposed a f2fs feature named "space awared garbage collection"
> >> and shipped it in huawei/honor's devices, but forgot to try upstreaming
> >> it. :-P
> >>
> >> In this feature, we introduced three mode:
> >> - performance mode: something like write-gc in ftl, it can trigger
> >> background gc more frequently and tune its speed according to free
> >> segs and reclaimable blks ratio.
> >> - lifetime mode: slow down background gc to avoid high waf if there
> >> is less free space.
> >> - balance mode: behave as usual.
> >>
> >> I guess this may be helpful for Linux desktop distros since there is
> >> no such storage service trigger gc_urgent.
> >>
> >
> > That indeed sounds interesting.
> >
> > If you need me to test something out, feel free to ask.
>
> Thanks a lot for that. :)
>
> I'm trying to figure out a patch...
>
> >
> > I manually trigger gc_urgent from time to time on my 2TB SSD laptop
> > (which, as a laptop, isn't left on 24/7 so f2fs have a bit of trouble
> > finding enough idle time to trigger GC sufficiently).
> > If I don't, I run out of free segments within a few weeks.
>
> Have you ever tried to config /sys/fs/f2fs/<disk>/gc_idle_interval?
>
> Set the value to 0, and check free segment decrement in one day, it can
> infer whether free segment will be exhausted after a few weeks.
>

Well, I'm sure I can tune the sysfs tunables so that the background GC
works sufficiently for my workload. My original main concern is that,
that is a manual process that not all users can do.

My proposed way of hooking up GC to fstrim is a dirty, but fool-proof
way to ensure that f2fs is kept healthy.

But if there's a more elegant way of handling this automatically, I'm
all for it.

> >
> >>> kernel-side by default. Android handles this by init.rc tunings and a
> >>> separate code running in vold to trigger gc_urgent.
> >>>
> >>> For regular Linux desktop distros, f2fs just runs on the default
> >>> configuration set on the kernel and unless it’s running 24/7 with
> >>> plentiful idle time, it quickly runs out of free segments and starts
> >>> triggering foreground GC. This is giving people the wrong impression
> >>> that f2fs slows down far drastically than other file-systems when
> >>> that’s quite the contrary (i.e., less fragmentation overtime).
> >>>
> >>> This is almost the equivalent of re-living the nightmare of trim. On
> >>> SSDs with very small to no over-provisioned space, running a
> >>> file-system with no discard what-so-ever (sadly still a common case
> >>> when an external SSD is used with no UAS) will also drastically slow
> >>
> >> What does UAS mean?
> >>
> >
> > USB Attached SCSI. It's a protocol that sends SCSI commands over USB.
> > Most SATA-to-USB and NVMe-to-USB chips support it.
> >
> > AFAIK, it's the only way of sending trim commands and query SMART data
> > over USB. (Plus, it's faster.)
> >
> > If either the host or the chip doesn't support it, it's negotiated
> > through "usb-storage" (aka mass-storage), which then prevents anyone
> > from sending trim commands.
> >
> > The external SSD shenanigan is a whole another rant for another day..
>
> Thanks for the explanation.
>
> >
> >>> the performance down. On file-systems with no asynchronous discard,
> >>
> >> There is no such performance issue in f2fs, right? as f2fs enables
> >> discard mount option by default, and supports async discard feature.
> >>
> >
> > Yup. It's one of my favorite f2fs feature :))
> >
> > Though imo it might be a good idea to explicitly recommend people to
> > NOT disable it as a lot of "how to improve SSD performance on Linux"
> > guides online tell you to outright disable the "discard" mount option.
> > Like you said, those concerns are invalid on f2fs.
> >
> > btrfs recently added discard=async and enabled it by default too, but
> > I'm not sure if their implementation aims to do the same with what
> > f2fs does.
> >
> >>> mounting a file-system with the discard option adds a non-negligible
> >>> overhead on every remove/delete operations, so most distros now
> >>> (thankfully) use a timer job registered to systemd to trigger fstrim:
> >>> https://github.com/util-linux/util-linux/commits/master/sys-utils/fstrim.timer
> >>>
> >>> This is still far from ideal. The default file-system, ext4, slows
> >>> down drastically almost to a halt when fstrim -a is called, especially
> >>> on SATA. For some reason that is still a mystery for me, people seem
> >>> to be happy with it. No one bothered to improve it for years
> >>> ¯\_(ツ)_/¯.
> >>>
> >>> So here’s my proposal:
> >>> As Linux distros don’t have a good mechanism for hinting when to
> >>> trigger GC, introduce a new Kconfig, CONFIG_F2FS_GC_UPON_FSTRIM and
> >>> enable it by default.
> >>> This config will hook up ioctl(FITRIM), which is currently ignored on
> >>> f2fs - https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=master&id=e555da9f31210d2b62805cd7faf29228af7c3cfb
> >>> , to perform discard and GC on all invalid segments.
> >>> Userspace configuration with enough f2fs/GC knowledge such as Android
> >>> should disable it.
> >>>
> >>> This will ensure that Linux distros that blindly call fstrim will at
> >>> least avoid constant slowdowns when free segments are depleted with
> >>> the occasional (once a week) slowdown, which *people are already
> >>> living with on ext4*. I'll even go further and mention that since f2fs
> >>> GC is a regular R/W workload, it doesn't cause an extreme slowdown
> >>> comparable to a level of a full file-system trim operation.
> >>>
> >>> If this is acceptable, I’ll cook up a patch.
> >>>
> >>> In an ideal world, all Linux distros should have an explicit f2fs GC
> >>> trigger mechanism (akin to
> >>> https://github.com/kdave/btrfsmaintenance#distro-integration ), but
> >>> it’s practically unrealistic to expect that, given the installer
> >>> doesn’t even support f2fs for now.
> >>>
> >>> === C. Extended node bitmap ===
> >>>
> >>> f2fs by default have a very limited number of allowed inodes compared
> >>> to other file-systems. Just 2 AOSP syncs are enough to exhaust f2fs
> >>> and result in -ENOSPC.
> >>>
> >>> Here are some of the stats collected from me and my colleague that we
> >>> use daily as a regular desktop with GUI, web-browsing and everything:
> >>> 1. Laptop
> >>> Utilization: 68% (182914850 valid blocks, 462 discard blocks)
> >>>     - Node: 10234905 (Inode: 10106526, Other: 128379)
> >>>     - Data: 172679945
> >>>     - Inline_xattr Inode: 2004827
> >>>     - Inline_data Inode: 867204
> >>>     - Inline_dentry Inode: 51456
> >>>
> >>> 2. Desktop #1
> >>> Utilization: 55% (133310465 valid blocks, 0 discard blocks)
> >>>     - Node: 6389660 (Inode: 6289765, Other: 99895)
> >>>     - Data: 126920805
> >>>     - Inline_xattr Inode: 2253838
> >>>     - Inline_data Inode: 1119109
> >>>     - Inline_dentry Inode: 187958
> >>>
> >>> 3. Desktop #2
> >>> Utilization: 83% (202222003 valid blocks, 1 discard blocks)
> >>>     - Node: 21887836 (Inode: 21757139, Other: 130697)
> >>>     - Data: 180334167
> >>>     - Inline_xattr Inode: 39292
> >>>     - Inline_data Inode: 35213
> >>>     - Inline_dentry Inode: 1127
> >>>
> >>> 4. Colleague
> >>> Utilization: 22% (108652929 valid blocks, 362420605 discard blocks)
> >>>     - Node: 5629348 (Inode: 5542909, Other: 86439)
> >>>     - Data: 103023581
> >>>     - Inline_xattr Inode: 655752
> >>>     - Inline_data Inode: 259900
> >>>     - Inline_dentry Inode: 193000
> >>>
> >>> 5. Android phone (for reference)
> >>> Utilization: 78% (36505713 valid blocks, 1074 discard blocks)
> >>>     - Node: 704698 (Inode: 683337, Other: 21361)
> >>>     - Data: 35801015
> >>>     - Inline_xattr Inode: 683333
> >>>     - Inline_data Inode: 237470
> >>>     - Inline_dentry Inode: 112177
> >>>
> >>> Chao Yu added a functionality to expand this via the -i flag passed to
> >>> mkfs.f2fs back in 2018 -
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?id=baaa076b4d576042913cfe34169442dfda651ca4
> >>>
> >>> I occasionally find myself in a weird position of having to tell
> >>> people "Oh you should use the -i option from mkfs.f2fs" when they
> >>> encounter this issue only after they’ve migrated most of the data and
> >>> ask back "Why isn’t this enabled by default?".
> >>>
> >>> While this might not be an issue for the foreseeable future in
> >>> Android, I’d argue that this is a feature that needs to be enabled by
> >>> default for desktop environments with preferably a robust testing
> >>
> >> Yes, I guess we need to add some testcases and do some robust tests for
> >> large nat_bitmap feature first. Since I do remember once its design flaw
> >> corrupted data. :(
> >>
> >
> > And I'm glad to report everything's been rock solid ever since that
> > fix :) I'm actively using it on many of my systems.
> >
> > One thing to note here is that my colleague (Alexander Koskovich) ran
> > into an fsck issue with that feature enabled on a large SSD,
> > preventing boot.
> > I didn't encounter it as I never had f2fs-tools installed on my system
> > (which also tells you how robust f2fs was for years on my setup with
> > multiple sudden power-offs without any fsck runs).
> >
> > See here for the fix if you missed it:
> > https://lore.kernel.org/all/CAD14+f0FbTXfaD_dM-RyFiPbaong-B-6hqrms2M4riidX9yVug@mail.gmail.com/
>
> Oh, I missed that one.
>
> I added a comment on it, please check it.
>
> >
> > Btw, is there a downside (e.g., more disk usage, slower performance)
> > from using large nat_bitmap except for legacy kernel compatibility? I
>
> - f2fs needs to reserve more NAT space, it may be wasted, if there is
> less node(including inode), but I think that is a trade-off.
>
> Do you suffer any performance issue when using nat_bitmap?
>

Not that I've noticed, but I haven't exactly ran any benchmarks either.

Years ago, when I was switching my main system from ext4 to f2fs, the
only performance hit that I've noticed was the slow directory
traversal which was addressed later with readdir_ra.

And years later after that, I've switched to extended node bitmap and
the only issue there was data corruption, which was also fixed. I did
not notice any performance issues from using extended node bitmap.

I haven't used ext4 since then personally, but I did recently switch a
production server from ext4 to f2fs to avoid performance degradation
from lack of discard.

This specifically isn't really related to extended node bitmap, but
rather related to how f2fs and ext4 each behave.
If you're interested in this workload, read on:

The production server that I've mentioned basically uses an SSD as a
temporary cache as RAM is insufficient.

Once a workload is triggered, it writes 50-100 GiB of data to the
cache (SSD) and the resulting output is stored to HDD (10-30 GiB).
After the data is stored to HDD, the corresponding cache from the SSD
is removed.
Multiple instances of the said workload can co-exist, so the SSD is
subject to immense write-intensive workload and we almost immediately
hit thermal throttling territory if we start 3-4 instances. (Nothing
we can do to mitigate thermal throttling.)

Our major concern here was that, with ext4, the SSD runs out of free
block as the file-system doesn't perform trim and future workloads
have severe performance penalty. To mitigate that, we either have to
use the 'discard' option, which delays each file removal, or fstrim
upon each workload's end, which results in major I/O stalls big enough
to impact other workloads running at the same time.

We've deployed f2fs.

At first, we didn't see much of a performance gain as f2fs was too
conservative in issuing trim (not GC). We tuned the sysfs idle
intervals, but it didn't help much either. Maybe if the idle_interval
and discard_idle_interval were in milliseconds unit, it could've
allowed us to tune better.

We recently switched to always use gc_urgent low(2). f2fs aggressively
issues trim, and the free blocks are well within our comfort zone and
we didn't see any noticeable performance drop as the number of
parallel workloads increased. We've been sticking with gc_urgent
low(2) ever since with this particular setup, and we're happy with it
:)

> Thanks,
>
> > was guessing not, but might as well ask to be sure.
> >
> > Thanks, regards
> >
> >> Thanks,
> >>
> >>> infrastructure. Guarding this with #ifndef __ANDROID__ doesn’t seem to
> >>> make much sense as it introduces more complications to how
> >>> fuzzing/testing should be done.
> >>>
> >>> I’ll also add that it’s a common practice for userspace mkfs tools to
> >>> introduce breaking default changes to older kernels (with options to
> >>> produce a legacy image, of course).
> >>>
> >>> This was a lengthy email, but I hope I was being reasonable.
> >>>
> >>> Jaegeuk and Chao, let me know what you think.
> >>> And as always, thanks for your hard work :)
> >>>
> >>> Thanks,
> >>> regards


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-04-20 17:26       ` Juhyung Park
@ 2023-05-18  7:53         ` Chao Yu
  2023-05-18 18:12           ` Juhyung Park
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2023-05-18  7:53 UTC (permalink / raw)
  To: Juhyung Park; +Cc: Jaegeuk Kim, Alexander Koskovich, linux-f2fs-devel

On 2023/4/21 1:26, Juhyung Park wrote:
> Hi Chao,
> 
> On Fri, Apr 21, 2023 at 1:19 AM Chao Yu <chao@kernel.org> wrote:
>>
>> Hi JuHyung,
>>
>> Sorry for delay reply.
>>
>> On 2023/4/11 1:03, Juhyung Park wrote:
>>> Hi Chao,
>>>
>>> On Tue, Apr 11, 2023 at 12:44 AM Chao Yu <chao@kernel.org> wrote:
>>>>
>>>> Hi Juhyung,
>>>>
>>>> On 2023/4/4 15:36, Juhyung Park wrote:
>>>>> Hi everyone,
>>>>>
>>>>> I want to start a discussion on using f2fs for regular desktops/workstations.
>>>>>
>>>>> There are growing number of interests in using f2fs as the general
>>>>> root file-system:
>>>>> 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
>>>>> 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
>>>>> 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
>>>>> 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
>>>>>
>>>>> I've been personally running f2fs on all of my x86 Linux boxes since
>>>>> 2015, and I have several concerns that I think we need to collectively
>>>>> address for regular non-Android normies to use f2fs:
>>>>>
>>>>> A. Bootloader and installer support
>>>>> B. Host-side GC
>>>>> C. Extended node bitmap
>>>>>
>>>>> I'll go through each one.
>>>>>
>>>>> === A. Bootloader and installer support ===
>>>>>
>>>>> It seems that both GRUB and systemd-boot supports f2fs without the
>>>>> need for a separate ext4-formatted /boot partition.
>>>>> Some distros are seemingly disabling f2fs module for GRUB though for
>>>>> security reasons:
>>>>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
>>>>>
>>>>> It's ultimately up to the distro folks to enable this, and still in
>>>>> the worst-case scenario, they can specify a separate /boot partition
>>>>> and format it to ext4 upon installation.
>>>>>
>>>>> The installer itself to show f2fs and call mkfs.f2fs is being worked
>>>>> on currently on Ubuntu. See the 2023 links above.
>>>>>
>>>>> Nothing f2fs mainline developers should do here, imo.
>>>>>
>>>>> === B. Host-side GC ===
>>>>>
>>>>> f2fs relieves most of the device-side GC but introduces a new
>>>>> host-side GC. This is extremely confusing for people who have no
>>>>> background in SSDs and flash storage to understand, let alone
>>>>> discard/trim/erase complications.
>>>>>
>>>>> In most consumer-grade blackbox SSDs, device-side GCs are handled
>>>>> automatically for various workloads. f2fs, however, leaves that
>>>>> responsibility to the userspace with conservative tuning on the
>>>>
>>>> We've proposed a f2fs feature named "space awared garbage collection"
>>>> and shipped it in huawei/honor's devices, but forgot to try upstreaming
>>>> it. :-P
>>>>
>>>> In this feature, we introduced three mode:
>>>> - performance mode: something like write-gc in ftl, it can trigger
>>>> background gc more frequently and tune its speed according to free
>>>> segs and reclaimable blks ratio.
>>>> - lifetime mode: slow down background gc to avoid high waf if there
>>>> is less free space.
>>>> - balance mode: behave as usual.
>>>>
>>>> I guess this may be helpful for Linux desktop distros since there is
>>>> no such storage service trigger gc_urgent.
>>>>
>>>
>>> That indeed sounds interesting.
>>>
>>> If you need me to test something out, feel free to ask.
>>
>> Thanks a lot for that. :)
>>
>> I'm trying to figure out a patch...

Juhyung,

Are you interesting to try this patch in distros?

https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=dev-test&id=4736e55bc967e91cf8a275b678739b006c2617f0

There are some tunable parameters, I can export them via sysfs entry,
let me update later.

Thanks,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-05-18  7:53         ` Chao Yu
@ 2023-05-18 18:12           ` Juhyung Park
  2023-05-22 13:10             ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Juhyung Park @ 2023-05-18 18:12 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, Alexander Koskovich, linux-f2fs-devel

Hi Chao,

Thanks for the patch. I'll try it out on both my laptop and workstation soon.

One question though: would it make sense to see if it works fine on
Android too? (With userspace's explicit GC trigger disabled.)
Maybe it could be an indication on whether it works properly or not?

Thanks,

On Thu, May 18, 2023 at 4:53 PM Chao Yu <chao@kernel.org> wrote:
>
> On 2023/4/21 1:26, Juhyung Park wrote:
> > Hi Chao,
> >
> > On Fri, Apr 21, 2023 at 1:19 AM Chao Yu <chao@kernel.org> wrote:
> >>
> >> Hi JuHyung,
> >>
> >> Sorry for delay reply.
> >>
> >> On 2023/4/11 1:03, Juhyung Park wrote:
> >>> Hi Chao,
> >>>
> >>> On Tue, Apr 11, 2023 at 12:44 AM Chao Yu <chao@kernel.org> wrote:
> >>>>
> >>>> Hi Juhyung,
> >>>>
> >>>> On 2023/4/4 15:36, Juhyung Park wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> I want to start a discussion on using f2fs for regular desktops/workstations.
> >>>>>
> >>>>> There are growing number of interests in using f2fs as the general
> >>>>> root file-system:
> >>>>> 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
> >>>>> 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
> >>>>> 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
> >>>>> 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
> >>>>>
> >>>>> I've been personally running f2fs on all of my x86 Linux boxes since
> >>>>> 2015, and I have several concerns that I think we need to collectively
> >>>>> address for regular non-Android normies to use f2fs:
> >>>>>
> >>>>> A. Bootloader and installer support
> >>>>> B. Host-side GC
> >>>>> C. Extended node bitmap
> >>>>>
> >>>>> I'll go through each one.
> >>>>>
> >>>>> === A. Bootloader and installer support ===
> >>>>>
> >>>>> It seems that both GRUB and systemd-boot supports f2fs without the
> >>>>> need for a separate ext4-formatted /boot partition.
> >>>>> Some distros are seemingly disabling f2fs module for GRUB though for
> >>>>> security reasons:
> >>>>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
> >>>>>
> >>>>> It's ultimately up to the distro folks to enable this, and still in
> >>>>> the worst-case scenario, they can specify a separate /boot partition
> >>>>> and format it to ext4 upon installation.
> >>>>>
> >>>>> The installer itself to show f2fs and call mkfs.f2fs is being worked
> >>>>> on currently on Ubuntu. See the 2023 links above.
> >>>>>
> >>>>> Nothing f2fs mainline developers should do here, imo.
> >>>>>
> >>>>> === B. Host-side GC ===
> >>>>>
> >>>>> f2fs relieves most of the device-side GC but introduces a new
> >>>>> host-side GC. This is extremely confusing for people who have no
> >>>>> background in SSDs and flash storage to understand, let alone
> >>>>> discard/trim/erase complications.
> >>>>>
> >>>>> In most consumer-grade blackbox SSDs, device-side GCs are handled
> >>>>> automatically for various workloads. f2fs, however, leaves that
> >>>>> responsibility to the userspace with conservative tuning on the
> >>>>
> >>>> We've proposed a f2fs feature named "space awared garbage collection"
> >>>> and shipped it in huawei/honor's devices, but forgot to try upstreaming
> >>>> it. :-P
> >>>>
> >>>> In this feature, we introduced three mode:
> >>>> - performance mode: something like write-gc in ftl, it can trigger
> >>>> background gc more frequently and tune its speed according to free
> >>>> segs and reclaimable blks ratio.
> >>>> - lifetime mode: slow down background gc to avoid high waf if there
> >>>> is less free space.
> >>>> - balance mode: behave as usual.
> >>>>
> >>>> I guess this may be helpful for Linux desktop distros since there is
> >>>> no such storage service trigger gc_urgent.
> >>>>
> >>>
> >>> That indeed sounds interesting.
> >>>
> >>> If you need me to test something out, feel free to ask.
> >>
> >> Thanks a lot for that. :)
> >>
> >> I'm trying to figure out a patch...
>
> Juhyung,
>
> Are you interesting to try this patch in distros?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=dev-test&id=4736e55bc967e91cf8a275b678739b006c2617f0
>
> There are some tunable parameters, I can export them via sysfs entry,
> let me update later.
>
> Thanks,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [f2fs-dev] [DISCUSSION] f2fs for desktop
  2023-05-18 18:12           ` Juhyung Park
@ 2023-05-22 13:10             ` Chao Yu
  0 siblings, 0 replies; 10+ messages in thread
From: Chao Yu @ 2023-05-22 13:10 UTC (permalink / raw)
  To: Juhyung Park; +Cc: Jaegeuk Kim, Alexander Koskovich, linux-f2fs-devel

Hi Juhyung,

On 2023/5/19 2:12, Juhyung Park wrote:
> Hi Chao,
> 
> Thanks for the patch. I'll try it out on both my laptop and workstation soon.

Thank you! Let me know if you have any concern or suggestion. :)

> 
> One question though: would it make sense to see if it works fine on
> Android too? (With userspace's explicit GC trigger disabled.)

I can see that huawei/honor are still using this feature, I remember that idle
urgent GC feature is not available and when this feature shipped in products.

I guess so, it's fine to use this feature in Android if manual GC from userspace
is disabled.

Thanks,

> Maybe it could be an indication on whether it works properly or not?
> 
> Thanks,
> 
> On Thu, May 18, 2023 at 4:53 PM Chao Yu <chao@kernel.org> wrote:
>>
>> On 2023/4/21 1:26, Juhyung Park wrote:
>>> Hi Chao,
>>>
>>> On Fri, Apr 21, 2023 at 1:19 AM Chao Yu <chao@kernel.org> wrote:
>>>>
>>>> Hi JuHyung,
>>>>
>>>> Sorry for delay reply.
>>>>
>>>> On 2023/4/11 1:03, Juhyung Park wrote:
>>>>> Hi Chao,
>>>>>
>>>>> On Tue, Apr 11, 2023 at 12:44 AM Chao Yu <chao@kernel.org> wrote:
>>>>>>
>>>>>> Hi Juhyung,
>>>>>>
>>>>>> On 2023/4/4 15:36, Juhyung Park wrote:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I want to start a discussion on using f2fs for regular desktops/workstations.
>>>>>>>
>>>>>>> There are growing number of interests in using f2fs as the general
>>>>>>> root file-system:
>>>>>>> 2018: https://www.phoronix.com/news/GRUB-Now-Supports-F2FS
>>>>>>> 2020: https://www.phoronix.com/news/Clear-Linux-F2FS-Root-Option
>>>>>>> 2023: https://code.launchpad.net/~nexusprism/curtin/+git/curtin/+merge/439880
>>>>>>> 2023: https://code.launchpad.net/~nexusprism/grub/+git/ubuntu/+merge/440193
>>>>>>>
>>>>>>> I've been personally running f2fs on all of my x86 Linux boxes since
>>>>>>> 2015, and I have several concerns that I think we need to collectively
>>>>>>> address for regular non-Android normies to use f2fs:
>>>>>>>
>>>>>>> A. Bootloader and installer support
>>>>>>> B. Host-side GC
>>>>>>> C. Extended node bitmap
>>>>>>>
>>>>>>> I'll go through each one.
>>>>>>>
>>>>>>> === A. Bootloader and installer support ===
>>>>>>>
>>>>>>> It seems that both GRUB and systemd-boot supports f2fs without the
>>>>>>> need for a separate ext4-formatted /boot partition.
>>>>>>> Some distros are seemingly disabling f2fs module for GRUB though for
>>>>>>> security reasons:
>>>>>>> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1868664
>>>>>>>
>>>>>>> It's ultimately up to the distro folks to enable this, and still in
>>>>>>> the worst-case scenario, they can specify a separate /boot partition
>>>>>>> and format it to ext4 upon installation.
>>>>>>>
>>>>>>> The installer itself to show f2fs and call mkfs.f2fs is being worked
>>>>>>> on currently on Ubuntu. See the 2023 links above.
>>>>>>>
>>>>>>> Nothing f2fs mainline developers should do here, imo.
>>>>>>>
>>>>>>> === B. Host-side GC ===
>>>>>>>
>>>>>>> f2fs relieves most of the device-side GC but introduces a new
>>>>>>> host-side GC. This is extremely confusing for people who have no
>>>>>>> background in SSDs and flash storage to understand, let alone
>>>>>>> discard/trim/erase complications.
>>>>>>>
>>>>>>> In most consumer-grade blackbox SSDs, device-side GCs are handled
>>>>>>> automatically for various workloads. f2fs, however, leaves that
>>>>>>> responsibility to the userspace with conservative tuning on the
>>>>>>
>>>>>> We've proposed a f2fs feature named "space awared garbage collection"
>>>>>> and shipped it in huawei/honor's devices, but forgot to try upstreaming
>>>>>> it. :-P
>>>>>>
>>>>>> In this feature, we introduced three mode:
>>>>>> - performance mode: something like write-gc in ftl, it can trigger
>>>>>> background gc more frequently and tune its speed according to free
>>>>>> segs and reclaimable blks ratio.
>>>>>> - lifetime mode: slow down background gc to avoid high waf if there
>>>>>> is less free space.
>>>>>> - balance mode: behave as usual.
>>>>>>
>>>>>> I guess this may be helpful for Linux desktop distros since there is
>>>>>> no such storage service trigger gc_urgent.
>>>>>>
>>>>>
>>>>> That indeed sounds interesting.
>>>>>
>>>>> If you need me to test something out, feel free to ask.
>>>>
>>>> Thanks a lot for that. :)
>>>>
>>>> I'm trying to figure out a patch...
>>
>> Juhyung,
>>
>> Are you interesting to try this patch in distros?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=dev-test&id=4736e55bc967e91cf8a275b678739b006c2617f0
>>
>> There are some tunable parameters, I can export them via sysfs entry,
>> let me update later.
>>
>> Thanks,


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-22 13:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-04  7:36 [f2fs-dev] [DISCUSSION] f2fs for desktop Juhyung Park
2023-04-04 23:35 ` Jaegeuk Kim
2023-04-10  8:39   ` Juhyung Park
2023-04-10 15:44 ` Chao Yu
2023-04-10 17:03   ` Juhyung Park
2023-04-20 16:19     ` Chao Yu
2023-04-20 17:26       ` Juhyung Park
2023-05-18  7:53         ` Chao Yu
2023-05-18 18:12           ` Juhyung Park
2023-05-22 13:10             ` Chao Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).