linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] f2fs: add sysfs entry to avoid FUA
@ 2022-05-27 20:59 Jaegeuk Kim
  2022-05-27 21:33 ` Eric Biggers
  2022-05-28  1:07 ` [RFC PATCH v2] " Jaegeuk Kim
  0 siblings, 2 replies; 10+ messages in thread
From: Jaegeuk Kim @ 2022-05-27 20:59 UTC (permalink / raw)
  To: linux-kernel, linux-f2fs-devel; +Cc: Jaegeuk Kim

Some UFS storage gives slower performance on FUA than write+cache_flush.
Let's give a way to manage it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++++++
 fs/f2fs/data.c                          | 2 ++
 fs/f2fs/f2fs.h                          | 1 +
 fs/f2fs/sysfs.c                         | 2 ++
 4 files changed, 12 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
index 9b583dd0298b..cd96b09d7182 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -434,6 +434,7 @@ Date:		April 2020
 Contact:	"Daeho Jeong" <daehojeong@google.com>
 Description:	Give a way to change iostat_period time. 3secs by default.
 		The new iostat trace gives stats gap given the period.
+
 What:		/sys/fs/f2fs/<disk>/max_io_bytes
 Date:		December 2020
 Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
@@ -442,6 +443,12 @@ Description:	This gives a control to limit the bio size in f2fs.
 		whereas, if it has a certain bytes value, f2fs won't submit a
 		bio larger than that size.
 
+What:		/sys/fs/f2fs/<disk>/no_fua_dio
+Date:		May 2022
+Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
+Description:	This gives a signal to iomap, which should not use FUA for
+		direct IOs. Default: 0.
+
 What:		/sys/fs/f2fs/<disk>/stat/sb_status
 Date:		December 2020
 Contact:	"Chao Yu" <yuchao0@huawei.com>
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index f5f2b7233982..23486486eab2 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -4153,6 +4153,8 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if ((inode->i_state & I_DIRTY_DATASYNC) ||
 	    offset + length > i_size_read(inode))
 		iomap->flags |= IOMAP_F_DIRTY;
+	if (F2FS_I_SB(inode)->no_fua_dio)
+		iomap->flags |= IOMAP_F_DIRTY;
 
 	return 0;
 }
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e10838879538..c2400ea0080b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1671,6 +1671,7 @@ struct f2fs_sb_info {
 	int dir_level;				/* directory level */
 	int readdir_ra;				/* readahead inode in readdir */
 	u64 max_io_bytes;			/* max io bytes to merge IOs */
+	int no_fua_dio;				/* avoid FUA in DIO */
 
 	block_t user_block_count;		/* # of user blocks */
 	block_t total_valid_block_count;	/* # of valid blocks */
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 4c50aedd5144..24d628ca92cc 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -771,6 +771,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
 #endif
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, no_fua_dio, no_fua_dio);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_super_block, extension_list, extension_list);
 #ifdef CONFIG_F2FS_FAULT_INJECTION
@@ -890,6 +891,7 @@ static struct attribute *f2fs_attrs[] = {
 #endif
 	ATTR_LIST(readdir_ra),
 	ATTR_LIST(max_io_bytes),
+	ATTR_LIST(no_fua_dio),
 	ATTR_LIST(gc_pin_file_thresh),
 	ATTR_LIST(extension_list),
 #ifdef CONFIG_F2FS_FAULT_INJECTION
-- 
2.36.1.124.g0e6072fb45-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: add sysfs entry to avoid FUA
  2022-05-27 20:59 [PATCH] f2fs: add sysfs entry to avoid FUA Jaegeuk Kim
@ 2022-05-27 21:33 ` Eric Biggers
  2022-05-27 23:55   ` Dave Chinner
  2022-05-28  1:06   ` Jaegeuk Kim
  2022-05-28  1:07 ` [RFC PATCH v2] " Jaegeuk Kim
  1 sibling, 2 replies; 10+ messages in thread
From: Eric Biggers @ 2022-05-27 21:33 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, linux-block, linux-xfs

[+Cc linux-block for FUA, and linux-xfs for iomap]

On Fri, May 27, 2022 at 01:59:55PM -0700, Jaegeuk Kim wrote:
> Some UFS storage gives slower performance on FUA than write+cache_flush.
> Let's give a way to manage it.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Should the driver even be saying that it has FUA support in this case?  If the
driver didn't claim FUA support, that would also solve this problem.

> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++++++
>  fs/f2fs/data.c                          | 2 ++
>  fs/f2fs/f2fs.h                          | 1 +
>  fs/f2fs/sysfs.c                         | 2 ++
>  4 files changed, 12 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
> index 9b583dd0298b..cd96b09d7182 100644
> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> @@ -434,6 +434,7 @@ Date:		April 2020
>  Contact:	"Daeho Jeong" <daehojeong@google.com>
>  Description:	Give a way to change iostat_period time. 3secs by default.
>  		The new iostat trace gives stats gap given the period.
> +
>  What:		/sys/fs/f2fs/<disk>/max_io_bytes
>  Date:		December 2020
>  Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> @@ -442,6 +443,12 @@ Description:	This gives a control to limit the bio size in f2fs.
>  		whereas, if it has a certain bytes value, f2fs won't submit a
>  		bio larger than that size.
>  
> +What:		/sys/fs/f2fs/<disk>/no_fua_dio
> +Date:		May 2022
> +Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> +Description:	This gives a signal to iomap, which should not use FUA for
> +		direct IOs. Default: 0.

iomap is an implementation detail, so it shouldn't be mentioned in UAPI
documentation.  UAPI documentation should describe user-visible behavior only.

> +
>  What:		/sys/fs/f2fs/<disk>/stat/sb_status
>  Date:		December 2020
>  Contact:	"Chao Yu" <yuchao0@huawei.com>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index f5f2b7233982..23486486eab2 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -4153,6 +4153,8 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  	if ((inode->i_state & I_DIRTY_DATASYNC) ||
>  	    offset + length > i_size_read(inode))
>  		iomap->flags |= IOMAP_F_DIRTY;
> +	if (F2FS_I_SB(inode)->no_fua_dio)
> +		iomap->flags |= IOMAP_F_DIRTY;

This is overloading the IOMAP_F_DIRTY flag to mean something other than dirty.
Perhaps this flag needs to be renamed, or a new flag should be added?

> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index e10838879538..c2400ea0080b 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1671,6 +1671,7 @@ struct f2fs_sb_info {
>  	int dir_level;				/* directory level */
>  	int readdir_ra;				/* readahead inode in readdir */
>  	u64 max_io_bytes;			/* max io bytes to merge IOs */
> +	int no_fua_dio;				/* avoid FUA in DIO */

Make this a bool?

> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> index 4c50aedd5144..24d628ca92cc 100644
> --- a/fs/f2fs/sysfs.c
> +++ b/fs/f2fs/sysfs.c
> @@ -771,6 +771,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
>  #endif
>  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
>  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
> +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, no_fua_dio, no_fua_dio);
>  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
>  F2FS_RW_ATTR(F2FS_SBI, f2fs_super_block, extension_list, extension_list);
>  #ifdef CONFIG_F2FS_FAULT_INJECTION
> @@ -890,6 +891,7 @@ static struct attribute *f2fs_attrs[] = {
>  #endif
>  	ATTR_LIST(readdir_ra),
>  	ATTR_LIST(max_io_bytes),
> +	ATTR_LIST(no_fua_dio),

Where is it validated that only valid values (0 or 1) can be written to this
file?

- Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: add sysfs entry to avoid FUA
  2022-05-27 21:33 ` Eric Biggers
@ 2022-05-27 23:55   ` Dave Chinner
  2022-05-28  0:26     ` Jaegeuk Kim
  2022-05-28  1:06   ` Jaegeuk Kim
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2022-05-27 23:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Jaegeuk Kim, linux-kernel, linux-f2fs-devel, linux-block, linux-xfs

On Fri, May 27, 2022 at 09:33:55PM +0000, Eric Biggers wrote:
> [+Cc linux-block for FUA, and linux-xfs for iomap]

linux-fsdevel should really be used for iomap stuff...

> 
> On Fri, May 27, 2022 at 01:59:55PM -0700, Jaegeuk Kim wrote:
> > Some UFS storage gives slower performance on FUA than write+cache_flush.
> > Let's give a way to manage it.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> 
> Should the driver even be saying that it has FUA support in this case?  If the
> driver didn't claim FUA support, that would also solve this problem.

Agreed, this is a hardware problem that need to addressed with a
driver quirk to stop it advertising FUA support. The high level
fs/iomap code should always issue FUA writes where possible and
the lower layers tell the block layer whether to issue the FUA as
a FUA or write+cache flush pair.

And, quite frankly, exposing this sort of "hardware needs help" knob
as a sysfs variable is exactly the sort of thing we should never do.

Users have no idea how to tune stuff like this correctly (even if
they knew it existed!), yet we know exactly what hardware has this
problem and the kernel already has mechanisms that would allow it to
just Do The Right Thing. IOWs, we can fix this without the user even
having to know that they have garbage hardware that needs special
help....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: add sysfs entry to avoid FUA
  2022-05-27 23:55   ` Dave Chinner
@ 2022-05-28  0:26     ` Jaegeuk Kim
  2022-05-28  5:12       ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Jaegeuk Kim @ 2022-05-28  0:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Biggers, linux-kernel, linux-f2fs-devel, linux-block, linux-xfs

On 05/28, Dave Chinner wrote:
> On Fri, May 27, 2022 at 09:33:55PM +0000, Eric Biggers wrote:
> > [+Cc linux-block for FUA, and linux-xfs for iomap]
> 
> linux-fsdevel should really be used for iomap stuff...
> 
> > 
> > On Fri, May 27, 2022 at 01:59:55PM -0700, Jaegeuk Kim wrote:
> > > Some UFS storage gives slower performance on FUA than write+cache_flush.
> > > Let's give a way to manage it.
> > > 
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > 
> > Should the driver even be saying that it has FUA support in this case?  If the
> > driver didn't claim FUA support, that would also solve this problem.
> 
> Agreed, this is a hardware problem that need to addressed with a
> driver quirk to stop it advertising FUA support. The high level
> fs/iomap code should always issue FUA writes where possible and
> the lower layers tell the block layer whether to issue the FUA as
> a FUA or write+cache flush pair.

I was thinking to turn off FUA in driver side quickly tho, one concern
was the bandwidth vs. latency. What if the device can support FUA having
short latency while giving low bandwidth? In that case, we still have
a room to utilize FUA for small-sized  writes such as filesystem metadata
writes, but avoid DIO w/ FUA for sequential write stream. Is this just
HW problem? Or, does SW need to use FUA more efficiently?

> 
> And, quite frankly, exposing this sort of "hardware needs help" knob
> as a sysfs variable is exactly the sort of thing we should never do.
> 
> Users have no idea how to tune stuff like this correctly (even if
> they knew it existed!), yet we know exactly what hardware has this
> problem and the kernel already has mechanisms that would allow it to
> just Do The Right Thing. IOWs, we can fix this without the user even
> having to know that they have garbage hardware that needs special
> help....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: add sysfs entry to avoid FUA
  2022-05-27 21:33 ` Eric Biggers
  2022-05-27 23:55   ` Dave Chinner
@ 2022-05-28  1:06   ` Jaegeuk Kim
  2022-05-28  1:42     ` Darrick J. Wong
  2022-05-28  5:03     ` Christoph Hellwig
  1 sibling, 2 replies; 10+ messages in thread
From: Jaegeuk Kim @ 2022-05-28  1:06 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-kernel, linux-f2fs-devel, linux-block, linux-xfs

On 05/27, Eric Biggers wrote:
> [+Cc linux-block for FUA, and linux-xfs for iomap]
> 
> On Fri, May 27, 2022 at 01:59:55PM -0700, Jaegeuk Kim wrote:
> > Some UFS storage gives slower performance on FUA than write+cache_flush.
> > Let's give a way to manage it.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> 
> Should the driver even be saying that it has FUA support in this case?  If the
> driver didn't claim FUA support, that would also solve this problem.

I think there's still some benefit to use FUA such as small chunk writes
for checkpoint.

> 
> > ---
> >  Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++++++
> >  fs/f2fs/data.c                          | 2 ++
> >  fs/f2fs/f2fs.h                          | 1 +
> >  fs/f2fs/sysfs.c                         | 2 ++
> >  4 files changed, 12 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
> > index 9b583dd0298b..cd96b09d7182 100644
> > --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> > @@ -434,6 +434,7 @@ Date:		April 2020
> >  Contact:	"Daeho Jeong" <daehojeong@google.com>
> >  Description:	Give a way to change iostat_period time. 3secs by default.
> >  		The new iostat trace gives stats gap given the period.
> > +
> >  What:		/sys/fs/f2fs/<disk>/max_io_bytes
> >  Date:		December 2020
> >  Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> > @@ -442,6 +443,12 @@ Description:	This gives a control to limit the bio size in f2fs.
> >  		whereas, if it has a certain bytes value, f2fs won't submit a
> >  		bio larger than that size.
> >  
> > +What:		/sys/fs/f2fs/<disk>/no_fua_dio
> > +Date:		May 2022
> > +Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> > +Description:	This gives a signal to iomap, which should not use FUA for
> > +		direct IOs. Default: 0.
> 
> iomap is an implementation detail, so it shouldn't be mentioned in UAPI
> documentation.  UAPI documentation should describe user-visible behavior only.

Ok.

> 
> > +
> >  What:		/sys/fs/f2fs/<disk>/stat/sb_status
> >  Date:		December 2020
> >  Contact:	"Chao Yu" <yuchao0@huawei.com>
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index f5f2b7233982..23486486eab2 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -4153,6 +4153,8 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >  	if ((inode->i_state & I_DIRTY_DATASYNC) ||
> >  	    offset + length > i_size_read(inode))
> >  		iomap->flags |= IOMAP_F_DIRTY;
> > +	if (F2FS_I_SB(inode)->no_fua_dio)
> > +		iomap->flags |= IOMAP_F_DIRTY;
> 
> This is overloading the IOMAP_F_DIRTY flag to mean something other than dirty.
> Perhaps this flag needs to be renamed, or a new flag should be added?

I'm not sure it's acceptable to add another flag for f2fs only.

> 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index e10838879538..c2400ea0080b 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1671,6 +1671,7 @@ struct f2fs_sb_info {
> >  	int dir_level;				/* directory level */
> >  	int readdir_ra;				/* readahead inode in readdir */
> >  	u64 max_io_bytes;			/* max io bytes to merge IOs */
> > +	int no_fua_dio;				/* avoid FUA in DIO */
> 
> Make this a bool?

Done.

> 
> > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > index 4c50aedd5144..24d628ca92cc 100644
> > --- a/fs/f2fs/sysfs.c
> > +++ b/fs/f2fs/sysfs.c
> > @@ -771,6 +771,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
> >  #endif
> >  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
> >  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
> > +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, no_fua_dio, no_fua_dio);
> >  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
> >  F2FS_RW_ATTR(F2FS_SBI, f2fs_super_block, extension_list, extension_list);
> >  #ifdef CONFIG_F2FS_FAULT_INJECTION
> > @@ -890,6 +891,7 @@ static struct attribute *f2fs_attrs[] = {
> >  #endif
> >  	ATTR_LIST(readdir_ra),
> >  	ATTR_LIST(max_io_bytes),
> > +	ATTR_LIST(no_fua_dio),
> 
> Where is it validated that only valid values (0 or 1) can be written to this
> file?

Added.

> 
> - Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v2] f2fs: add sysfs entry to avoid FUA
  2022-05-27 20:59 [PATCH] f2fs: add sysfs entry to avoid FUA Jaegeuk Kim
  2022-05-27 21:33 ` Eric Biggers
@ 2022-05-28  1:07 ` Jaegeuk Kim
  1 sibling, 0 replies; 10+ messages in thread
From: Jaegeuk Kim @ 2022-05-28  1:07 UTC (permalink / raw)
  To: linux-kernel, linux-f2fs-devel

Some UFS storage supporting FUA gives slower DIO write bandwidth with FUA
than write+cache_flush. But, in some small chunk writes, there's no
reason to avoid FUA for shorter latency. Let's give a way to handle it
by user.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---

 Note that, this is a RFC, waiting for a better/right solution.

 Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++++++
 fs/f2fs/data.c                          | 2 ++
 fs/f2fs/f2fs.h                          | 1 +
 fs/f2fs/sysfs.c                         | 9 +++++++++
 4 files changed, 19 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
index 9b583dd0298b..8ca49f7d28ad 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -434,6 +434,7 @@ Date:		April 2020
 Contact:	"Daeho Jeong" <daehojeong@google.com>
 Description:	Give a way to change iostat_period time. 3secs by default.
 		The new iostat trace gives stats gap given the period.
+
 What:		/sys/fs/f2fs/<disk>/max_io_bytes
 Date:		December 2020
 Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
@@ -442,6 +443,12 @@ Description:	This gives a control to limit the bio size in f2fs.
 		whereas, if it has a certain bytes value, f2fs won't submit a
 		bio larger than that size.
 
+What:		/sys/fs/f2fs/<disk>/no_fua_dio
+Date:		May 2022
+Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
+Description:	This contorls whether direct IOs attach FUA or not. Default is
+		using FUA.
+
 What:		/sys/fs/f2fs/<disk>/stat/sb_status
 Date:		December 2020
 Contact:	"Chao Yu" <yuchao0@huawei.com>
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index f5f2b7233982..23486486eab2 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -4153,6 +4153,8 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if ((inode->i_state & I_DIRTY_DATASYNC) ||
 	    offset + length > i_size_read(inode))
 		iomap->flags |= IOMAP_F_DIRTY;
+	if (F2FS_I_SB(inode)->no_fua_dio)
+		iomap->flags |= IOMAP_F_DIRTY;
 
 	return 0;
 }
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e10838879538..4897ada1929b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1671,6 +1671,7 @@ struct f2fs_sb_info {
 	int dir_level;				/* directory level */
 	int readdir_ra;				/* readahead inode in readdir */
 	u64 max_io_bytes;			/* max io bytes to merge IOs */
+	bool no_fua_dio;			/* don't add FUA in DIO write */
 
 	block_t user_block_count;		/* # of user blocks */
 	block_t total_valid_block_count;	/* # of valid blocks */
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 4c50aedd5144..199ba3e20ab0 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -608,6 +608,13 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
 		return count;
 	}
 
+	if (!strcmp(a->attr.name, "no_fua_dio")) {
+		if (t != 0 && t != 1)
+			return -EINVAL;
+		sbi->no_fua_dio = t;
+		return count;
+	}
+
 	*ui = (unsigned int)t;
 
 	return count;
@@ -771,6 +778,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
 #endif
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, no_fua_dio, no_fua_dio);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
 F2FS_RW_ATTR(F2FS_SBI, f2fs_super_block, extension_list, extension_list);
 #ifdef CONFIG_F2FS_FAULT_INJECTION
@@ -890,6 +898,7 @@ static struct attribute *f2fs_attrs[] = {
 #endif
 	ATTR_LIST(readdir_ra),
 	ATTR_LIST(max_io_bytes),
+	ATTR_LIST(no_fua_dio),
 	ATTR_LIST(gc_pin_file_thresh),
 	ATTR_LIST(extension_list),
 #ifdef CONFIG_F2FS_FAULT_INJECTION
-- 
2.36.1.124.g0e6072fb45-goog



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: add sysfs entry to avoid FUA
  2022-05-28  1:06   ` Jaegeuk Kim
@ 2022-05-28  1:42     ` Darrick J. Wong
  2022-05-28  5:03     ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2022-05-28  1:42 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Eric Biggers, linux-kernel, linux-f2fs-devel, linux-block, linux-xfs

On Fri, May 27, 2022 at 06:06:08PM -0700, Jaegeuk Kim wrote:
> On 05/27, Eric Biggers wrote:
> > [+Cc linux-block for FUA, and linux-xfs for iomap]
> > 
> > On Fri, May 27, 2022 at 01:59:55PM -0700, Jaegeuk Kim wrote:
> > > Some UFS storage gives slower performance on FUA than write+cache_flush.
> > > Let's give a way to manage it.
> > > 
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > 
> > Should the driver even be saying that it has FUA support in this case?  If the
> > driver didn't claim FUA support, that would also solve this problem.
> 
> I think there's still some benefit to use FUA such as small chunk writes
> for checkpoint.
> 
> > 
> > > ---
> > >  Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++++++
> > >  fs/f2fs/data.c                          | 2 ++
> > >  fs/f2fs/f2fs.h                          | 1 +
> > >  fs/f2fs/sysfs.c                         | 2 ++
> > >  4 files changed, 12 insertions(+)
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
> > > index 9b583dd0298b..cd96b09d7182 100644
> > > --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> > > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> > > @@ -434,6 +434,7 @@ Date:		April 2020
> > >  Contact:	"Daeho Jeong" <daehojeong@google.com>
> > >  Description:	Give a way to change iostat_period time. 3secs by default.
> > >  		The new iostat trace gives stats gap given the period.
> > > +
> > >  What:		/sys/fs/f2fs/<disk>/max_io_bytes
> > >  Date:		December 2020
> > >  Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> > > @@ -442,6 +443,12 @@ Description:	This gives a control to limit the bio size in f2fs.
> > >  		whereas, if it has a certain bytes value, f2fs won't submit a
> > >  		bio larger than that size.
> > >  
> > > +What:		/sys/fs/f2fs/<disk>/no_fua_dio
> > > +Date:		May 2022
> > > +Contact:	"Jaegeuk Kim" <jaegeuk@kernel.org>
> > > +Description:	This gives a signal to iomap, which should not use FUA for
> > > +		direct IOs. Default: 0.
> > 
> > iomap is an implementation detail, so it shouldn't be mentioned in UAPI
> > documentation.  UAPI documentation should describe user-visible behavior only.
> 
> Ok.
> 
> > 
> > > +
> > >  What:		/sys/fs/f2fs/<disk>/stat/sb_status
> > >  Date:		December 2020
> > >  Contact:	"Chao Yu" <yuchao0@huawei.com>
> > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > index f5f2b7233982..23486486eab2 100644
> > > --- a/fs/f2fs/data.c
> > > +++ b/fs/f2fs/data.c
> > > @@ -4153,6 +4153,8 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > >  	if ((inode->i_state & I_DIRTY_DATASYNC) ||
> > >  	    offset + length > i_size_read(inode))
> > >  		iomap->flags |= IOMAP_F_DIRTY;
> > > +	if (F2FS_I_SB(inode)->no_fua_dio)
> > > +		iomap->flags |= IOMAP_F_DIRTY;
> > 
> > This is overloading the IOMAP_F_DIRTY flag to mean something other than dirty.
> > Perhaps this flag needs to be renamed, or a new flag should be added?
> 
> I'm not sure it's acceptable to add another flag for f2fs only.

I think Al and willy have been throwing around patches to tell
iomap_dio_rw or someone that the caller will handle cache flushes and
that it shouldn't initiate them on its own; would that help here?

--D

> > 
> > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > index e10838879538..c2400ea0080b 100644
> > > --- a/fs/f2fs/f2fs.h
> > > +++ b/fs/f2fs/f2fs.h
> > > @@ -1671,6 +1671,7 @@ struct f2fs_sb_info {
> > >  	int dir_level;				/* directory level */
> > >  	int readdir_ra;				/* readahead inode in readdir */
> > >  	u64 max_io_bytes;			/* max io bytes to merge IOs */
> > > +	int no_fua_dio;				/* avoid FUA in DIO */
> > 
> > Make this a bool?
> 
> Done.
> 
> > 
> > > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > > index 4c50aedd5144..24d628ca92cc 100644
> > > --- a/fs/f2fs/sysfs.c
> > > +++ b/fs/f2fs/sysfs.c
> > > @@ -771,6 +771,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
> > >  #endif
> > >  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
> > >  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
> > > +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, no_fua_dio, no_fua_dio);
> > >  F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
> > >  F2FS_RW_ATTR(F2FS_SBI, f2fs_super_block, extension_list, extension_list);
> > >  #ifdef CONFIG_F2FS_FAULT_INJECTION
> > > @@ -890,6 +891,7 @@ static struct attribute *f2fs_attrs[] = {
> > >  #endif
> > >  	ATTR_LIST(readdir_ra),
> > >  	ATTR_LIST(max_io_bytes),
> > > +	ATTR_LIST(no_fua_dio),
> > 
> > Where is it validated that only valid values (0 or 1) can be written to this
> > file?
> 
> Added.
> 
> > 
> > - Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: add sysfs entry to avoid FUA
  2022-05-28  1:06   ` Jaegeuk Kim
  2022-05-28  1:42     ` Darrick J. Wong
@ 2022-05-28  5:03     ` Christoph Hellwig
  2022-05-31 20:15       ` Jaegeuk Kim
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-05-28  5:03 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Eric Biggers, linux-kernel, linux-f2fs-devel, linux-block, linux-xfs

On Fri, May 27, 2022 at 06:06:08PM -0700, Jaegeuk Kim wrote:
> I think there's still some benefit to use FUA such as small chunk writes
> for checkpoint.

Did you measure if there is?  Because some SSDs basically implemented
FUA as an implied flush after the write, in which case it would not
really help there either (but also not hurt).

But as the previous two maintainers already said - this needs quirking
at the driver layer, not in the submitter.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: add sysfs entry to avoid FUA
  2022-05-28  0:26     ` Jaegeuk Kim
@ 2022-05-28  5:12       ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2022-05-28  5:12 UTC (permalink / raw)
  To: Jaegeuk Kim
  Cc: Eric Biggers, linux-kernel, linux-f2fs-devel, linux-block, linux-xfs

On Fri, May 27, 2022 at 05:26:32PM -0700, Jaegeuk Kim wrote:
> On 05/28, Dave Chinner wrote:
> > On Fri, May 27, 2022 at 09:33:55PM +0000, Eric Biggers wrote:
> > > [+Cc linux-block for FUA, and linux-xfs for iomap]
> > 
> > linux-fsdevel should really be used for iomap stuff...
> > 
> > > 
> > > On Fri, May 27, 2022 at 01:59:55PM -0700, Jaegeuk Kim wrote:
> > > > Some UFS storage gives slower performance on FUA than write+cache_flush.
> > > > Let's give a way to manage it.
> > > > 
> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > 
> > > Should the driver even be saying that it has FUA support in this case?  If the
> > > driver didn't claim FUA support, that would also solve this problem.
> > 
> > Agreed, this is a hardware problem that need to addressed with a
> > driver quirk to stop it advertising FUA support. The high level
> > fs/iomap code should always issue FUA writes where possible and
> > the lower layers tell the block layer whether to issue the FUA as
> > a FUA or write+cache flush pair.
> 
> I was thinking to turn off FUA in driver side quickly tho, one concern
> was the bandwidth vs. latency. What if the device can support FUA having
> short latency while giving low bandwidth?

Seriously, how is a user supposed to know this sort of thing about
the hardware they are using? They don't, and to expect them to not
only know about the existing of a weird sysfs knob, let alone how it
applies to their hardware and their workload is totally
unreasonable.

If the hardware has non-deterministic FUA write performance, or
requires very careful switch over between cache flushes and FUA to
get the most out of the hardware, then that's not something we can
tune or optimise for - that's just broken hardware and the drive
should quirk the brokeness away so nobody has to care about it. Tell
the hardware manufacturer to fix their hardware, don't try to hack
around it in software and then expect the user to know how to tune
for that broken hardware.

> In that case, we still have
> a room to utilize FUA for small-sized  writes such as filesystem metadata
> writes, but avoid DIO w/ FUA for sequential write stream.

Strawman.

We don't use FUA for normal DIO writes - they only get used for
O_DSYNC writes, in which case we either use FUA if the device
supports it, or we do a normal write followed by a cache flush.
If there are metadata updates that the O_DSYNC needs to also flush,
we don't use FUA by let the fileystem issue a cache flush in the
most optimal possible after the write completes.

Either way, using O_DSYNC DIO writes for streaming, sequential data
is a really poor choice for an application to make. Normal DIO
writes followed by fdatasync() to flush the metadata and caches once
will be much faster and far more efficient than a metadata and cache
flush after every single data write, FUA or not.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: add sysfs entry to avoid FUA
  2022-05-28  5:03     ` Christoph Hellwig
@ 2022-05-31 20:15       ` Jaegeuk Kim
  0 siblings, 0 replies; 10+ messages in thread
From: Jaegeuk Kim @ 2022-05-31 20:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Biggers, linux-kernel, linux-f2fs-devel, linux-block, linux-xfs

On 05/27, Christoph Hellwig wrote:
> On Fri, May 27, 2022 at 06:06:08PM -0700, Jaegeuk Kim wrote:
> > I think there's still some benefit to use FUA such as small chunk writes
> > for checkpoint.
> 
> Did you measure if there is?  Because some SSDs basically implemented
> FUA as an implied flush after the write, in which case it would not
> really help there either (but also not hurt).
> 
> But as the previous two maintainers already said - this needs quirking
> at the driver layer, not in the submitter.

Thanks, I indeed measured this using UFS, and it turned out cache_flush
is better than FUA all the time like this. Hence, I posted a quirk [1].

Write(us/KB)	4	64	256	1024	2048
FUA		873.792	754.604	995.624	1011.67	1067.99
CACHE_FLUSH	824.703	712.98	800.307	1019.5	1037.37

[1] https://lore.kernel.org/linux-scsi/20220531201053.3300018-1-jaegeuk@kernel.org/T/#u

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-31 20:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27 20:59 [PATCH] f2fs: add sysfs entry to avoid FUA Jaegeuk Kim
2022-05-27 21:33 ` Eric Biggers
2022-05-27 23:55   ` Dave Chinner
2022-05-28  0:26     ` Jaegeuk Kim
2022-05-28  5:12       ` Dave Chinner
2022-05-28  1:06   ` Jaegeuk Kim
2022-05-28  1:42     ` Darrick J. Wong
2022-05-28  5:03     ` Christoph Hellwig
2022-05-31 20:15       ` Jaegeuk Kim
2022-05-28  1:07 ` [RFC PATCH v2] " Jaegeuk Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).