All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-06-01 10:10 ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-06-01 10:10 UTC (permalink / raw)
  To: jaegeuk; +Cc: linux-f2fs-devel, linux-kernel, chao, Chao Yu

[1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html

As [1] reported, if lower device doesn't support write barrier, in below
case:

- write page #0; persist
- overwrite page #0
- fsync
 - write data page #0 OPU into device's cache
 - write inode page into device's cache
 - issue flush

If SPO is triggered during flush command, inode page can be persisted
before data page #0, so that after recovery, inode page can be recovered
with new physical block address of data page #0, however there may
contains dummy data in new physical block address.

Then what user will see is: after overwrite & fsync + SPO, old data in
file was corrupted, if any user do care about such case, we can suggest
user to use STRICT fsync mode, in this mode, we will force to trigger
preflush command to persist data in device cache in prior to node
writeback, it avoids potential data corruption during fsync().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
---
v2:
- fix this by adding additional preflush command rather than using
atomic write flow.
 fs/f2fs/file.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7d5311d54f63..238ca2a733ac 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
 				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
 			goto flush_out;
 		goto out;
+	} else {
+		/*
+		 * for OPU case, during fsync(), node can be persisted before
+		 * data when lower device doesn't support write barrier, result
+		 * in data corruption after SPO.
+		 * So for strict fsync mode, force to trigger preflush to keep
+		 * data/node write order to avoid potential data corruption.
+		 */
+		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
+								!atomic) {
+			ret = f2fs_issue_flush(sbi, inode->i_ino);
+			if (ret)
+				goto out;
+		}
 	}
 go_write:
 	/*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-06-01 10:10 ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-06-01 10:10 UTC (permalink / raw)
  To: jaegeuk; +Cc: linux-kernel, linux-f2fs-devel

[1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html

As [1] reported, if lower device doesn't support write barrier, in below
case:

- write page #0; persist
- overwrite page #0
- fsync
 - write data page #0 OPU into device's cache
 - write inode page into device's cache
 - issue flush

If SPO is triggered during flush command, inode page can be persisted
before data page #0, so that after recovery, inode page can be recovered
with new physical block address of data page #0, however there may
contains dummy data in new physical block address.

Then what user will see is: after overwrite & fsync + SPO, old data in
file was corrupted, if any user do care about such case, we can suggest
user to use STRICT fsync mode, in this mode, we will force to trigger
preflush command to persist data in device cache in prior to node
writeback, it avoids potential data corruption during fsync().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
---
v2:
- fix this by adding additional preflush command rather than using
atomic write flow.
 fs/f2fs/file.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7d5311d54f63..238ca2a733ac 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
 				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
 			goto flush_out;
 		goto out;
+	} else {
+		/*
+		 * for OPU case, during fsync(), node can be persisted before
+		 * data when lower device doesn't support write barrier, result
+		 * in data corruption after SPO.
+		 * So for strict fsync mode, force to trigger preflush to keep
+		 * data/node write order to avoid potential data corruption.
+		 */
+		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
+								!atomic) {
+			ret = f2fs_issue_flush(sbi, inode->i_ino);
+			if (ret)
+				goto out;
+		}
 	}
 go_write:
 	/*
-- 
2.29.2



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-06-01 10:10 ` [f2fs-dev] " Chao Yu
@ 2021-06-03 16:00   ` Chao Yu
  -1 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-06-03 16:00 UTC (permalink / raw)
  To: Chao Yu, jaegeuk; +Cc: linux-f2fs-devel, linux-kernel

Jaegeuk,

Any comments on this patch?

On 2021/6/1 18:10, Chao Yu wrote:
> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> 
> As [1] reported, if lower device doesn't support write barrier, in below
> case:
> 
> - write page #0; persist
> - overwrite page #0
> - fsync
>   - write data page #0 OPU into device's cache
>   - write inode page into device's cache
>   - issue flush
> 
> If SPO is triggered during flush command, inode page can be persisted
> before data page #0, so that after recovery, inode page can be recovered
> with new physical block address of data page #0, however there may
> contains dummy data in new physical block address.
> 
> Then what user will see is: after overwrite & fsync + SPO, old data in
> file was corrupted, if any user do care about such case, we can suggest
> user to use STRICT fsync mode, in this mode, we will force to trigger
> preflush command to persist data in device cache in prior to node
> writeback, it avoids potential data corruption during fsync().
> 
> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> ---
> v2:
> - fix this by adding additional preflush command rather than using
> atomic write flow.
>   fs/f2fs/file.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 7d5311d54f63..238ca2a733ac 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>   				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>   			goto flush_out;
>   		goto out;
> +	} else {
> +		/*
> +		 * for OPU case, during fsync(), node can be persisted before
> +		 * data when lower device doesn't support write barrier, result
> +		 * in data corruption after SPO.
> +		 * So for strict fsync mode, force to trigger preflush to keep
> +		 * data/node write order to avoid potential data corruption.
> +		 */
> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> +								!atomic) {
> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> +			if (ret)
> +				goto out;
> +		}
>   	}
>   go_write:
>   	/*
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-06-03 16:00   ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-06-03 16:00 UTC (permalink / raw)
  To: Chao Yu, jaegeuk; +Cc: linux-kernel, linux-f2fs-devel

Jaegeuk,

Any comments on this patch?

On 2021/6/1 18:10, Chao Yu wrote:
> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> 
> As [1] reported, if lower device doesn't support write barrier, in below
> case:
> 
> - write page #0; persist
> - overwrite page #0
> - fsync
>   - write data page #0 OPU into device's cache
>   - write inode page into device's cache
>   - issue flush
> 
> If SPO is triggered during flush command, inode page can be persisted
> before data page #0, so that after recovery, inode page can be recovered
> with new physical block address of data page #0, however there may
> contains dummy data in new physical block address.
> 
> Then what user will see is: after overwrite & fsync + SPO, old data in
> file was corrupted, if any user do care about such case, we can suggest
> user to use STRICT fsync mode, in this mode, we will force to trigger
> preflush command to persist data in device cache in prior to node
> writeback, it avoids potential data corruption during fsync().
> 
> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> ---
> v2:
> - fix this by adding additional preflush command rather than using
> atomic write flow.
>   fs/f2fs/file.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 7d5311d54f63..238ca2a733ac 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>   				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>   			goto flush_out;
>   		goto out;
> +	} else {
> +		/*
> +		 * for OPU case, during fsync(), node can be persisted before
> +		 * data when lower device doesn't support write barrier, result
> +		 * in data corruption after SPO.
> +		 * So for strict fsync mode, force to trigger preflush to keep
> +		 * data/node write order to avoid potential data corruption.
> +		 */
> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> +								!atomic) {
> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> +			if (ret)
> +				goto out;
> +		}
>   	}
>   go_write:
>   	/*
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-06-03 16:00   ` [f2fs-dev] " Chao Yu
@ 2021-06-07 23:32     ` Chao Yu
  -1 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-06-07 23:32 UTC (permalink / raw)
  To: Chao Yu, jaegeuk; +Cc: linux-kernel, linux-f2fs-devel

Still no time to check this?

Thanks,

On 2021/6/4 0:00, Chao Yu wrote:
> Jaegeuk,
> 
> Any comments on this patch?
> 
> On 2021/6/1 18:10, Chao Yu wrote:
>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>
>> As [1] reported, if lower device doesn't support write barrier, in below
>> case:
>>
>> - write page #0; persist
>> - overwrite page #0
>> - fsync
>>   - write data page #0 OPU into device's cache
>>   - write inode page into device's cache
>>   - issue flush
>>
>> If SPO is triggered during flush command, inode page can be persisted
>> before data page #0, so that after recovery, inode page can be recovered
>> with new physical block address of data page #0, however there may
>> contains dummy data in new physical block address.
>>
>> Then what user will see is: after overwrite & fsync + SPO, old data in
>> file was corrupted, if any user do care about such case, we can suggest
>> user to use STRICT fsync mode, in this mode, we will force to trigger
>> preflush command to persist data in device cache in prior to node
>> writeback, it avoids potential data corruption during fsync().
>>
>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>> ---
>> v2:
>> - fix this by adding additional preflush command rather than using
>> atomic write flow.
>>   fs/f2fs/file.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>> index 7d5311d54f63..238ca2a733ac 100644
>> --- a/fs/f2fs/file.c
>> +++ b/fs/f2fs/file.c
>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>                   f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>               goto flush_out;
>>           goto out;
>> +    } else {
>> +        /*
>> +         * for OPU case, during fsync(), node can be persisted before
>> +         * data when lower device doesn't support write barrier, result
>> +         * in data corruption after SPO.
>> +         * So for strict fsync mode, force to trigger preflush to keep
>> +         * data/node write order to avoid potential data corruption.
>> +         */
>> +        if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>> +                                !atomic) {
>> +            ret = f2fs_issue_flush(sbi, inode->i_ino);
>> +            if (ret)
>> +                goto out;
>> +        }
>>       }
>>   go_write:
>>       /*
>>
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-06-07 23:32     ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-06-07 23:32 UTC (permalink / raw)
  To: Chao Yu, jaegeuk; +Cc: linux-kernel, linux-f2fs-devel

Still no time to check this?

Thanks,

On 2021/6/4 0:00, Chao Yu wrote:
> Jaegeuk,
> 
> Any comments on this patch?
> 
> On 2021/6/1 18:10, Chao Yu wrote:
>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>
>> As [1] reported, if lower device doesn't support write barrier, in below
>> case:
>>
>> - write page #0; persist
>> - overwrite page #0
>> - fsync
>>   - write data page #0 OPU into device's cache
>>   - write inode page into device's cache
>>   - issue flush
>>
>> If SPO is triggered during flush command, inode page can be persisted
>> before data page #0, so that after recovery, inode page can be recovered
>> with new physical block address of data page #0, however there may
>> contains dummy data in new physical block address.
>>
>> Then what user will see is: after overwrite & fsync + SPO, old data in
>> file was corrupted, if any user do care about such case, we can suggest
>> user to use STRICT fsync mode, in this mode, we will force to trigger
>> preflush command to persist data in device cache in prior to node
>> writeback, it avoids potential data corruption during fsync().
>>
>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>> ---
>> v2:
>> - fix this by adding additional preflush command rather than using
>> atomic write flow.
>>   fs/f2fs/file.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>> index 7d5311d54f63..238ca2a733ac 100644
>> --- a/fs/f2fs/file.c
>> +++ b/fs/f2fs/file.c
>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>                   f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>               goto flush_out;
>>           goto out;
>> +    } else {
>> +        /*
>> +         * for OPU case, during fsync(), node can be persisted before
>> +         * data when lower device doesn't support write barrier, result
>> +         * in data corruption after SPO.
>> +         * So for strict fsync mode, force to trigger preflush to keep
>> +         * data/node write order to avoid potential data corruption.
>> +         */
>> +        if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>> +                                !atomic) {
>> +            ret = f2fs_issue_flush(sbi, inode->i_ino);
>> +            if (ret)
>> +                goto out;
>> +        }
>>       }
>>   go_write:
>>       /*
>>
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-06-01 10:10 ` [f2fs-dev] " Chao Yu
@ 2021-07-01 17:10   ` Jaegeuk Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-01 17:10 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel, linux-kernel, chao

On 06/01, Chao Yu wrote:
> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> 
> As [1] reported, if lower device doesn't support write barrier, in below
> case:
> 
> - write page #0; persist
> - overwrite page #0
> - fsync
>  - write data page #0 OPU into device's cache
>  - write inode page into device's cache
>  - issue flush

Well, we have preflush for node writes, so I don't think this is the case.

 fio.op_flags |= REQ_PREFLUSH | REQ_FUA;

> 
> If SPO is triggered during flush command, inode page can be persisted
> before data page #0, so that after recovery, inode page can be recovered
> with new physical block address of data page #0, however there may
> contains dummy data in new physical block address.
> 
> Then what user will see is: after overwrite & fsync + SPO, old data in
> file was corrupted, if any user do care about such case, we can suggest
> user to use STRICT fsync mode, in this mode, we will force to trigger
> preflush command to persist data in device cache in prior to node
> writeback, it avoids potential data corruption during fsync().
> 
> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> ---
> v2:
> - fix this by adding additional preflush command rather than using
> atomic write flow.
>  fs/f2fs/file.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 7d5311d54f63..238ca2a733ac 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>  				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>  			goto flush_out;
>  		goto out;
> +	} else {
> +		/*
> +		 * for OPU case, during fsync(), node can be persisted before
> +		 * data when lower device doesn't support write barrier, result
> +		 * in data corruption after SPO.
> +		 * So for strict fsync mode, force to trigger preflush to keep
> +		 * data/node write order to avoid potential data corruption.
> +		 */
> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> +								!atomic) {
> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> +			if (ret)
> +				goto out;
> +		}
>  	}
>  go_write:
>  	/*
> -- 
> 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-01 17:10   ` Jaegeuk Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-01 17:10 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 06/01, Chao Yu wrote:
> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> 
> As [1] reported, if lower device doesn't support write barrier, in below
> case:
> 
> - write page #0; persist
> - overwrite page #0
> - fsync
>  - write data page #0 OPU into device's cache
>  - write inode page into device's cache
>  - issue flush

Well, we have preflush for node writes, so I don't think this is the case.

 fio.op_flags |= REQ_PREFLUSH | REQ_FUA;

> 
> If SPO is triggered during flush command, inode page can be persisted
> before data page #0, so that after recovery, inode page can be recovered
> with new physical block address of data page #0, however there may
> contains dummy data in new physical block address.
> 
> Then what user will see is: after overwrite & fsync + SPO, old data in
> file was corrupted, if any user do care about such case, we can suggest
> user to use STRICT fsync mode, in this mode, we will force to trigger
> preflush command to persist data in device cache in prior to node
> writeback, it avoids potential data corruption during fsync().
> 
> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> ---
> v2:
> - fix this by adding additional preflush command rather than using
> atomic write flow.
>  fs/f2fs/file.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 7d5311d54f63..238ca2a733ac 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>  				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>  			goto flush_out;
>  		goto out;
> +	} else {
> +		/*
> +		 * for OPU case, during fsync(), node can be persisted before
> +		 * data when lower device doesn't support write barrier, result
> +		 * in data corruption after SPO.
> +		 * So for strict fsync mode, force to trigger preflush to keep
> +		 * data/node write order to avoid potential data corruption.
> +		 */
> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> +								!atomic) {
> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> +			if (ret)
> +				goto out;
> +		}
>  	}
>  go_write:
>  	/*
> -- 
> 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-01 17:10   ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-01 23:04     ` Chao Yu
  -1 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-01 23:04 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel, linux-kernel

On 2021/7/2 1:10, Jaegeuk Kim wrote:
> On 06/01, Chao Yu wrote:
>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>
>> As [1] reported, if lower device doesn't support write barrier, in below
>> case:
>>
>> - write page #0; persist
>> - overwrite page #0
>> - fsync
>>   - write data page #0 OPU into device's cache
>>   - write inode page into device's cache
>>   - issue flush
> 
> Well, we have preflush for node writes, so I don't think this is the case.
> 
>   fio.op_flags |= REQ_PREFLUSH | REQ_FUA;

This is only used for atomic write case, right?

I mean the common case which is called from f2fs_issue_flush() in
f2fs_do_sync_file().

And please see do_checkpoint(), we call f2fs_flush_device_cache() and
commit_checkpoint() separately to keep persistence order of CP datas.

See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
for details.

Thanks,

> 
>>
>> If SPO is triggered during flush command, inode page can be persisted
>> before data page #0, so that after recovery, inode page can be recovered
>> with new physical block address of data page #0, however there may
>> contains dummy data in new physical block address.
>>
>> Then what user will see is: after overwrite & fsync + SPO, old data in
>> file was corrupted, if any user do care about such case, we can suggest
>> user to use STRICT fsync mode, in this mode, we will force to trigger
>> preflush command to persist data in device cache in prior to node
>> writeback, it avoids potential data corruption during fsync().
>>
>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>> ---
>> v2:
>> - fix this by adding additional preflush command rather than using
>> atomic write flow.
>>   fs/f2fs/file.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>> index 7d5311d54f63..238ca2a733ac 100644
>> --- a/fs/f2fs/file.c
>> +++ b/fs/f2fs/file.c
>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>   				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>   			goto flush_out;
>>   		goto out;
>> +	} else {
>> +		/*
>> +		 * for OPU case, during fsync(), node can be persisted before
>> +		 * data when lower device doesn't support write barrier, result
>> +		 * in data corruption after SPO.
>> +		 * So for strict fsync mode, force to trigger preflush to keep
>> +		 * data/node write order to avoid potential data corruption.
>> +		 */
>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>> +								!atomic) {
>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>> +			if (ret)
>> +				goto out;
>> +		}
>>   	}
>>   go_write:
>>   	/*
>> -- 
>> 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-01 23:04     ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-01 23:04 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel

On 2021/7/2 1:10, Jaegeuk Kim wrote:
> On 06/01, Chao Yu wrote:
>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>
>> As [1] reported, if lower device doesn't support write barrier, in below
>> case:
>>
>> - write page #0; persist
>> - overwrite page #0
>> - fsync
>>   - write data page #0 OPU into device's cache
>>   - write inode page into device's cache
>>   - issue flush
> 
> Well, we have preflush for node writes, so I don't think this is the case.
> 
>   fio.op_flags |= REQ_PREFLUSH | REQ_FUA;

This is only used for atomic write case, right?

I mean the common case which is called from f2fs_issue_flush() in
f2fs_do_sync_file().

And please see do_checkpoint(), we call f2fs_flush_device_cache() and
commit_checkpoint() separately to keep persistence order of CP datas.

See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
for details.

Thanks,

> 
>>
>> If SPO is triggered during flush command, inode page can be persisted
>> before data page #0, so that after recovery, inode page can be recovered
>> with new physical block address of data page #0, however there may
>> contains dummy data in new physical block address.
>>
>> Then what user will see is: after overwrite & fsync + SPO, old data in
>> file was corrupted, if any user do care about such case, we can suggest
>> user to use STRICT fsync mode, in this mode, we will force to trigger
>> preflush command to persist data in device cache in prior to node
>> writeback, it avoids potential data corruption during fsync().
>>
>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>> ---
>> v2:
>> - fix this by adding additional preflush command rather than using
>> atomic write flow.
>>   fs/f2fs/file.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>> index 7d5311d54f63..238ca2a733ac 100644
>> --- a/fs/f2fs/file.c
>> +++ b/fs/f2fs/file.c
>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>   				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>   			goto flush_out;
>>   		goto out;
>> +	} else {
>> +		/*
>> +		 * for OPU case, during fsync(), node can be persisted before
>> +		 * data when lower device doesn't support write barrier, result
>> +		 * in data corruption after SPO.
>> +		 * So for strict fsync mode, force to trigger preflush to keep
>> +		 * data/node write order to avoid potential data corruption.
>> +		 */
>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>> +								!atomic) {
>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>> +			if (ret)
>> +				goto out;
>> +		}
>>   	}
>>   go_write:
>>   	/*
>> -- 
>> 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-01 23:04     ` [f2fs-dev] " Chao Yu
@ 2021-07-02  1:32       ` Jaegeuk Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-02  1:32 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel, linux-kernel

On 07/02, Chao Yu wrote:
> On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > On 06/01, Chao Yu wrote:
> > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > 
> > > As [1] reported, if lower device doesn't support write barrier, in below
> > > case:
> > > 
> > > - write page #0; persist
> > > - overwrite page #0
> > > - fsync
> > >   - write data page #0 OPU into device's cache
> > >   - write inode page into device's cache
> > >   - issue flush
> > 
> > Well, we have preflush for node writes, so I don't think this is the case.
> > 
> >   fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> 
> This is only used for atomic write case, right?
> 
> I mean the common case which is called from f2fs_issue_flush() in
> f2fs_do_sync_file().

How about adding PREFLUSH when writing node blocks aligned to the above set?

> 
> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> commit_checkpoint() separately to keep persistence order of CP datas.
> 
> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> for details.
> 
> Thanks,
> 
> > 
> > > 
> > > If SPO is triggered during flush command, inode page can be persisted
> > > before data page #0, so that after recovery, inode page can be recovered
> > > with new physical block address of data page #0, however there may
> > > contains dummy data in new physical block address.
> > > 
> > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > file was corrupted, if any user do care about such case, we can suggest
> > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > preflush command to persist data in device cache in prior to node
> > > writeback, it avoids potential data corruption during fsync().
> > > 
> > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > ---
> > > v2:
> > > - fix this by adding additional preflush command rather than using
> > > atomic write flow.
> > >   fs/f2fs/file.c | 14 ++++++++++++++
> > >   1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 7d5311d54f63..238ca2a733ac 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > >   				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > >   			goto flush_out;
> > >   		goto out;
> > > +	} else {
> > > +		/*
> > > +		 * for OPU case, during fsync(), node can be persisted before
> > > +		 * data when lower device doesn't support write barrier, result
> > > +		 * in data corruption after SPO.
> > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > +		 * data/node write order to avoid potential data corruption.
> > > +		 */
> > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > +								!atomic) {
> > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > +			if (ret)
> > > +				goto out;
> > > +		}
> > >   	}
> > >   go_write:
> > >   	/*
> > > -- 
> > > 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-02  1:32       ` Jaegeuk Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-02  1:32 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 07/02, Chao Yu wrote:
> On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > On 06/01, Chao Yu wrote:
> > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > 
> > > As [1] reported, if lower device doesn't support write barrier, in below
> > > case:
> > > 
> > > - write page #0; persist
> > > - overwrite page #0
> > > - fsync
> > >   - write data page #0 OPU into device's cache
> > >   - write inode page into device's cache
> > >   - issue flush
> > 
> > Well, we have preflush for node writes, so I don't think this is the case.
> > 
> >   fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> 
> This is only used for atomic write case, right?
> 
> I mean the common case which is called from f2fs_issue_flush() in
> f2fs_do_sync_file().

How about adding PREFLUSH when writing node blocks aligned to the above set?

> 
> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> commit_checkpoint() separately to keep persistence order of CP datas.
> 
> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> for details.
> 
> Thanks,
> 
> > 
> > > 
> > > If SPO is triggered during flush command, inode page can be persisted
> > > before data page #0, so that after recovery, inode page can be recovered
> > > with new physical block address of data page #0, however there may
> > > contains dummy data in new physical block address.
> > > 
> > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > file was corrupted, if any user do care about such case, we can suggest
> > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > preflush command to persist data in device cache in prior to node
> > > writeback, it avoids potential data corruption during fsync().
> > > 
> > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > ---
> > > v2:
> > > - fix this by adding additional preflush command rather than using
> > > atomic write flow.
> > >   fs/f2fs/file.c | 14 ++++++++++++++
> > >   1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > index 7d5311d54f63..238ca2a733ac 100644
> > > --- a/fs/f2fs/file.c
> > > +++ b/fs/f2fs/file.c
> > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > >   				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > >   			goto flush_out;
> > >   		goto out;
> > > +	} else {
> > > +		/*
> > > +		 * for OPU case, during fsync(), node can be persisted before
> > > +		 * data when lower device doesn't support write barrier, result
> > > +		 * in data corruption after SPO.
> > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > +		 * data/node write order to avoid potential data corruption.
> > > +		 */
> > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > +								!atomic) {
> > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > +			if (ret)
> > > +				goto out;
> > > +		}
> > >   	}
> > >   go_write:
> > >   	/*
> > > -- 
> > > 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-02  1:32       ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-02 15:49         ` Chao Yu
  -1 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-02 15:49 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel, linux-kernel

On 2021/7/2 9:32, Jaegeuk Kim wrote:
> On 07/02, Chao Yu wrote:
>> On 2021/7/2 1:10, Jaegeuk Kim wrote:
>>> On 06/01, Chao Yu wrote:
>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>>>
>>>> As [1] reported, if lower device doesn't support write barrier, in below
>>>> case:
>>>>
>>>> - write page #0; persist
>>>> - overwrite page #0
>>>> - fsync
>>>>    - write data page #0 OPU into device's cache
>>>>    - write inode page into device's cache
>>>>    - issue flush
>>>
>>> Well, we have preflush for node writes, so I don't think this is the case.
>>>
>>>    fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
>>
>> This is only used for atomic write case, right?
>>
>> I mean the common case which is called from f2fs_issue_flush() in
>> f2fs_do_sync_file().
> 
> How about adding PREFLUSH when writing node blocks aligned to the above set?

You mean implementation like v1 as below?

https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/

Thanks,

> 
>>
>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
>> commit_checkpoint() separately to keep persistence order of CP datas.
>>
>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
>> for details.
>>
>> Thanks,
>>
>>>
>>>>
>>>> If SPO is triggered during flush command, inode page can be persisted
>>>> before data page #0, so that after recovery, inode page can be recovered
>>>> with new physical block address of data page #0, however there may
>>>> contains dummy data in new physical block address.
>>>>
>>>> Then what user will see is: after overwrite & fsync + SPO, old data in
>>>> file was corrupted, if any user do care about such case, we can suggest
>>>> user to use STRICT fsync mode, in this mode, we will force to trigger
>>>> preflush command to persist data in device cache in prior to node
>>>> writeback, it avoids potential data corruption during fsync().
>>>>
>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>> ---
>>>> v2:
>>>> - fix this by adding additional preflush command rather than using
>>>> atomic write flow.
>>>>    fs/f2fs/file.c | 14 ++++++++++++++
>>>>    1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>> index 7d5311d54f63..238ca2a733ac 100644
>>>> --- a/fs/f2fs/file.c
>>>> +++ b/fs/f2fs/file.c
>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>    				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>>>    			goto flush_out;
>>>>    		goto out;
>>>> +	} else {
>>>> +		/*
>>>> +		 * for OPU case, during fsync(), node can be persisted before
>>>> +		 * data when lower device doesn't support write barrier, result
>>>> +		 * in data corruption after SPO.
>>>> +		 * So for strict fsync mode, force to trigger preflush to keep
>>>> +		 * data/node write order to avoid potential data corruption.
>>>> +		 */
>>>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>>>> +								!atomic) {
>>>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>>>> +			if (ret)
>>>> +				goto out;
>>>> +		}
>>>>    	}
>>>>    go_write:
>>>>    	/*
>>>> -- 
>>>> 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-02 15:49         ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-02 15:49 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel

On 2021/7/2 9:32, Jaegeuk Kim wrote:
> On 07/02, Chao Yu wrote:
>> On 2021/7/2 1:10, Jaegeuk Kim wrote:
>>> On 06/01, Chao Yu wrote:
>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>>>
>>>> As [1] reported, if lower device doesn't support write barrier, in below
>>>> case:
>>>>
>>>> - write page #0; persist
>>>> - overwrite page #0
>>>> - fsync
>>>>    - write data page #0 OPU into device's cache
>>>>    - write inode page into device's cache
>>>>    - issue flush
>>>
>>> Well, we have preflush for node writes, so I don't think this is the case.
>>>
>>>    fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
>>
>> This is only used for atomic write case, right?
>>
>> I mean the common case which is called from f2fs_issue_flush() in
>> f2fs_do_sync_file().
> 
> How about adding PREFLUSH when writing node blocks aligned to the above set?

You mean implementation like v1 as below?

https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/

Thanks,

> 
>>
>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
>> commit_checkpoint() separately to keep persistence order of CP datas.
>>
>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
>> for details.
>>
>> Thanks,
>>
>>>
>>>>
>>>> If SPO is triggered during flush command, inode page can be persisted
>>>> before data page #0, so that after recovery, inode page can be recovered
>>>> with new physical block address of data page #0, however there may
>>>> contains dummy data in new physical block address.
>>>>
>>>> Then what user will see is: after overwrite & fsync + SPO, old data in
>>>> file was corrupted, if any user do care about such case, we can suggest
>>>> user to use STRICT fsync mode, in this mode, we will force to trigger
>>>> preflush command to persist data in device cache in prior to node
>>>> writeback, it avoids potential data corruption during fsync().
>>>>
>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>> ---
>>>> v2:
>>>> - fix this by adding additional preflush command rather than using
>>>> atomic write flow.
>>>>    fs/f2fs/file.c | 14 ++++++++++++++
>>>>    1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>> index 7d5311d54f63..238ca2a733ac 100644
>>>> --- a/fs/f2fs/file.c
>>>> +++ b/fs/f2fs/file.c
>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>    				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>>>    			goto flush_out;
>>>>    		goto out;
>>>> +	} else {
>>>> +		/*
>>>> +		 * for OPU case, during fsync(), node can be persisted before
>>>> +		 * data when lower device doesn't support write barrier, result
>>>> +		 * in data corruption after SPO.
>>>> +		 * So for strict fsync mode, force to trigger preflush to keep
>>>> +		 * data/node write order to avoid potential data corruption.
>>>> +		 */
>>>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>>>> +								!atomic) {
>>>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>>>> +			if (ret)
>>>> +				goto out;
>>>> +		}
>>>>    	}
>>>>    go_write:
>>>>    	/*
>>>> -- 
>>>> 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-02 15:49         ` [f2fs-dev] " Chao Yu
@ 2021-07-07 17:48           ` Jaegeuk Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-07 17:48 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel, linux-kernel

On 07/02, Chao Yu wrote:
> On 2021/7/2 9:32, Jaegeuk Kim wrote:
> > On 07/02, Chao Yu wrote:
> > > On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > > > On 06/01, Chao Yu wrote:
> > > > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > > > 
> > > > > As [1] reported, if lower device doesn't support write barrier, in below
> > > > > case:
> > > > > 
> > > > > - write page #0; persist
> > > > > - overwrite page #0
> > > > > - fsync
> > > > >    - write data page #0 OPU into device's cache
> > > > >    - write inode page into device's cache
> > > > >    - issue flush
> > > > 
> > > > Well, we have preflush for node writes, so I don't think this is the case.
> > > > 
> > > >    fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> > > 
> > > This is only used for atomic write case, right?
> > > 
> > > I mean the common case which is called from f2fs_issue_flush() in
> > > f2fs_do_sync_file().
> > 
> > How about adding PREFLUSH when writing node blocks aligned to the above set?
> 
> You mean implementation like v1 as below?
> 
> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/

Yea, I think so. :P

> 
> Thanks,
> 
> > 
> > > 
> > > And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> > > commit_checkpoint() separately to keep persistence order of CP datas.
> > > 
> > > See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> > > for details.
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > If SPO is triggered during flush command, inode page can be persisted
> > > > > before data page #0, so that after recovery, inode page can be recovered
> > > > > with new physical block address of data page #0, however there may
> > > > > contains dummy data in new physical block address.
> > > > > 
> > > > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > > > file was corrupted, if any user do care about such case, we can suggest
> > > > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > > > preflush command to persist data in device cache in prior to node
> > > > > writeback, it avoids potential data corruption during fsync().
> > > > > 
> > > > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > > > ---
> > > > > v2:
> > > > > - fix this by adding additional preflush command rather than using
> > > > > atomic write flow.
> > > > >    fs/f2fs/file.c | 14 ++++++++++++++
> > > > >    1 file changed, 14 insertions(+)
> > > > > 
> > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > index 7d5311d54f63..238ca2a733ac 100644
> > > > > --- a/fs/f2fs/file.c
> > > > > +++ b/fs/f2fs/file.c
> > > > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > > > >    				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > > > >    			goto flush_out;
> > > > >    		goto out;
> > > > > +	} else {
> > > > > +		/*
> > > > > +		 * for OPU case, during fsync(), node can be persisted before
> > > > > +		 * data when lower device doesn't support write barrier, result
> > > > > +		 * in data corruption after SPO.
> > > > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > > > +		 * data/node write order to avoid potential data corruption.
> > > > > +		 */
> > > > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > > > +								!atomic) {
> > > > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > > > +			if (ret)
> > > > > +				goto out;
> > > > > +		}
> > > > >    	}
> > > > >    go_write:
> > > > >    	/*
> > > > > -- 
> > > > > 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-07 17:48           ` Jaegeuk Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-07 17:48 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 07/02, Chao Yu wrote:
> On 2021/7/2 9:32, Jaegeuk Kim wrote:
> > On 07/02, Chao Yu wrote:
> > > On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > > > On 06/01, Chao Yu wrote:
> > > > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > > > 
> > > > > As [1] reported, if lower device doesn't support write barrier, in below
> > > > > case:
> > > > > 
> > > > > - write page #0; persist
> > > > > - overwrite page #0
> > > > > - fsync
> > > > >    - write data page #0 OPU into device's cache
> > > > >    - write inode page into device's cache
> > > > >    - issue flush
> > > > 
> > > > Well, we have preflush for node writes, so I don't think this is the case.
> > > > 
> > > >    fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> > > 
> > > This is only used for atomic write case, right?
> > > 
> > > I mean the common case which is called from f2fs_issue_flush() in
> > > f2fs_do_sync_file().
> > 
> > How about adding PREFLUSH when writing node blocks aligned to the above set?
> 
> You mean implementation like v1 as below?
> 
> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/

Yea, I think so. :P

> 
> Thanks,
> 
> > 
> > > 
> > > And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> > > commit_checkpoint() separately to keep persistence order of CP datas.
> > > 
> > > See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> > > for details.
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > If SPO is triggered during flush command, inode page can be persisted
> > > > > before data page #0, so that after recovery, inode page can be recovered
> > > > > with new physical block address of data page #0, however there may
> > > > > contains dummy data in new physical block address.
> > > > > 
> > > > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > > > file was corrupted, if any user do care about such case, we can suggest
> > > > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > > > preflush command to persist data in device cache in prior to node
> > > > > writeback, it avoids potential data corruption during fsync().
> > > > > 
> > > > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > > > ---
> > > > > v2:
> > > > > - fix this by adding additional preflush command rather than using
> > > > > atomic write flow.
> > > > >    fs/f2fs/file.c | 14 ++++++++++++++
> > > > >    1 file changed, 14 insertions(+)
> > > > > 
> > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > index 7d5311d54f63..238ca2a733ac 100644
> > > > > --- a/fs/f2fs/file.c
> > > > > +++ b/fs/f2fs/file.c
> > > > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > > > >    				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > > > >    			goto flush_out;
> > > > >    		goto out;
> > > > > +	} else {
> > > > > +		/*
> > > > > +		 * for OPU case, during fsync(), node can be persisted before
> > > > > +		 * data when lower device doesn't support write barrier, result
> > > > > +		 * in data corruption after SPO.
> > > > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > > > +		 * data/node write order to avoid potential data corruption.
> > > > > +		 */
> > > > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > > > +								!atomic) {
> > > > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > > > +			if (ret)
> > > > > +				goto out;
> > > > > +		}
> > > > >    	}
> > > > >    go_write:
> > > > >    	/*
> > > > > -- 
> > > > > 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-07 17:48           ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-13  9:23             ` Chao Yu
  -1 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-13  9:23 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel, linux-kernel

On 2021/7/8 1:48, Jaegeuk Kim wrote:
> On 07/02, Chao Yu wrote:
>> On 2021/7/2 9:32, Jaegeuk Kim wrote:
>>> On 07/02, Chao Yu wrote:
>>>> On 2021/7/2 1:10, Jaegeuk Kim wrote:
>>>>> On 06/01, Chao Yu wrote:
>>>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>>>>>
>>>>>> As [1] reported, if lower device doesn't support write barrier, in below
>>>>>> case:
>>>>>>
>>>>>> - write page #0; persist
>>>>>> - overwrite page #0
>>>>>> - fsync
>>>>>>     - write data page #0 OPU into device's cache
>>>>>>     - write inode page into device's cache
>>>>>>     - issue flush
>>>>>
>>>>> Well, we have preflush for node writes, so I don't think this is the case.
>>>>>
>>>>>     fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
>>>>
>>>> This is only used for atomic write case, right?
>>>>
>>>> I mean the common case which is called from f2fs_issue_flush() in
>>>> f2fs_do_sync_file().
>>>
>>> How about adding PREFLUSH when writing node blocks aligned to the above set?
>>
>> You mean implementation like v1 as below?
>>
>> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
> 
> Yea, I think so. :P

I prefer v2, we may have several schemes to improve performance with v2, e.g.
- use inplace IO to avoid newly added preflush
- use flush_merge option to avoid redundant preflush
- if lower device supports barrier IO, we can avoid newly added preflush

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
>>>> commit_checkpoint() separately to keep persistence order of CP datas.
>>>>
>>>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
>>>> for details.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> If SPO is triggered during flush command, inode page can be persisted
>>>>>> before data page #0, so that after recovery, inode page can be recovered
>>>>>> with new physical block address of data page #0, however there may
>>>>>> contains dummy data in new physical block address.
>>>>>>
>>>>>> Then what user will see is: after overwrite & fsync + SPO, old data in
>>>>>> file was corrupted, if any user do care about such case, we can suggest
>>>>>> user to use STRICT fsync mode, in this mode, we will force to trigger
>>>>>> preflush command to persist data in device cache in prior to node
>>>>>> writeback, it avoids potential data corruption during fsync().
>>>>>>
>>>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>>>> ---
>>>>>> v2:
>>>>>> - fix this by adding additional preflush command rather than using
>>>>>> atomic write flow.
>>>>>>     fs/f2fs/file.c | 14 ++++++++++++++
>>>>>>     1 file changed, 14 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>> index 7d5311d54f63..238ca2a733ac 100644
>>>>>> --- a/fs/f2fs/file.c
>>>>>> +++ b/fs/f2fs/file.c
>>>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>>>     				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>>>>>     			goto flush_out;
>>>>>>     		goto out;
>>>>>> +	} else {
>>>>>> +		/*
>>>>>> +		 * for OPU case, during fsync(), node can be persisted before
>>>>>> +		 * data when lower device doesn't support write barrier, result
>>>>>> +		 * in data corruption after SPO.
>>>>>> +		 * So for strict fsync mode, force to trigger preflush to keep
>>>>>> +		 * data/node write order to avoid potential data corruption.
>>>>>> +		 */
>>>>>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>>>>>> +								!atomic) {
>>>>>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>>>>>> +			if (ret)
>>>>>> +				goto out;
>>>>>> +		}
>>>>>>     	}
>>>>>>     go_write:
>>>>>>     	/*
>>>>>> -- 
>>>>>> 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-13  9:23             ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-13  9:23 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel

On 2021/7/8 1:48, Jaegeuk Kim wrote:
> On 07/02, Chao Yu wrote:
>> On 2021/7/2 9:32, Jaegeuk Kim wrote:
>>> On 07/02, Chao Yu wrote:
>>>> On 2021/7/2 1:10, Jaegeuk Kim wrote:
>>>>> On 06/01, Chao Yu wrote:
>>>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>>>>>
>>>>>> As [1] reported, if lower device doesn't support write barrier, in below
>>>>>> case:
>>>>>>
>>>>>> - write page #0; persist
>>>>>> - overwrite page #0
>>>>>> - fsync
>>>>>>     - write data page #0 OPU into device's cache
>>>>>>     - write inode page into device's cache
>>>>>>     - issue flush
>>>>>
>>>>> Well, we have preflush for node writes, so I don't think this is the case.
>>>>>
>>>>>     fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
>>>>
>>>> This is only used for atomic write case, right?
>>>>
>>>> I mean the common case which is called from f2fs_issue_flush() in
>>>> f2fs_do_sync_file().
>>>
>>> How about adding PREFLUSH when writing node blocks aligned to the above set?
>>
>> You mean implementation like v1 as below?
>>
>> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
> 
> Yea, I think so. :P

I prefer v2, we may have several schemes to improve performance with v2, e.g.
- use inplace IO to avoid newly added preflush
- use flush_merge option to avoid redundant preflush
- if lower device supports barrier IO, we can avoid newly added preflush

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
>>>> commit_checkpoint() separately to keep persistence order of CP datas.
>>>>
>>>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
>>>> for details.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> If SPO is triggered during flush command, inode page can be persisted
>>>>>> before data page #0, so that after recovery, inode page can be recovered
>>>>>> with new physical block address of data page #0, however there may
>>>>>> contains dummy data in new physical block address.
>>>>>>
>>>>>> Then what user will see is: after overwrite & fsync + SPO, old data in
>>>>>> file was corrupted, if any user do care about such case, we can suggest
>>>>>> user to use STRICT fsync mode, in this mode, we will force to trigger
>>>>>> preflush command to persist data in device cache in prior to node
>>>>>> writeback, it avoids potential data corruption during fsync().
>>>>>>
>>>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>>>> ---
>>>>>> v2:
>>>>>> - fix this by adding additional preflush command rather than using
>>>>>> atomic write flow.
>>>>>>     fs/f2fs/file.c | 14 ++++++++++++++
>>>>>>     1 file changed, 14 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>> index 7d5311d54f63..238ca2a733ac 100644
>>>>>> --- a/fs/f2fs/file.c
>>>>>> +++ b/fs/f2fs/file.c
>>>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>>>     				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>>>>>     			goto flush_out;
>>>>>>     		goto out;
>>>>>> +	} else {
>>>>>> +		/*
>>>>>> +		 * for OPU case, during fsync(), node can be persisted before
>>>>>> +		 * data when lower device doesn't support write barrier, result
>>>>>> +		 * in data corruption after SPO.
>>>>>> +		 * So for strict fsync mode, force to trigger preflush to keep
>>>>>> +		 * data/node write order to avoid potential data corruption.
>>>>>> +		 */
>>>>>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>>>>>> +								!atomic) {
>>>>>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>>>>>> +			if (ret)
>>>>>> +				goto out;
>>>>>> +		}
>>>>>>     	}
>>>>>>     go_write:
>>>>>>     	/*
>>>>>> -- 
>>>>>> 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-13  9:23             ` [f2fs-dev] " Chao Yu
@ 2021-07-13 23:34               ` Jaegeuk Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-13 23:34 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel, linux-kernel

On 07/13, Chao Yu wrote:
> On 2021/7/8 1:48, Jaegeuk Kim wrote:
> > On 07/02, Chao Yu wrote:
> > > On 2021/7/2 9:32, Jaegeuk Kim wrote:
> > > > On 07/02, Chao Yu wrote:
> > > > > On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > > > > > On 06/01, Chao Yu wrote:
> > > > > > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > > > > > 
> > > > > > > As [1] reported, if lower device doesn't support write barrier, in below
> > > > > > > case:
> > > > > > > 
> > > > > > > - write page #0; persist
> > > > > > > - overwrite page #0
> > > > > > > - fsync
> > > > > > >     - write data page #0 OPU into device's cache
> > > > > > >     - write inode page into device's cache
> > > > > > >     - issue flush
> > > > > > 
> > > > > > Well, we have preflush for node writes, so I don't think this is the case.
> > > > > > 
> > > > > >     fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> > > > > 
> > > > > This is only used for atomic write case, right?
> > > > > 
> > > > > I mean the common case which is called from f2fs_issue_flush() in
> > > > > f2fs_do_sync_file().
> > > > 
> > > > How about adding PREFLUSH when writing node blocks aligned to the above set?
> > > 
> > > You mean implementation like v1 as below?
> > > 
> > > https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
> > 
> > Yea, I think so. :P
> 
> I prefer v2, we may have several schemes to improve performance with v2, e.g.
> - use inplace IO to avoid newly added preflush
> - use flush_merge option to avoid redundant preflush
> - if lower device supports barrier IO, we can avoid newly added preflush

Doesn't v2 give one more flush than v1? Why do you want to take worse one and
try to improve back? Not clear the benefit on v2.

> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> > > > > commit_checkpoint() separately to keep persistence order of CP datas.
> > > > > 
> > > > > See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> > > > > for details.
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > If SPO is triggered during flush command, inode page can be persisted
> > > > > > > before data page #0, so that after recovery, inode page can be recovered
> > > > > > > with new physical block address of data page #0, however there may
> > > > > > > contains dummy data in new physical block address.
> > > > > > > 
> > > > > > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > > > > > file was corrupted, if any user do care about such case, we can suggest
> > > > > > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > > > > > preflush command to persist data in device cache in prior to node
> > > > > > > writeback, it avoids potential data corruption during fsync().
> > > > > > > 
> > > > > > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > > > > > ---
> > > > > > > v2:
> > > > > > > - fix this by adding additional preflush command rather than using
> > > > > > > atomic write flow.
> > > > > > >     fs/f2fs/file.c | 14 ++++++++++++++
> > > > > > >     1 file changed, 14 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > > > index 7d5311d54f63..238ca2a733ac 100644
> > > > > > > --- a/fs/f2fs/file.c
> > > > > > > +++ b/fs/f2fs/file.c
> > > > > > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > > > > > >     				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > > > > > >     			goto flush_out;
> > > > > > >     		goto out;
> > > > > > > +	} else {
> > > > > > > +		/*
> > > > > > > +		 * for OPU case, during fsync(), node can be persisted before
> > > > > > > +		 * data when lower device doesn't support write barrier, result
> > > > > > > +		 * in data corruption after SPO.
> > > > > > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > > > > > +		 * data/node write order to avoid potential data corruption.
> > > > > > > +		 */
> > > > > > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > > > > > +								!atomic) {
> > > > > > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > > > > > +			if (ret)
> > > > > > > +				goto out;
> > > > > > > +		}
> > > > > > >     	}
> > > > > > >     go_write:
> > > > > > >     	/*
> > > > > > > -- 
> > > > > > > 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-13 23:34               ` Jaegeuk Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-13 23:34 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 07/13, Chao Yu wrote:
> On 2021/7/8 1:48, Jaegeuk Kim wrote:
> > On 07/02, Chao Yu wrote:
> > > On 2021/7/2 9:32, Jaegeuk Kim wrote:
> > > > On 07/02, Chao Yu wrote:
> > > > > On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > > > > > On 06/01, Chao Yu wrote:
> > > > > > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > > > > > 
> > > > > > > As [1] reported, if lower device doesn't support write barrier, in below
> > > > > > > case:
> > > > > > > 
> > > > > > > - write page #0; persist
> > > > > > > - overwrite page #0
> > > > > > > - fsync
> > > > > > >     - write data page #0 OPU into device's cache
> > > > > > >     - write inode page into device's cache
> > > > > > >     - issue flush
> > > > > > 
> > > > > > Well, we have preflush for node writes, so I don't think this is the case.
> > > > > > 
> > > > > >     fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> > > > > 
> > > > > This is only used for atomic write case, right?
> > > > > 
> > > > > I mean the common case which is called from f2fs_issue_flush() in
> > > > > f2fs_do_sync_file().
> > > > 
> > > > How about adding PREFLUSH when writing node blocks aligned to the above set?
> > > 
> > > You mean implementation like v1 as below?
> > > 
> > > https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
> > 
> > Yea, I think so. :P
> 
> I prefer v2, we may have several schemes to improve performance with v2, e.g.
> - use inplace IO to avoid newly added preflush
> - use flush_merge option to avoid redundant preflush
> - if lower device supports barrier IO, we can avoid newly added preflush

Doesn't v2 give one more flush than v1? Why do you want to take worse one and
try to improve back? Not clear the benefit on v2.

> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> > > > > commit_checkpoint() separately to keep persistence order of CP datas.
> > > > > 
> > > > > See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> > > > > for details.
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > If SPO is triggered during flush command, inode page can be persisted
> > > > > > > before data page #0, so that after recovery, inode page can be recovered
> > > > > > > with new physical block address of data page #0, however there may
> > > > > > > contains dummy data in new physical block address.
> > > > > > > 
> > > > > > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > > > > > file was corrupted, if any user do care about such case, we can suggest
> > > > > > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > > > > > preflush command to persist data in device cache in prior to node
> > > > > > > writeback, it avoids potential data corruption during fsync().
> > > > > > > 
> > > > > > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > > > > > ---
> > > > > > > v2:
> > > > > > > - fix this by adding additional preflush command rather than using
> > > > > > > atomic write flow.
> > > > > > >     fs/f2fs/file.c | 14 ++++++++++++++
> > > > > > >     1 file changed, 14 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > > > index 7d5311d54f63..238ca2a733ac 100644
> > > > > > > --- a/fs/f2fs/file.c
> > > > > > > +++ b/fs/f2fs/file.c
> > > > > > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > > > > > >     				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > > > > > >     			goto flush_out;
> > > > > > >     		goto out;
> > > > > > > +	} else {
> > > > > > > +		/*
> > > > > > > +		 * for OPU case, during fsync(), node can be persisted before
> > > > > > > +		 * data when lower device doesn't support write barrier, result
> > > > > > > +		 * in data corruption after SPO.
> > > > > > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > > > > > +		 * data/node write order to avoid potential data corruption.
> > > > > > > +		 */
> > > > > > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > > > > > +								!atomic) {
> > > > > > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > > > > > +			if (ret)
> > > > > > > +				goto out;
> > > > > > > +		}
> > > > > > >     	}
> > > > > > >     go_write:
> > > > > > >     	/*
> > > > > > > -- 
> > > > > > > 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-13 23:34               ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-14  1:15                 ` Chao Yu
  -1 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-14  1:15 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel, linux-kernel

On 2021/7/14 7:34, Jaegeuk Kim wrote:
> On 07/13, Chao Yu wrote:
>> On 2021/7/8 1:48, Jaegeuk Kim wrote:
>>> On 07/02, Chao Yu wrote:
>>>> On 2021/7/2 9:32, Jaegeuk Kim wrote:
>>>>> On 07/02, Chao Yu wrote:
>>>>>> On 2021/7/2 1:10, Jaegeuk Kim wrote:
>>>>>>> On 06/01, Chao Yu wrote:
>>>>>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>>>>>>>
>>>>>>>> As [1] reported, if lower device doesn't support write barrier, in below
>>>>>>>> case:
>>>>>>>>
>>>>>>>> - write page #0; persist
>>>>>>>> - overwrite page #0
>>>>>>>> - fsync
>>>>>>>>      - write data page #0 OPU into device's cache
>>>>>>>>      - write inode page into device's cache
>>>>>>>>      - issue flush
>>>>>>>
>>>>>>> Well, we have preflush for node writes, so I don't think this is the case.
>>>>>>>
>>>>>>>      fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
>>>>>>
>>>>>> This is only used for atomic write case, right?
>>>>>>
>>>>>> I mean the common case which is called from f2fs_issue_flush() in
>>>>>> f2fs_do_sync_file().
>>>>>
>>>>> How about adding PREFLUSH when writing node blocks aligned to the above set?
>>>>
>>>> You mean implementation like v1 as below?
>>>>
>>>> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
>>>
>>> Yea, I think so. :P
>>
>> I prefer v2, we may have several schemes to improve performance with v2, e.g.
>> - use inplace IO to avoid newly added preflush
>> - use flush_merge option to avoid redundant preflush
>> - if lower device supports barrier IO, we can avoid newly added preflush
> 
> Doesn't v2 give one more flush than v1? Why do you want to take worse one and

FUA implies an extra preflush command or similar mechanism in lower device to keep data
in bio being persistent before this command's completion.

Also if lower device doesn't support FUA natively, block layer turns it into an empty
PREFLUSH command.

So, it's hard to say which one will win the benchmark game, maybe we need some
performance data before making the choice, but you know, it depends on device's
character.

> try to improve back? Not clear the benefit on v2.

Well, if user suffer and complain performance regression with v1, any plan to improve it?

I just thought about plan B/C/D for no matter v1 or v2.

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
>>>>>> commit_checkpoint() separately to keep persistence order of CP datas.
>>>>>>
>>>>>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
>>>>>> for details.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> If SPO is triggered during flush command, inode page can be persisted
>>>>>>>> before data page #0, so that after recovery, inode page can be recovered
>>>>>>>> with new physical block address of data page #0, however there may
>>>>>>>> contains dummy data in new physical block address.
>>>>>>>>
>>>>>>>> Then what user will see is: after overwrite & fsync + SPO, old data in
>>>>>>>> file was corrupted, if any user do care about such case, we can suggest
>>>>>>>> user to use STRICT fsync mode, in this mode, we will force to trigger
>>>>>>>> preflush command to persist data in device cache in prior to node
>>>>>>>> writeback, it avoids potential data corruption during fsync().
>>>>>>>>
>>>>>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>>>>>> ---
>>>>>>>> v2:
>>>>>>>> - fix this by adding additional preflush command rather than using
>>>>>>>> atomic write flow.
>>>>>>>>      fs/f2fs/file.c | 14 ++++++++++++++
>>>>>>>>      1 file changed, 14 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>>>> index 7d5311d54f63..238ca2a733ac 100644
>>>>>>>> --- a/fs/f2fs/file.c
>>>>>>>> +++ b/fs/f2fs/file.c
>>>>>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>>>>>      				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>>>>>>>      			goto flush_out;
>>>>>>>>      		goto out;
>>>>>>>> +	} else {
>>>>>>>> +		/*
>>>>>>>> +		 * for OPU case, during fsync(), node can be persisted before
>>>>>>>> +		 * data when lower device doesn't support write barrier, result
>>>>>>>> +		 * in data corruption after SPO.
>>>>>>>> +		 * So for strict fsync mode, force to trigger preflush to keep
>>>>>>>> +		 * data/node write order to avoid potential data corruption.
>>>>>>>> +		 */
>>>>>>>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>>>>>>>> +								!atomic) {
>>>>>>>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>>>>>>>> +			if (ret)
>>>>>>>> +				goto out;
>>>>>>>> +		}
>>>>>>>>      	}
>>>>>>>>      go_write:
>>>>>>>>      	/*
>>>>>>>> -- 
>>>>>>>> 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-14  1:15                 ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-14  1:15 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel

On 2021/7/14 7:34, Jaegeuk Kim wrote:
> On 07/13, Chao Yu wrote:
>> On 2021/7/8 1:48, Jaegeuk Kim wrote:
>>> On 07/02, Chao Yu wrote:
>>>> On 2021/7/2 9:32, Jaegeuk Kim wrote:
>>>>> On 07/02, Chao Yu wrote:
>>>>>> On 2021/7/2 1:10, Jaegeuk Kim wrote:
>>>>>>> On 06/01, Chao Yu wrote:
>>>>>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>>>>>>>
>>>>>>>> As [1] reported, if lower device doesn't support write barrier, in below
>>>>>>>> case:
>>>>>>>>
>>>>>>>> - write page #0; persist
>>>>>>>> - overwrite page #0
>>>>>>>> - fsync
>>>>>>>>      - write data page #0 OPU into device's cache
>>>>>>>>      - write inode page into device's cache
>>>>>>>>      - issue flush
>>>>>>>
>>>>>>> Well, we have preflush for node writes, so I don't think this is the case.
>>>>>>>
>>>>>>>      fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
>>>>>>
>>>>>> This is only used for atomic write case, right?
>>>>>>
>>>>>> I mean the common case which is called from f2fs_issue_flush() in
>>>>>> f2fs_do_sync_file().
>>>>>
>>>>> How about adding PREFLUSH when writing node blocks aligned to the above set?
>>>>
>>>> You mean implementation like v1 as below?
>>>>
>>>> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
>>>
>>> Yea, I think so. :P
>>
>> I prefer v2, we may have several schemes to improve performance with v2, e.g.
>> - use inplace IO to avoid newly added preflush
>> - use flush_merge option to avoid redundant preflush
>> - if lower device supports barrier IO, we can avoid newly added preflush
> 
> Doesn't v2 give one more flush than v1? Why do you want to take worse one and

FUA implies an extra preflush command or similar mechanism in lower device to keep data
in bio being persistent before this command's completion.

Also if lower device doesn't support FUA natively, block layer turns it into an empty
PREFLUSH command.

So, it's hard to say which one will win the benchmark game, maybe we need some
performance data before making the choice, but you know, it depends on device's
character.

> try to improve back? Not clear the benefit on v2.

Well, if user suffer and complain performance regression with v1, any plan to improve it?

I just thought about plan B/C/D for no matter v1 or v2.

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
>>>>>> commit_checkpoint() separately to keep persistence order of CP datas.
>>>>>>
>>>>>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
>>>>>> for details.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> If SPO is triggered during flush command, inode page can be persisted
>>>>>>>> before data page #0, so that after recovery, inode page can be recovered
>>>>>>>> with new physical block address of data page #0, however there may
>>>>>>>> contains dummy data in new physical block address.
>>>>>>>>
>>>>>>>> Then what user will see is: after overwrite & fsync + SPO, old data in
>>>>>>>> file was corrupted, if any user do care about such case, we can suggest
>>>>>>>> user to use STRICT fsync mode, in this mode, we will force to trigger
>>>>>>>> preflush command to persist data in device cache in prior to node
>>>>>>>> writeback, it avoids potential data corruption during fsync().
>>>>>>>>
>>>>>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>>>>>> ---
>>>>>>>> v2:
>>>>>>>> - fix this by adding additional preflush command rather than using
>>>>>>>> atomic write flow.
>>>>>>>>      fs/f2fs/file.c | 14 ++++++++++++++
>>>>>>>>      1 file changed, 14 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>>>> index 7d5311d54f63..238ca2a733ac 100644
>>>>>>>> --- a/fs/f2fs/file.c
>>>>>>>> +++ b/fs/f2fs/file.c
>>>>>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>>>>>      				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>>>>>>>      			goto flush_out;
>>>>>>>>      		goto out;
>>>>>>>> +	} else {
>>>>>>>> +		/*
>>>>>>>> +		 * for OPU case, during fsync(), node can be persisted before
>>>>>>>> +		 * data when lower device doesn't support write barrier, result
>>>>>>>> +		 * in data corruption after SPO.
>>>>>>>> +		 * So for strict fsync mode, force to trigger preflush to keep
>>>>>>>> +		 * data/node write order to avoid potential data corruption.
>>>>>>>> +		 */
>>>>>>>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>>>>>>>> +								!atomic) {
>>>>>>>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>>>>>>>> +			if (ret)
>>>>>>>> +				goto out;
>>>>>>>> +		}
>>>>>>>>      	}
>>>>>>>>      go_write:
>>>>>>>>      	/*
>>>>>>>> -- 
>>>>>>>> 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-14  1:15                 ` [f2fs-dev] " Chao Yu
@ 2021-07-14  2:19                   ` Jaegeuk Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-14  2:19 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel, linux-kernel

On 07/14, Chao Yu wrote:
> On 2021/7/14 7:34, Jaegeuk Kim wrote:
> > On 07/13, Chao Yu wrote:
> > > On 2021/7/8 1:48, Jaegeuk Kim wrote:
> > > > On 07/02, Chao Yu wrote:
> > > > > On 2021/7/2 9:32, Jaegeuk Kim wrote:
> > > > > > On 07/02, Chao Yu wrote:
> > > > > > > On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > > > > > > > On 06/01, Chao Yu wrote:
> > > > > > > > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > > > > > > > 
> > > > > > > > > As [1] reported, if lower device doesn't support write barrier, in below
> > > > > > > > > case:
> > > > > > > > > 
> > > > > > > > > - write page #0; persist
> > > > > > > > > - overwrite page #0
> > > > > > > > > - fsync
> > > > > > > > >      - write data page #0 OPU into device's cache
> > > > > > > > >      - write inode page into device's cache
> > > > > > > > >      - issue flush
> > > > > > > > 
> > > > > > > > Well, we have preflush for node writes, so I don't think this is the case.
> > > > > > > > 
> > > > > > > >      fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> > > > > > > 
> > > > > > > This is only used for atomic write case, right?
> > > > > > > 
> > > > > > > I mean the common case which is called from f2fs_issue_flush() in
> > > > > > > f2fs_do_sync_file().
> > > > > > 
> > > > > > How about adding PREFLUSH when writing node blocks aligned to the above set?
> > > > > 
> > > > > You mean implementation like v1 as below?
> > > > > 
> > > > > https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
> > > > 
> > > > Yea, I think so. :P
> > > 
> > > I prefer v2, we may have several schemes to improve performance with v2, e.g.
> > > - use inplace IO to avoid newly added preflush
> > > - use flush_merge option to avoid redundant preflush
> > > - if lower device supports barrier IO, we can avoid newly added preflush
> > 
> > Doesn't v2 give one more flush than v1? Why do you want to take worse one and
> 
> FUA implies an extra preflush command or similar mechanism in lower device to keep data
> in bio being persistent before this command's completion.
> 
> Also if lower device doesn't support FUA natively, block layer turns it into an empty
> PREFLUSH command.
> 
> So, it's hard to say which one will win the benchmark game, maybe we need some
> performance data before making the choice, but you know, it depends on device's
> character.

I was looking at # of bios.

> 
> > try to improve back? Not clear the benefit on v2.
> 
> Well, if user suffer and complain performance regression with v1, any plan to improve it?
> 
> I just thought about plan B/C/D for no matter v1 or v2.

I assumed you wanted v2 since it might be used for B/C/D improvements. But, it
seems it wasn't. My point is to save one bio, but piggyback the flag to the
device driver.

> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> > > > > > > commit_checkpoint() separately to keep persistence order of CP datas.
> > > > > > > 
> > > > > > > See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> > > > > > > for details.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > If SPO is triggered during flush command, inode page can be persisted
> > > > > > > > > before data page #0, so that after recovery, inode page can be recovered
> > > > > > > > > with new physical block address of data page #0, however there may
> > > > > > > > > contains dummy data in new physical block address.
> > > > > > > > > 
> > > > > > > > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > > > > > > > file was corrupted, if any user do care about such case, we can suggest
> > > > > > > > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > > > > > > > preflush command to persist data in device cache in prior to node
> > > > > > > > > writeback, it avoids potential data corruption during fsync().
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > > > > > > > ---
> > > > > > > > > v2:
> > > > > > > > > - fix this by adding additional preflush command rather than using
> > > > > > > > > atomic write flow.
> > > > > > > > >      fs/f2fs/file.c | 14 ++++++++++++++
> > > > > > > > >      1 file changed, 14 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > > > > > index 7d5311d54f63..238ca2a733ac 100644
> > > > > > > > > --- a/fs/f2fs/file.c
> > > > > > > > > +++ b/fs/f2fs/file.c
> > > > > > > > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > > > > > > > >      				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > > > > > > > >      			goto flush_out;
> > > > > > > > >      		goto out;
> > > > > > > > > +	} else {
> > > > > > > > > +		/*
> > > > > > > > > +		 * for OPU case, during fsync(), node can be persisted before
> > > > > > > > > +		 * data when lower device doesn't support write barrier, result
> > > > > > > > > +		 * in data corruption after SPO.
> > > > > > > > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > > > > > > > +		 * data/node write order to avoid potential data corruption.
> > > > > > > > > +		 */
> > > > > > > > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > > > > > > > +								!atomic) {
> > > > > > > > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > > > > > > > +			if (ret)
> > > > > > > > > +				goto out;
> > > > > > > > > +		}
> > > > > > > > >      	}
> > > > > > > > >      go_write:
> > > > > > > > >      	/*
> > > > > > > > > -- 
> > > > > > > > > 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-14  2:19                   ` Jaegeuk Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-14  2:19 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 07/14, Chao Yu wrote:
> On 2021/7/14 7:34, Jaegeuk Kim wrote:
> > On 07/13, Chao Yu wrote:
> > > On 2021/7/8 1:48, Jaegeuk Kim wrote:
> > > > On 07/02, Chao Yu wrote:
> > > > > On 2021/7/2 9:32, Jaegeuk Kim wrote:
> > > > > > On 07/02, Chao Yu wrote:
> > > > > > > On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > > > > > > > On 06/01, Chao Yu wrote:
> > > > > > > > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > > > > > > > 
> > > > > > > > > As [1] reported, if lower device doesn't support write barrier, in below
> > > > > > > > > case:
> > > > > > > > > 
> > > > > > > > > - write page #0; persist
> > > > > > > > > - overwrite page #0
> > > > > > > > > - fsync
> > > > > > > > >      - write data page #0 OPU into device's cache
> > > > > > > > >      - write inode page into device's cache
> > > > > > > > >      - issue flush
> > > > > > > > 
> > > > > > > > Well, we have preflush for node writes, so I don't think this is the case.
> > > > > > > > 
> > > > > > > >      fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> > > > > > > 
> > > > > > > This is only used for atomic write case, right?
> > > > > > > 
> > > > > > > I mean the common case which is called from f2fs_issue_flush() in
> > > > > > > f2fs_do_sync_file().
> > > > > > 
> > > > > > How about adding PREFLUSH when writing node blocks aligned to the above set?
> > > > > 
> > > > > You mean implementation like v1 as below?
> > > > > 
> > > > > https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
> > > > 
> > > > Yea, I think so. :P
> > > 
> > > I prefer v2, we may have several schemes to improve performance with v2, e.g.
> > > - use inplace IO to avoid newly added preflush
> > > - use flush_merge option to avoid redundant preflush
> > > - if lower device supports barrier IO, we can avoid newly added preflush
> > 
> > Doesn't v2 give one more flush than v1? Why do you want to take worse one and
> 
> FUA implies an extra preflush command or similar mechanism in lower device to keep data
> in bio being persistent before this command's completion.
> 
> Also if lower device doesn't support FUA natively, block layer turns it into an empty
> PREFLUSH command.
> 
> So, it's hard to say which one will win the benchmark game, maybe we need some
> performance data before making the choice, but you know, it depends on device's
> character.

I was looking at # of bios.

> 
> > try to improve back? Not clear the benefit on v2.
> 
> Well, if user suffer and complain performance regression with v1, any plan to improve it?
> 
> I just thought about plan B/C/D for no matter v1 or v2.

I assumed you wanted v2 since it might be used for B/C/D improvements. But, it
seems it wasn't. My point is to save one bio, but piggyback the flag to the
device driver.

> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> > > > > > > commit_checkpoint() separately to keep persistence order of CP datas.
> > > > > > > 
> > > > > > > See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> > > > > > > for details.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > If SPO is triggered during flush command, inode page can be persisted
> > > > > > > > > before data page #0, so that after recovery, inode page can be recovered
> > > > > > > > > with new physical block address of data page #0, however there may
> > > > > > > > > contains dummy data in new physical block address.
> > > > > > > > > 
> > > > > > > > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > > > > > > > file was corrupted, if any user do care about such case, we can suggest
> > > > > > > > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > > > > > > > preflush command to persist data in device cache in prior to node
> > > > > > > > > writeback, it avoids potential data corruption during fsync().
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > > > > > > > ---
> > > > > > > > > v2:
> > > > > > > > > - fix this by adding additional preflush command rather than using
> > > > > > > > > atomic write flow.
> > > > > > > > >      fs/f2fs/file.c | 14 ++++++++++++++
> > > > > > > > >      1 file changed, 14 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > > > > > index 7d5311d54f63..238ca2a733ac 100644
> > > > > > > > > --- a/fs/f2fs/file.c
> > > > > > > > > +++ b/fs/f2fs/file.c
> > > > > > > > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > > > > > > > >      				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > > > > > > > >      			goto flush_out;
> > > > > > > > >      		goto out;
> > > > > > > > > +	} else {
> > > > > > > > > +		/*
> > > > > > > > > +		 * for OPU case, during fsync(), node can be persisted before
> > > > > > > > > +		 * data when lower device doesn't support write barrier, result
> > > > > > > > > +		 * in data corruption after SPO.
> > > > > > > > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > > > > > > > +		 * data/node write order to avoid potential data corruption.
> > > > > > > > > +		 */
> > > > > > > > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > > > > > > > +								!atomic) {
> > > > > > > > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > > > > > > > +			if (ret)
> > > > > > > > > +				goto out;
> > > > > > > > > +		}
> > > > > > > > >      	}
> > > > > > > > >      go_write:
> > > > > > > > >      	/*
> > > > > > > > > -- 
> > > > > > > > > 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-14  2:19                   ` [f2fs-dev] " Jaegeuk Kim
@ 2021-07-14  2:51                     ` Chao Yu
  -1 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-14  2:51 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel, linux-kernel

On 2021/7/14 10:19, Jaegeuk Kim wrote:
> On 07/14, Chao Yu wrote:
>> On 2021/7/14 7:34, Jaegeuk Kim wrote:
>>> On 07/13, Chao Yu wrote:
>>>> On 2021/7/8 1:48, Jaegeuk Kim wrote:
>>>>> On 07/02, Chao Yu wrote:
>>>>>> On 2021/7/2 9:32, Jaegeuk Kim wrote:
>>>>>>> On 07/02, Chao Yu wrote:
>>>>>>>> On 2021/7/2 1:10, Jaegeuk Kim wrote:
>>>>>>>>> On 06/01, Chao Yu wrote:
>>>>>>>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>>>>>>>>>
>>>>>>>>>> As [1] reported, if lower device doesn't support write barrier, in below
>>>>>>>>>> case:
>>>>>>>>>>
>>>>>>>>>> - write page #0; persist
>>>>>>>>>> - overwrite page #0
>>>>>>>>>> - fsync
>>>>>>>>>>       - write data page #0 OPU into device's cache
>>>>>>>>>>       - write inode page into device's cache
>>>>>>>>>>       - issue flush
>>>>>>>>>
>>>>>>>>> Well, we have preflush for node writes, so I don't think this is the case.
>>>>>>>>>
>>>>>>>>>       fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
>>>>>>>>
>>>>>>>> This is only used for atomic write case, right?
>>>>>>>>
>>>>>>>> I mean the common case which is called from f2fs_issue_flush() in
>>>>>>>> f2fs_do_sync_file().
>>>>>>>
>>>>>>> How about adding PREFLUSH when writing node blocks aligned to the above set?
>>>>>>
>>>>>> You mean implementation like v1 as below?
>>>>>>
>>>>>> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
>>>>>
>>>>> Yea, I think so. :P
>>>>
>>>> I prefer v2, we may have several schemes to improve performance with v2, e.g.
>>>> - use inplace IO to avoid newly added preflush
>>>> - use flush_merge option to avoid redundant preflush
>>>> - if lower device supports barrier IO, we can avoid newly added preflush
>>>
>>> Doesn't v2 give one more flush than v1? Why do you want to take worse one and
>>
>> FUA implies an extra preflush command or similar mechanism in lower device to keep data
>> in bio being persistent before this command's completion.
>>
>> Also if lower device doesn't support FUA natively, block layer turns it into an empty
>> PREFLUSH command.
>>
>> So, it's hard to say which one will win the benchmark game, maybe we need some
>> performance data before making the choice, but you know, it depends on device's
>> character.
> 
> I was looking at # of bios.
> 
>>
>>> try to improve back? Not clear the benefit on v2.
>>
>> Well, if user suffer and complain performance regression with v1, any plan to improve it?
>>
>> I just thought about plan B/C/D for no matter v1 or v2.
> 
> I assumed you wanted v2 since it might be used for B/C/D improvements. But, it
> seems it wasn't. My point is to save one bio, but piggyback the flag to the
> device driver.

I doubt the conclusion...but it needs to get some data to prove it.

I think the right way is merging v1 now to fix the bug firstly, and let me do
the comparison on them a little bit later to see whether we need another
implementation... thoughts?

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
>>>>>>>> commit_checkpoint() separately to keep persistence order of CP datas.
>>>>>>>>
>>>>>>>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
>>>>>>>> for details.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If SPO is triggered during flush command, inode page can be persisted
>>>>>>>>>> before data page #0, so that after recovery, inode page can be recovered
>>>>>>>>>> with new physical block address of data page #0, however there may
>>>>>>>>>> contains dummy data in new physical block address.
>>>>>>>>>>
>>>>>>>>>> Then what user will see is: after overwrite & fsync + SPO, old data in
>>>>>>>>>> file was corrupted, if any user do care about such case, we can suggest
>>>>>>>>>> user to use STRICT fsync mode, in this mode, we will force to trigger
>>>>>>>>>> preflush command to persist data in device cache in prior to node
>>>>>>>>>> writeback, it avoids potential data corruption during fsync().
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>>>>>>>> ---
>>>>>>>>>> v2:
>>>>>>>>>> - fix this by adding additional preflush command rather than using
>>>>>>>>>> atomic write flow.
>>>>>>>>>>       fs/f2fs/file.c | 14 ++++++++++++++
>>>>>>>>>>       1 file changed, 14 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>>>>>> index 7d5311d54f63..238ca2a733ac 100644
>>>>>>>>>> --- a/fs/f2fs/file.c
>>>>>>>>>> +++ b/fs/f2fs/file.c
>>>>>>>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>>>>>>>       				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>>>>>>>>>       			goto flush_out;
>>>>>>>>>>       		goto out;
>>>>>>>>>> +	} else {
>>>>>>>>>> +		/*
>>>>>>>>>> +		 * for OPU case, during fsync(), node can be persisted before
>>>>>>>>>> +		 * data when lower device doesn't support write barrier, result
>>>>>>>>>> +		 * in data corruption after SPO.
>>>>>>>>>> +		 * So for strict fsync mode, force to trigger preflush to keep
>>>>>>>>>> +		 * data/node write order to avoid potential data corruption.
>>>>>>>>>> +		 */
>>>>>>>>>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>>>>>>>>>> +								!atomic) {
>>>>>>>>>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>>>>>>>>>> +			if (ret)
>>>>>>>>>> +				goto out;
>>>>>>>>>> +		}
>>>>>>>>>>       	}
>>>>>>>>>>       go_write:
>>>>>>>>>>       	/*
>>>>>>>>>> -- 
>>>>>>>>>> 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-14  2:51                     ` Chao Yu
  0 siblings, 0 replies; 28+ messages in thread
From: Chao Yu @ 2021-07-14  2:51 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel

On 2021/7/14 10:19, Jaegeuk Kim wrote:
> On 07/14, Chao Yu wrote:
>> On 2021/7/14 7:34, Jaegeuk Kim wrote:
>>> On 07/13, Chao Yu wrote:
>>>> On 2021/7/8 1:48, Jaegeuk Kim wrote:
>>>>> On 07/02, Chao Yu wrote:
>>>>>> On 2021/7/2 9:32, Jaegeuk Kim wrote:
>>>>>>> On 07/02, Chao Yu wrote:
>>>>>>>> On 2021/7/2 1:10, Jaegeuk Kim wrote:
>>>>>>>>> On 06/01, Chao Yu wrote:
>>>>>>>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
>>>>>>>>>>
>>>>>>>>>> As [1] reported, if lower device doesn't support write barrier, in below
>>>>>>>>>> case:
>>>>>>>>>>
>>>>>>>>>> - write page #0; persist
>>>>>>>>>> - overwrite page #0
>>>>>>>>>> - fsync
>>>>>>>>>>       - write data page #0 OPU into device's cache
>>>>>>>>>>       - write inode page into device's cache
>>>>>>>>>>       - issue flush
>>>>>>>>>
>>>>>>>>> Well, we have preflush for node writes, so I don't think this is the case.
>>>>>>>>>
>>>>>>>>>       fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
>>>>>>>>
>>>>>>>> This is only used for atomic write case, right?
>>>>>>>>
>>>>>>>> I mean the common case which is called from f2fs_issue_flush() in
>>>>>>>> f2fs_do_sync_file().
>>>>>>>
>>>>>>> How about adding PREFLUSH when writing node blocks aligned to the above set?
>>>>>>
>>>>>> You mean implementation like v1 as below?
>>>>>>
>>>>>> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
>>>>>
>>>>> Yea, I think so. :P
>>>>
>>>> I prefer v2, we may have several schemes to improve performance with v2, e.g.
>>>> - use inplace IO to avoid newly added preflush
>>>> - use flush_merge option to avoid redundant preflush
>>>> - if lower device supports barrier IO, we can avoid newly added preflush
>>>
>>> Doesn't v2 give one more flush than v1? Why do you want to take worse one and
>>
>> FUA implies an extra preflush command or similar mechanism in lower device to keep data
>> in bio being persistent before this command's completion.
>>
>> Also if lower device doesn't support FUA natively, block layer turns it into an empty
>> PREFLUSH command.
>>
>> So, it's hard to say which one will win the benchmark game, maybe we need some
>> performance data before making the choice, but you know, it depends on device's
>> character.
> 
> I was looking at # of bios.
> 
>>
>>> try to improve back? Not clear the benefit on v2.
>>
>> Well, if user suffer and complain performance regression with v1, any plan to improve it?
>>
>> I just thought about plan B/C/D for no matter v1 or v2.
> 
> I assumed you wanted v2 since it might be used for B/C/D improvements. But, it
> seems it wasn't. My point is to save one bio, but piggyback the flag to the
> device driver.

I doubt the conclusion...but it needs to get some data to prove it.

I think the right way is merging v1 now to fix the bug firstly, and let me do
the comparison on them a little bit later to see whether we need another
implementation... thoughts?

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and
>>>>>>>> commit_checkpoint() separately to keep persistence order of CP datas.
>>>>>>>>
>>>>>>>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
>>>>>>>> for details.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If SPO is triggered during flush command, inode page can be persisted
>>>>>>>>>> before data page #0, so that after recovery, inode page can be recovered
>>>>>>>>>> with new physical block address of data page #0, however there may
>>>>>>>>>> contains dummy data in new physical block address.
>>>>>>>>>>
>>>>>>>>>> Then what user will see is: after overwrite & fsync + SPO, old data in
>>>>>>>>>> file was corrupted, if any user do care about such case, we can suggest
>>>>>>>>>> user to use STRICT fsync mode, in this mode, we will force to trigger
>>>>>>>>>> preflush command to persist data in device cache in prior to node
>>>>>>>>>> writeback, it avoids potential data corruption during fsync().
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>>>>>>>> ---
>>>>>>>>>> v2:
>>>>>>>>>> - fix this by adding additional preflush command rather than using
>>>>>>>>>> atomic write flow.
>>>>>>>>>>       fs/f2fs/file.c | 14 ++++++++++++++
>>>>>>>>>>       1 file changed, 14 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>>>>>> index 7d5311d54f63..238ca2a733ac 100644
>>>>>>>>>> --- a/fs/f2fs/file.c
>>>>>>>>>> +++ b/fs/f2fs/file.c
>>>>>>>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>>>>>>>       				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
>>>>>>>>>>       			goto flush_out;
>>>>>>>>>>       		goto out;
>>>>>>>>>> +	} else {
>>>>>>>>>> +		/*
>>>>>>>>>> +		 * for OPU case, during fsync(), node can be persisted before
>>>>>>>>>> +		 * data when lower device doesn't support write barrier, result
>>>>>>>>>> +		 * in data corruption after SPO.
>>>>>>>>>> +		 * So for strict fsync mode, force to trigger preflush to keep
>>>>>>>>>> +		 * data/node write order to avoid potential data corruption.
>>>>>>>>>> +		 */
>>>>>>>>>> +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
>>>>>>>>>> +								!atomic) {
>>>>>>>>>> +			ret = f2fs_issue_flush(sbi, inode->i_ino);
>>>>>>>>>> +			if (ret)
>>>>>>>>>> +				goto out;
>>>>>>>>>> +		}
>>>>>>>>>>       	}
>>>>>>>>>>       go_write:
>>>>>>>>>>       	/*
>>>>>>>>>> -- 
>>>>>>>>>> 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
  2021-07-14  2:51                     ` [f2fs-dev] " Chao Yu
@ 2021-07-19 18:38                       ` Jaegeuk Kim
  -1 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-19 18:38 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel, linux-kernel

On 07/14, Chao Yu wrote:
> On 2021/7/14 10:19, Jaegeuk Kim wrote:
> > On 07/14, Chao Yu wrote:
> > > On 2021/7/14 7:34, Jaegeuk Kim wrote:
> > > > On 07/13, Chao Yu wrote:
> > > > > On 2021/7/8 1:48, Jaegeuk Kim wrote:
> > > > > > On 07/02, Chao Yu wrote:
> > > > > > > On 2021/7/2 9:32, Jaegeuk Kim wrote:
> > > > > > > > On 07/02, Chao Yu wrote:
> > > > > > > > > On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > > > > > > > > > On 06/01, Chao Yu wrote:
> > > > > > > > > > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > > > > > > > > > 
> > > > > > > > > > > As [1] reported, if lower device doesn't support write barrier, in below
> > > > > > > > > > > case:
> > > > > > > > > > > 
> > > > > > > > > > > - write page #0; persist
> > > > > > > > > > > - overwrite page #0
> > > > > > > > > > > - fsync
> > > > > > > > > > >       - write data page #0 OPU into device's cache
> > > > > > > > > > >       - write inode page into device's cache
> > > > > > > > > > >       - issue flush
> > > > > > > > > > 
> > > > > > > > > > Well, we have preflush for node writes, so I don't think this is the case.
> > > > > > > > > > 
> > > > > > > > > >       fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> > > > > > > > > 
> > > > > > > > > This is only used for atomic write case, right?
> > > > > > > > > 
> > > > > > > > > I mean the common case which is called from f2fs_issue_flush() in
> > > > > > > > > f2fs_do_sync_file().
> > > > > > > > 
> > > > > > > > How about adding PREFLUSH when writing node blocks aligned to the above set?
> > > > > > > 
> > > > > > > You mean implementation like v1 as below?
> > > > > > > 
> > > > > > > https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
> > > > > > 
> > > > > > Yea, I think so. :P
> > > > > 
> > > > > I prefer v2, we may have several schemes to improve performance with v2, e.g.
> > > > > - use inplace IO to avoid newly added preflush
> > > > > - use flush_merge option to avoid redundant preflush
> > > > > - if lower device supports barrier IO, we can avoid newly added preflush
> > > > 
> > > > Doesn't v2 give one more flush than v1? Why do you want to take worse one and
> > > 
> > > FUA implies an extra preflush command or similar mechanism in lower device to keep data
> > > in bio being persistent before this command's completion.
> > > 
> > > Also if lower device doesn't support FUA natively, block layer turns it into an empty
> > > PREFLUSH command.
> > > 
> > > So, it's hard to say which one will win the benchmark game, maybe we need some
> > > performance data before making the choice, but you know, it depends on device's
> > > character.
> > 
> > I was looking at # of bios.
> > 
> > > 
> > > > try to improve back? Not clear the benefit on v2.
> > > 
> > > Well, if user suffer and complain performance regression with v1, any plan to improve it?
> > > 
> > > I just thought about plan B/C/D for no matter v1 or v2.
> > 
> > I assumed you wanted v2 since it might be used for B/C/D improvements. But, it
> > seems it wasn't. My point is to save one bio, but piggyback the flag to the
> > device driver.
> 
> I doubt the conclusion...but it needs to get some data to prove it.
> 
> I think the right way is merging v1 now to fix the bug firstly, and let me do
> the comparison on them a little bit later to see whether we need another
> implementation... thoughts?

Chao, could you please post v1 with an updated description?

> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> > > > > > > > > commit_checkpoint() separately to keep persistence order of CP datas.
> > > > > > > > > 
> > > > > > > > > See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> > > > > > > > > for details.
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > If SPO is triggered during flush command, inode page can be persisted
> > > > > > > > > > > before data page #0, so that after recovery, inode page can be recovered
> > > > > > > > > > > with new physical block address of data page #0, however there may
> > > > > > > > > > > contains dummy data in new physical block address.
> > > > > > > > > > > 
> > > > > > > > > > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > > > > > > > > > file was corrupted, if any user do care about such case, we can suggest
> > > > > > > > > > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > > > > > > > > > preflush command to persist data in device cache in prior to node
> > > > > > > > > > > writeback, it avoids potential data corruption during fsync().
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > > > > > > > > > ---
> > > > > > > > > > > v2:
> > > > > > > > > > > - fix this by adding additional preflush command rather than using
> > > > > > > > > > > atomic write flow.
> > > > > > > > > > >       fs/f2fs/file.c | 14 ++++++++++++++
> > > > > > > > > > >       1 file changed, 14 insertions(+)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > > > > > > > index 7d5311d54f63..238ca2a733ac 100644
> > > > > > > > > > > --- a/fs/f2fs/file.c
> > > > > > > > > > > +++ b/fs/f2fs/file.c
> > > > > > > > > > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > > > > > > > > > >       				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > > > > > > > > > >       			goto flush_out;
> > > > > > > > > > >       		goto out;
> > > > > > > > > > > +	} else {
> > > > > > > > > > > +		/*
> > > > > > > > > > > +		 * for OPU case, during fsync(), node can be persisted before
> > > > > > > > > > > +		 * data when lower device doesn't support write barrier, result
> > > > > > > > > > > +		 * in data corruption after SPO.
> > > > > > > > > > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > > > > > > > > > +		 * data/node write order to avoid potential data corruption.
> > > > > > > > > > > +		 */
> > > > > > > > > > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > > > > > > > > > +								!atomic) {
> > > > > > > > > > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > > > > > > > > > +			if (ret)
> > > > > > > > > > > +				goto out;
> > > > > > > > > > > +		}
> > > > > > > > > > >       	}
> > > > > > > > > > >       go_write:
> > > > > > > > > > >       	/*
> > > > > > > > > > > -- 
> > > > > > > > > > > 2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [f2fs-dev] [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode
@ 2021-07-19 18:38                       ` Jaegeuk Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Jaegeuk Kim @ 2021-07-19 18:38 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel

On 07/14, Chao Yu wrote:
> On 2021/7/14 10:19, Jaegeuk Kim wrote:
> > On 07/14, Chao Yu wrote:
> > > On 2021/7/14 7:34, Jaegeuk Kim wrote:
> > > > On 07/13, Chao Yu wrote:
> > > > > On 2021/7/8 1:48, Jaegeuk Kim wrote:
> > > > > > On 07/02, Chao Yu wrote:
> > > > > > > On 2021/7/2 9:32, Jaegeuk Kim wrote:
> > > > > > > > On 07/02, Chao Yu wrote:
> > > > > > > > > On 2021/7/2 1:10, Jaegeuk Kim wrote:
> > > > > > > > > > On 06/01, Chao Yu wrote:
> > > > > > > > > > > [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html
> > > > > > > > > > > 
> > > > > > > > > > > As [1] reported, if lower device doesn't support write barrier, in below
> > > > > > > > > > > case:
> > > > > > > > > > > 
> > > > > > > > > > > - write page #0; persist
> > > > > > > > > > > - overwrite page #0
> > > > > > > > > > > - fsync
> > > > > > > > > > >       - write data page #0 OPU into device's cache
> > > > > > > > > > >       - write inode page into device's cache
> > > > > > > > > > >       - issue flush
> > > > > > > > > > 
> > > > > > > > > > Well, we have preflush for node writes, so I don't think this is the case.
> > > > > > > > > > 
> > > > > > > > > >       fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> > > > > > > > > 
> > > > > > > > > This is only used for atomic write case, right?
> > > > > > > > > 
> > > > > > > > > I mean the common case which is called from f2fs_issue_flush() in
> > > > > > > > > f2fs_do_sync_file().
> > > > > > > > 
> > > > > > > > How about adding PREFLUSH when writing node blocks aligned to the above set?
> > > > > > > 
> > > > > > > You mean implementation like v1 as below?
> > > > > > > 
> > > > > > > https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/
> > > > > > 
> > > > > > Yea, I think so. :P
> > > > > 
> > > > > I prefer v2, we may have several schemes to improve performance with v2, e.g.
> > > > > - use inplace IO to avoid newly added preflush
> > > > > - use flush_merge option to avoid redundant preflush
> > > > > - if lower device supports barrier IO, we can avoid newly added preflush
> > > > 
> > > > Doesn't v2 give one more flush than v1? Why do you want to take worse one and
> > > 
> > > FUA implies an extra preflush command or similar mechanism in lower device to keep data
> > > in bio being persistent before this command's completion.
> > > 
> > > Also if lower device doesn't support FUA natively, block layer turns it into an empty
> > > PREFLUSH command.
> > > 
> > > So, it's hard to say which one will win the benchmark game, maybe we need some
> > > performance data before making the choice, but you know, it depends on device's
> > > character.
> > 
> > I was looking at # of bios.
> > 
> > > 
> > > > try to improve back? Not clear the benefit on v2.
> > > 
> > > Well, if user suffer and complain performance regression with v1, any plan to improve it?
> > > 
> > > I just thought about plan B/C/D for no matter v1 or v2.
> > 
> > I assumed you wanted v2 since it might be used for B/C/D improvements. But, it
> > seems it wasn't. My point is to save one bio, but piggyback the flag to the
> > device driver.
> 
> I doubt the conclusion...but it needs to get some data to prove it.
> 
> I think the right way is merging v1 now to fix the bug firstly, and let me do
> the comparison on them a little bit later to see whether we need another
> implementation... thoughts?

Chao, could you please post v1 with an updated description?

> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > And please see do_checkpoint(), we call f2fs_flush_device_cache() and
> > > > > > > > > commit_checkpoint() separately to keep persistence order of CP datas.
> > > > > > > > > 
> > > > > > > > > See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
> > > > > > > > > for details.
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > If SPO is triggered during flush command, inode page can be persisted
> > > > > > > > > > > before data page #0, so that after recovery, inode page can be recovered
> > > > > > > > > > > with new physical block address of data page #0, however there may
> > > > > > > > > > > contains dummy data in new physical block address.
> > > > > > > > > > > 
> > > > > > > > > > > Then what user will see is: after overwrite & fsync + SPO, old data in
> > > > > > > > > > > file was corrupted, if any user do care about such case, we can suggest
> > > > > > > > > > > user to use STRICT fsync mode, in this mode, we will force to trigger
> > > > > > > > > > > preflush command to persist data in device cache in prior to node
> > > > > > > > > > > writeback, it avoids potential data corruption during fsync().
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> > > > > > > > > > > ---
> > > > > > > > > > > v2:
> > > > > > > > > > > - fix this by adding additional preflush command rather than using
> > > > > > > > > > > atomic write flow.
> > > > > > > > > > >       fs/f2fs/file.c | 14 ++++++++++++++
> > > > > > > > > > >       1 file changed, 14 insertions(+)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > > > > > > > index 7d5311d54f63..238ca2a733ac 100644
> > > > > > > > > > > --- a/fs/f2fs/file.c
> > > > > > > > > > > +++ b/fs/f2fs/file.c
> > > > > > > > > > > @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> > > > > > > > > > >       				f2fs_exist_written_data(sbi, ino, UPDATE_INO))
> > > > > > > > > > >       			goto flush_out;
> > > > > > > > > > >       		goto out;
> > > > > > > > > > > +	} else {
> > > > > > > > > > > +		/*
> > > > > > > > > > > +		 * for OPU case, during fsync(), node can be persisted before
> > > > > > > > > > > +		 * data when lower device doesn't support write barrier, result
> > > > > > > > > > > +		 * in data corruption after SPO.
> > > > > > > > > > > +		 * So for strict fsync mode, force to trigger preflush to keep
> > > > > > > > > > > +		 * data/node write order to avoid potential data corruption.
> > > > > > > > > > > +		 */
> > > > > > > > > > > +		if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
> > > > > > > > > > > +								!atomic) {
> > > > > > > > > > > +			ret = f2fs_issue_flush(sbi, inode->i_ino);
> > > > > > > > > > > +			if (ret)
> > > > > > > > > > > +				goto out;
> > > > > > > > > > > +		}
> > > > > > > > > > >       	}
> > > > > > > > > > >       go_write:
> > > > > > > > > > >       	/*
> > > > > > > > > > > -- 
> > > > > > > > > > > 2.29.2


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-07-19 19:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 10:10 [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode Chao Yu
2021-06-01 10:10 ` [f2fs-dev] " Chao Yu
2021-06-03 16:00 ` Chao Yu
2021-06-03 16:00   ` [f2fs-dev] " Chao Yu
2021-06-07 23:32   ` Chao Yu
2021-06-07 23:32     ` Chao Yu
2021-07-01 17:10 ` Jaegeuk Kim
2021-07-01 17:10   ` [f2fs-dev] " Jaegeuk Kim
2021-07-01 23:04   ` Chao Yu
2021-07-01 23:04     ` [f2fs-dev] " Chao Yu
2021-07-02  1:32     ` Jaegeuk Kim
2021-07-02  1:32       ` [f2fs-dev] " Jaegeuk Kim
2021-07-02 15:49       ` Chao Yu
2021-07-02 15:49         ` [f2fs-dev] " Chao Yu
2021-07-07 17:48         ` Jaegeuk Kim
2021-07-07 17:48           ` [f2fs-dev] " Jaegeuk Kim
2021-07-13  9:23           ` Chao Yu
2021-07-13  9:23             ` [f2fs-dev] " Chao Yu
2021-07-13 23:34             ` Jaegeuk Kim
2021-07-13 23:34               ` [f2fs-dev] " Jaegeuk Kim
2021-07-14  1:15               ` Chao Yu
2021-07-14  1:15                 ` [f2fs-dev] " Chao Yu
2021-07-14  2:19                 ` Jaegeuk Kim
2021-07-14  2:19                   ` [f2fs-dev] " Jaegeuk Kim
2021-07-14  2:51                   ` Chao Yu
2021-07-14  2:51                     ` [f2fs-dev] " Chao Yu
2021-07-19 18:38                     ` Jaegeuk Kim
2021-07-19 18:38                       ` [f2fs-dev] " Jaegeuk Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.