All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "Javier González" <javier@javigon.com>
Cc: "jaegeuk@kernel.org" <jaegeuk@kernel.org>,
	"yuchao0@huawei.com" <yuchao0@huawei.com>,
	"linux-f2fs-devel@lists.sourceforge.net"
	<linux-f2fs-devel@lists.sourceforge.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Javier González" <javier.gonz@samsung.com>
Subject: Re: [PATCH] f2fs: disble physical prealloc in LSF mount
Date: Tue, 26 Nov 2019 02:06:53 +0000	[thread overview]
Message-ID: <BYAPR04MB58161C14246FA30366B69B9DE7450@BYAPR04MB5816.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20191125190320.g7beal27nc5ubju7@mpHalley

On 2019/11/26 4:03, Javier González wrote:
> On 25.11.2019 00:48, Damien Le Moal wrote:
>> On 2019/11/22 18:00, Javier González wrote:
>>> From: Javier González <javier.gonz@samsung.com>
>>>
>>> Fix file system corruption when using LFS mount (e.g., in zoned
>>> devices). Seems like the fallback into buffered I/O creates an
>>> inconsistency if the application is assuming both read and write DIO. I
>>> can easily reproduce a corruption with a simple RocksDB test.
>>>
>>> Might be that the f2fs_forced_buffered_io path brings some problems too,
>>> but I have not seen other failures besides this one.
>>>
>>> Problem reproducible without a zoned block device, simply by forcing
>>> LFS mount:
>>>
>>>   $ sudo mkfs.f2fs -f -m /dev/nvme0n1
>>>   $ sudo mount /dev/nvme0n1 /mnt/f2fs
>>>   $ sudo  /opt/rocksdb/db_bench  --benchmarks=fillseq --use_existing_db=0
>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>   --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1
>>>   --block_size=65536
>>>
>>> Note that the options that cause the problem are:
>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>
>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write")
>>>
>>> Signed-off-by: Javier González <javier.gonz@samsung.com>
>>> ---
>>>  fs/f2fs/data.c | 3 ---
>>>  1 file changed, 3 deletions(-)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 5755e897a5f0..b045dd6ab632 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
>>>  			return err;
>>>  	}
>>>
>>> -	if (direct_io && allow_outplace_dio(inode, iocb, from))
>>> -		return 0;
>>
>> Since for LFS mode, all DIOs can end up out of place, I think that it
>> may be better to change allow_outplace_dio() to always return true in
>> the case of LFS mode. So may be something like:
>>
>> static inline int allow_outplace_dio(struct inode *inode,
>> 			struct kiocb *iocb, struct iov_iter *iter)
>> {
>> 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>> 	int rw = iov_iter_rw(iter);
>>
>> 	return test_opt(sbi, LFS) ||
>> 	 	(rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
>> }
>>
>> instead of the original:
>>
>> static inline int allow_outplace_dio(struct inode *inode,
>> 			struct kiocb *iocb, struct iov_iter *iter)
>> {
>> 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>> 	int rw = iov_iter_rw(iter);
>>
>> 	return (test_opt(sbi, LFS) && (rw == WRITE) &&
>> 				!block_unaligned_IO(inode, iocb, iter));
>> }
>>
>> Thoughts ?
>>
> 
> I see what you mean and it makes sense. However, the problem I am seeing
> occurs when allow_outplace_dio() returns true, as this is what creates
> the inconsistency between the write being buffered and the read being
> DIO.

But if the write is switched to buffered, the DIO read should use the
buffered path too, no ? Since this is all happening under VFS, the
generic DIO read path will not ensure that the buffered writes are
flushed to disk before issuing the direct read, I think. So that would
explain your data corruption, i.e. you are reading stale data on the
device before the buffered writes make it to the media.

> 
> I did test the code you propose and, as expected, it still triggered the
> corruption.
> 
>>> -
>>>  	if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>>>  		return 0;
>>>
>>>
>>
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
> 
> Thanks,
> Javier
> 


-- 
Damien Le Moal
Western Digital Research

WARNING: multiple messages have this Message-ID (diff)
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "Javier González" <javier@javigon.com>
Cc: "jaegeuk@kernel.org" <jaegeuk@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Javier González" <javier.gonz@samsung.com>,
	"linux-f2fs-devel@lists.sourceforge.net"
	<linux-f2fs-devel@lists.sourceforge.net>
Subject: Re: [f2fs-dev] [PATCH] f2fs: disble physical prealloc in LSF mount
Date: Tue, 26 Nov 2019 02:06:53 +0000	[thread overview]
Message-ID: <BYAPR04MB58161C14246FA30366B69B9DE7450@BYAPR04MB5816.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20191125190320.g7beal27nc5ubju7@mpHalley

On 2019/11/26 4:03, Javier González wrote:
> On 25.11.2019 00:48, Damien Le Moal wrote:
>> On 2019/11/22 18:00, Javier González wrote:
>>> From: Javier González <javier.gonz@samsung.com>
>>>
>>> Fix file system corruption when using LFS mount (e.g., in zoned
>>> devices). Seems like the fallback into buffered I/O creates an
>>> inconsistency if the application is assuming both read and write DIO. I
>>> can easily reproduce a corruption with a simple RocksDB test.
>>>
>>> Might be that the f2fs_forced_buffered_io path brings some problems too,
>>> but I have not seen other failures besides this one.
>>>
>>> Problem reproducible without a zoned block device, simply by forcing
>>> LFS mount:
>>>
>>>   $ sudo mkfs.f2fs -f -m /dev/nvme0n1
>>>   $ sudo mount /dev/nvme0n1 /mnt/f2fs
>>>   $ sudo  /opt/rocksdb/db_bench  --benchmarks=fillseq --use_existing_db=0
>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>   --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1
>>>   --block_size=65536
>>>
>>> Note that the options that cause the problem are:
>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>
>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write")
>>>
>>> Signed-off-by: Javier González <javier.gonz@samsung.com>
>>> ---
>>>  fs/f2fs/data.c | 3 ---
>>>  1 file changed, 3 deletions(-)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 5755e897a5f0..b045dd6ab632 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
>>>  			return err;
>>>  	}
>>>
>>> -	if (direct_io && allow_outplace_dio(inode, iocb, from))
>>> -		return 0;
>>
>> Since for LFS mode, all DIOs can end up out of place, I think that it
>> may be better to change allow_outplace_dio() to always return true in
>> the case of LFS mode. So may be something like:
>>
>> static inline int allow_outplace_dio(struct inode *inode,
>> 			struct kiocb *iocb, struct iov_iter *iter)
>> {
>> 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>> 	int rw = iov_iter_rw(iter);
>>
>> 	return test_opt(sbi, LFS) ||
>> 	 	(rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
>> }
>>
>> instead of the original:
>>
>> static inline int allow_outplace_dio(struct inode *inode,
>> 			struct kiocb *iocb, struct iov_iter *iter)
>> {
>> 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>> 	int rw = iov_iter_rw(iter);
>>
>> 	return (test_opt(sbi, LFS) && (rw == WRITE) &&
>> 				!block_unaligned_IO(inode, iocb, iter));
>> }
>>
>> Thoughts ?
>>
> 
> I see what you mean and it makes sense. However, the problem I am seeing
> occurs when allow_outplace_dio() returns true, as this is what creates
> the inconsistency between the write being buffered and the read being
> DIO.

But if the write is switched to buffered, the DIO read should use the
buffered path too, no ? Since this is all happening under VFS, the
generic DIO read path will not ensure that the buffered writes are
flushed to disk before issuing the direct read, I think. So that would
explain your data corruption, i.e. you are reading stale data on the
device before the buffered writes make it to the media.

> 
> I did test the code you propose and, as expected, it still triggered the
> corruption.
> 
>>> -
>>>  	if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>>>  		return 0;
>>>
>>>
>>
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
> 
> Thanks,
> Javier
> 


-- 
Damien Le Moal
Western Digital Research


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2019-11-26  2:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-22  8:59 [PATCH] f2fs: disble physical prealloc in LSF mount Javier González
2019-11-22  8:59 ` [f2fs-dev] " Javier González
2019-11-25  0:48 ` Damien Le Moal
2019-11-25  0:48   ` [f2fs-dev] " Damien Le Moal
2019-11-25 19:03   ` Javier González
2019-11-25 19:03     ` [f2fs-dev] " Javier González
2019-11-26  2:06     ` Damien Le Moal [this message]
2019-11-26  2:06       ` Damien Le Moal
2019-11-26  3:57       ` Javier González
2019-11-26  3:57         ` [f2fs-dev] " Javier González
2019-11-26  6:19         ` Damien Le Moal
2019-11-26  6:19           ` [f2fs-dev] " Damien Le Moal
2019-11-26  6:20           ` Damien Le Moal
2019-11-26  6:20             ` [f2fs-dev] " Damien Le Moal
2019-11-26  6:43             ` Javier González
2019-11-26  6:43               ` [f2fs-dev] " Javier González

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR04MB58161C14246FA30366B69B9DE7450@BYAPR04MB5816.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=jaegeuk@kernel.org \
    --cc=javier.gonz@samsung.com \
    --cc=javier@javigon.com \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.