From: Damien Le Moal <Damien.LeMoal@wdc.com> To: "Javier González" <javier@javigon.com> Cc: "jaegeuk@kernel.org" <jaegeuk@kernel.org>, "yuchao0@huawei.com" <yuchao0@huawei.com>, "linux-f2fs-devel@lists.sourceforge.net" <linux-f2fs-devel@lists.sourceforge.net>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "Javier González" <javier.gonz@samsung.com> Subject: Re: [PATCH] f2fs: disble physical prealloc in LSF mount Date: Tue, 26 Nov 2019 02:06:53 +0000 [thread overview] Message-ID: <BYAPR04MB58161C14246FA30366B69B9DE7450@BYAPR04MB5816.namprd04.prod.outlook.com> (raw) In-Reply-To: 20191125190320.g7beal27nc5ubju7@mpHalley On 2019/11/26 4:03, Javier González wrote: > On 25.11.2019 00:48, Damien Le Moal wrote: >> On 2019/11/22 18:00, Javier González wrote: >>> From: Javier González <javier.gonz@samsung.com> >>> >>> Fix file system corruption when using LFS mount (e.g., in zoned >>> devices). Seems like the fallback into buffered I/O creates an >>> inconsistency if the application is assuming both read and write DIO. I >>> can easily reproduce a corruption with a simple RocksDB test. >>> >>> Might be that the f2fs_forced_buffered_io path brings some problems too, >>> but I have not seen other failures besides this one. >>> >>> Problem reproducible without a zoned block device, simply by forcing >>> LFS mount: >>> >>> $ sudo mkfs.f2fs -f -m /dev/nvme0n1 >>> $ sudo mount /dev/nvme0n1 /mnt/f2fs >>> $ sudo /opt/rocksdb/db_bench --benchmarks=fillseq --use_existing_db=0 >>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>> --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1 >>> --block_size=65536 >>> >>> Note that the options that cause the problem are: >>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>> >>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write") >>> >>> Signed-off-by: Javier González <javier.gonz@samsung.com> >>> --- >>> fs/f2fs/data.c | 3 --- >>> 1 file changed, 3 deletions(-) >>> >>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>> index 5755e897a5f0..b045dd6ab632 100644 >>> --- a/fs/f2fs/data.c >>> +++ b/fs/f2fs/data.c >>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >>> return err; >>> } >>> >>> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >>> - return 0; >> >> Since for LFS mode, all DIOs can end up out of place, I think that it >> may be better to change allow_outplace_dio() to always return true in >> the case of LFS mode. So may be something like: >> >> static inline int allow_outplace_dio(struct inode *inode, >> struct kiocb *iocb, struct iov_iter *iter) >> { >> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >> int rw = iov_iter_rw(iter); >> >> return test_opt(sbi, LFS) || >> (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); >> } >> >> instead of the original: >> >> static inline int allow_outplace_dio(struct inode *inode, >> struct kiocb *iocb, struct iov_iter *iter) >> { >> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >> int rw = iov_iter_rw(iter); >> >> return (test_opt(sbi, LFS) && (rw == WRITE) && >> !block_unaligned_IO(inode, iocb, iter)); >> } >> >> Thoughts ? >> > > I see what you mean and it makes sense. However, the problem I am seeing > occurs when allow_outplace_dio() returns true, as this is what creates > the inconsistency between the write being buffered and the read being > DIO. But if the write is switched to buffered, the DIO read should use the buffered path too, no ? Since this is all happening under VFS, the generic DIO read path will not ensure that the buffered writes are flushed to disk before issuing the direct read, I think. So that would explain your data corruption, i.e. you are reading stale data on the device before the buffered writes make it to the media. > > I did test the code you propose and, as expected, it still triggered the > corruption. > >>> - >>> if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >>> return 0; >>> >>> >> >> >> -- >> Damien Le Moal >> Western Digital Research > > Thanks, > Javier > -- Damien Le Moal Western Digital Research
WARNING: multiple messages have this Message-ID (diff)
From: Damien Le Moal <Damien.LeMoal@wdc.com> To: "Javier González" <javier@javigon.com> Cc: "jaegeuk@kernel.org" <jaegeuk@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "Javier González" <javier.gonz@samsung.com>, "linux-f2fs-devel@lists.sourceforge.net" <linux-f2fs-devel@lists.sourceforge.net> Subject: Re: [f2fs-dev] [PATCH] f2fs: disble physical prealloc in LSF mount Date: Tue, 26 Nov 2019 02:06:53 +0000 [thread overview] Message-ID: <BYAPR04MB58161C14246FA30366B69B9DE7450@BYAPR04MB5816.namprd04.prod.outlook.com> (raw) In-Reply-To: 20191125190320.g7beal27nc5ubju7@mpHalley On 2019/11/26 4:03, Javier González wrote: > On 25.11.2019 00:48, Damien Le Moal wrote: >> On 2019/11/22 18:00, Javier González wrote: >>> From: Javier González <javier.gonz@samsung.com> >>> >>> Fix file system corruption when using LFS mount (e.g., in zoned >>> devices). Seems like the fallback into buffered I/O creates an >>> inconsistency if the application is assuming both read and write DIO. I >>> can easily reproduce a corruption with a simple RocksDB test. >>> >>> Might be that the f2fs_forced_buffered_io path brings some problems too, >>> but I have not seen other failures besides this one. >>> >>> Problem reproducible without a zoned block device, simply by forcing >>> LFS mount: >>> >>> $ sudo mkfs.f2fs -f -m /dev/nvme0n1 >>> $ sudo mount /dev/nvme0n1 /mnt/f2fs >>> $ sudo /opt/rocksdb/db_bench --benchmarks=fillseq --use_existing_db=0 >>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>> --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1 >>> --block_size=65536 >>> >>> Note that the options that cause the problem are: >>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>> >>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write") >>> >>> Signed-off-by: Javier González <javier.gonz@samsung.com> >>> --- >>> fs/f2fs/data.c | 3 --- >>> 1 file changed, 3 deletions(-) >>> >>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>> index 5755e897a5f0..b045dd6ab632 100644 >>> --- a/fs/f2fs/data.c >>> +++ b/fs/f2fs/data.c >>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >>> return err; >>> } >>> >>> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >>> - return 0; >> >> Since for LFS mode, all DIOs can end up out of place, I think that it >> may be better to change allow_outplace_dio() to always return true in >> the case of LFS mode. So may be something like: >> >> static inline int allow_outplace_dio(struct inode *inode, >> struct kiocb *iocb, struct iov_iter *iter) >> { >> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >> int rw = iov_iter_rw(iter); >> >> return test_opt(sbi, LFS) || >> (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); >> } >> >> instead of the original: >> >> static inline int allow_outplace_dio(struct inode *inode, >> struct kiocb *iocb, struct iov_iter *iter) >> { >> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >> int rw = iov_iter_rw(iter); >> >> return (test_opt(sbi, LFS) && (rw == WRITE) && >> !block_unaligned_IO(inode, iocb, iter)); >> } >> >> Thoughts ? >> > > I see what you mean and it makes sense. However, the problem I am seeing > occurs when allow_outplace_dio() returns true, as this is what creates > the inconsistency between the write being buffered and the read being > DIO. But if the write is switched to buffered, the DIO read should use the buffered path too, no ? Since this is all happening under VFS, the generic DIO read path will not ensure that the buffered writes are flushed to disk before issuing the direct read, I think. So that would explain your data corruption, i.e. you are reading stale data on the device before the buffered writes make it to the media. > > I did test the code you propose and, as expected, it still triggered the > corruption. > >>> - >>> if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >>> return 0; >>> >>> >> >> >> -- >> Damien Le Moal >> Western Digital Research > > Thanks, > Javier > -- Damien Le Moal Western Digital Research _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2019-11-26 2:06 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-22 8:59 [PATCH] f2fs: disble physical prealloc in LSF mount Javier González 2019-11-22 8:59 ` [f2fs-dev] " Javier González 2019-11-25 0:48 ` Damien Le Moal 2019-11-25 0:48 ` [f2fs-dev] " Damien Le Moal 2019-11-25 19:03 ` Javier González 2019-11-25 19:03 ` [f2fs-dev] " Javier González 2019-11-26 2:06 ` Damien Le Moal [this message] 2019-11-26 2:06 ` Damien Le Moal 2019-11-26 3:57 ` Javier González 2019-11-26 3:57 ` [f2fs-dev] " Javier González 2019-11-26 6:19 ` Damien Le Moal 2019-11-26 6:19 ` [f2fs-dev] " Damien Le Moal 2019-11-26 6:20 ` Damien Le Moal 2019-11-26 6:20 ` [f2fs-dev] " Damien Le Moal 2019-11-26 6:43 ` Javier González 2019-11-26 6:43 ` [f2fs-dev] " Javier González
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=BYAPR04MB58161C14246FA30366B69B9DE7450@BYAPR04MB5816.namprd04.prod.outlook.com \ --to=damien.lemoal@wdc.com \ --cc=jaegeuk@kernel.org \ --cc=javier.gonz@samsung.com \ --cc=javier@javigon.com \ --cc=linux-f2fs-devel@lists.sourceforge.net \ --cc=linux-kernel@vger.kernel.org \ --cc=yuchao0@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.