* [f2fs-dev] Can F2FS roll forward after fdatasync()? @ 2020-05-28 15:34 Hongwei 2020-05-28 17:26 ` Jaegeuk Kim 0 siblings, 1 reply; 6+ messages in thread From: Hongwei @ 2020-05-28 15:34 UTC (permalink / raw) To: linux-f2fs-devel Hi F2FS experts, As written in f2fs_do_sync_file(): "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off." Please consider this workflow: 1. Start atomic write 2. Multiple file writes 3. Commit atomic write 4. fdatasync() 5. Powerloss. In the 4th step, the fdatasync() doesn't wait for node writeback. So we may loss node blocks after powerloss. If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction? Thanks! Hongwei _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()? 2020-05-28 15:34 [f2fs-dev] Can F2FS roll forward after fdatasync()? Hongwei @ 2020-05-28 17:26 ` Jaegeuk Kim 2020-05-29 13:02 ` Hongwei 0 siblings, 1 reply; 6+ messages in thread From: Jaegeuk Kim @ 2020-05-28 17:26 UTC (permalink / raw) To: Hongwei; +Cc: linux-f2fs-devel On 05/28, Hongwei wrote: > Hi F2FS experts, > As written in f2fs_do_sync_file(): > "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off." > > Please consider this workflow: > 1. Start atomic write > 2. Multiple file writes > 3. Commit atomic write > 4. fdatasync() > 5. Powerloss. > > In the 4th step, the fdatasync() doesn't wait for node writeback. > So we may loss node blocks after powerloss. > > If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction? #3 will guarantee the blocks written by #2. So, if there's no written between #3 and #4, I think we have nothing to recover. Does this make sense to you? > > Thanks! > > Hongwei > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()? 2020-05-28 17:26 ` Jaegeuk Kim @ 2020-05-29 13:02 ` Hongwei 2020-06-03 17:18 ` Jaegeuk Kim 0 siblings, 1 reply; 6+ messages in thread From: Hongwei @ 2020-05-29 13:02 UTC (permalink / raw) To: Jaegeuk Kim; +Cc: linux-f2fs-devel Hi, >On 05/28, Hongwei wrote: >> Hi F2FS experts, >> As written in f2fs_do_sync_file(): >> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off." >> >> Please consider this workflow: >> 1. Start atomic write >> 2. Multiple file writes >> 3. Commit atomic write >> 4. fdatasync() >> 5. Powerloss. >> >> In the 4th step, the fdatasync() doesn't wait for node writeback. >> So we may loss node blocks after powerloss. >> >> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction? > >#3 will guarantee the blocks written by #2. So, if there's no written between #3 >and #4, I think we have nothing to recover. >Does this make sense to you? Thanks for your reply. Please consider this: f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is possible that after #3, node is still writing back. #4 fdatasync() doesn't wait for node write back either. Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a long time to complete. Therefore, when #5 power failure happens, it is possible that the node block is not persisted? If I was correct about this, can the recovery program recover the transaction? > >> >> Thanks! >> >> Hongwei >> _______________________________________________ >> Linux-f2fs-devel mailing list >> Linux-f2fs-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()? 2020-05-29 13:02 ` Hongwei @ 2020-06-03 17:18 ` Jaegeuk Kim 2020-06-04 13:24 ` Hongwei Qin [not found] ` <CAKvRR0QjB8u-MnG7om5skFAg_y68vb5b2jjL-VdMOFhHcKqc2g@mail.gmail.com> 0 siblings, 2 replies; 6+ messages in thread From: Jaegeuk Kim @ 2020-06-03 17:18 UTC (permalink / raw) To: Hongwei; +Cc: linux-f2fs-devel Hi Hongwei, On 05/29, Hongwei wrote: > Hi, > >On 05/28, Hongwei wrote: > >> Hi F2FS experts, > >> As written in f2fs_do_sync_file(): > >> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off." > >> > >> Please consider this workflow: > >> 1. Start atomic write > >> 2. Multiple file writes > >> 3. Commit atomic write > >> 4. fdatasync() > >> 5. Powerloss. > >> > >> In the 4th step, the fdatasync() doesn't wait for node writeback. > >> So we may loss node blocks after powerloss. > >> > >> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction? > > > >#3 will guarantee the blocks written by #2. So, if there's no written between #3 > >and #4, I think we have nothing to recover. > >Does this make sense to you? > > Thanks for your reply. Please consider this: > f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is possible that after #3, node is still writing back. > #4 fdatasync() doesn't wait for node write back either. > Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a long time to complete. > Therefore, when #5 power failure happens, it is possible that the node block is not persisted? > If I was correct about this, can the recovery program recover the transaction? I see. That can be the issue tho, is there a real usecase for this? I mean, given atomic writes by sqlite, next transaction will be also serialized with another atomic writes, which we could bypass waiting node writes. Thanks, > > > > >> > >> Thanks! > >> > >> Hongwei > >> _______________________________________________ > >> Linux-f2fs-devel mailing list > >> Linux-f2fs-devel@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()? 2020-06-03 17:18 ` Jaegeuk Kim @ 2020-06-04 13:24 ` Hongwei Qin [not found] ` <CAKvRR0QjB8u-MnG7om5skFAg_y68vb5b2jjL-VdMOFhHcKqc2g@mail.gmail.com> 1 sibling, 0 replies; 6+ messages in thread From: Hongwei Qin @ 2020-06-04 13:24 UTC (permalink / raw) To: Jaegeuk Kim; +Cc: linux-f2fs-devel Hi Jaegeuk, On Thu, Jun 4, 2020 at 1:19 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote: > > Hi Hongwei, > > On 05/29, Hongwei wrote: > > Hi, > > >On 05/28, Hongwei wrote: > > >> Hi F2FS experts, > > >> As written in f2fs_do_sync_file(): > > >> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off." > > >> > > >> Please consider this workflow: > > >> 1. Start atomic write > > >> 2. Multiple file writes > > >> 3. Commit atomic write > > >> 4. fdatasync() > > >> 5. Powerloss. > > >> > > >> In the 4th step, the fdatasync() doesn't wait for node writeback. > > >> So we may loss node blocks after powerloss. > > >> > > >> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction? > > > > > >#3 will guarantee the blocks written by #2. So, if there's no written between #3 > > >and #4, I think we have nothing to recover. > > >Does this make sense to you? > > > > Thanks for your reply. Please consider this: > > f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is possible that after #3, node is still writing back. > > #4 fdatasync() doesn't wait for node write back either. > > Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a long time to complete. > > Therefore, when #5 power failure happens, it is possible that the node block is not persisted? > > If I was correct about this, can the recovery program recover the transaction? > > I see. That can be the issue tho, is there a real usecase for this? I mean, > given atomic writes by sqlite, next transaction will be also serialized with > another atomic writes, which we could bypass waiting node writes. > Thanks for your reply. I think the use case is from SQLite. I'm writing an SQLite test program and I need to decide whether to use fdatasync() or fsync() after the F2FS transaction to ensure durability. E.g., if the SQLite receives an INSERT, it needs to ensure the data's persistency before returning the SQL handler. My guess is that in this case, the SQLite needs to use fsync(). This further drives me to think that whether we can optimize F2FS so that in this case we can use fdatasync() instead of fsync(). My concern is that under current implementation, it is possible that after #4, the data is still volatile (data BIOs are not flagged with FUA so waiting for data page writeback can't guarantee its persistency). Therefore, if we add the FUA flag to data BIOs, maybe we can at lease guarantee that the data blocks are durable after fdatasync()? If all of my understandings are correct, can F2FS roll forward the transaction if all its data blocks are persisted while missing node blocks? (My guess is no because in such case we don't know the file offset of the data blocks) Or, maybe this just doesn't happen in reality? > Thanks, > > > > > > > > >> > > >> Thanks! > > >> > > >> Hongwei > > >> _______________________________________________ > > >> Linux-f2fs-devel mailing list > > >> Linux-f2fs-devel@lists.sourceforge.net > > >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > > _______________________________________________ > > Linux-f2fs-devel mailing list > > Linux-f2fs-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > > > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAKvRR0QjB8u-MnG7om5skFAg_y68vb5b2jjL-VdMOFhHcKqc2g@mail.gmail.com>]
[parent not found: <20200604161332.GA187121@google.com>]
* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()? [not found] ` <20200604161332.GA187121@google.com> @ 2020-06-05 4:19 ` Hongwei Qin 0 siblings, 0 replies; 6+ messages in thread From: Hongwei Qin @ 2020-06-05 4:19 UTC (permalink / raw) To: Jaegeuk Kim; +Cc: linux-f2fs-devel > It's a bit unclear to me. Do you want to add fsync/fdatasync 1) in SQLite, or > 2) in your test program after transaction? > > 1) You'd better to build SQLite without atomic write. > 2) I think you don't need to do that, since SQLite should guarantee the > durability. > It's (1). I want to use fsync/fdatasync in SQLite. In fact, this is already done in SQLite. If the database is run in FULL_SYNC mode with atomic_write enabled, SQLite will add an fsync() after the transaction to enforce durability. When compiling the SQLite, if we add -DHAVE_FDATASYNC to Makefile cflags, the SQLite will use fdatasync() instead of fsync(). So I think for strict persistency guarantee, we can still use the atomic write feature with fsync(). This avoids SQLite journaling overheads. > Yes, it can hurt durability a bit logically, but I haven't seen any real problem > from field. The reason is, let's say power cut happens during last transactions. > Then, we can actually ask a question like "do we need to recover last moment > transaction? Moreover, does uesr get noticed?". I may say no, if we can keep the > order of all the transactions. Indeed, it's not that necessary. Thanks for your help and the discussion. :) On Fri, Jun 5, 2020 at 12:13 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote: > > On 06/04, Hongwei Qin wrote: > > Hi Jaegeuk(if I may), > > > > On Thu, Jun 4, 2020 at 1:19 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote: > > > > > > Hi Hongwei, > > > > > > On 05/29, Hongwei wrote: > > > > Hi, > > > > >On 05/28, Hongwei wrote: > > > > >> Hi F2FS experts, > > > > >> As written in f2fs_do_sync_file(): > > > > >> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off." > > > > >> > > > > >> Please consider this workflow: > > > > >> 1. Start atomic write > > > > >> 2. Multiple file writes > > > > >> 3. Commit atomic write > > > > >> 4. fdatasync() > > > > >> 5. Powerloss. > > > > >> > > > > >> In the 4th step, the fdatasync() doesn't wait for node writeback. > > > > >> So we may loss node blocks after powerloss. > > > > >> > > > > >> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction? > > > > > > > > > >#3 will guarantee the blocks written by #2. So, if there's no written between #3 > > > > >and #4, I think we have nothing to recover. > > > > >Does this make sense to you? > > > > > > > > Thanks for your reply. Please consider this: > > > > f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is possible that after #3, node is still writing back. > > > > #4 fdatasync() doesn't wait for node write back either. > > > > Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a long time to complete. > > > > Therefore, when #5 power failure happens, it is possible that the node block is not persisted? > > > > If I was correct about this, can the recovery program recover the transaction? > > > > > > I see. That can be the issue tho, is there a real usecase for this? I mean, > > > given atomic writes by sqlite, next transaction will be also serialized with > > > another atomic writes, which we could bypass waiting node writes. > > > > > > > Thanks for your reply. I think the use case is from SQLite. > > I'm writing an SQLite test program and I need to decide whether to use > > fdatasync() or fsync() after the F2FS transaction to ensure > > durability. > > E.g., if the SQLite receives an INSERT, it needs to ensure the data's > > persistency before returning the SQL handler. > > My guess is that in this case, the SQLite needs to use fsync(). > > It's a bit unclear to me. Do you want to add fsync/fdatasync 1) in SQLite, or > 2) in your test program after transaction? > > 1) You'd better to build SQLite without atomic write. > 2) I think you don't need to do that, since SQLite should guarantee the > durability. > > > > > This further drives me to think that whether we can optimize F2FS so > > that in this case we can use fdatasync() instead of fsync(). > > My concern is that under current implementation, it is possible that > > after #4, the data is still volatile (data BIOs are not flagged with > > FUA so waiting for data page writeback can't guarantee its > > persistency). > > Therefore, if we add the FUA flag to data BIOs, maybe we can at lease > > guarantee that the data blocks are durable after fdatasync()? > > > > If all of my understandings are correct, can F2FS roll forward the > > transaction if all its data blocks are persisted while missing node > > blocks? (My guess is no because in such case we don't know the file > > offset of the data blocks) > > > > Or, maybe this just doesn't happen in reality? > > So, if you want to guarantee all of them very strictly, you can build SQLite > without atomic write, and set mount option with fsync_mode=posix in f2fs. > Instead, if you want to improve the performance a bit, you can build SQLite with > atomic write and fsync_mode=nobarrier in f2fs. > Yes, it can hurt durability a bit logically, but I haven't seen any real problem > from field. The reason is, let's say power cut happens during last transactions. > Then, we can actually ask a question like "do we need to recover last moment > transaction? Moreover, does uesr get noticed?". I may say no, if we can keep the > order of all the transactions. > > Thanks, > > > > > > Thanks, > > > > > > > > > > > > > > > > >> > > > > >> Thanks! > > > > >> > > > > >> Hongwei > > > > >> _______________________________________________ > > > > >> Linux-f2fs-devel mailing list > > > > >> Linux-f2fs-devel@lists.sourceforge.net > > > > >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > > > > _______________________________________________ > > > > Linux-f2fs-devel mailing list > > > > Linux-f2fs-devel@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > > > > > > > > > _______________________________________________ > > > Linux-f2fs-devel mailing list > > > Linux-f2fs-devel@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-06-05 4:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-28 15:34 [f2fs-dev] Can F2FS roll forward after fdatasync()? Hongwei 2020-05-28 17:26 ` Jaegeuk Kim 2020-05-29 13:02 ` Hongwei 2020-06-03 17:18 ` Jaegeuk Kim 2020-06-04 13:24 ` Hongwei Qin [not found] ` <CAKvRR0QjB8u-MnG7om5skFAg_y68vb5b2jjL-VdMOFhHcKqc2g@mail.gmail.com> [not found] ` <20200604161332.GA187121@google.com> 2020-06-05 4:19 ` Hongwei Qin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.