All of lore.kernel.org
 help / color / mirror / Atom feed
* [f2fs-dev] Can F2FS roll forward after fdatasync()?
@ 2020-05-28 15:34 Hongwei
  2020-05-28 17:26 ` Jaegeuk Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Hongwei @ 2020-05-28 15:34 UTC (permalink / raw)
  To: linux-f2fs-devel

Hi F2FS experts,
As written in f2fs_do_sync_file():
"Both of fdatasync() and fsync() are able to be recovered from sudden-power-off."

Please consider this workflow:
1. Start atomic write
2. Multiple file writes
3. Commit atomic write
4. fdatasync()
5. Powerloss.

In the 4th step, the fdatasync() doesn't wait for node writeback.
So we may loss node blocks after powerloss.

If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction?

Thanks!

Hongwei
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()?
  2020-05-28 15:34 [f2fs-dev] Can F2FS roll forward after fdatasync()? Hongwei
@ 2020-05-28 17:26 ` Jaegeuk Kim
  2020-05-29 13:02   ` Hongwei
  0 siblings, 1 reply; 6+ messages in thread
From: Jaegeuk Kim @ 2020-05-28 17:26 UTC (permalink / raw)
  To: Hongwei; +Cc: linux-f2fs-devel

On 05/28, Hongwei wrote:
> Hi F2FS experts,
> As written in f2fs_do_sync_file():
> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off."
> 
> Please consider this workflow:
> 1. Start atomic write
> 2. Multiple file writes
> 3. Commit atomic write
> 4. fdatasync()
> 5. Powerloss.
> 
> In the 4th step, the fdatasync() doesn't wait for node writeback.
> So we may loss node blocks after powerloss.
> 
> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction?

#3 will guarantee the blocks written by #2. So, if there's no written between #3
and #4, I think we have nothing to recover.
Does this make sense to you?

> 
> Thanks!
> 
> Hongwei
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()?
  2020-05-28 17:26 ` Jaegeuk Kim
@ 2020-05-29 13:02   ` Hongwei
  2020-06-03 17:18     ` Jaegeuk Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Hongwei @ 2020-05-29 13:02 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel

Hi,
>On 05/28, Hongwei wrote:
>> Hi F2FS experts,
>> As written in f2fs_do_sync_file():
>> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off."
>>
>> Please consider this workflow:
>> 1. Start atomic write
>> 2. Multiple file writes
>> 3. Commit atomic write
>> 4. fdatasync()
>> 5. Powerloss.
>>
>> In the 4th step, the fdatasync() doesn't wait for node writeback.
>> So we may loss node blocks after powerloss.
>>
>> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction?
>
>#3 will guarantee the blocks written by #2. So, if there's no written between #3
>and #4, I think we have nothing to recover.
>Does this make sense to you?

Thanks for your reply. Please consider this:
f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is possible that after #3, node is still writing back.
#4 fdatasync() doesn't wait for node write back either.
Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a long time to complete.
Therefore, when #5 power failure happens, it is possible that the node block is not persisted?
If I was correct about this, can the recovery program recover the transaction?

>
>>
>> Thanks!
>>
>> Hongwei
>> _______________________________________________
>> Linux-f2fs-devel mailing list
>> Linux-f2fs-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()?
  2020-05-29 13:02   ` Hongwei
@ 2020-06-03 17:18     ` Jaegeuk Kim
  2020-06-04 13:24       ` Hongwei Qin
       [not found]       ` <CAKvRR0QjB8u-MnG7om5skFAg_y68vb5b2jjL-VdMOFhHcKqc2g@mail.gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Jaegeuk Kim @ 2020-06-03 17:18 UTC (permalink / raw)
  To: Hongwei; +Cc: linux-f2fs-devel

Hi Hongwei,

On 05/29, Hongwei wrote:
> Hi,
> >On 05/28, Hongwei wrote:
> >> Hi F2FS experts,
> >> As written in f2fs_do_sync_file():
> >> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off."
> >>
> >> Please consider this workflow:
> >> 1. Start atomic write
> >> 2. Multiple file writes
> >> 3. Commit atomic write
> >> 4. fdatasync()
> >> 5. Powerloss.
> >>
> >> In the 4th step, the fdatasync() doesn't wait for node writeback.
> >> So we may loss node blocks after powerloss.
> >>
> >> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction?
> >
> >#3 will guarantee the blocks written by #2. So, if there's no written between #3
> >and #4, I think we have nothing to recover.
> >Does this make sense to you?
> 
> Thanks for your reply. Please consider this:
> f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is possible that after #3, node is still writing back.
> #4 fdatasync() doesn't wait for node write back either.
> Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a long time to complete.
> Therefore, when #5 power failure happens, it is possible that the node block is not persisted?
> If I was correct about this, can the recovery program recover the transaction?

I see. That can be the issue tho, is there a real usecase for this? I mean,
given atomic writes by sqlite, next transaction will be also serialized with
another atomic writes, which we could bypass waiting node writes.

Thanks,

> 
> >
> >>
> >> Thanks!
> >>
> >> Hongwei
> >> _______________________________________________
> >> Linux-f2fs-devel mailing list
> >> Linux-f2fs-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()?
  2020-06-03 17:18     ` Jaegeuk Kim
@ 2020-06-04 13:24       ` Hongwei Qin
       [not found]       ` <CAKvRR0QjB8u-MnG7om5skFAg_y68vb5b2jjL-VdMOFhHcKqc2g@mail.gmail.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Hongwei Qin @ 2020-06-04 13:24 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel

Hi Jaegeuk,

On Thu, Jun 4, 2020 at 1:19 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote:
>
> Hi Hongwei,
>
> On 05/29, Hongwei wrote:
> > Hi,
> > >On 05/28, Hongwei wrote:
> > >> Hi F2FS experts,
> > >> As written in f2fs_do_sync_file():
> > >> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off."
> > >>
> > >> Please consider this workflow:
> > >> 1. Start atomic write
> > >> 2. Multiple file writes
> > >> 3. Commit atomic write
> > >> 4. fdatasync()
> > >> 5. Powerloss.
> > >>
> > >> In the 4th step, the fdatasync() doesn't wait for node writeback.
> > >> So we may loss node blocks after powerloss.
> > >>
> > >> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction?
> > >
> > >#3 will guarantee the blocks written by #2. So, if there's no written between #3
> > >and #4, I think we have nothing to recover.
> > >Does this make sense to you?
> >
> > Thanks for your reply. Please consider this:
> > f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is possible that after #3, node is still writing back.
> > #4 fdatasync() doesn't wait for node write back either.
> > Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a long time to complete.
> > Therefore, when #5 power failure happens, it is possible that the node block is not persisted?
> > If I was correct about this, can the recovery program recover the transaction?
>
> I see. That can be the issue tho, is there a real usecase for this? I mean,
> given atomic writes by sqlite, next transaction will be also serialized with
> another atomic writes, which we could bypass waiting node writes.
>

Thanks for your reply. I think the use case is from SQLite.
I'm writing an SQLite test program and I need to decide whether to use
fdatasync() or fsync() after the F2FS transaction to ensure
durability.
E.g., if the SQLite receives an INSERT, it needs to ensure the data's
persistency before returning the SQL handler.
My guess is that in this case, the SQLite needs to use fsync().

This further drives me to think that whether we can optimize F2FS so
that in this case we can use fdatasync() instead of fsync().
My concern is that under current implementation, it is possible that
after #4, the data is still volatile (data BIOs are not flagged with
FUA so waiting for data page writeback can't guarantee its
persistency).
Therefore, if we add the FUA flag to data BIOs, maybe we can at lease
guarantee that the data blocks are durable after fdatasync()?

If all of my understandings are correct, can F2FS roll forward the
transaction if all its data blocks are persisted while missing node
blocks? (My guess is no because in such case we don't know the file
offset of the data blocks)

Or, maybe this just doesn't happen in reality?

> Thanks,
>
> >
> > >
> > >>
> > >> Thanks!
> > >>
> > >> Hongwei
> > >> _______________________________________________
> > >> Linux-f2fs-devel mailing list
> > >> Linux-f2fs-devel@lists.sourceforge.net
> > >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>
>
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [f2fs-dev] Can F2FS roll forward after fdatasync()?
       [not found]         ` <20200604161332.GA187121@google.com>
@ 2020-06-05  4:19           ` Hongwei Qin
  0 siblings, 0 replies; 6+ messages in thread
From: Hongwei Qin @ 2020-06-05  4:19 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-f2fs-devel

> It's a bit unclear to me. Do you want to add fsync/fdatasync 1) in SQLite, or
> 2) in your test program after transaction?
>
> 1) You'd better to build SQLite without atomic write.
> 2) I think you don't need to do that, since SQLite should guarantee the
>    durability.
>

It's (1). I want to use fsync/fdatasync in SQLite.
In fact, this is already done in SQLite. If the database is run in
FULL_SYNC mode with atomic_write enabled, SQLite will add an fsync()
after the transaction to enforce durability. When compiling the
SQLite, if we add -DHAVE_FDATASYNC to Makefile cflags, the SQLite will
use fdatasync() instead of fsync().
So I think for strict persistency guarantee, we can still use the
atomic write feature with fsync(). This avoids SQLite journaling
overheads.

> Yes, it can hurt durability a bit logically, but I haven't seen any real problem
> from field. The reason is, let's say power cut happens during last transactions.
> Then, we can actually ask a question like "do we need to recover last moment
> transaction? Moreover, does uesr get noticed?". I may say no, if we can keep the
> order of all the transactions.

Indeed, it's not that necessary.

Thanks for your help and the discussion. :)


On Fri, Jun 5, 2020 at 12:13 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote:
>
> On 06/04, Hongwei Qin wrote:
> > Hi Jaegeuk(if I may),
> >
> > On Thu, Jun 4, 2020 at 1:19 AM Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > >
> > > Hi Hongwei,
> > >
> > > On 05/29, Hongwei wrote:
> > > > Hi,
> > > > >On 05/28, Hongwei wrote:
> > > > >> Hi F2FS experts,
> > > > >> As written in f2fs_do_sync_file():
> > > > >> "Both of fdatasync() and fsync() are able to be recovered from sudden-power-off."
> > > > >>
> > > > >> Please consider this workflow:
> > > > >> 1. Start atomic write
> > > > >> 2. Multiple file writes
> > > > >> 3. Commit atomic write
> > > > >> 4. fdatasync()
> > > > >> 5. Powerloss.
> > > > >>
> > > > >> In the 4th step, the fdatasync() doesn't wait for node writeback.
> > > > >> So we may loss node blocks after powerloss.
> > > > >>
> > > > >> If the data blocks are persisted but node blocks aren't, can the recovery program recover the transaction?
> > > > >
> > > > >#3 will guarantee the blocks written by #2. So, if there's no written between #3
> > > > >and #4, I think we have nothing to recover.
> > > > >Does this make sense to you?
> > > >
> > > > Thanks for your reply. Please consider this:
> > > > f2fs_do_sync_file() doesn't wait for node writeback if atomic==1. So it is possible that after #3, node is still writing back.
> > > > #4 fdatasync() doesn't wait for node write back either.
> > > > Considering node writeback BIO is flagged with PREFLUSH and FUA, it may take a long time to complete.
> > > > Therefore, when #5 power failure happens, it is possible that the node block is not persisted?
> > > > If I was correct about this, can the recovery program recover the transaction?
> > >
> > > I see. That can be the issue tho, is there a real usecase for this? I mean,
> > > given atomic writes by sqlite, next transaction will be also serialized with
> > > another atomic writes, which we could bypass waiting node writes.
> > >
> >
> > Thanks for your reply. I think the use case is from SQLite.
> > I'm writing an SQLite test program and I need to decide whether to use
> > fdatasync() or fsync() after the F2FS transaction to ensure
> > durability.
> > E.g., if the SQLite receives an INSERT, it needs to ensure the data's
> > persistency before returning the SQL handler.
> > My guess is that in this case, the SQLite needs to use fsync().
>
> It's a bit unclear to me. Do you want to add fsync/fdatasync 1) in SQLite, or
> 2) in your test program after transaction?
>
> 1) You'd better to build SQLite without atomic write.
> 2) I think you don't need to do that, since SQLite should guarantee the
>    durability.
>
> >
> > This further drives me to think that whether we can optimize F2FS so
> > that in this case we can use fdatasync() instead of fsync().
> > My concern is that under current implementation, it is possible that
> > after #4, the data is still volatile (data BIOs are not flagged with
> > FUA so waiting for data page writeback can't guarantee its
> > persistency).
> > Therefore, if we add the FUA flag to data BIOs, maybe we can at lease
> > guarantee that the data blocks are durable after fdatasync()?
> >
> > If all of my understandings are correct, can F2FS roll forward the
> > transaction if all its data blocks are persisted while missing node
> > blocks? (My guess is no because in such case we don't know the file
> > offset of the data blocks)
> >
> > Or, maybe this just doesn't happen in reality?
>
> So, if you want to guarantee all of them very strictly, you can build SQLite
> without atomic write, and set mount option with fsync_mode=posix in f2fs.
> Instead, if you want to improve the performance a bit, you can build SQLite with
> atomic write and fsync_mode=nobarrier in f2fs.
> Yes, it can hurt durability a bit logically, but I haven't seen any real problem
> from field. The reason is, let's say power cut happens during last transactions.
> Then, we can actually ask a question like "do we need to recover last moment
> transaction? Moreover, does uesr get noticed?". I may say no, if we can keep the
> order of all the transactions.
>
> Thanks,
>
> >
> > > Thanks,
> > >
> > > >
> > > > >
> > > > >>
> > > > >> Thanks!
> > > > >>
> > > > >> Hongwei
> > > > >> _______________________________________________
> > > > >> Linux-f2fs-devel mailing list
> > > > >> Linux-f2fs-devel@lists.sourceforge.net
> > > > >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > > > _______________________________________________
> > > > Linux-f2fs-devel mailing list
> > > > Linux-f2fs-devel@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > >
> > >
> > > _______________________________________________
> > > Linux-f2fs-devel mailing list
> > > Linux-f2fs-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-05  4:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-28 15:34 [f2fs-dev] Can F2FS roll forward after fdatasync()? Hongwei
2020-05-28 17:26 ` Jaegeuk Kim
2020-05-29 13:02   ` Hongwei
2020-06-03 17:18     ` Jaegeuk Kim
2020-06-04 13:24       ` Hongwei Qin
     [not found]       ` <CAKvRR0QjB8u-MnG7om5skFAg_y68vb5b2jjL-VdMOFhHcKqc2g@mail.gmail.com>
     [not found]         ` <20200604161332.GA187121@google.com>
2020-06-05  4:19           ` Hongwei Qin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.