All of lore.kernel.org
 help / color / mirror / Atom feed
* Inconsistent behavior of fsync in btrfs
@ 2018-04-25  2:35 Jayashree Mohan
  2018-04-25  3:08 ` Chris Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Jayashree Mohan @ 2018-04-25  2:35 UTC (permalink / raw)
  To: linux-btrfs, fstests; +Cc: Vijaychidambaram Velayudhan Pillai

Hi,

While investigating crash consistency bugs on btrfs, we came across
workloads that demonstrate inconsistent behavior of fsync.

Consider the following workload where fsync on the directory did not persist it.

Workload 1:

mkdir A
Sync
rename (A, B)
creat B/foo
fsync B/foo
fsync B
---crash---

In this case, the directory B as well as file B/foo are missing.
What's more worrying is that, on recovery from crash, we expect the
contents of directory to be

Dir A : should not exist
Dir B :
    foo

But instead, what we see is that:
Dir A :
    foo
Dir B : doesn't exist


This state is acceptable if we had created the file foo in dir A and
then renamed the directory - in that case it would mean the rename did
not persist. However what we see here is that, a file created in
directory B falsely appears in A, which is incorrect.

However, if we did not persist the initial create of directory A, i.e

Workload 2:

mkdir A
rename (A, B)
creat B/foo
fsync B/foo
fsync B
---crash---

the directory B and its entry both get persisted in this case.

Is this something to do with the directory entry A being already
present in the FS/subvolume tree and then the changes to the directory
inode going into the fsync log?

We do not clearly understand the reason for such inconsistent
behavior, but it does seem incorrect.

Consider another case where we found inconsistent behavior in the way
fsync is handled.

Workload 3:

mkdir A
mkdir B
creat A/foo
link (A/foo, B/foo)
fsync A/foo
fsync B/foo
---crash---

In this case,  file A/foo is persisted, but inspite of an explicit
fsync on B/foo, the file goes missing.

Workload 4:

mkdir A
mkdir B
creat A/foo
link (A/foo, B/foo)
fsync B/foo
fsync A/foo
---crash---

Note that, the only difference between workload 3 and 4 is the order
of fsync on files A/foo and B/foo. In this case, the file B/foo is
persisted, but A/foo is missing.

What we interpret from the above workloads is that, the second fsync
is behaving like a no-op, and in either cases, only the file that is
fsynced first gets persisted. If we insert a sleep(45) between the two
fsyncs in the workloads above, we see both the files A/foo and B/foo
being persisted.

No matter how many more links we create and fsync, only the first
fsync persists the file, i.e for example,

Workload 5:

mkdir A
mkdir B
mkdir C
creat A/foo
link (A/foo, B/foo)
link (A/foo, C/foo)
fsync B/foo
fsync A/foo
fsync C/foo
---crash---

Only file B/foo gets persisted, and both A/foo and C/foo are missing.

This seems like inconsistent behavior as only the first fsync persists
the file, while all others don't seem to. Do you agree if this is
indeed incorrect and needs fixing?

All the above tests pass on ext4 and xfs.

Please let us know what you feel about such inconsistency.


Thanks,
Jayashree Mohan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-25  2:35 Inconsistent behavior of fsync in btrfs Jayashree Mohan
@ 2018-04-25  3:08 ` Chris Murphy
       [not found] ` <CAJCQCtT7S9Qb-m3-7EXJPwtMT9nuUJjDykHi915KU+fc4fB-aQ@mail.gmail.com>
  2018-04-26 16:28 ` Chris Mason
  2 siblings, 0 replies; 18+ messages in thread
From: Chris Murphy @ 2018-04-25  3:08 UTC (permalink / raw)
  To: fstests, linux-btrfs

On Tue, Apr 24, 2018 at 8:35 PM, Jayashree Mohan
<jayashree2912@gmail.com> wrote:
> Hi,
>
> While investigating crash consistency bugs on btrfs, we came across
> workloads that demonstrate inconsistent behavior of fsync.
>
> Consider the following workload where fsync on the directory did not persist it.
>
> Workload 1:
>
> mkdir A
> Sync
> rename (A, B)
> creat B/foo
> fsync B/foo
> fsync B
> ---crash---
>
> In this case, the directory B as well as file B/foo are missing.
> What's more worrying is that, on recovery from crash, we expect the
> contents of directory to be
>
> Dir A : should not exist
> Dir B :
>     foo
>
> But instead, what we see is that:
> Dir A :
>     foo
> Dir B : doesn't exist
>
>
> This state is acceptable if we had created the file foo in dir A and
> then renamed the directory - in that case it would mean the rename did
> not persist. However what we see here is that, a file created in
> directory B falsely appears in A, which is incorrect.
>
> However, if we did not persist the initial create of directory A, i.e
>
> Workload 2:
>
> mkdir A
> rename (A, B)
> creat B/foo
> fsync B/foo
> fsync B
> ---crash---
>
> the directory B and its entry both get persisted in this case.
>
> Is this something to do with the directory entry A being already
> present in the FS/subvolume tree and then the changes to the directory
> inode going into the fsync log?
>
> We do not clearly understand the reason for such inconsistent
> behavior, but it does seem incorrect.
>
> Consider another case where we found inconsistent behavior in the way
> fsync is handled.
>
> Workload 3:
>
> mkdir A
> mkdir B
> creat A/foo
> link (A/foo, B/foo)
> fsync A/foo
> fsync B/foo
> ---crash---
>
> In this case,  file A/foo is persisted, but inspite of an explicit
> fsync on B/foo, the file goes missing.
>
> Workload 4:
>
> mkdir A
> mkdir B
> creat A/foo
> link (A/foo, B/foo)
> fsync B/foo
> fsync A/foo
> ---crash---
>
> Note that, the only difference between workload 3 and 4 is the order
> of fsync on files A/foo and B/foo. In this case, the file B/foo is
> persisted, but A/foo is missing.
>
> What we interpret from the above workloads is that, the second fsync
> is behaving like a no-op, and in either cases, only the file that is
> fsynced first gets persisted. If we insert a sleep(45) between the two
> fsyncs in the workloads above, we see both the files A/foo and B/foo
> being persisted.
>
> No matter how many more links we create and fsync, only the first
> fsync persists the file, i.e for example,
>
> Workload 5:
>
> mkdir A
> mkdir B
> mkdir C
> creat A/foo
> link (A/foo, B/foo)
> link (A/foo, C/foo)
> fsync B/foo
> fsync A/foo
> fsync C/foo
> ---crash---
>
> Only file B/foo gets persisted, and both A/foo and C/foo are missing.
>
> This seems like inconsistent behavior as only the first fsync persists
> the file, while all others don't seem to. Do you agree if this is
> indeed incorrect and needs fixing?
>
> All the above tests pass on ext4 and xfs.
>
> Please let us know what you feel about such inconsistency.


I don't have answer to your question, but I'm curious exactly how you
simulate a crash? For my own really rudimentary testing I've been doing
crazy things like:

# grub-mkconfig -o /boot/efi && echo b > /proc/sysrq-trigger

And seeing what makes it to disk - or not. And I'm finding a some
non-determinstic results are possible even in a VM which is a bit
confusing. I'm sure with real hardware I'd find even more inconsistency.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
       [not found] ` <CAJCQCtT7S9Qb-m3-7EXJPwtMT9nuUJjDykHi915KU+fc4fB-aQ@mail.gmail.com>
@ 2018-04-25  3:10   ` Vijaychidambaram Velayudhan Pillai
  2018-04-25  3:16   ` Vijaychidambaram Velayudhan Pillai
  1 sibling, 0 replies; 18+ messages in thread
From: Vijaychidambaram Velayudhan Pillai @ 2018-04-25  3:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Jayashree Mohan, linux-btrfs, fstests

Hi Chris,

We are using software we developed called CrashMonkey [1]. It
simulates the state on storage after a crash (taking into accounts
FLUSH and FUA flags). Talk slides on how it works can be found here
[2].

It is similar to dm-log-writes if you have used that in the past.

[1] https://github.com/utsaslab/crashmonkey
[2] http://www.cs.utexas.edu/~vijay/papers/hotstorage17-crashmonkey-slides.pdf

Thanks,
Vijay Chidambaram

On Tue, Apr 24, 2018 at 10:07 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
>
>
> On Tue, Apr 24, 2018 at 8:35 PM, Jayashree Mohan <jayashree2912@gmail.com> wrote:
>>
>> Hi,
>>
>> While investigating crash consistency bugs on btrfs, we came across
>> workloads that demonstrate inconsistent behavior of fsync.
>>
>> Consider the following workload where fsync on the directory did not persist it.
>>
>> Workload 1:
>>
>> mkdir A
>> Sync
>> rename (A, B)
>> creat B/foo
>> fsync B/foo
>> fsync B
>> ---crash---
>>
>> In this case, the directory B as well as file B/foo are missing.
>> What's more worrying is that, on recovery from crash, we expect the
>> contents of directory to be
>>
>> Dir A : should not exist
>> Dir B :
>>     foo
>>
>> But instead, what we see is that:
>> Dir A :
>>     foo
>> Dir B : doesn't exist
>>
>>
>> This state is acceptable if we had created the file foo in dir A and
>> then renamed the directory - in that case it would mean the rename did
>> not persist. However what we see here is that, a file created in
>> directory B falsely appears in A, which is incorrect.
>>
>> However, if we did not persist the initial create of directory A, i.e
>>
>> Workload 2:
>>
>> mkdir A
>> rename (A, B)
>> creat B/foo
>> fsync B/foo
>> fsync B
>> ---crash---
>>
>> the directory B and its entry both get persisted in this case.
>>
>> Is this something to do with the directory entry A being already
>> present in the FS/subvolume tree and then the changes to the directory
>> inode going into the fsync log?
>>
>> We do not clearly understand the reason for such inconsistent
>> behavior, but it does seem incorrect.
>>
>> Consider another case where we found inconsistent behavior in the way
>> fsync is handled.
>>
>> Workload 3:
>>
>> mkdir A
>> mkdir B
>> creat A/foo
>> link (A/foo, B/foo)
>> fsync A/foo
>> fsync B/foo
>> ---crash---
>>
>> In this case,  file A/foo is persisted, but inspite of an explicit
>> fsync on B/foo, the file goes missing.
>>
>> Workload 4:
>>
>> mkdir A
>> mkdir B
>> creat A/foo
>> link (A/foo, B/foo)
>> fsync B/foo
>> fsync A/foo
>> ---crash---
>>
>> Note that, the only difference between workload 3 and 4 is the order
>> of fsync on files A/foo and B/foo. In this case, the file B/foo is
>> persisted, but A/foo is missing.
>>
>> What we interpret from the above workloads is that, the second fsync
>> is behaving like a no-op, and in either cases, only the file that is
>> fsynced first gets persisted. If we insert a sleep(45) between the two
>> fsyncs in the workloads above, we see both the files A/foo and B/foo
>> being persisted.
>>
>> No matter how many more links we create and fsync, only the first
>> fsync persists the file, i.e for example,
>>
>> Workload 5:
>>
>> mkdir A
>> mkdir B
>> mkdir C
>> creat A/foo
>> link (A/foo, B/foo)
>> link (A/foo, C/foo)
>> fsync B/foo
>> fsync A/foo
>> fsync C/foo
>> ---crash---
>>
>> Only file B/foo gets persisted, and both A/foo and C/foo are missing.
>>
>> This seems like inconsistent behavior as only the first fsync persists
>> the file, while all others don't seem to. Do you agree if this is
>> indeed incorrect and needs fixing?
>>
>> All the above tests pass on ext4 and xfs.
>>
>> Please let us know what you feel about such inconsistency.
>>
>
> I don't have answer to your question, but I'm curious exactly how you simulate a crash? For my own really rudimentary testing I've been doing crazy things like:
>
> # grub-mkconfig -o /boot/efi && echo b > /proc/sysrq-trigger
>
> And seeing what makes it to disk - or not. And I'm finding a some non-determinstic results are possible even in a VM which is a bit confusing. I'm sure with real hardware I'd find even more inconsistency.
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
       [not found] ` <CAJCQCtT7S9Qb-m3-7EXJPwtMT9nuUJjDykHi915KU+fc4fB-aQ@mail.gmail.com>
  2018-04-25  3:10   ` Vijaychidambaram Velayudhan Pillai
@ 2018-04-25  3:16   ` Vijaychidambaram Velayudhan Pillai
  2018-04-25 12:36     ` Ashlie Martinez
  1 sibling, 1 reply; 18+ messages in thread
From: Vijaychidambaram Velayudhan Pillai @ 2018-04-25  3:16 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Jayashree Mohan, linux-btrfs, fstests

Hi Chris,

On Tue, Apr 24, 2018 at 10:07 PM, Chris Murphy <lists@colorremedies.com> wrote:
> I don't have answer to your question, but I'm curious exactly how you
> simulate a crash? For my own really rudimentary testing I've been doing
> crazy things like:
>
> # grub-mkconfig -o /boot/efi && echo b > /proc/sysrq-trigger
>
> And seeing what makes it to disk - or not. And I'm finding a some
> non-determinstic results are possible even in a VM which is a bit confusing.
> I'm sure with real hardware I'd find even more inconsistency.

We are using software we developed called CrashMonkey [1]. It
simulates the state on storage after a crash (taking into accounts
FLUSH and FUA flags). Talk slides on how it works can be found here
[2].

It is similar to dm-log-writes if you have used that in the past.

[1] https://github.com/utsaslab/crashmonkey
[2] http://www.cs.utexas.edu/~vijay/papers/hotstorage17-crashmonkey-slides.pdf

Thanks,
Vijay Chidambaram

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-25  3:16   ` Vijaychidambaram Velayudhan Pillai
@ 2018-04-25 12:36     ` Ashlie Martinez
  2018-04-25 13:53       ` Ashlie Martinez
  0 siblings, 1 reply; 18+ messages in thread
From: Ashlie Martinez @ 2018-04-25 12:36 UTC (permalink / raw)
  To: Vijaychidambaram Velayudhan Pillai
  Cc: Chris Murphy, Jayashree Mohan, linux-btrfs, fstests

I don't really know all that much about the btrfs code, but I was
digging around to try and find the cause of this. Given the fact that
inserting a sleep between the fsyncs gives correct behavior, do you
think the issue could be related to how btrfs determines whether or
not to log a change to an inode? I found some code in
btrfs_log_inode_parent() (part of the fsync path for both files and
directories) that appears to check if the inode being fsync-ed is
already in the log and, if it is, returns BTRFS_NO_LOG_SYNC [1]. Since
in both cases we saw issues where the same inode was either changed
parent directories (rename) or was present in multiple directories
(hard link), it seems plausible that this could be the problem. Do you
have any thoughts on this?

[1] https://www.google.com/url?q=https://elixir.bootlin.com/linux/v4.16-rc7/source/fs/btrfs/tree-log.c%23L5563&sa=D&source=hangouts&ust=1524702498584000&usg=AFQjCNE_KadcgkZ7xiIhLOzQCFQoet8Lqw

Thanks,
Ashlie

On Tue, Apr 24, 2018 at 10:16 PM, Vijaychidambaram Velayudhan Pillai
<vijay@cs.utexas.edu> wrote:
> Hi Chris,
>
> On Tue, Apr 24, 2018 at 10:07 PM, Chris Murphy <lists@colorremedies.com> wrote:
>> I don't have answer to your question, but I'm curious exactly how you
>> simulate a crash? For my own really rudimentary testing I've been doing
>> crazy things like:
>>
>> # grub-mkconfig -o /boot/efi && echo b > /proc/sysrq-trigger
>>
>> And seeing what makes it to disk - or not. And I'm finding a some
>> non-determinstic results are possible even in a VM which is a bit confusing.
>> I'm sure with real hardware I'd find even more inconsistency.
>
> We are using software we developed called CrashMonkey [1]. It
> simulates the state on storage after a crash (taking into accounts
> FLUSH and FUA flags). Talk slides on how it works can be found here
> [2].
>
> It is similar to dm-log-writes if you have used that in the past.
>
> [1] https://github.com/utsaslab/crashmonkey
> [2] http://www.cs.utexas.edu/~vijay/papers/hotstorage17-crashmonkey-slides.pdf
>
> Thanks,
> Vijay Chidambaram
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-25 12:36     ` Ashlie Martinez
@ 2018-04-25 13:53       ` Ashlie Martinez
  0 siblings, 0 replies; 18+ messages in thread
From: Ashlie Martinez @ 2018-04-25 13:53 UTC (permalink / raw)
  To: Vijaychidambaram Velayudhan Pillai
  Cc: Chris Murphy, Jayashree Mohan, linux-btrfs, fstests

On Wed, Apr 25, 2018 at 7:36 AM, Ashlie Martinez <ashmrtn@utexas.edu> wrote:
> I don't really know all that much about the btrfs code, but I was
> digging around to try and find the cause of this. Given the fact that
> inserting a sleep between the fsyncs gives correct behavior, do you
> think the issue could be related to how btrfs determines whether or
> not to log a change to an inode? I found some code in
> btrfs_log_inode_parent() (part of the fsync path for both files and
> directories) that appears to check if the inode being fsync-ed is
> already in the log and, if it is, returns BTRFS_NO_LOG_SYNC [1]. Since
> in both cases we saw issues where the same inode was either changed
> parent directories (rename) or was present in multiple directories
> (hard link), it seems plausible that this could be the problem. Do you
> have any thoughts on this?
>
> [1] https://www.google.com/url?q=https://elixir.bootlin.com/linux/v4.16-rc7/source/fs/btrfs/tree-log.c%23L5563&sa=D&source=hangouts&ust=1524702498584000&usg=AFQjCNE_KadcgkZ7xiIhLOzQCFQoet8Lqw
>
> Thanks,
> Ashlie
>
> On Tue, Apr 24, 2018 at 10:16 PM, Vijaychidambaram Velayudhan Pillai
> <vijay@cs.utexas.edu> wrote:
>> Hi Chris,
>>
>> On Tue, Apr 24, 2018 at 10:07 PM, Chris Murphy <lists@colorremedies.com> wrote:
>>> I don't have answer to your question,

Sending inline just to make sure I get a response (sorry if this is spam):

I don't really know all that much about the btrfs code, but I was
digging around to try and find the cause of this. Given the fact that
inserting a sleep between the fsyncs gives correct behavior, do you
think the issue could be related to how btrfs determines whether or
not to log a change to an inode? I found some code in
btrfs_log_inode_parent() (part of the fsync path for both files and
directories) that appears to check if the inode being fsync-ed is
already in the log and, if it is, returns BTRFS_NO_LOG_SYNC [1]. Since
in both cases we saw issues where the same inode was either changed
parent directories (rename) or was present in multiple directories
(hard link), it seems plausible that this could be the problem. Do you
have any thoughts on this?

[1] https://www.google.com/url?q=https://elixir.bootlin.com/linux/v4.16-rc7/source/fs/btrfs/tree-log.c%23L5563&sa=D&source=hangouts&ust=1524702498584000&usg=AFQjCNE_KadcgkZ7xiIhLOzQCFQoet8Lqw

Thanks,
Ashlie


 but I'm curious exactly how you
>>> simulate a crash? For my own really rudimentary testing I've been doing
>>> crazy things like:
>>>
>>> # grub-mkconfig -o /boot/efi && echo b > /proc/sysrq-trigger
>>>
>>> And seeing what makes it to disk - or not. And I'm finding a some
>>> non-determinstic results are possible even in a VM which is a bit confusing.
>>> I'm sure with real hardware I'd find even more inconsistency.
>>
>> We are using software we developed called CrashMonkey [1]. It
>> simulates the state on storage after a crash (taking into accounts
>> FLUSH and FUA flags). Talk slides on how it works can be found here
>> [2].
>>
>> It is similar to dm-log-writes if you have used that in the past.
>>
>> [1] https://github.com/utsaslab/crashmonkey
>> [2] http://www.cs.utexas.edu/~vijay/papers/hotstorage17-crashmonkey-slides.pdf
>>
>> Thanks,
>> Vijay Chidambaram
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-25  2:35 Inconsistent behavior of fsync in btrfs Jayashree Mohan
  2018-04-25  3:08 ` Chris Murphy
       [not found] ` <CAJCQCtT7S9Qb-m3-7EXJPwtMT9nuUJjDykHi915KU+fc4fB-aQ@mail.gmail.com>
@ 2018-04-26 16:28 ` Chris Mason
  2018-04-27  0:59   ` Jayashree Mohan
  2 siblings, 1 reply; 18+ messages in thread
From: Chris Mason @ 2018-04-26 16:28 UTC (permalink / raw)
  To: Jayashree Mohan; +Cc: linux-btrfs, fstests, Vijaychidambaram Velayudhan Pillai

On 24 Apr 2018, at 20:35, Jayashree Mohan wrote:

> Hi,
>
> While investigating crash consistency bugs on btrfs, we came across
> workloads that demonstrate inconsistent behavior of fsync.
>
> Consider the following workload where fsync on the directory did not 
> persist it.

> Only file B/foo gets persisted, and both A/foo and C/foo are missing.
>
> This seems like inconsistent behavior as only the first fsync persists
> the file, while all others don't seem to. Do you agree if this is
> indeed incorrect and needs fixing?
>
> All the above tests pass on ext4 and xfs.
>
> Please let us know what you feel about such inconsistency.
>

The btrfs fsync log is more fine grained than xfs/ext, but fsync(any 
file) should be enough to persist that file in its current directory.

I'll get these reproduced and see if we can nail down why we're not 
logging the location properly.

-chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-26 16:28 ` Chris Mason
@ 2018-04-27  0:59   ` Jayashree Mohan
  2018-04-27 15:26     ` Chris Mason
  2018-04-27 16:07     ` David Sterba
  0 siblings, 2 replies; 18+ messages in thread
From: Jayashree Mohan @ 2018-04-27  0:59 UTC (permalink / raw)
  To: Chris Mason
  Cc: linux-btrfs, fstests, Vijaychidambaram Velayudhan Pillai, Filipe Manana

Hi Chris,

Thanks for the response. We are using a tool we developed called
CrashMonkey[1] to run crash consistency tests and generate the bug
reports above. We'd be happy to guide you through setting up
CrashMonkey and getting these bugs reproduced. However, if you want to
be able to reproduce them with your current setup (xfstest +
dm-flakey), I have the workload scripts attached to the end of the
mail which might make your task simpler.

Interestingly we seem to have found another bug that breaks rename
atomicity and results in a previously fsynced file missing.

Workload:
1. mkdir A
2. creat A/bar (*)
3. fsync A/bar
4. mkdir B
5. creat B/bar
6. rename B/bar A/bar
7. creat A/foo
8. fsync A/foo
9. fsync A
--- crash---

When we recover from the crash, we see that file A/bar goes missing.
If the rename did not persist, we expect to see A/bar(*) created in
step 2 above, or if the rename indeed persisted, we still expect file
A/bar to be present. A previously fsynced file going missing is a
concern, especially when you fsync the directory that contains this
file as well. This appears as if rename was not atomic and ended up
losing the files. What do you think about this scenario?

As stated above, please find the xfstest codes to reproduce the bugs
we reported:

_init_flakey
_mount_flakey

-----------------------------------------------------------------------
Workload 1 : File foo is missing

touch  $SCRATCH_MNT/foo
mkdir  $SCRATCH_MNT/a
touch  $SCRATCH_MNT/a/foo
$XFS_IO_PROG  -c  "fsync"   $SCRATCH_MNT/a/foo
$XFS_IO_PROG  -c  "fsync"   $SCRATCH_MNT

-----------------------------------------------------------------------
Workload 2 : Directory b and file b/foo are missing - instead foo
appears in directory a

mkdir  $SCRATCH_MNT/a
sync
mv  $SCRATCH_MNT/a   $SCRATCH_MNT/b
touch $SCRATCH_MNT/b/foo
$XFS_IO_PROG  -c  "fsync"   $SCRATCH_MNT/b/foo
$XFS_IO_PROG  -c  "fsync"   $SCRATCH_MNT/b


-----------------------------------------------------------------------
Workload 3 : File a/foo is missing

mkdir  $SCRATCH_MNT/a
mkdir  $SCRATCH_MNT/b
touch $SCRATCH_MNT/a/foo
ln  $SCRATCH_MNT/a/foo   $SCRATCH_MNT/b/foo
$XFS_IO_PROG  -c  "fsync"   $SCRATCH_MNT/b/foo
$XFS_IO_PROG  -c  "fsync"   $SCRATCH_MNT/a/foo

-----------------------------------------------------------------------
_flakey_drop_and_remount
_unmount_flakey

Thanks for your time!

[1] https://github.com/utsaslab/crashmonkey

Thanks,
Jayashree Mohan



On Thu, Apr 26, 2018 at 11:28 AM, Chris Mason <clm@fb.com> wrote:
> On 24 Apr 2018, at 20:35, Jayashree Mohan wrote:
>
>> Hi,
>>
>> While investigating crash consistency bugs on btrfs, we came across
>> workloads that demonstrate inconsistent behavior of fsync.
>>
>> Consider the following workload where fsync on the directory did not
>> persist it.
>
>
>> Only file B/foo gets persisted, and both A/foo and C/foo are missing.
>>
>> This seems like inconsistent behavior as only the first fsync persists
>> the file, while all others don't seem to. Do you agree if this is
>> indeed incorrect and needs fixing?
>>
>> All the above tests pass on ext4 and xfs.
>>
>> Please let us know what you feel about such inconsistency.
>>
>
> The btrfs fsync log is more fine grained than xfs/ext, but fsync(any file)
> should be enough to persist that file in its current directory.
>
> I'll get these reproduced and see if we can nail down why we're not logging
> the location properly.
>
> -chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-27  0:59   ` Jayashree Mohan
@ 2018-04-27 15:26     ` Chris Mason
  2018-04-27 16:07     ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Mason @ 2018-04-27 15:26 UTC (permalink / raw)
  To: Jayashree Mohan
  Cc: linux-btrfs, fstests, Vijaychidambaram Velayudhan Pillai, Filipe Manana

On 26 Apr 2018, at 18:59, Jayashree Mohan wrote:

> Hi Chris,
>
> Thanks for the response. We are using a tool we developed called
> CrashMonkey[1] to run crash consistency tests and generate the bug
> reports above. We'd be happy to guide you through setting up
> CrashMonkey and getting these bugs reproduced. However, if you want to
> be able to reproduce them with your current setup (xfstest +
> dm-flakey), I have the workload scripts attached to the end of the
> mail which might make your task simpler.
>
> Interestingly we seem to have found another bug that breaks rename
> atomicity and results in a previously fsynced file missing.
>
> Workload:
> 1. mkdir A
> 2. creat A/bar (*)
> 3. fsync A/bar
> 4. mkdir B
> 5. creat B/bar
> 6. rename B/bar A/bar
> 7. creat A/foo
> 8. fsync A/foo
> 9. fsync A

The original workloads have been easy to reproduce/fix so far.  I want 
to make sure my patches aren't slowing things down and I'll send them 
out ~Monday.

This one looks similar but I'll double check the rename handling and 
make sure I've got all the cases.

-chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-27  0:59   ` Jayashree Mohan
  2018-04-27 15:26     ` Chris Mason
@ 2018-04-27 16:07     ` David Sterba
  2018-04-27 17:33       ` Chris Mason
  1 sibling, 1 reply; 18+ messages in thread
From: David Sterba @ 2018-04-27 16:07 UTC (permalink / raw)
  To: Jayashree Mohan
  Cc: Chris Mason, linux-btrfs, fstests,
	Vijaychidambaram Velayudhan Pillai, Filipe Manana

On Thu, Apr 26, 2018 at 07:59:23PM -0500, Jayashree Mohan wrote:
> Thanks for the response. We are using a tool we developed called
> CrashMonkey[1] to run crash consistency tests and generate the bug
> reports above. We'd be happy to guide you through setting up
> CrashMonkey and getting these bugs reproduced. However, if you want to
> be able to reproduce them with your current setup (xfstest +
> dm-flakey), I have the workload scripts attached to the end of the
> mail which might make your task simpler.
> 
> Interestingly we seem to have found another bug that breaks rename
> atomicity and results in a previously fsynced file missing.
> 
> Workload:
> 1. mkdir A
> 2. creat A/bar (*)
> 3. fsync A/bar
> 4. mkdir B
> 5. creat B/bar
> 6. rename B/bar A/bar
> 7. creat A/foo
> 8. fsync A/foo
> 9. fsync A
> --- crash---
> 
> When we recover from the crash, we see that file A/bar goes missing.
> If the rename did not persist, we expect to see A/bar(*) created in
> step 2 above, or if the rename indeed persisted, we still expect file
> A/bar to be present.

I'm no fsync expert and the lack of standard or well defined behaviour
(mentioned elsewhere) leads me to question, on what do you base your
expectations? Not only for this report, but in general during your
testing.

Comparing various filesystems will show that at best it's implementation
defined and everybody has their own reasons for doing it one way or
another, or request fsync at particular time etc.

We have a manual page in section 5 that contains general topics of
btrfs, so documenting the fsync specifics would be good.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-27 16:07     ` David Sterba
@ 2018-04-27 17:33       ` Chris Mason
  2018-04-27 20:53         ` Theodore Y. Ts'o
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Mason @ 2018-04-27 17:33 UTC (permalink / raw)
  To: David Sterba
  Cc: Jayashree Mohan, linux-btrfs, fstests,
	Vijaychidambaram Velayudhan Pillai, Filipe Manana

On 27 Apr 2018, at 10:07, David Sterba wrote:

> On Thu, Apr 26, 2018 at 07:59:23PM -0500, Jayashree Mohan wrote:
>> Thanks for the response. We are using a tool we developed called
>> CrashMonkey[1] to run crash consistency tests and generate the bug
>> reports above. We'd be happy to guide you through setting up
>> CrashMonkey and getting these bugs reproduced. However, if you want 
>> to
>> be able to reproduce them with your current setup (xfstest +
>> dm-flakey), I have the workload scripts attached to the end of the
>> mail which might make your task simpler.
>>
>> Interestingly we seem to have found another bug that breaks rename
>> atomicity and results in a previously fsynced file missing.
>>
>> Workload:
>> 1. mkdir A
>> 2. creat A/bar (*)
>> 3. fsync A/bar
>> 4. mkdir B
>> 5. creat B/bar
>> 6. rename B/bar A/bar
>> 7. creat A/foo
>> 8. fsync A/foo
>> 9. fsync A
>> --- crash---
>>
>> When we recover from the crash, we see that file A/bar goes missing.
>> If the rename did not persist, we expect to see A/bar(*) created in
>> step 2 above, or if the rename indeed persisted, we still expect file
>> A/bar to be present.
>
> I'm no fsync expert and the lack of standard or well defined behaviour
> (mentioned elsewhere) leads me to question, on what do you base your
> expectations? Not only for this report, but in general during your
> testing.
>
> Comparing various filesystems will show that at best it's 
> implementation
> defined and everybody has their own reasons for doing it one way or
> another, or request fsync at particular time etc.
>
> We have a manual page in section 5 that contains general topics of
> btrfs, so documenting the fsync specifics would be good.

My goal for the fsync tree log was to make it just do the right thing 
most of the time.  We mostly got there, thanks to a ton of fixes and 
test cases from Filipe.

fsync(some file) -- all the names for this file will exist, without 
having to fsync the directory.

fsync(some dir) -- ugh, don't fsync the directory.  But if you do, all 
the files/subdirs will exist, and unlinks will be done and renames will 
be included.  This is slow and may require a full FS commit, which is 
why we don't want dirs fsunk.

-chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-27 17:33       ` Chris Mason
@ 2018-04-27 20:53         ` Theodore Y. Ts'o
  2018-04-27 23:24           ` Chris Murphy
  2018-04-27 23:44           ` Jayashree Mohan
  0 siblings, 2 replies; 18+ messages in thread
From: Theodore Y. Ts'o @ 2018-04-27 20:53 UTC (permalink / raw)
  To: Chris Mason
  Cc: David Sterba, Jayashree Mohan, linux-btrfs, fstests,
	Vijaychidambaram Velayudhan Pillai, Filipe Manana

On Fri, Apr 27, 2018 at 11:33:29AM -0600, Chris Mason wrote:
> My goal for the fsync tree log was to make it just do the right thing most
> of the time.  We mostly got there, thanks to a ton of fixes and test cases
> from Filipe.
> 
> fsync(some file) -- all the names for this file will exist, without having
> to fsync the directory.
> 
> fsync(some dir) -- ugh, don't fsync the directory.  But if you do, all the
> files/subdirs will exist, and unlinks will be done and renames will be
> included.  This is slow and may require a full FS commit, which is why we
> don't want dirs fsunk.

What ext4 does is this:

fsync(some file) -- for a newly created file, the filename that it was
	created under will exist.  If the file has a hard-link added,
	the hard link is not guarnateed to be written to disk

fsync(some dir) -- all changes to file names in thentee directory will
	exist after the crash.  It does *not* guarantee that any data
	changes to any of files in the directories will persist after
	a crash.

It seems to me that it would be desirable if all of the major file
systems have roughly the same minimum guarantee for fsync(2), so that
application writers don't have to make file-system specific
assumptions.  In general the goal ought to be "the right thing" should
happen.

The reason why ext4 doesn't sync all possible hard link names is that
(a) that's not a common requiremnt for most applications, and (b) it's
too hard to find all of the directories which might contain a hard
link to a particular file.  But otherwise, the semantics seem to
largely match up with what Chris as suggested for btrfs.

	      	      	   	    - Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-27 20:53         ` Theodore Y. Ts'o
@ 2018-04-27 23:24           ` Chris Murphy
  2018-04-27 23:44           ` Jayashree Mohan
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Murphy @ 2018-04-27 23:24 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Chris Mason, David Sterba, Jayashree Mohan, linux-btrfs, fstests,
	Vijaychidambaram Velayudhan Pillai, Filipe Manana

On Fri, Apr 27, 2018 at 2:53 PM, Theodore Y. Ts'o <tytso@mit.edu> wrote:

> It seems to me that it would be desirable if all of the major file
> systems have roughly the same minimum guarantee for fsync(2), so that
> application writers don't have to make file-system specific
> assumptions.  In general the goal ought to be "the right thing" should
> happen.

Yes please.

I'd also like to know about sync() differences as well, as I've found
not through code eval but through testing that file systems differ in
sync() behavior as well.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-27 20:53         ` Theodore Y. Ts'o
  2018-04-27 23:24           ` Chris Murphy
@ 2018-04-27 23:44           ` Jayashree Mohan
  2018-04-29 20:55             ` Vijay Chidambaram
  1 sibling, 1 reply; 18+ messages in thread
From: Jayashree Mohan @ 2018-04-27 23:44 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Chris Mason, David Sterba, linux-btrfs, fstests,
	Vijaychidambaram Velayudhan Pillai, Filipe Manana

Thanks Chris and Ted for putting down the expected fsync behaviour of
btrfs and ext4 clearly. This sort of information is not documented
anywhere; it would be really useful if all major filesystems
explicitly stated what fsync behavior to expect. Filesystems
definitely seem to provide more guarantees than POSIX and it would of
great help to researchers and developers, if you all documented what
guarantees we can expect from each filesystem.

On Fri, Apr 27, 2018 at 3:53 PM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> On Fri, Apr 27, 2018 at 11:33:29AM -0600, Chris Mason wrote:
>> My goal for the fsync tree log was to make it just do the right thing most
>> of the time.  We mostly got there, thanks to a ton of fixes and test cases
>> from Filipe.
>>
>> fsync(some file) -- all the names for this file will exist, without having
>> to fsync the directory.
>>
>> fsync(some dir) -- ugh, don't fsync the directory.  But if you do, all the
>> files/subdirs will exist, and unlinks will be done and renames will be
>> included.  This is slow and may require a full FS commit, which is why we
>> don't want dirs fsunk.
>
> What ext4 does is this:
>
> fsync(some file) -- for a newly created file, the filename that it was
>         created under will exist.  If the file has a hard-link added,
>         the hard link is not guarnateed to be written to disk
>
> fsync(some dir) -- all changes to file names in thentee directory will
>         exist after the crash.  It does *not* guarantee that any data
>         changes to any of files in the directories will persist after
>         a crash.
>
> It seems to me that it would be desirable if all of the major file
> systems have roughly the same minimum guarantee for fsync(2), so that
> application writers don't have to make file-system specific
> assumptions.  In general the goal ought to be "the right thing" should
> happen.
>
> The reason why ext4 doesn't sync all possible hard link names is that
> (a) that's not a common requiremnt for most applications, and (b) it's
> too hard to find all of the directories which might contain a hard
> link to a particular file.  But otherwise, the semantics seem to
> largely match up with what Chris as suggested for btrfs.
>
>                                     - Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-27 23:44           ` Jayashree Mohan
@ 2018-04-29 20:55             ` Vijay Chidambaram
  2018-04-29 22:16               ` Theodore Y. Ts'o
  0 siblings, 1 reply; 18+ messages in thread
From: Vijay Chidambaram @ 2018-04-29 20:55 UTC (permalink / raw)
  To: Jayashree Mohan
  Cc: Theodore Y. Ts'o, Chris Mason, David Sterba, linux-btrfs,
	fstests, Filipe Manana

> On Fri, Apr 27, 2018 at 3:53 PM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
>> On Fri, Apr 27, 2018 at 11:33:29AM -0600, Chris Mason wrote:
>>> My goal for the fsync tree log was to make it just do the right thing most
>>> of the time.  We mostly got there, thanks to a ton of fixes and test cases
>>> from Filipe.
>>>
>>> fsync(some file) -- all the names for this file will exist, without having
>>> to fsync the directory.
>>>
>>> fsync(some dir) -- ugh, don't fsync the directory.  But if you do, all the
>>> files/subdirs will exist, and unlinks will be done and renames will be
>>> included.  This is slow and may require a full FS commit, which is why we
>>> don't want dirs fsunk.
>>
>> What ext4 does is this:
>>
>> fsync(some file) -- for a newly created file, the filename that it was
>>         created under will exist.  If the file has a hard-link added,
>>         the hard link is not guarnateed to be written to disk
>>
>> fsync(some dir) -- all changes to file names in thentee directory will
>>         exist after the crash.  It does *not* guarantee that any data
>>         changes to any of files in the directories will persist after
>>         a crash.

In the spirit of clarifying fsync behavior, we have one more case
where we'd like to find out what should be expected.

Consider this:

Mkdir A
Creat A/bar
Fsync A/bar
Rename A to B
Fsync B/bar
-- Crash --

A/bar has been fsynced previously, so its not a newly created file.
After the crash, in ext4 and btrfs, can we expect directory B and
B/bar to exist?

I know this is not POSIX compliant, but from prior comments, it seems
like both ext4 and btrfs would like to persist directory entries upon
fsync of newly created files. So we were wondering if this extended to
this case.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-29 20:55             ` Vijay Chidambaram
@ 2018-04-29 22:16               ` Theodore Y. Ts'o
  2018-04-29 23:21                 ` Vijay Chidambaram
  2018-04-30 14:30                 ` Chris Mason
  0 siblings, 2 replies; 18+ messages in thread
From: Theodore Y. Ts'o @ 2018-04-29 22:16 UTC (permalink / raw)
  To: Vijay Chidambaram
  Cc: Jayashree Mohan, Chris Mason, David Sterba, linux-btrfs, fstests,
	Filipe Manana

On Sun, Apr 29, 2018 at 03:55:39PM -0500, Vijay Chidambaram wrote:
> In the spirit of clarifying fsync behavior, we have one more case
> where we'd like to find out what should be expected.
> 
> Consider this:
> 
> Mkdir A
> Creat A/bar
> Fsync A/bar
> Rename A to B
> Fsync B/bar
> -- Crash --
> 
> A/bar has been fsynced previously, so its not a newly created file.
> After the crash, in ext4 and btrfs, can we expect directory B and
> B/bar to exist?

or ext4, no.  The POSIX semantics apply: bar will *either* be in A,
or in B.

If you modify the file bar such that the mod time has been updated,
then fsync(2) --- but not necessarily fdatasync(2) --- will cause the
inode modifications to be written committed, and this will cause the
updates to directory B from the rename to be committed as a
side-effect.

Note though that there are plenty of people who consider this to be a
performance bug, and not a feature, and there have been papers
proposed by your fellow academics that if implemented, would change
this to no longer be true.

In general with these sorts of things it would be useful to reason
about this in the context of real world applications and why they want
such guarantees.  These guarantees can cost performance hits, and so
there is a cost/benefit tradeoff involved.  So my preference is to
negotiate with applicationt writes, and ask *why* they want such
guarantees, and to explore whether there better ways of achieving
their high level goals before we legislate this to be an iron-clad
commitment which might application A happy, but performance-seeking
user B unhappy.

> I know this is not POSIX compliant, but from prior comments, it seems
> like both ext4 and btrfs would like to persist directory entries upon
> fsync of newly created files. So we were wondering if this extended to
> this case.

We had real world examples of users/applications who suffered data
loss when the directory entries for newly created files were not
persisted.  It was on the basis of these complaints that we made this
commitment, since it seemed more important than the relatively minor
performance hit.

Cheers,

					- Ted


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-29 22:16               ` Theodore Y. Ts'o
@ 2018-04-29 23:21                 ` Vijay Chidambaram
  2018-04-30 14:30                 ` Chris Mason
  1 sibling, 0 replies; 18+ messages in thread
From: Vijay Chidambaram @ 2018-04-29 23:21 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Jayashree Mohan, Chris Mason, David Sterba, linux-btrfs, fstests,
	Filipe Manana

On Sun, Apr 29, 2018 at 5:16 PM, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> On Sun, Apr 29, 2018 at 03:55:39PM -0500, Vijay Chidambaram wrote:
>> In the spirit of clarifying fsync behavior, we have one more case
>> where we'd like to find out what should be expected.
>>
>> Consider this:
>>
>> Mkdir A
>> Creat A/bar
>> Fsync A/bar
>> Rename A to B
>> Fsync B/bar
>> -- Crash --
>>
>> A/bar has been fsynced previously, so its not a newly created file.
>> After the crash, in ext4 and btrfs, can we expect directory B and
>> B/bar to exist?
>
> or ext4, no.  The POSIX semantics apply: bar will *either* be in A,
> or in B.

Thank you for confirming this.

> If you modify the file bar such that the mod time has been updated,
> then fsync(2) --- but not necessarily fdatasync(2) --- will cause the
> inode modifications to be written committed, and this will cause the
> updates to directory B from the rename to be committed as a
> side-effect.
>
> Note though that there are plenty of people who consider this to be a
> performance bug, and not a feature, and there have been papers
> proposed by your fellow academics that if implemented, would change
> this to no longer be true.
>
> In general with these sorts of things it would be useful to reason
> about this in the context of real world applications and why they want
> such guarantees.  These guarantees can cost performance hits, and so
> there is a cost/benefit tradeoff involved.  So my preference is to
> negotiate with applicationt writes, and ask *why* they want such
> guarantees, and to explore whether there better ways of achieving
> their high level goals before we legislate this to be an iron-clad
> commitment which might application A happy, but performance-seeking
> user B unhappy.

I definitely agree that there is a performance trade-off for each
guarantee given by the file system. We are not suggesting any
particular set of guarantees, only trying to find out what are
supported by different file systems. I agree that application
use-cases seem like a good way to motivate the guarantees given by
file systems.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Inconsistent behavior of fsync in btrfs
  2018-04-29 22:16               ` Theodore Y. Ts'o
  2018-04-29 23:21                 ` Vijay Chidambaram
@ 2018-04-30 14:30                 ` Chris Mason
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Mason @ 2018-04-30 14:30 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: Vijay Chidambaram, Jayashree Mohan, David Sterba, linux-btrfs,
	fstests, Filipe Manana



On 29 Apr 2018, at 18:16, Theodore Y. Ts'o wrote:

> On Sun, Apr 29, 2018 at 03:55:39PM -0500, Vijay Chidambaram wrote:
>> In the spirit of clarifying fsync behavior, we have one more case
>> where we'd like to find out what should be expected.
>>
>> Consider this:
>>
>> Mkdir A
>> Creat A/bar
>> Fsync A/bar
>> Rename A to B
>> Fsync B/bar
>> -- Crash --
>>
>> A/bar has been fsynced previously, so its not a newly created file.
>> After the crash, in ext4 and btrfs, can we expect directory B and
>> B/bar to exist?
>
> or ext4, no.  The POSIX semantics apply: bar will *either* be in A,
> or in B.

Same for btrfs.  If the rename for B goes down, it'll be a side effect 
of other decisions and not on purpose.  I'd actually like for the rename 
to be on disk in the normal case, but we won't always be able to catch 
it.

>
> If you modify the file bar such that the mod time has been updated,
> then fsync(2) --- but not necessarily fdatasync(2) --- will cause the
> inode modifications to be written committed, and this will cause the
> updates to directory B from the rename to be committed as a
> side-effect.
>
> Note though that there are plenty of people who consider this to be a
> performance bug, and not a feature, and there have been papers
> proposed by your fellow academics that if implemented, would change
> this to no longer be true.
>
> In general with these sorts of things it would be useful to reason
> about this in the context of real world applications and why they want
> such guarantees.  These guarantees can cost performance hits, and so
> there is a cost/benefit tradeoff involved.  So my preference is to
> negotiate with applicationt writes, and ask *why* they want such
> guarantees, and to explore whether there better ways of achieving
> their high level goals before we legislate this to be an iron-clad
> commitment which might application A happy, but performance-seeking
> user B unhappy.
>
>> I know this is not POSIX compliant, but from prior comments, it seems
>> like both ext4 and btrfs would like to persist directory entries upon
>> fsync of newly created files. So we were wondering if this extended 
>> to
>> this case.
>
> We had real world examples of users/applications who suffered data
> loss when the directory entries for newly created files were not
> persisted.  It was on the basis of these complaints that we made this
> commitment, since it seemed more important than the relatively minor
> performance hit.
>

Agreeing with Ted and expanding a bit.  If fsync(some file) doesn't 
persist the name for that file, applications need to fsync the 
directories, which can be double the log commits.  Getting everything 
down to disk in one fsync() is much better for both the application and 
the FS.

-chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-04-30 14:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25  2:35 Inconsistent behavior of fsync in btrfs Jayashree Mohan
2018-04-25  3:08 ` Chris Murphy
     [not found] ` <CAJCQCtT7S9Qb-m3-7EXJPwtMT9nuUJjDykHi915KU+fc4fB-aQ@mail.gmail.com>
2018-04-25  3:10   ` Vijaychidambaram Velayudhan Pillai
2018-04-25  3:16   ` Vijaychidambaram Velayudhan Pillai
2018-04-25 12:36     ` Ashlie Martinez
2018-04-25 13:53       ` Ashlie Martinez
2018-04-26 16:28 ` Chris Mason
2018-04-27  0:59   ` Jayashree Mohan
2018-04-27 15:26     ` Chris Mason
2018-04-27 16:07     ` David Sterba
2018-04-27 17:33       ` Chris Mason
2018-04-27 20:53         ` Theodore Y. Ts'o
2018-04-27 23:24           ` Chris Murphy
2018-04-27 23:44           ` Jayashree Mohan
2018-04-29 20:55             ` Vijay Chidambaram
2018-04-29 22:16               ` Theodore Y. Ts'o
2018-04-29 23:21                 ` Vijay Chidambaram
2018-04-30 14:30                 ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.