All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug] GNOME loses all settings following failure to resume from suspend
@ 2022-01-05 17:34 Chris Murphy
  2022-01-05 18:04 ` Filipe Manana
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Murphy @ 2022-01-05 17:34 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: Josef Bacik

https://gitlab.gnome.org/GNOME/dconf/-/issues/73

Following a crash, instead of either the old or new dconf database
file being present, a corrupt one is present.

dconf uses g_file_set_contents() to atomically update the database
file, which effectively inhibits (one or more?) fsync's, yet somehow
in the crash/powerfail case this is resulting in a corrupt dconf
database. I don't know if by "corrupt" this is a 0 length file or some
other effect.

Thanks,

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 17:34 [bug] GNOME loses all settings following failure to resume from suspend Chris Murphy
@ 2022-01-05 18:04 ` Filipe Manana
  2022-01-05 18:32   ` Filipe Manana
                     ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Filipe Manana @ 2022-01-05 18:04 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Josef Bacik

On Wed, Jan 05, 2022 at 10:34:10AM -0700, Chris Murphy wrote:
> https://gitlab.gnome.org/GNOME/dconf/-/issues/73
> 
> Following a crash, instead of either the old or new dconf database
> file being present, a corrupt one is present.
> 
> dconf uses g_file_set_contents() to atomically update the database
> file, which effectively inhibits (one or more?) fsync's, yet somehow
> in the crash/powerfail case this is resulting in a corrupt dconf
> database. I don't know if by "corrupt" this is a 0 length file or some
> other effect.

Looking at the issue, Sebastian Keller posted a patch to disable one
optimization in glib, that should fix it.

Looking at the code before that patch, it explicitly skips fsync after
a rename pointing out to:

https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F

I'm afraid that information is wrong, perhaps it might have been true in
some very distant past, but certainly not for many years.

The wiki says, doing something like this:

   echo "oldcontent" > file

   # make sure oldcontent is on disk
   sync

   echo "newcontent" > file.tmp
   mv -f file.tmp file

   # *crash*

Will give either:

file contains "newcontent"; file.tmp does not exist
file contains "oldcontent"; file.tmp may contain "newcontent", be zero-length or not exists at all.

However that's not true, there's a chance 'file' will be empty.

During a rename with overwrite we trigger writeback (via filemap_flush())
of the file being renamed but never wait for it to complete before
returning to user space. So what can happen is:

1) We trigger writeback;

2) We join the current transaction and do the rename;

3) We return from rename to user space with success (0);

4) Writeback didn't finish yet, or it has finished but the
   ordered extent is not yet complete - i.e. btrfs_finish_ordered_io()
   did not complete yet;

5) The transaction used by the rename is committed;
   A transaction commit does not wait for any writeback or in flight
   ordered extents to complete - except if the fs is mounted with
   "-o flushoncommit";

6) Crash

After mounting the fs again 'file' is empty and 'file.tmp' does not exists.

The only for that to guarantee 'file' is not empty and has the expected
data ("newcontent"), would be to mount with -o flushoncommit.

I don't think I have a wiki account enabled, but I'll see if I get that
updated soon.

Thanks.

> 
> Thanks,
> 
> -- 
> Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 18:04 ` Filipe Manana
@ 2022-01-05 18:32   ` Filipe Manana
  2022-01-05 18:34   ` Hugo Mills
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2022-01-05 18:32 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Josef Bacik

On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> On Wed, Jan 05, 2022 at 10:34:10AM -0700, Chris Murphy wrote:
> > https://gitlab.gnome.org/GNOME/dconf/-/issues/73
> > 
> > Following a crash, instead of either the old or new dconf database
> > file being present, a corrupt one is present.
> > 
> > dconf uses g_file_set_contents() to atomically update the database
> > file, which effectively inhibits (one or more?) fsync's, yet somehow
> > in the crash/powerfail case this is resulting in a corrupt dconf
> > database. I don't know if by "corrupt" this is a 0 length file or some
> > other effect.
> 
> Looking at the issue, Sebastian Keller posted a patch to disable one
> optimization in glib, that should fix it.
> 
> Looking at the code before that patch, it explicitly skips fsync after
> a rename pointing out to:
> 
> https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F
> 
> I'm afraid that information is wrong, perhaps it might have been true in
> some very distant past, but certainly not for many years.

After some digging in git history:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d875f95da43c6a8f18f77869f2ef26e9594fecc

So that guarantee existed until August 2014.

Not anymore, and no one should rely on it (except when using "-o flushoncommit").

> 
> The wiki says, doing something like this:
> 
>    echo "oldcontent" > file
> 
>    # make sure oldcontent is on disk
>    sync
> 
>    echo "newcontent" > file.tmp
>    mv -f file.tmp file
> 
>    # *crash*
> 
> Will give either:
> 
> file contains "newcontent"; file.tmp does not exist
> file contains "oldcontent"; file.tmp may contain "newcontent", be zero-length or not exists at all.
> 
> However that's not true, there's a chance 'file' will be empty.
> 
> During a rename with overwrite we trigger writeback (via filemap_flush())
> of the file being renamed but never wait for it to complete before
> returning to user space. So what can happen is:
> 
> 1) We trigger writeback;
> 
> 2) We join the current transaction and do the rename;
> 
> 3) We return from rename to user space with success (0);
> 
> 4) Writeback didn't finish yet, or it has finished but the
>    ordered extent is not yet complete - i.e. btrfs_finish_ordered_io()
>    did not complete yet;
> 
> 5) The transaction used by the rename is committed;
>    A transaction commit does not wait for any writeback or in flight
>    ordered extents to complete - except if the fs is mounted with
>    "-o flushoncommit";
> 
> 6) Crash
> 
> After mounting the fs again 'file' is empty and 'file.tmp' does not exists.
> 
> The only for that to guarantee 'file' is not empty and has the expected
> data ("newcontent"), would be to mount with -o flushoncommit.
> 
> I don't think I have a wiki account enabled, but I'll see if I get that
> updated soon.
> 
> Thanks.
> 
> > 
> > Thanks,
> > 
> > -- 
> > Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 18:04 ` Filipe Manana
  2022-01-05 18:32   ` Filipe Manana
@ 2022-01-05 18:34   ` Hugo Mills
  2022-01-05 20:38     ` Filipe Manana
  2022-01-05 18:40   ` Chris Murphy
  2022-01-09 17:04   ` Remi Gauvin
  3 siblings, 1 reply; 18+ messages in thread
From: Hugo Mills @ 2022-01-05 18:34 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Chris Murphy, Btrfs BTRFS, Josef Bacik

   Hi, Filipe,

On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> I don't think I have a wiki account enabled, but I'll see if I get that
> updated soon.

   If you can't (or don't want to), feel free to put the text you want
to replace it with here, and I'll update the wiki for you...

   Hugo.

-- 
Hugo Mills             | "There's a Martian war machine outside -- they want
hugo@... carfax.org.uk | to talk to you about a cure for the common cold."
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                           Stephen Franklin, Babylon 5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 18:04 ` Filipe Manana
  2022-01-05 18:32   ` Filipe Manana
  2022-01-05 18:34   ` Hugo Mills
@ 2022-01-05 18:40   ` Chris Murphy
  2022-01-05 20:32     ` Filipe Manana
  2022-01-09 17:04   ` Remi Gauvin
  3 siblings, 1 reply; 18+ messages in thread
From: Chris Murphy @ 2022-01-05 18:40 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Chris Murphy, Btrfs BTRFS, Josef Bacik, David Sterba

On Wed, Jan 5, 2022 at 11:04 AM Filipe Manana <fdmanana@kernel.org> wrote:
>
> Looking at the code before that patch, it explicitly skips fsync after
> a rename pointing out to:
>
> https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F
>
> I'm afraid that information is wrong, perhaps it might have been true in
> some very distant past, but certainly not for many years.

However, the system does suspend correctly and before that there
should be "Filesystems sync" message from the kernel (this is the case
on my laptops, unconfirmed for the GNOME bug case). [1]

If the sequence is:

write to file
no fsync
"filesystem sync" (I guess that's syncfs for all mounted filesystems? no idea)
suspend
crash

Still seems like the file should be either old or new, not 0 length?


> I don't think I have a wiki account enabled, but I'll see if I get that
> updated soon.

Thanks Felipe!

[1] "Filesystems sync" message appears to come from here

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/power/main.c?h=v5.16-rc8#n62

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/power/main.c?h=v5.16-rc8#n195




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 18:40   ` Chris Murphy
@ 2022-01-05 20:32     ` Filipe Manana
  0 siblings, 0 replies; 18+ messages in thread
From: Filipe Manana @ 2022-01-05 20:32 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Josef Bacik, David Sterba

On Wed, Jan 5, 2022 at 6:40 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Wed, Jan 5, 2022 at 11:04 AM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > Looking at the code before that patch, it explicitly skips fsync after
> > a rename pointing out to:
> >
> > https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F
> >
> > I'm afraid that information is wrong, perhaps it might have been true in
> > some very distant past, but certainly not for many years.
>
> However, the system does suspend correctly and before that there
> should be "Filesystems sync" message from the kernel (this is the case
> on my laptops, unconfirmed for the GNOME bug case). [1]
>
> If the sequence is:
>
> write to file
> no fsync
> "filesystem sync" (I guess that's syncfs for all mounted filesystems? no idea)
> suspend
> crash
>
> Still seems like the file should be either old or new, not 0 length?

Nop, it can still be 0.

So where that message is printed, before it, ksys_sync() is called,
which will indeed call btrfs' sync_fs() callback (btrfs_sync_fs()).

However btrfs_sync_fs() will only wait for existing ordered extents to
complete, and then commit the current transaction.
Ordered extents are created when writeback is started.

In the rename path (btrfs_rename()) we trigger writeback with
filemap_flush() (mm/filemap.c), but that does not guarantee that
I/O is started for all dirty pages:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/filemap.c?h=v5.16-rc8#n462

/**
 * filemap_flush - mostly a non-blocking flush
 * @mapping: target address_space
 *
 * This is a mostly non-blocking flush.  Not suitable for data-integrity
 * purposes - I/O may not be started against all dirty pages.
 *
 * Return: %0 on success, negative error code otherwise.
 */
int filemap_flush(struct address_space *mapping)
{
return __filemap_fdatawrite(mapping, WB_SYNC_NONE);
}
EXPORT_SYMBOL(filemap_flush);

I.e., we have no guarantee that writeback was started by the time the
rename returns to user space.

Regardless of that, it was not safe... a crash can happen after the
rename and before ksys_sync() is called.

>
>
> > I don't think I have a wiki account enabled, but I'll see if I get that
> > updated soon.
>
> Thanks Felipe!

Filipe

Thanks.

>
> [1] "Filesystems sync" message appears to come from here
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/power/main.c?h=v5.16-rc8#n62
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/power/main.c?h=v5.16-rc8#n195
>
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 18:34   ` Hugo Mills
@ 2022-01-05 20:38     ` Filipe Manana
  2022-01-05 21:31       ` Hugo Mills
  0 siblings, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2022-01-05 20:38 UTC (permalink / raw)
  To: Hugo Mills, Filipe Manana, Chris Murphy, Btrfs BTRFS, Josef Bacik

On Wed, Jan 5, 2022 at 6:34 PM Hugo Mills <hugo@carfax.org.uk> wrote:
>
>    Hi, Filipe,
>
> On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> > I don't think I have a wiki account enabled, but I'll see if I get that
> > updated soon.
>
>    If you can't (or don't want to), feel free to put the text you want
> to replace it with here, and I'll update the wiki for you...

Hi Hugo,

That would be great.
I don't have a concrete text, but as you are a native english speaker,
a version from you would sound better :)

Perhaps just mention that as of kernel 3.17 (and maybe point to that
commit too), the behaviour is no longer guaranteed, and we can end up
getting a file of 0 bytes.
So an explicit fsync on the file is needed (just like ext4 and other
filesystems).

I asked for an account creation before seeing your reply.
Anyway, if you want to do it, go ahead.

Thanks.

>
>    Hugo.
>
> --
> Hugo Mills             | "There's a Martian war machine outside -- they want
> hugo@... carfax.org.uk | to talk to you about a cure for the common cold."
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |                           Stephen Franklin, Babylon 5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 20:38     ` Filipe Manana
@ 2022-01-05 21:31       ` Hugo Mills
  2022-01-05 21:39         ` Hugo Mills
  0 siblings, 1 reply; 18+ messages in thread
From: Hugo Mills @ 2022-01-05 21:31 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Chris Murphy, Btrfs BTRFS, Josef Bacik

On Wed, Jan 05, 2022 at 08:38:37PM +0000, Filipe Manana wrote:
> On Wed, Jan 5, 2022 at 6:34 PM Hugo Mills <hugo@carfax.org.uk> wrote:
> >
> >    Hi, Filipe,
> >
> > On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> > > I don't think I have a wiki account enabled, but I'll see if I get that
> > > updated soon.
> >
> >    If you can't (or don't want to), feel free to put the text you want
> > to replace it with here, and I'll update the wiki for you...
> 
> Hi Hugo,
> 
> That would be great.
> I don't have a concrete text, but as you are a native english speaker,
> a version from you would sound better :)
> 
> Perhaps just mention that as of kernel 3.17 (and maybe point to that
> commit too), the behaviour is no longer guaranteed, and we can end up
> getting a file of 0 bytes.

   I'd rather not reinforce the wrong usage with an example of it. :)
Better to document the correct usage...

> So an explicit fsync on the file is needed (just like ext4 and other
> filesystems).

   At what point in the process does the fsync need to be done?
Before/after/instead of the sync?

   Hugo.

-- 
Hugo Mills             | Geek, n.:
hugo@... carfax.org.uk | Circus sideshow performer specialising in the eating
http://carfax.org.uk/  | of live animals.
PGP: E2AB1DE4          |                                                   OED

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 21:31       ` Hugo Mills
@ 2022-01-05 21:39         ` Hugo Mills
  2022-01-05 21:53           ` Hugo Mills
  0 siblings, 1 reply; 18+ messages in thread
From: Hugo Mills @ 2022-01-05 21:39 UTC (permalink / raw)
  To: Filipe Manana, Chris Murphy, Btrfs BTRFS, Josef Bacik

On Wed, Jan 05, 2022 at 09:31:57PM +0000, Hugo Mills wrote:
> On Wed, Jan 05, 2022 at 08:38:37PM +0000, Filipe Manana wrote:
> > On Wed, Jan 5, 2022 at 6:34 PM Hugo Mills <hugo@carfax.org.uk> wrote:
> > >
> > >    Hi, Filipe,
> > >
> > > On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> > > > I don't think I have a wiki account enabled, but I'll see if I get that
> > > > updated soon.
> > >
> > >    If you can't (or don't want to), feel free to put the text you want
> > > to replace it with here, and I'll update the wiki for you...
> > 
> > Hi Hugo,
> > 
> > That would be great.
> > I don't have a concrete text, but as you are a native english speaker,
> > a version from you would sound better :)
> > 
> > Perhaps just mention that as of kernel 3.17 (and maybe point to that
> > commit too), the behaviour is no longer guaranteed, and we can end up
> > getting a file of 0 bytes.
> 
>    I'd rather not reinforce the wrong usage with an example of it. :)
> Better to document the correct usage...
> 
> > So an explicit fsync on the file is needed (just like ext4 and other
> > filesystems).
> 
>    At what point in the process does the fsync need to be done?
> Before/after/instead of the sync?

   Hmm. That doesn't make sense, of course (sorry, it's late, I've had
a hard day). I'm guessing that the fsync needs to go after the write
of the new data and before the rename. Is there any other constraint
on what needs to be done to make this work safely?

   Hugo.

-- 
Hugo Mills             | Hickory Dickory Dock,
hugo@... carfax.org.uk | Three mice ran up the clock.
http://carfax.org.uk/  | The clock struck one,
PGP: E2AB1DE4          | The other two escaped with minor injuries

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 21:39         ` Hugo Mills
@ 2022-01-05 21:53           ` Hugo Mills
  2022-01-06  9:51             ` Filipe Manana
  0 siblings, 1 reply; 18+ messages in thread
From: Hugo Mills @ 2022-01-05 21:53 UTC (permalink / raw)
  To: Filipe Manana, Chris Murphy, Btrfs BTRFS, Josef Bacik

On Wed, Jan 05, 2022 at 09:39:21PM +0000, Hugo Mills wrote:
> On Wed, Jan 05, 2022 at 09:31:57PM +0000, Hugo Mills wrote:
> > On Wed, Jan 05, 2022 at 08:38:37PM +0000, Filipe Manana wrote:
> > > On Wed, Jan 5, 2022 at 6:34 PM Hugo Mills <hugo@carfax.org.uk> wrote:
> > > >
> > > >    Hi, Filipe,
> > > >
> > > > On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> > > > > I don't think I have a wiki account enabled, but I'll see if I get that
> > > > > updated soon.
> > > >
> > > >    If you can't (or don't want to), feel free to put the text you want
> > > > to replace it with here, and I'll update the wiki for you...
> > > 
> > > Hi Hugo,
> > > 
> > > That would be great.
> > > I don't have a concrete text, but as you are a native english speaker,
> > > a version from you would sound better :)
> > > 
> > > Perhaps just mention that as of kernel 3.17 (and maybe point to that
> > > commit too), the behaviour is no longer guaranteed, and we can end up
> > > getting a file of 0 bytes.
> > 
> >    I'd rather not reinforce the wrong usage with an example of it. :)
> > Better to document the correct usage...
> > 
> > > So an explicit fsync on the file is needed (just like ext4 and other
> > > filesystems).
> > 
> >    At what point in the process does the fsync need to be done?
> > Before/after/instead of the sync?
> 
>    Hmm. That doesn't make sense, of course (sorry, it's late, I've had
> a hard day). I'm guessing that the fsync needs to go after the write
> of the new data and before the rename. Is there any other constraint
> on what needs to be done to make this work safely?

   Right, I think I've got it. Ping me in the morning if it's not
correct.

   Hugo.

-- 
Hugo Mills             | Great films about cricket: Monster's No-Ball
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 21:53           ` Hugo Mills
@ 2022-01-06  9:51             ` Filipe Manana
  2022-01-06 10:20               ` Hugo Mills
  0 siblings, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2022-01-06  9:51 UTC (permalink / raw)
  To: Hugo Mills, Filipe Manana, Chris Murphy, Btrfs BTRFS, Josef Bacik

On Wed, Jan 5, 2022 at 9:54 PM Hugo Mills <hugo@carfax.org.uk> wrote:
>
> On Wed, Jan 05, 2022 at 09:39:21PM +0000, Hugo Mills wrote:
> > On Wed, Jan 05, 2022 at 09:31:57PM +0000, Hugo Mills wrote:
> > > On Wed, Jan 05, 2022 at 08:38:37PM +0000, Filipe Manana wrote:
> > > > On Wed, Jan 5, 2022 at 6:34 PM Hugo Mills <hugo@carfax.org.uk> wrote:
> > > > >
> > > > >    Hi, Filipe,
> > > > >
> > > > > On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> > > > > > I don't think I have a wiki account enabled, but I'll see if I get that
> > > > > > updated soon.
> > > > >
> > > > >    If you can't (or don't want to), feel free to put the text you want
> > > > > to replace it with here, and I'll update the wiki for you...
> > > >
> > > > Hi Hugo,
> > > >
> > > > That would be great.
> > > > I don't have a concrete text, but as you are a native english speaker,
> > > > a version from you would sound better :)
> > > >
> > > > Perhaps just mention that as of kernel 3.17 (and maybe point to that
> > > > commit too), the behaviour is no longer guaranteed, and we can end up
> > > > getting a file of 0 bytes.
> > >
> > >    I'd rather not reinforce the wrong usage with an example of it. :)
> > > Better to document the correct usage...
> > >
> > > > So an explicit fsync on the file is needed (just like ext4 and other
> > > > filesystems).
> > >
> > >    At what point in the process does the fsync need to be done?
> > > Before/after/instead of the sync?
> >
> >    Hmm. That doesn't make sense, of course (sorry, it's late, I've had
> > a hard day). I'm guessing that the fsync needs to go after the write
> > of the new data and before the rename. Is there any other constraint
> > on what needs to be done to make this work safely?
>
>    Right, I think I've got it. Ping me in the morning if it's not
> correct.

Yep, that's correct Hugo.

Starting with 3.17, the example on the wiki needs an fsync on
"file.tmp" after writing to it and before renaming it over "file".
I.e. the full example should be:

****
echo "oldcontent" > file

# make sure oldcontent is on disk
sync

echo "newcontent" > file.tmp
fsync file.tmp
mv -f file.tmp file

# *crash*

Will give either

file contains "newcontent"; file.tmp does not exist
file contains "oldcontent"; file.tmp may contain "newcontent", be
zero-length or not exists at all.
****

>
>    Hugo.
>
> --
> Hugo Mills             | Great films about cricket: Monster's No-Ball
> hugo@... carfax.org.uk |
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-06  9:51             ` Filipe Manana
@ 2022-01-06 10:20               ` Hugo Mills
  2022-01-06 10:27                 ` Filipe Manana
  2022-01-06 21:07                 ` Adam Borowski
  0 siblings, 2 replies; 18+ messages in thread
From: Hugo Mills @ 2022-01-06 10:20 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Chris Murphy, Btrfs BTRFS, Josef Bacik

On Thu, Jan 06, 2022 at 09:51:16AM +0000, Filipe Manana wrote:
> On Wed, Jan 5, 2022 at 9:54 PM Hugo Mills <hugo@carfax.org.uk> wrote:
> >
> > On Wed, Jan 05, 2022 at 09:39:21PM +0000, Hugo Mills wrote:
> > > On Wed, Jan 05, 2022 at 09:31:57PM +0000, Hugo Mills wrote:
> > > > On Wed, Jan 05, 2022 at 08:38:37PM +0000, Filipe Manana wrote:
> > > > > On Wed, Jan 5, 2022 at 6:34 PM Hugo Mills <hugo@carfax.org.uk> wrote:
> > > > > >
> > > > > >    Hi, Filipe,
> > > > > >
> > > > > > On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> > > > > > > I don't think I have a wiki account enabled, but I'll see if I get that
> > > > > > > updated soon.
> > > > > >
> > > > > >    If you can't (or don't want to), feel free to put the text you want
> > > > > > to replace it with here, and I'll update the wiki for you...
> > > > >
> > > > > Hi Hugo,
> > > > >
> > > > > That would be great.
> > > > > I don't have a concrete text, but as you are a native english speaker,
> > > > > a version from you would sound better :)
> > > > >
> > > > > Perhaps just mention that as of kernel 3.17 (and maybe point to that
> > > > > commit too), the behaviour is no longer guaranteed, and we can end up
> > > > > getting a file of 0 bytes.
> > > >
> > > >    I'd rather not reinforce the wrong usage with an example of it. :)
> > > > Better to document the correct usage...
> > > >
> > > > > So an explicit fsync on the file is needed (just like ext4 and other
> > > > > filesystems).
> > > >
> > > >    At what point in the process does the fsync need to be done?
> > > > Before/after/instead of the sync?
> > >
> > >    Hmm. That doesn't make sense, of course (sorry, it's late, I've had
> > > a hard day). I'm guessing that the fsync needs to go after the write
> > > of the new data and before the rename. Is there any other constraint
> > > on what needs to be done to make this work safely?
> >
> >    Right, I think I've got it. Ping me in the morning if it's not
> > correct.
> 
> Yep, that's correct Hugo.
> 
> Starting with 3.17, the example on the wiki needs an fsync on
> "file.tmp" after writing to it and before renaming it over "file".
> I.e. the full example should be:
> 
> ****
> echo "oldcontent" > file
> 
> # make sure oldcontent is on disk
> sync
> 
> echo "newcontent" > file.tmp
> fsync file.tmp
> mv -f file.tmp file
> 
> # *crash*
> 
> Will give either
> 
> file contains "newcontent"; file.tmp does not exist
> file contains "oldcontent"; file.tmp may contain "newcontent", be
> zero-length or not exists at all.
> ****

   Since I can't find an "fsync" command line tool, I've rewritten the
example in more general terms, rather than tying it to a specific
implementation (such as a sequence of shell commands). I note that
we've got a near-duplicate entry in the FAQ, two items down ("What are
the crash guarantees of rename?"), that should probably be removed.

Updated entry: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F
Candidate for deletion: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_rename.3F

   Hugo.

-- 
Hugo Mills             | I thought I'd discovered a new colour, but it was
hugo@... carfax.org.uk | just a pigment of my imagination.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-06 10:20               ` Hugo Mills
@ 2022-01-06 10:27                 ` Filipe Manana
  2022-01-06 20:02                   ` Chris Murphy
  2022-01-06 21:07                 ` Adam Borowski
  1 sibling, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2022-01-06 10:27 UTC (permalink / raw)
  To: Hugo Mills, Filipe Manana, Chris Murphy, Btrfs BTRFS, Josef Bacik

On Thu, Jan 6, 2022 at 10:21 AM Hugo Mills <hugo@carfax.org.uk> wrote:
>
> On Thu, Jan 06, 2022 at 09:51:16AM +0000, Filipe Manana wrote:
> > On Wed, Jan 5, 2022 at 9:54 PM Hugo Mills <hugo@carfax.org.uk> wrote:
> > >
> > > On Wed, Jan 05, 2022 at 09:39:21PM +0000, Hugo Mills wrote:
> > > > On Wed, Jan 05, 2022 at 09:31:57PM +0000, Hugo Mills wrote:
> > > > > On Wed, Jan 05, 2022 at 08:38:37PM +0000, Filipe Manana wrote:
> > > > > > On Wed, Jan 5, 2022 at 6:34 PM Hugo Mills <hugo@carfax.org.uk> wrote:
> > > > > > >
> > > > > > >    Hi, Filipe,
> > > > > > >
> > > > > > > On Wed, Jan 05, 2022 at 06:04:38PM +0000, Filipe Manana wrote:
> > > > > > > > I don't think I have a wiki account enabled, but I'll see if I get that
> > > > > > > > updated soon.
> > > > > > >
> > > > > > >    If you can't (or don't want to), feel free to put the text you want
> > > > > > > to replace it with here, and I'll update the wiki for you...
> > > > > >
> > > > > > Hi Hugo,
> > > > > >
> > > > > > That would be great.
> > > > > > I don't have a concrete text, but as you are a native english speaker,
> > > > > > a version from you would sound better :)
> > > > > >
> > > > > > Perhaps just mention that as of kernel 3.17 (and maybe point to that
> > > > > > commit too), the behaviour is no longer guaranteed, and we can end up
> > > > > > getting a file of 0 bytes.
> > > > >
> > > > >    I'd rather not reinforce the wrong usage with an example of it. :)
> > > > > Better to document the correct usage...
> > > > >
> > > > > > So an explicit fsync on the file is needed (just like ext4 and other
> > > > > > filesystems).
> > > > >
> > > > >    At what point in the process does the fsync need to be done?
> > > > > Before/after/instead of the sync?
> > > >
> > > >    Hmm. That doesn't make sense, of course (sorry, it's late, I've had
> > > > a hard day). I'm guessing that the fsync needs to go after the write
> > > > of the new data and before the rename. Is there any other constraint
> > > > on what needs to be done to make this work safely?
> > >
> > >    Right, I think I've got it. Ping me in the morning if it's not
> > > correct.
> >
> > Yep, that's correct Hugo.
> >
> > Starting with 3.17, the example on the wiki needs an fsync on
> > "file.tmp" after writing to it and before renaming it over "file".
> > I.e. the full example should be:
> >
> > ****
> > echo "oldcontent" > file
> >
> > # make sure oldcontent is on disk
> > sync
> >
> > echo "newcontent" > file.tmp
> > fsync file.tmp
> > mv -f file.tmp file
> >
> > # *crash*
> >
> > Will give either
> >
> > file contains "newcontent"; file.tmp does not exist
> > file contains "oldcontent"; file.tmp may contain "newcontent", be
> > zero-length or not exists at all.
> > ****
>
>    Since I can't find an "fsync" command line tool, I've rewritten the
> example in more general terms, rather than tying it to a specific
> implementation (such as a sequence of shell commands). I note that
> we've got a near-duplicate entry in the FAQ, two items down ("What are
> the crash guarantees of rename?"), that should probably be removed.
>
> Updated entry: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F
> Candidate for deletion: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_rename.3F

There's the xfs_io command from xfsprogs that can be used to trigger
an fsync like this:  xfs_io -c fsync /path/to/file
But it's probably not well known for non-developers.

Anyway, as it is, it looks perfect to me, thanks!

>
>    Hugo.
>
> --
> Hugo Mills             | I thought I'd discovered a new colour, but it was
> hugo@... carfax.org.uk | just a pigment of my imagination.
> http://carfax.org.uk/  |
> PGP: E2AB1DE4          |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-06 10:27                 ` Filipe Manana
@ 2022-01-06 20:02                   ` Chris Murphy
  2022-01-06 20:06                     ` Chris Murphy
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Murphy @ 2022-01-06 20:02 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Hugo Mills, Chris Murphy, Btrfs BTRFS, Josef Bacik

On Thu, Jan 6, 2022 at 3:28 AM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Thu, Jan 6, 2022 at 10:21 AM Hugo Mills <hugo@carfax.org.uk> wrote:
> >
> > > Yep, that's correct Hugo.
> > >
> > > Starting with 3.17, the example on the wiki needs an fsync on
> > > "file.tmp" after writing to it and before renaming it over "file".
> > > I.e. the full example should be:
> > >
> > > ****
> > > echo "oldcontent" > file
> > >
> > > # make sure oldcontent is on disk
> > > sync
> > >
> > > echo "newcontent" > file.tmp
> > > fsync file.tmp
> > > mv -f file.tmp file
> > >
> > > # *crash*
> > >
> > > Will give either
> > >
> > > file contains "newcontent"; file.tmp does not exist
> > > file contains "oldcontent"; file.tmp may contain "newcontent", be
> > > zero-length or not exists at all.
> > > ****
> >
> >    Since I can't find an "fsync" command line tool, I've rewritten the
> > example in more general terms, rather than tying it to a specific
> > implementation (such as a sequence of shell commands). I note that
> > we've got a near-duplicate entry in the FAQ, two items down ("What are
> > the crash guarantees of rename?"), that should probably be removed.
> >
> > Updated entry: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F
> > Candidate for deletion: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_rename.3F
>
> There's the xfs_io command from xfsprogs that can be used to trigger
> an fsync like this:  xfs_io -c fsync /path/to/file
> But it's probably not well known for non-developers.
>
> Anyway, as it is, it looks perfect to me, thanks!


I think it's OK to use pseudo-code. Folks will figure it out. So you
can just write it as fsync() and even if you're not using the exact
correct syntax it'll be understood.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-06 20:02                   ` Chris Murphy
@ 2022-01-06 20:06                     ` Chris Murphy
  2022-01-06 20:23                       ` Hugo Mills
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Murphy @ 2022-01-06 20:06 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Filipe Manana, Hugo Mills, Btrfs BTRFS, Josef Bacik

On Thu, Jan 6, 2022 at 1:02 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Thu, Jan 6, 2022 at 3:28 AM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > On Thu, Jan 6, 2022 at 10:21 AM Hugo Mills <hugo@carfax.org.uk> wrote:
> > >
> > > > Yep, that's correct Hugo.
> > > >
> > > > Starting with 3.17, the example on the wiki needs an fsync on
> > > > "file.tmp" after writing to it and before renaming it over "file".
> > > > I.e. the full example should be:
> > > >
> > > > ****
> > > > echo "oldcontent" > file
> > > >
> > > > # make sure oldcontent is on disk
> > > > sync
> > > >
> > > > echo "newcontent" > file.tmp
> > > > fsync file.tmp
> > > > mv -f file.tmp file
> > > >
> > > > # *crash*
> > > >
> > > > Will give either
> > > >
> > > > file contains "newcontent"; file.tmp does not exist
> > > > file contains "oldcontent"; file.tmp may contain "newcontent", be
> > > > zero-length or not exists at all.
> > > > ****
> > >
> > >    Since I can't find an "fsync" command line tool, I've rewritten the
> > > example in more general terms, rather than tying it to a specific
> > > implementation (such as a sequence of shell commands). I note that
> > > we've got a near-duplicate entry in the FAQ, two items down ("What are
> > > the crash guarantees of rename?"), that should probably be removed.
> > >
> > > Updated entry: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F
> > > Candidate for deletion: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_rename.3F
> >
> > There's the xfs_io command from xfsprogs that can be used to trigger
> > an fsync like this:  xfs_io -c fsync /path/to/file
> > But it's probably not well known for non-developers.
> >
> > Anyway, as it is, it looks perfect to me, thanks!
>
>
> I think it's OK to use pseudo-code. Folks will figure it out. So you
> can just write it as fsync() and even if you're not using the exact
> correct syntax it'll be understood.

Topical xxample:
https://lore.kernel.org/linux-xfs/6A65F394-C1BA-4339-AC9B-051885D12F65@corp.ovh.com/

Reminds me to ask Filipe if the example Hugo is writing up also needs
an fsync() on the enclosing directory *after* the rename?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-06 20:06                     ` Chris Murphy
@ 2022-01-06 20:23                       ` Hugo Mills
  0 siblings, 0 replies; 18+ messages in thread
From: Hugo Mills @ 2022-01-06 20:23 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Filipe Manana, Btrfs BTRFS, Josef Bacik

On Thu, Jan 06, 2022 at 01:06:40PM -0700, Chris Murphy wrote:
> On Thu, Jan 6, 2022 at 1:02 PM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > On Thu, Jan 6, 2022 at 3:28 AM Filipe Manana <fdmanana@kernel.org> wrote:
> > >
> > > On Thu, Jan 6, 2022 at 10:21 AM Hugo Mills <hugo@carfax.org.uk> wrote:
> > > >
> > > > > Yep, that's correct Hugo.
> > > > >
> > > > > Starting with 3.17, the example on the wiki needs an fsync on
> > > > > "file.tmp" after writing to it and before renaming it over "file".
> > > > > I.e. the full example should be:
> > > > >
> > > > > ****
> > > > > echo "oldcontent" > file
> > > > >
> > > > > # make sure oldcontent is on disk
> > > > > sync
> > > > >
> > > > > echo "newcontent" > file.tmp
> > > > > fsync file.tmp
> > > > > mv -f file.tmp file
> > > > >
> > > > > # *crash*
> > > > >
> > > > > Will give either
> > > > >
> > > > > file contains "newcontent"; file.tmp does not exist
> > > > > file contains "oldcontent"; file.tmp may contain "newcontent", be
> > > > > zero-length or not exists at all.
> > > > > ****
> > > >
> > > >    Since I can't find an "fsync" command line tool, I've rewritten the
> > > > example in more general terms, rather than tying it to a specific
> > > > implementation (such as a sequence of shell commands). I note that
> > > > we've got a near-duplicate entry in the FAQ, two items down ("What are
> > > > the crash guarantees of rename?"), that should probably be removed.
> > > >
> > > > Updated entry: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_overwrite-by-rename.3F
> > > > Candidate for deletion: https://btrfs.wiki.kernel.org/index.php/FAQ#What_are_the_crash_guarantees_of_rename.3F
> > >
> > > There's the xfs_io command from xfsprogs that can be used to trigger
> > > an fsync like this:  xfs_io -c fsync /path/to/file
> > > But it's probably not well known for non-developers.
> > >
> > > Anyway, as it is, it looks perfect to me, thanks!
> >
> >
> > I think it's OK to use pseudo-code. Folks will figure it out. So you
> > can just write it as fsync() and even if you're not using the exact
> > correct syntax it'll be understood.
> 
> Topical xxample:
> https://lore.kernel.org/linux-xfs/6A65F394-C1BA-4339-AC9B-051885D12F65@corp.ovh.com/

   I did wonder whether to write it as C-ish code, but I felt that the
prose description was sufficient, and didn't presuppose any particular
implementation.

> Reminds me to ask Filipe if the example Hugo is writing up also needs
> an fsync() on the enclosing directory *after* the rename?

   As I understand it, as long as the data is fsynced properly before
the move, the semantics are safe and atomic (if you don't get the new
data afterwards, you get the old data, regardless of where the crash
is). Syncing the directory and waiting for that to complete gives you
the guarantee that you'll get the new data from that point on of the
system crashes afterwards.

   Hugo.

-- 
Hugo Mills             | I hate housework. You make the beds, you wash the
hugo@... carfax.org.uk | dishes, and six months later you have to start all
http://carfax.org.uk/  | over again.
PGP: E2AB1DE4          |                                           Joan Rivers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-06 10:20               ` Hugo Mills
  2022-01-06 10:27                 ` Filipe Manana
@ 2022-01-06 21:07                 ` Adam Borowski
  1 sibling, 0 replies; 18+ messages in thread
From: Adam Borowski @ 2022-01-06 21:07 UTC (permalink / raw)
  To: Hugo Mills, Filipe Manana, Chris Murphy, Btrfs BTRFS, Josef Bacik

On Thu, Jan 06, 2022 at 10:20:56AM +0000, Hugo Mills wrote:
>    Since I can't find an "fsync" command line tool

sync /the/file


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Certified airhead; got the CT scan to prove that!
⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug] GNOME loses all settings following failure to resume from suspend
  2022-01-05 18:04 ` Filipe Manana
                     ` (2 preceding siblings ...)
  2022-01-05 18:40   ` Chris Murphy
@ 2022-01-09 17:04   ` Remi Gauvin
  3 siblings, 0 replies; 18+ messages in thread
From: Remi Gauvin @ 2022-01-09 17:04 UTC (permalink / raw)
  To: linux-btrfs

On 2022-01-05 1:04 p.m., Filipe Manana wrote:

> After mounting the fs again 'file' is empty and 'file.tmp' does not exists.
> 
> The only for that to guarantee 'file' is not empty and has the expected
> data ("newcontent"), would be to mount with -o flushoncommit.
> 



Was flushoncommit on the default back in kernel/tools version 4.4?  I
found an old Ubuntu man page that seemed to indicate as much.  That
would be a very radical and (for me at least.) unexpected change.



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-09 17:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-05 17:34 [bug] GNOME loses all settings following failure to resume from suspend Chris Murphy
2022-01-05 18:04 ` Filipe Manana
2022-01-05 18:32   ` Filipe Manana
2022-01-05 18:34   ` Hugo Mills
2022-01-05 20:38     ` Filipe Manana
2022-01-05 21:31       ` Hugo Mills
2022-01-05 21:39         ` Hugo Mills
2022-01-05 21:53           ` Hugo Mills
2022-01-06  9:51             ` Filipe Manana
2022-01-06 10:20               ` Hugo Mills
2022-01-06 10:27                 ` Filipe Manana
2022-01-06 20:02                   ` Chris Murphy
2022-01-06 20:06                     ` Chris Murphy
2022-01-06 20:23                       ` Hugo Mills
2022-01-06 21:07                 ` Adam Borowski
2022-01-05 18:40   ` Chris Murphy
2022-01-05 20:32     ` Filipe Manana
2022-01-09 17:04   ` Remi Gauvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.