All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
@ 2013-09-22 20:55 Filipe David Borba Manana
  2013-09-23  1:30 ` Miao Xie
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-09-22 20:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe David Borba Manana

Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.

Steps to reproduce the data loss issue:

$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs

Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8

        item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
                location key (257 INODE_ITEM 0) type FILE
                namelen 6 datalen 0 name: foobar
        item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
                inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
        item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
                inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)

After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
	item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
		location key (257 INODE_ITEM 0) type FILE
		namelen 6 datalen 0 name: foobar
	item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
		inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
	item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
		inode ref index 2 namelen 6 name: foobar
	item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
		extent data disk byte 12845056 nr 8192
		extent data offset 0 nr 8192 ram 8192
		extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
 fs/btrfs/super.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6ab0df5..557e38f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
 	struct btrfs_trans_handle *trans;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
 	struct btrfs_root *root = fs_info->tree_root;
+	int ret;
 
 	trace_btrfs_sync_fs(wait);
 
@@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
 		return 0;
 	}
 
+	ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
+	if (ret)
+		return ret;
+
 	btrfs_wait_all_ordered_extents(fs_info);
 
 	trans = btrfs_attach_transaction_barrier(root);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
@ 2013-09-23  1:30 ` Miao Xie
  2013-09-23  9:11   ` Filipe David Manana
  2013-09-23  9:23 ` [PATCH v2] " Filipe David Borba Manana
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Miao Xie @ 2013-09-23  1:30 UTC (permalink / raw)
  To: Filipe David Borba Manana; +Cc: linux-btrfs

On 	sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> wait for delayed work to finish before returning success to the
> caller. This change fixes this, ensuring that there's no data loss
> if a power failure happens right after fs sync returns success to
> the caller and before the next commit happens.
> 
> Steps to reproduce the data loss issue:
> 
> $ mkfs.btrfs -f /dev/sdb3
> $ mount /dev/sdb3 /mnt/btrfs
> $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> 
> Right after the btrfs fi sync command (a second or 2 for example), power
> off the machine and reboot it. The file will be empty, as it can be verified
> after mounting the filesystem and through btrfs-debug-tree:
> 
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> 
>         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>                 location key (257 INODE_ITEM 0) type FILE
>                 namelen 6 datalen 0 name: foobar
>         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>                 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
>         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>                 inode ref index 2 namelen 6 name: foobar
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> leaf 29429760 items 0 free space 3995 generation 7 owner 7
> fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> uuid tree key (UUID_TREE ROOT_ITEM 0)
> 
> After this patch, the data loss no longer happens after a power failure and
> btrfs-debug-tree shows:
> 
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> 	item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> 		location key (257 INODE_ITEM 0) type FILE
> 		namelen 6 datalen 0 name: foobar
> 	item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> 		inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> 	item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> 		inode ref index 2 namelen 6 name: foobar
> 	item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> 		extent data disk byte 12845056 nr 8192
> 		extent data offset 0 nr 8192 ram 8192
> 		extent compression 0
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> 
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> ---
>  fs/btrfs/super.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 6ab0df5..557e38f 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>  	struct btrfs_trans_handle *trans;
>  	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
>  	struct btrfs_root *root = fs_info->tree_root;
> +	int ret;
>  
>  	trace_btrfs_sync_fs(wait);
>  
> @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>  		return 0;
>  	}
>  
> +	ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> +	if (ret)
> +		return ret;
> +

I don't think we should call btrfs_start_all_delalloc_inodes(), because this function is also
called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add
btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary.

Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix this problem.

Thanks
Miao

>  	btrfs_wait_all_ordered_extents(fs_info);
>  
>  	trans = btrfs_attach_transaction_barrier(root);
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-23  1:30 ` Miao Xie
@ 2013-09-23  9:11   ` Filipe David Manana
  2013-09-23  9:51     ` Liu Bo
  0 siblings, 1 reply; 11+ messages in thread
From: Filipe David Manana @ 2013-09-23  9:11 UTC (permalink / raw)
  To: Miao Xie; +Cc: linux-btrfs

On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie <miaox@cn.fujitsu.com> wrote:
>
> On      sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > wait for delayed work to finish before returning success to the
> > caller. This change fixes this, ensuring that there's no data loss
> > if a power failure happens right after fs sync returns success to
> > the caller and before the next commit happens.
> >
> > Steps to reproduce the data loss issue:
> >
> > $ mkfs.btrfs -f /dev/sdb3
> > $ mount /dev/sdb3 /mnt/btrfs
> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> >
> > Right after the btrfs fi sync command (a second or 2 for example), power
> > off the machine and reboot it. The file will be empty, as it can be verified
> > after mounting the filesystem and through btrfs-debug-tree:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> >
> >         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> >                 location key (257 INODE_ITEM 0) type FILE
> >                 namelen 6 datalen 0 name: foobar
> >         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> >                 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> >         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> >                 inode ref index 2 namelen 6 name: foobar
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > uuid tree key (UUID_TREE ROOT_ITEM 0)
> >
> > After this patch, the data loss no longer happens after a power failure and
> > btrfs-debug-tree shows:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> >       item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> >               location key (257 INODE_ITEM 0) type FILE
> >               namelen 6 datalen 0 name: foobar
> >       item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> >               inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> >       item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> >               inode ref index 2 namelen 6 name: foobar
> >       item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> >               extent data disk byte 12845056 nr 8192
> >               extent data offset 0 nr 8192 ram 8192
> >               extent compression 0
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> >
> > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> > ---
> >  fs/btrfs/super.c |    5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index 6ab0df5..557e38f 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> >       struct btrfs_trans_handle *trans;
> >       struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> >       struct btrfs_root *root = fs_info->tree_root;
> > +     int ret;
> >
> >       trace_btrfs_sync_fs(wait);
> >
> > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> >               return 0;
> >       }
> >
> > +     ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> > +     if (ret)
> > +             return ret;
> > +
>
> I don't think we should call btrfs_start_all_delalloc_inodes(), because this function is also
> called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add
> btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary.

Where is that do_sync() function exactly? I'm not finding any with
that exact name in fs/btrfs/* nor fs/*
I used this approach because (besides working) it's what is done in
btrfs_commit_transaction() (via btrfs_start_delalloc_flush and
btrfs_wait_delalloc_flush). Why can it be like that in the transaction
commit and not in btrfs_sync_fs() ?

>
> Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix this problem.

Just tested it, and it works that way too. Uploading a new patch.

Thanks for the feedback/review Miao.

>
> Thanks
> Miao
>
> >       btrfs_wait_all_ordered_extents(fs_info);
> >
> >       trans = btrfs_attach_transaction_barrier(root);
> >
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
  2013-09-23  1:30 ` Miao Xie
@ 2013-09-23  9:23 ` Filipe David Borba Manana
  2013-09-23  9:53   ` Filipe David Manana
  2013-09-23 10:28 ` [PATCH v3] " Filipe David Borba Manana
  2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
  3 siblings, 1 reply; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-09-23  9:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe David Borba Manana

Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.

Steps to reproduce the data loss issue:

$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs

Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
        item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
                location key (257 INODE_ITEM 0) type FILE
                namelen 6 datalen 0 name: foobar
        item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
                inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
        item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
                inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)

After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
	item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
		location key (257 INODE_ITEM 0) type FILE
		namelen 6 datalen 0 name: foobar
	item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
		inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
	item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
		inode ref index 2 namelen 6 name: foobar
	item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
		extent data disk byte 12845056 nr 8192
		extent data offset 0 nr 8192 ram 8192
		extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---

V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
    suggested by Miao Xie.

 fs/btrfs/super.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6ab0df5..38b4392 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
 		return 0;
 	}
 
+	writeback_inodes_sb(sb, WB_REASON_SYNC);
 	btrfs_wait_all_ordered_extents(fs_info);
 
 	trans = btrfs_attach_transaction_barrier(root);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-23  9:11   ` Filipe David Manana
@ 2013-09-23  9:51     ` Liu Bo
  0 siblings, 0 replies; 11+ messages in thread
From: Liu Bo @ 2013-09-23  9:51 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: Miao Xie, linux-btrfs

On Mon, Sep 23, 2013 at 10:11:42AM +0100, Filipe David Manana wrote:
> On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie <miaox@cn.fujitsu.com> wrote:
> >
> > On      sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> > > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > > wait for delayed work to finish before returning success to the
> > > caller. This change fixes this, ensuring that there's no data loss
> > > if a power failure happens right after fs sync returns success to
> > > the caller and before the next commit happens.
> > >
> > > Steps to reproduce the data loss issue:
> > >
> > > $ mkfs.btrfs -f /dev/sdb3
> > > $ mount /dev/sdb3 /mnt/btrfs
> > > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> > >
> > > Right after the btrfs fi sync command (a second or 2 for example), power
> > > off the machine and reboot it. The file will be empty, as it can be verified
> > > after mounting the filesystem and through btrfs-debug-tree:
> > >
> > > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> > >
> > >         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > >                 location key (257 INODE_ITEM 0) type FILE
> > >                 namelen 6 datalen 0 name: foobar
> > >         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > >                 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> > >         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > >                 inode ref index 2 namelen 6 name: foobar
> > > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > > uuid tree key (UUID_TREE ROOT_ITEM 0)
> > >
> > > After this patch, the data loss no longer happens after a power failure and
> > > btrfs-debug-tree shows:
> > >
> > > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> > >       item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > >               location key (257 INODE_ITEM 0) type FILE
> > >               namelen 6 datalen 0 name: foobar
> > >       item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > >               inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> > >       item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > >               inode ref index 2 namelen 6 name: foobar
> > >       item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> > >               extent data disk byte 12845056 nr 8192
> > >               extent data offset 0 nr 8192 ram 8192
> > >               extent compression 0
> > > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > >
> > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> > > ---
> > >  fs/btrfs/super.c |    5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > > index 6ab0df5..557e38f 100644
> > > --- a/fs/btrfs/super.c
> > > +++ b/fs/btrfs/super.c
> > > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > >       struct btrfs_trans_handle *trans;
> > >       struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> > >       struct btrfs_root *root = fs_info->tree_root;
> > > +     int ret;
> > >
> > >       trace_btrfs_sync_fs(wait);
> > >
> > > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > >               return 0;
> > >       }
> > >
> > > +     ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> > > +     if (ret)
> > > +             return ret;
> > > +
> >
> > I don't think we should call btrfs_start_all_delalloc_inodes(), because this function is also
> > called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add
> > btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary.
> 
> Where is that do_sync() function exactly? I'm not finding any with
> that exact name in fs/btrfs/* nor fs/*

I think it should refer to sync_filesystem() :)

-liubo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-23  9:23 ` [PATCH v2] " Filipe David Borba Manana
@ 2013-09-23  9:53   ` Filipe David Manana
  2013-09-23  9:59     ` Liu Bo
  0 siblings, 1 reply; 11+ messages in thread
From: Filipe David Manana @ 2013-09-23  9:53 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe David Borba Manana

On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana
<fdmanana@gmail.com> wrote:
> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> wait for delayed work to finish before returning success to the
> caller. This change fixes this, ensuring that there's no data loss
> if a power failure happens right after fs sync returns success to
> the caller and before the next commit happens.
>
> Steps to reproduce the data loss issue:
>
> $ mkfs.btrfs -f /dev/sdb3
> $ mount /dev/sdb3 /mnt/btrfs
> $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
>
> Right after the btrfs fi sync command (a second or 2 for example), power
> off the machine and reboot it. The file will be empty, as it can be verified
> after mounting the filesystem and through btrfs-debug-tree:
>
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>                 location key (257 INODE_ITEM 0) type FILE
>                 namelen 6 datalen 0 name: foobar
>         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>                 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
>         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>                 inode ref index 2 namelen 6 name: foobar
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> leaf 29429760 items 0 free space 3995 generation 7 owner 7
> fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> uuid tree key (UUID_TREE ROOT_ITEM 0)
>
> After this patch, the data loss no longer happens after a power failure and
> btrfs-debug-tree shows:
>
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>                 location key (257 INODE_ITEM 0) type FILE
>                 namelen 6 datalen 0 name: foobar
>         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>                 inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
>         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>                 inode ref index 2 namelen 6 name: foobar
>         item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
>                 extent data disk byte 12845056 nr 8192
>                 extent data offset 0 nr 8192 ram 8192
>                 extent compression 0
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> ---
>
> V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
>     suggested by Miao Xie.
>
>  fs/btrfs/super.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 6ab0df5..38b4392 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>                 return 0;
>         }
>
> +       writeback_inodes_sb(sb, WB_REASON_SYNC);
>         btrfs_wait_all_ordered_extents(fs_info);

Ignore this 2nd patch version please, for 2 reasons:

1) It triggers a WARN_ON because writeback_inodes_sb() requires the
sb->u_mount semaphore to be acquired before, which is not always the
case (it is when called through btrfs_kill_super, otherwise it isn't)

2) It doesn't guarantee that  inodes are actually written (see comment
of writeback_inodes_sb()), so we can return 0 (success) when the
writes actually didn't happen/succeed. Because of this,
btrfs_start_all_delalloc_inodes() is more honest.


>
>         trans = btrfs_attach_transaction_barrier(root);
> --
> 1.7.9.5
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-23  9:53   ` Filipe David Manana
@ 2013-09-23  9:59     ` Liu Bo
  2013-09-23 10:06       ` Filipe David Manana
  0 siblings, 1 reply; 11+ messages in thread
From: Liu Bo @ 2013-09-23  9:59 UTC (permalink / raw)
  To: Filipe David Manana; +Cc: linux-btrfs

On Mon, Sep 23, 2013 at 10:53:20AM +0100, Filipe David Manana wrote:
> On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana
> <fdmanana@gmail.com> wrote:
> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > wait for delayed work to finish before returning success to the
> > caller. This change fixes this, ensuring that there's no data loss
> > if a power failure happens right after fs sync returns success to
> > the caller and before the next commit happens.
> >
> > Steps to reproduce the data loss issue:
> >
> > $ mkfs.btrfs -f /dev/sdb3
> > $ mount /dev/sdb3 /mnt/btrfs
> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> >
> > Right after the btrfs fi sync command (a second or 2 for example), power
> > off the machine and reboot it. The file will be empty, as it can be verified
> > after mounting the filesystem and through btrfs-debug-tree:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> >         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> >                 location key (257 INODE_ITEM 0) type FILE
> >                 namelen 6 datalen 0 name: foobar
> >         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> >                 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> >         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> >                 inode ref index 2 namelen 6 name: foobar
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > uuid tree key (UUID_TREE ROOT_ITEM 0)
> >
> > After this patch, the data loss no longer happens after a power failure and
> > btrfs-debug-tree shows:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> >         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> >                 location key (257 INODE_ITEM 0) type FILE
> >                 namelen 6 datalen 0 name: foobar
> >         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> >                 inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> >         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> >                 inode ref index 2 namelen 6 name: foobar
> >         item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> >                 extent data disk byte 12845056 nr 8192
> >                 extent data offset 0 nr 8192 ram 8192
> >                 extent compression 0
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> >
> > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> > ---
> >
> > V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
> >     suggested by Miao Xie.
> >
> >  fs/btrfs/super.c |    1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index 6ab0df5..38b4392 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> >                 return 0;
> >         }
> >
> > +       writeback_inodes_sb(sb, WB_REASON_SYNC);
> >         btrfs_wait_all_ordered_extents(fs_info);
> 
> Ignore this 2nd patch version please, for 2 reasons:
> 
> 1) It triggers a WARN_ON because writeback_inodes_sb() requires the
> sb->u_mount semaphore to be acquired before, which is not always the
> case (it is when called through btrfs_kill_super, otherwise it isn't)
> 
> 2) It doesn't guarantee that  inodes are actually written (see comment
> of writeback_inodes_sb()), so we can return 0 (success) when the
> writes actually didn't happen/succeed. Because of this,
> btrfs_start_all_delalloc_inodes() is more honest.

What about
        case BTRFS_IOC_SYNC:
		btrfs_start_all_delalloc_inodes();
                btrfs_sync_fs(file->f_dentry->d_sb, 1);
                return 0;

This way, there is no impact on calling sync(1).

-liubo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-23  9:59     ` Liu Bo
@ 2013-09-23 10:06       ` Filipe David Manana
  0 siblings, 0 replies; 11+ messages in thread
From: Filipe David Manana @ 2013-09-23 10:06 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs

On Mon, Sep 23, 2013 at 10:59 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Mon, Sep 23, 2013 at 10:53:20AM +0100, Filipe David Manana wrote:
>> On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana
>> <fdmanana@gmail.com> wrote:
>> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
>> > wait for delayed work to finish before returning success to the
>> > caller. This change fixes this, ensuring that there's no data loss
>> > if a power failure happens right after fs sync returns success to
>> > the caller and before the next commit happens.
>> >
>> > Steps to reproduce the data loss issue:
>> >
>> > $ mkfs.btrfs -f /dev/sdb3
>> > $ mount /dev/sdb3 /mnt/btrfs
>> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
>> >
>> > Right after the btrfs fi sync command (a second or 2 for example), power
>> > off the machine and reboot it. The file will be empty, as it can be verified
>> > after mounting the filesystem and through btrfs-debug-tree:
>> >
>> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>> >         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>> >                 location key (257 INODE_ITEM 0) type FILE
>> >                 namelen 6 datalen 0 name: foobar
>> >         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>> >                 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
>> >         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>> >                 inode ref index 2 namelen 6 name: foobar
>> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> > leaf 29429760 items 0 free space 3995 generation 7 owner 7
>> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
>> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
>> > uuid tree key (UUID_TREE ROOT_ITEM 0)
>> >
>> > After this patch, the data loss no longer happens after a power failure and
>> > btrfs-debug-tree shows:
>> >
>> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>> >         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>> >                 location key (257 INODE_ITEM 0) type FILE
>> >                 namelen 6 datalen 0 name: foobar
>> >         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>> >                 inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
>> >         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>> >                 inode ref index 2 namelen 6 name: foobar
>> >         item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
>> >                 extent data disk byte 12845056 nr 8192
>> >                 extent data offset 0 nr 8192 ram 8192
>> >                 extent compression 0
>> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> >
>> > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>> > ---
>> >
>> > V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
>> >     suggested by Miao Xie.
>> >
>> >  fs/btrfs/super.c |    1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> > index 6ab0df5..38b4392 100644
>> > --- a/fs/btrfs/super.c
>> > +++ b/fs/btrfs/super.c
>> > @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>> >                 return 0;
>> >         }
>> >
>> > +       writeback_inodes_sb(sb, WB_REASON_SYNC);
>> >         btrfs_wait_all_ordered_extents(fs_info);
>>
>> Ignore this 2nd patch version please, for 2 reasons:
>>
>> 1) It triggers a WARN_ON because writeback_inodes_sb() requires the
>> sb->u_mount semaphore to be acquired before, which is not always the
>> case (it is when called through btrfs_kill_super, otherwise it isn't)
>>
>> 2) It doesn't guarantee that  inodes are actually written (see comment
>> of writeback_inodes_sb()), so we can return 0 (success) when the
>> writes actually didn't happen/succeed. Because of this,
>> btrfs_start_all_delalloc_inodes() is more honest.
>
> What about
>         case BTRFS_IOC_SYNC:
>                 btrfs_start_all_delalloc_inodes();
>                 btrfs_sync_fs(file->f_dentry->d_sb, 1);
>                 return 0;
>
> This way, there is no impact on calling sync(1).

Sounds ok. Will try it, returning error if
btrfs_start_all_delalloc_inodes() returns an error.
Thanks for the suggestion and pointing me to sync_filesystem() :)

>
> -liubo



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
  2013-09-23  1:30 ` Miao Xie
  2013-09-23  9:23 ` [PATCH v2] " Filipe David Borba Manana
@ 2013-09-23 10:28 ` Filipe David Borba Manana
  2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
  3 siblings, 0 replies; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-09-23 10:28 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe David Borba Manana

Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.

Steps to reproduce the data loss issue:

$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs

Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
        item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
                location key (257 INODE_ITEM 0) type FILE
                namelen 6 datalen 0 name: foobar
        item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
                inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
        item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
                inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)

After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
	item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
		location key (257 INODE_ITEM 0) type FILE
		namelen 6 datalen 0 name: foobar
	item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
		inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
	item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
		inode ref index 2 namelen 6 name: foobar
	item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
		extent data disk byte 12845056 nr 8192
		extent data offset 0 nr 8192 ram 8192
		extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---

V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
    suggested by Miao Xie.
V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(),
    in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest
    because it doesn't guarantee inode data is persisted and we have no way
    to know if persistence really happened or not, returning 0 (success) always.
    Thanks Liu Bo for the suggestion.


 fs/btrfs/ioctl.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9d46f60..8792fc8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_logical_to_ino(root, argp);
 	case BTRFS_IOC_SPACE_INFO:
 		return btrfs_ioctl_space_info(root, argp);
-	case BTRFS_IOC_SYNC:
+	case BTRFS_IOC_SYNC: {
+		int ret;
+
+		ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0);
+		if (ret)
+			return ret;
 		btrfs_sync_fs(file->f_dentry->d_sb, 1);
 		return 0;
+	}
 	case BTRFS_IOC_START_SYNC:
 		return btrfs_ioctl_start_sync(root, argp);
 	case BTRFS_IOC_WAIT_SYNC:
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
                   ` (2 preceding siblings ...)
  2013-09-23 10:28 ` [PATCH v3] " Filipe David Borba Manana
@ 2013-09-23 10:35 ` Filipe David Borba Manana
  2013-09-24  1:31   ` Miao Xie
  3 siblings, 1 reply; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-09-23 10:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe David Borba Manana

Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.

Steps to reproduce the data loss issue:

$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs

Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
        item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
                location key (257 INODE_ITEM 0) type FILE
                namelen 6 datalen 0 name: foobar
        item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
                inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
        item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
                inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)

After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
	item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
		location key (257 INODE_ITEM 0) type FILE
		namelen 6 datalen 0 name: foobar
	item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
		inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
	item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
		inode ref index 2 namelen 6 name: foobar
	item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
		extent data disk byte 12845056 nr 8192
		extent data offset 0 nr 8192 ram 8192
		extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---

V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
    suggested by Miao Xie.
V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(),
    in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest
    because it doesn't guarantee inode data is persisted and we have no way
    to know if persistence really happened or not, returning 0 (success) always.
    Thanks Liu Bo for the suggestion.
V4: Be even more honest in the sync IOCTL handler - don't always return success
    regardless of the result of the btrfs_sync_fs() call.

 fs/btrfs/ioctl.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9d46f60..385c58f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_logical_to_ino(root, argp);
 	case BTRFS_IOC_SPACE_INFO:
 		return btrfs_ioctl_space_info(root, argp);
-	case BTRFS_IOC_SYNC:
-		btrfs_sync_fs(file->f_dentry->d_sb, 1);
-		return 0;
+	case BTRFS_IOC_SYNC: {
+		int ret;
+
+		ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0);
+		if (ret)
+			return ret;
+		ret = btrfs_sync_fs(file->f_dentry->d_sb, 1);
+		return ret;
+	}
 	case BTRFS_IOC_START_SYNC:
 		return btrfs_ioctl_start_sync(root, argp);
 	case BTRFS_IOC_WAIT_SYNC:
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4] Btrfs: fix sync fs to actually wait for all data to be persisted
  2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
@ 2013-09-24  1:31   ` Miao Xie
  0 siblings, 0 replies; 11+ messages in thread
From: Miao Xie @ 2013-09-24  1:31 UTC (permalink / raw)
  To: Filipe David Borba Manana; +Cc: linux-btrfs

On 	mon, 23 Sep 2013 11:35:11 +0100, Filipe David Borba Manana wrote:
> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> wait for delayed work to finish before returning success to the
> caller. This change fixes this, ensuring that there's no data loss
> if a power failure happens right after fs sync returns success to
> the caller and before the next commit happens.
> 
> Steps to reproduce the data loss issue:
> 
> $ mkfs.btrfs -f /dev/sdb3
> $ mount /dev/sdb3 /mnt/btrfs
> $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> 
> Right after the btrfs fi sync command (a second or 2 for example), power
> off the machine and reboot it. The file will be empty, as it can be verified
> after mounting the filesystem and through btrfs-debug-tree:
> 
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>         item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>                 location key (257 INODE_ITEM 0) type FILE
>                 namelen 6 datalen 0 name: foobar
>         item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>                 inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
>         item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>                 inode ref index 2 namelen 6 name: foobar
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> leaf 29429760 items 0 free space 3995 generation 7 owner 7
> fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> uuid tree key (UUID_TREE ROOT_ITEM 0)
> 
> After this patch, the data loss no longer happens after a power failure and
> btrfs-debug-tree shows:
> 
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> 	item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> 		location key (257 INODE_ITEM 0) type FILE
> 		namelen 6 datalen 0 name: foobar
> 	item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> 		inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> 	item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> 		inode ref index 2 namelen 6 name: foobar
> 	item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> 		extent data disk byte 12845056 nr 8192
> 		extent data offset 0 nr 8192 ram 8192
> 		extent compression 0
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> 
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>

Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>

> ---
> 
> V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
>     suggested by Miao Xie.
> V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(),
>     in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest
>     because it doesn't guarantee inode data is persisted and we have no way
>     to know if persistence really happened or not, returning 0 (success) always.
>     Thanks Liu Bo for the suggestion.
> V4: Be even more honest in the sync IOCTL handler - don't always return success
>     regardless of the result of the btrfs_sync_fs() call.
> 
>  fs/btrfs/ioctl.c |   12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 9d46f60..385c58f 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int
>  		return btrfs_ioctl_logical_to_ino(root, argp);
>  	case BTRFS_IOC_SPACE_INFO:
>  		return btrfs_ioctl_space_info(root, argp);
> -	case BTRFS_IOC_SYNC:
> -		btrfs_sync_fs(file->f_dentry->d_sb, 1);
> -		return 0;
> +	case BTRFS_IOC_SYNC: {
> +		int ret;
> +
> +		ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0);
> +		if (ret)
> +			return ret;
> +		ret = btrfs_sync_fs(file->f_dentry->d_sb, 1);
> +		return ret;
> +	}
>  	case BTRFS_IOC_START_SYNC:
>  		return btrfs_ioctl_start_sync(root, argp);
>  	case BTRFS_IOC_WAIT_SYNC:
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-09-24  1:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
2013-09-23  1:30 ` Miao Xie
2013-09-23  9:11   ` Filipe David Manana
2013-09-23  9:51     ` Liu Bo
2013-09-23  9:23 ` [PATCH v2] " Filipe David Borba Manana
2013-09-23  9:53   ` Filipe David Manana
2013-09-23  9:59     ` Liu Bo
2013-09-23 10:06       ` Filipe David Manana
2013-09-23 10:28 ` [PATCH v3] " Filipe David Borba Manana
2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
2013-09-24  1:31   ` Miao Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.