* [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
@ 2013-09-22 20:55 Filipe David Borba Manana
2013-09-23 1:30 ` Miao Xie
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-09-22 20:55 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe David Borba Manana
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.
Steps to reproduce the data loss issue:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)
After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
extent data disk byte 12845056 nr 8192
extent data offset 0 nr 8192 ram 8192
extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
fs/btrfs/super.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6ab0df5..557e38f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
struct btrfs_trans_handle *trans;
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_root *root = fs_info->tree_root;
+ int ret;
trace_btrfs_sync_fs(wait);
@@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
return 0;
}
+ ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
+ if (ret)
+ return ret;
+
btrfs_wait_all_ordered_extents(fs_info);
trans = btrfs_attach_transaction_barrier(root);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
@ 2013-09-23 1:30 ` Miao Xie
2013-09-23 9:11 ` Filipe David Manana
2013-09-23 9:23 ` [PATCH v2] " Filipe David Borba Manana
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Miao Xie @ 2013-09-23 1:30 UTC (permalink / raw)
To: Filipe David Borba Manana; +Cc: linux-btrfs
On sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> wait for delayed work to finish before returning success to the
> caller. This change fixes this, ensuring that there's no data loss
> if a power failure happens right after fs sync returns success to
> the caller and before the next commit happens.
>
> Steps to reproduce the data loss issue:
>
> $ mkfs.btrfs -f /dev/sdb3
> $ mount /dev/sdb3 /mnt/btrfs
> $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
>
> Right after the btrfs fi sync command (a second or 2 for example), power
> off the machine and reboot it. The file will be empty, as it can be verified
> after mounting the filesystem and through btrfs-debug-tree:
>
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>
> item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> location key (257 INODE_ITEM 0) type FILE
> namelen 6 datalen 0 name: foobar
> item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> inode ref index 2 namelen 6 name: foobar
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> leaf 29429760 items 0 free space 3995 generation 7 owner 7
> fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> uuid tree key (UUID_TREE ROOT_ITEM 0)
>
> After this patch, the data loss no longer happens after a power failure and
> btrfs-debug-tree shows:
>
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> location key (257 INODE_ITEM 0) type FILE
> namelen 6 datalen 0 name: foobar
> item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> inode ref index 2 namelen 6 name: foobar
> item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> extent data disk byte 12845056 nr 8192
> extent data offset 0 nr 8192 ram 8192
> extent compression 0
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> ---
> fs/btrfs/super.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 6ab0df5..557e38f 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> struct btrfs_trans_handle *trans;
> struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> struct btrfs_root *root = fs_info->tree_root;
> + int ret;
>
> trace_btrfs_sync_fs(wait);
>
> @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> return 0;
> }
>
> + ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> + if (ret)
> + return ret;
> +
I don't think we should call btrfs_start_all_delalloc_inodes(), because this function is also
called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add
btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary.
Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix this problem.
Thanks
Miao
> btrfs_wait_all_ordered_extents(fs_info);
>
> trans = btrfs_attach_transaction_barrier(root);
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-23 1:30 ` Miao Xie
@ 2013-09-23 9:11 ` Filipe David Manana
2013-09-23 9:51 ` Liu Bo
0 siblings, 1 reply; 11+ messages in thread
From: Filipe David Manana @ 2013-09-23 9:11 UTC (permalink / raw)
To: Miao Xie; +Cc: linux-btrfs
On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie <miaox@cn.fujitsu.com> wrote:
>
> On sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > wait for delayed work to finish before returning success to the
> > caller. This change fixes this, ensuring that there's no data loss
> > if a power failure happens right after fs sync returns success to
> > the caller and before the next commit happens.
> >
> > Steps to reproduce the data loss issue:
> >
> > $ mkfs.btrfs -f /dev/sdb3
> > $ mount /dev/sdb3 /mnt/btrfs
> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> >
> > Right after the btrfs fi sync command (a second or 2 for example), power
> > off the machine and reboot it. The file will be empty, as it can be verified
> > after mounting the filesystem and through btrfs-debug-tree:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> >
> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > location key (257 INODE_ITEM 0) type FILE
> > namelen 6 datalen 0 name: foobar
> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > inode ref index 2 namelen 6 name: foobar
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > uuid tree key (UUID_TREE ROOT_ITEM 0)
> >
> > After this patch, the data loss no longer happens after a power failure and
> > btrfs-debug-tree shows:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > location key (257 INODE_ITEM 0) type FILE
> > namelen 6 datalen 0 name: foobar
> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > inode ref index 2 namelen 6 name: foobar
> > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> > extent data disk byte 12845056 nr 8192
> > extent data offset 0 nr 8192 ram 8192
> > extent compression 0
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> >
> > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> > ---
> > fs/btrfs/super.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index 6ab0df5..557e38f 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > struct btrfs_trans_handle *trans;
> > struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> > struct btrfs_root *root = fs_info->tree_root;
> > + int ret;
> >
> > trace_btrfs_sync_fs(wait);
> >
> > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > return 0;
> > }
> >
> > + ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> > + if (ret)
> > + return ret;
> > +
>
> I don't think we should call btrfs_start_all_delalloc_inodes(), because this function is also
> called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add
> btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary.
Where is that do_sync() function exactly? I'm not finding any with
that exact name in fs/btrfs/* nor fs/*
I used this approach because (besides working) it's what is done in
btrfs_commit_transaction() (via btrfs_start_delalloc_flush and
btrfs_wait_delalloc_flush). Why can it be like that in the transaction
commit and not in btrfs_sync_fs() ?
>
> Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix this problem.
Just tested it, and it works that way too. Uploading a new patch.
Thanks for the feedback/review Miao.
>
> Thanks
> Miao
>
> > btrfs_wait_all_ordered_extents(fs_info);
> >
> > trans = btrfs_attach_transaction_barrier(root);
> >
>
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
2013-09-23 1:30 ` Miao Xie
@ 2013-09-23 9:23 ` Filipe David Borba Manana
2013-09-23 9:53 ` Filipe David Manana
2013-09-23 10:28 ` [PATCH v3] " Filipe David Borba Manana
2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
3 siblings, 1 reply; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-09-23 9:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe David Borba Manana
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.
Steps to reproduce the data loss issue:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)
After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
extent data disk byte 12845056 nr 8192
extent data offset 0 nr 8192 ram 8192
extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
suggested by Miao Xie.
fs/btrfs/super.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6ab0df5..38b4392 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
return 0;
}
+ writeback_inodes_sb(sb, WB_REASON_SYNC);
btrfs_wait_all_ordered_extents(fs_info);
trans = btrfs_attach_transaction_barrier(root);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-23 9:11 ` Filipe David Manana
@ 2013-09-23 9:51 ` Liu Bo
0 siblings, 0 replies; 11+ messages in thread
From: Liu Bo @ 2013-09-23 9:51 UTC (permalink / raw)
To: Filipe David Manana; +Cc: Miao Xie, linux-btrfs
On Mon, Sep 23, 2013 at 10:11:42AM +0100, Filipe David Manana wrote:
> On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie <miaox@cn.fujitsu.com> wrote:
> >
> > On sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> > > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > > wait for delayed work to finish before returning success to the
> > > caller. This change fixes this, ensuring that there's no data loss
> > > if a power failure happens right after fs sync returns success to
> > > the caller and before the next commit happens.
> > >
> > > Steps to reproduce the data loss issue:
> > >
> > > $ mkfs.btrfs -f /dev/sdb3
> > > $ mount /dev/sdb3 /mnt/btrfs
> > > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> > >
> > > Right after the btrfs fi sync command (a second or 2 for example), power
> > > off the machine and reboot it. The file will be empty, as it can be verified
> > > after mounting the filesystem and through btrfs-debug-tree:
> > >
> > > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> > >
> > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > > location key (257 INODE_ITEM 0) type FILE
> > > namelen 6 datalen 0 name: foobar
> > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > > inode ref index 2 namelen 6 name: foobar
> > > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > > uuid tree key (UUID_TREE ROOT_ITEM 0)
> > >
> > > After this patch, the data loss no longer happens after a power failure and
> > > btrfs-debug-tree shows:
> > >
> > > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > > location key (257 INODE_ITEM 0) type FILE
> > > namelen 6 datalen 0 name: foobar
> > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > > inode ref index 2 namelen 6 name: foobar
> > > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> > > extent data disk byte 12845056 nr 8192
> > > extent data offset 0 nr 8192 ram 8192
> > > extent compression 0
> > > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > >
> > > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> > > ---
> > > fs/btrfs/super.c | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > > index 6ab0df5..557e38f 100644
> > > --- a/fs/btrfs/super.c
> > > +++ b/fs/btrfs/super.c
> > > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > > struct btrfs_trans_handle *trans;
> > > struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> > > struct btrfs_root *root = fs_info->tree_root;
> > > + int ret;
> > >
> > > trace_btrfs_sync_fs(wait);
> > >
> > > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > > return 0;
> > > }
> > >
> > > + ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> > > + if (ret)
> > > + return ret;
> > > +
> >
> > I don't think we should call btrfs_start_all_delalloc_inodes(), because this function is also
> > called by do_sync(), but do_sync() syncs the whole fs before calling it, so if we add
> > btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the second one is unnecessary.
>
> Where is that do_sync() function exactly? I'm not finding any with
> that exact name in fs/btrfs/* nor fs/*
I think it should refer to sync_filesystem() :)
-liubo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-23 9:23 ` [PATCH v2] " Filipe David Borba Manana
@ 2013-09-23 9:53 ` Filipe David Manana
2013-09-23 9:59 ` Liu Bo
0 siblings, 1 reply; 11+ messages in thread
From: Filipe David Manana @ 2013-09-23 9:53 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe David Borba Manana
On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana
<fdmanana@gmail.com> wrote:
> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> wait for delayed work to finish before returning success to the
> caller. This change fixes this, ensuring that there's no data loss
> if a power failure happens right after fs sync returns success to
> the caller and before the next commit happens.
>
> Steps to reproduce the data loss issue:
>
> $ mkfs.btrfs -f /dev/sdb3
> $ mount /dev/sdb3 /mnt/btrfs
> $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
>
> Right after the btrfs fi sync command (a second or 2 for example), power
> off the machine and reboot it. The file will be empty, as it can be verified
> after mounting the filesystem and through btrfs-debug-tree:
>
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> location key (257 INODE_ITEM 0) type FILE
> namelen 6 datalen 0 name: foobar
> item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> inode ref index 2 namelen 6 name: foobar
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> leaf 29429760 items 0 free space 3995 generation 7 owner 7
> fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> uuid tree key (UUID_TREE ROOT_ITEM 0)
>
> After this patch, the data loss no longer happens after a power failure and
> btrfs-debug-tree shows:
>
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> location key (257 INODE_ITEM 0) type FILE
> namelen 6 datalen 0 name: foobar
> item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> inode ref index 2 namelen 6 name: foobar
> item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> extent data disk byte 12845056 nr 8192
> extent data offset 0 nr 8192 ram 8192
> extent compression 0
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> ---
>
> V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
> suggested by Miao Xie.
>
> fs/btrfs/super.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 6ab0df5..38b4392 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> return 0;
> }
>
> + writeback_inodes_sb(sb, WB_REASON_SYNC);
> btrfs_wait_all_ordered_extents(fs_info);
Ignore this 2nd patch version please, for 2 reasons:
1) It triggers a WARN_ON because writeback_inodes_sb() requires the
sb->u_mount semaphore to be acquired before, which is not always the
case (it is when called through btrfs_kill_super, otherwise it isn't)
2) It doesn't guarantee that inodes are actually written (see comment
of writeback_inodes_sb()), so we can return 0 (success) when the
writes actually didn't happen/succeed. Because of this,
btrfs_start_all_delalloc_inodes() is more honest.
>
> trans = btrfs_attach_transaction_barrier(root);
> --
> 1.7.9.5
>
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-23 9:53 ` Filipe David Manana
@ 2013-09-23 9:59 ` Liu Bo
2013-09-23 10:06 ` Filipe David Manana
0 siblings, 1 reply; 11+ messages in thread
From: Liu Bo @ 2013-09-23 9:59 UTC (permalink / raw)
To: Filipe David Manana; +Cc: linux-btrfs
On Mon, Sep 23, 2013 at 10:53:20AM +0100, Filipe David Manana wrote:
> On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana
> <fdmanana@gmail.com> wrote:
> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > wait for delayed work to finish before returning success to the
> > caller. This change fixes this, ensuring that there's no data loss
> > if a power failure happens right after fs sync returns success to
> > the caller and before the next commit happens.
> >
> > Steps to reproduce the data loss issue:
> >
> > $ mkfs.btrfs -f /dev/sdb3
> > $ mount /dev/sdb3 /mnt/btrfs
> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> >
> > Right after the btrfs fi sync command (a second or 2 for example), power
> > off the machine and reboot it. The file will be empty, as it can be verified
> > after mounting the filesystem and through btrfs-debug-tree:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > location key (257 INODE_ITEM 0) type FILE
> > namelen 6 datalen 0 name: foobar
> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > inode ref index 2 namelen 6 name: foobar
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > uuid tree key (UUID_TREE ROOT_ITEM 0)
> >
> > After this patch, the data loss no longer happens after a power failure and
> > btrfs-debug-tree shows:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > location key (257 INODE_ITEM 0) type FILE
> > namelen 6 datalen 0 name: foobar
> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > inode ref index 2 namelen 6 name: foobar
> > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> > extent data disk byte 12845056 nr 8192
> > extent data offset 0 nr 8192 ram 8192
> > extent compression 0
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> >
> > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> > ---
> >
> > V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
> > suggested by Miao Xie.
> >
> > fs/btrfs/super.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index 6ab0df5..38b4392 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > return 0;
> > }
> >
> > + writeback_inodes_sb(sb, WB_REASON_SYNC);
> > btrfs_wait_all_ordered_extents(fs_info);
>
> Ignore this 2nd patch version please, for 2 reasons:
>
> 1) It triggers a WARN_ON because writeback_inodes_sb() requires the
> sb->u_mount semaphore to be acquired before, which is not always the
> case (it is when called through btrfs_kill_super, otherwise it isn't)
>
> 2) It doesn't guarantee that inodes are actually written (see comment
> of writeback_inodes_sb()), so we can return 0 (success) when the
> writes actually didn't happen/succeed. Because of this,
> btrfs_start_all_delalloc_inodes() is more honest.
What about
case BTRFS_IOC_SYNC:
btrfs_start_all_delalloc_inodes();
btrfs_sync_fs(file->f_dentry->d_sb, 1);
return 0;
This way, there is no impact on calling sync(1).
-liubo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-23 9:59 ` Liu Bo
@ 2013-09-23 10:06 ` Filipe David Manana
0 siblings, 0 replies; 11+ messages in thread
From: Filipe David Manana @ 2013-09-23 10:06 UTC (permalink / raw)
To: bo.li.liu; +Cc: linux-btrfs
On Mon, Sep 23, 2013 at 10:59 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Mon, Sep 23, 2013 at 10:53:20AM +0100, Filipe David Manana wrote:
>> On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana
>> <fdmanana@gmail.com> wrote:
>> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
>> > wait for delayed work to finish before returning success to the
>> > caller. This change fixes this, ensuring that there's no data loss
>> > if a power failure happens right after fs sync returns success to
>> > the caller and before the next commit happens.
>> >
>> > Steps to reproduce the data loss issue:
>> >
>> > $ mkfs.btrfs -f /dev/sdb3
>> > $ mount /dev/sdb3 /mnt/btrfs
>> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
>> >
>> > Right after the btrfs fi sync command (a second or 2 for example), power
>> > off the machine and reboot it. The file will be empty, as it can be verified
>> > after mounting the filesystem and through btrfs-debug-tree:
>> >
>> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>> > location key (257 INODE_ITEM 0) type FILE
>> > namelen 6 datalen 0 name: foobar
>> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>> > inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
>> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>> > inode ref index 2 namelen 6 name: foobar
>> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> > leaf 29429760 items 0 free space 3995 generation 7 owner 7
>> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
>> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
>> > uuid tree key (UUID_TREE ROOT_ITEM 0)
>> >
>> > After this patch, the data loss no longer happens after a power failure and
>> > btrfs-debug-tree shows:
>> >
>> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>> > location key (257 INODE_ITEM 0) type FILE
>> > namelen 6 datalen 0 name: foobar
>> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>> > inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
>> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>> > inode ref index 2 namelen 6 name: foobar
>> > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
>> > extent data disk byte 12845056 nr 8192
>> > extent data offset 0 nr 8192 ram 8192
>> > extent compression 0
>> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> >
>> > Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
>> > ---
>> >
>> > V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
>> > suggested by Miao Xie.
>> >
>> > fs/btrfs/super.c | 1 +
>> > 1 file changed, 1 insertion(+)
>> >
>> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> > index 6ab0df5..38b4392 100644
>> > --- a/fs/btrfs/super.c
>> > +++ b/fs/btrfs/super.c
>> > @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>> > return 0;
>> > }
>> >
>> > + writeback_inodes_sb(sb, WB_REASON_SYNC);
>> > btrfs_wait_all_ordered_extents(fs_info);
>>
>> Ignore this 2nd patch version please, for 2 reasons:
>>
>> 1) It triggers a WARN_ON because writeback_inodes_sb() requires the
>> sb->u_mount semaphore to be acquired before, which is not always the
>> case (it is when called through btrfs_kill_super, otherwise it isn't)
>>
>> 2) It doesn't guarantee that inodes are actually written (see comment
>> of writeback_inodes_sb()), so we can return 0 (success) when the
>> writes actually didn't happen/succeed. Because of this,
>> btrfs_start_all_delalloc_inodes() is more honest.
>
> What about
> case BTRFS_IOC_SYNC:
> btrfs_start_all_delalloc_inodes();
> btrfs_sync_fs(file->f_dentry->d_sb, 1);
> return 0;
>
> This way, there is no impact on calling sync(1).
Sounds ok. Will try it, returning error if
btrfs_start_all_delalloc_inodes() returns an error.
Thanks for the suggestion and pointing me to sync_filesystem() :)
>
> -liubo
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
2013-09-23 1:30 ` Miao Xie
2013-09-23 9:23 ` [PATCH v2] " Filipe David Borba Manana
@ 2013-09-23 10:28 ` Filipe David Borba Manana
2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
3 siblings, 0 replies; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-09-23 10:28 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe David Borba Manana
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.
Steps to reproduce the data loss issue:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)
After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
extent data disk byte 12845056 nr 8192
extent data offset 0 nr 8192 ram 8192
extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
suggested by Miao Xie.
V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(),
in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest
because it doesn't guarantee inode data is persisted and we have no way
to know if persistence really happened or not, returning 0 (success) always.
Thanks Liu Bo for the suggestion.
fs/btrfs/ioctl.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9d46f60..8792fc8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_logical_to_ino(root, argp);
case BTRFS_IOC_SPACE_INFO:
return btrfs_ioctl_space_info(root, argp);
- case BTRFS_IOC_SYNC:
+ case BTRFS_IOC_SYNC: {
+ int ret;
+
+ ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0);
+ if (ret)
+ return ret;
btrfs_sync_fs(file->f_dentry->d_sb, 1);
return 0;
+ }
case BTRFS_IOC_START_SYNC:
return btrfs_ioctl_start_sync(root, argp);
case BTRFS_IOC_WAIT_SYNC:
--
1.7.9.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
` (2 preceding siblings ...)
2013-09-23 10:28 ` [PATCH v3] " Filipe David Borba Manana
@ 2013-09-23 10:35 ` Filipe David Borba Manana
2013-09-24 1:31 ` Miao Xie
3 siblings, 1 reply; 11+ messages in thread
From: Filipe David Borba Manana @ 2013-09-23 10:35 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe David Borba Manana
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.
Steps to reproduce the data loss issue:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)
After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
extent data disk byte 12845056 nr 8192
extent data offset 0 nr 8192 ram 8192
extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
suggested by Miao Xie.
V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(),
in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest
because it doesn't guarantee inode data is persisted and we have no way
to know if persistence really happened or not, returning 0 (success) always.
Thanks Liu Bo for the suggestion.
V4: Be even more honest in the sync IOCTL handler - don't always return success
regardless of the result of the btrfs_sync_fs() call.
fs/btrfs/ioctl.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9d46f60..385c58f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_logical_to_ino(root, argp);
case BTRFS_IOC_SPACE_INFO:
return btrfs_ioctl_space_info(root, argp);
- case BTRFS_IOC_SYNC:
- btrfs_sync_fs(file->f_dentry->d_sb, 1);
- return 0;
+ case BTRFS_IOC_SYNC: {
+ int ret;
+
+ ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0);
+ if (ret)
+ return ret;
+ ret = btrfs_sync_fs(file->f_dentry->d_sb, 1);
+ return ret;
+ }
case BTRFS_IOC_START_SYNC:
return btrfs_ioctl_start_sync(root, argp);
case BTRFS_IOC_WAIT_SYNC:
--
1.7.9.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v4] Btrfs: fix sync fs to actually wait for all data to be persisted
2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
@ 2013-09-24 1:31 ` Miao Xie
0 siblings, 0 replies; 11+ messages in thread
From: Miao Xie @ 2013-09-24 1:31 UTC (permalink / raw)
To: Filipe David Borba Manana; +Cc: linux-btrfs
On mon, 23 Sep 2013 11:35:11 +0100, Filipe David Borba Manana wrote:
> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> wait for delayed work to finish before returning success to the
> caller. This change fixes this, ensuring that there's no data loss
> if a power failure happens right after fs sync returns success to
> the caller and before the next commit happens.
>
> Steps to reproduce the data loss issue:
>
> $ mkfs.btrfs -f /dev/sdb3
> $ mount /dev/sdb3 /mnt/btrfs
> $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
>
> Right after the btrfs fi sync command (a second or 2 for example), power
> off the machine and reboot it. The file will be empty, as it can be verified
> after mounting the filesystem and through btrfs-debug-tree:
>
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> location key (257 INODE_ITEM 0) type FILE
> namelen 6 datalen 0 name: foobar
> item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
> item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> inode ref index 2 namelen 6 name: foobar
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> leaf 29429760 items 0 free space 3995 generation 7 owner 7
> fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> uuid tree key (UUID_TREE ROOT_ITEM 0)
>
> After this patch, the data loss no longer happens after a power failure and
> btrfs-debug-tree shows:
>
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> location key (257 INODE_ITEM 0) type FILE
> namelen 6 datalen 0 name: foobar
> item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
> item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> inode ref index 2 namelen 6 name: foobar
> item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> extent data disk byte 12845056 nr 8192
> extent data offset 0 nr 8192 ram 8192
> extent compression 0
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
>
> V2: Use writeback_inodes_sb() instead of btrfs_start_all_delalloc_inodes(), as
> suggested by Miao Xie.
> V3: Use btrfs_start_all_delalloc_inodes() instead but outside btrfs_sync_fs(),
> in the sync IOCTL handler. Using writeback_inodes_sb() is not very honest
> because it doesn't guarantee inode data is persisted and we have no way
> to know if persistence really happened or not, returning 0 (success) always.
> Thanks Liu Bo for the suggestion.
> V4: Be even more honest in the sync IOCTL handler - don't always return success
> regardless of the result of the btrfs_sync_fs() call.
>
> fs/btrfs/ioctl.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 9d46f60..385c58f 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4557,9 +4557,15 @@ long btrfs_ioctl(struct file *file, unsigned int
> return btrfs_ioctl_logical_to_ino(root, argp);
> case BTRFS_IOC_SPACE_INFO:
> return btrfs_ioctl_space_info(root, argp);
> - case BTRFS_IOC_SYNC:
> - btrfs_sync_fs(file->f_dentry->d_sb, 1);
> - return 0;
> + case BTRFS_IOC_SYNC: {
> + int ret;
> +
> + ret = btrfs_start_all_delalloc_inodes(root->fs_info, 0);
> + if (ret)
> + return ret;
> + ret = btrfs_sync_fs(file->f_dentry->d_sb, 1);
> + return ret;
> + }
> case BTRFS_IOC_START_SYNC:
> return btrfs_ioctl_start_sync(root, argp);
> case BTRFS_IOC_WAIT_SYNC:
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-09-24 1:30 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-22 20:55 [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted Filipe David Borba Manana
2013-09-23 1:30 ` Miao Xie
2013-09-23 9:11 ` Filipe David Manana
2013-09-23 9:51 ` Liu Bo
2013-09-23 9:23 ` [PATCH v2] " Filipe David Borba Manana
2013-09-23 9:53 ` Filipe David Manana
2013-09-23 9:59 ` Liu Bo
2013-09-23 10:06 ` Filipe David Manana
2013-09-23 10:28 ` [PATCH v3] " Filipe David Borba Manana
2013-09-23 10:35 ` [PATCH v4] " Filipe David Borba Manana
2013-09-24 1:31 ` Miao Xie
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.