All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix file corruption after cloning inline extents
@ 2015-07-14 15:34 fdmanana
  2017-03-02 22:18 ` Liu Bo
  0 siblings, 1 reply; 6+ messages in thread
From: fdmanana @ 2015-07-14 15:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana, stable

From: Filipe Manana <fdmanana@suse.com>

Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.

For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:

  _scratch_mkfs >>$seqres.full 2>&1
  _scratch_mount

  # Create our test files. File foo has the same 2K of data at offset 4K
  # as file bar has at its offset 0.
  $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
      -c "pwrite -S 0xbb 4k 2K" \
      -c "pwrite -S 0xcc 8K 4K" \
      $SCRATCH_MNT/foo | _filter_xfs_io

  # File bar consists of a single inline extent (2K size).
  $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
     $SCRATCH_MNT/bar | _filter_xfs_io

  # Now call the clone ioctl to clone the extent of file bar into file
  # foo at its offset 4K. This made file foo have an inline extent at
  # offset 4K, something which the btrfs code can not deal with in future
  # IO operations because all inline extents are supposed to start at an
  # offset of 0, resulting in all sorts of chaos.
  # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
  # what it returns for other cases dealing with inlined extents.
  $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
      $SCRATCH_MNT/bar $SCRATCH_MNT/foo

  # Because of the inline extent at offset 4K, the following write made
  # the kernel crash with a BUG_ON().
  $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io

  status=0
  exit

The stack trace of the BUG_ON() triggered by the last write is:

  [152154.035903] ------------[ cut here ]------------
  [152154.036424] kernel BUG at mm/page-writeback.c:2286!
  [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
  [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
  [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
  [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
  [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
  [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
  [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
  [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
  [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
  [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
  [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
  [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
  [152154.036424] Stack:
  [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
  [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
  [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
  [152154.036424] Call Trace:
  [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
  [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
  [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
  [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
  [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
  [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
  [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
  [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
  [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
  [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
  [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424]  RSP <ffff880429effc68>
  [152154.242621] ---[ end trace e3d3376b23a57041 ]---

Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:

   00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
   3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d389815..0770c91 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3588,6 +3588,20 @@ process_slot:
 				u64 trim = 0;
 				u64 aligned_end = 0;
 
+				/*
+				 * Don't copy an inline extent into an offset
+				 * greater than zero. Having an inline extent
+				 * at such an offset results in chaos as btrfs
+				 * isn't prepared for such cases. Just skip
+				 * this case for the same reasons as commented
+				 * at btrfs_ioctl_clone().
+				 */
+				if (last_dest_end > 0) {
+					ret = -EOPNOTSUPP;
+					btrfs_end_transaction(trans, root);
+					goto out;
+				}
+
 				if (off > key.offset) {
 					skip = off - key.offset;
 					new_key.offset += skip;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: fix file corruption after cloning inline extents
  2015-07-14 15:34 [PATCH] Btrfs: fix file corruption after cloning inline extents fdmanana
@ 2017-03-02 22:18 ` Liu Bo
  2017-03-03  0:36   ` Liu Bo
  0 siblings, 1 reply; 6+ messages in thread
From: Liu Bo @ 2017-03-02 22:18 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, Filipe Manana

On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Using the clone ioctl (or extent_same ioctl, which calls the same extent
> cloning function as well) we end up allowing copy an inline extent from
> the source file into a non-zero offset of the destination file. This is
> something not expected and that the btrfs code is not prepared to deal
> with - all inline extents must be at a file offset equals to 0.
>

Somehow I failed to reproduce the BUG_ON with this case.

> For example, the following excerpt of a test case for fstests triggers
> a crash/BUG_ON() on a write operation after an inline extent is cloned
> into a non-zero offset:
> 
>   _scratch_mkfs >>$seqres.full 2>&1
>   _scratch_mount
> 
>   # Create our test files. File foo has the same 2K of data at offset 4K
>   # as file bar has at its offset 0.
>   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
>       -c "pwrite -S 0xbb 4k 2K" \
>       -c "pwrite -S 0xcc 8K 4K" \
>       $SCRATCH_MNT/foo | _filter_xfs_io
> 
>   # File bar consists of a single inline extent (2K size).
>   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
>      $SCRATCH_MNT/bar | _filter_xfs_io
> 
>   # Now call the clone ioctl to clone the extent of file bar into file
>   # foo at its offset 4K. This made file foo have an inline extent at
>   # offset 4K, something which the btrfs code can not deal with in future
>   # IO operations because all inline extents are supposed to start at an
>   # offset of 0, resulting in all sorts of chaos.
>   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
>   # what it returns for other cases dealing with inlined extents.
>   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
>       $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> 
>   # Because of the inline extent at offset 4K, the following write made
>   # the kernel crash with a BUG_ON().
>   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
>

On 4.10, after allowing to clone an inline extent to dst file's offset greater
than zero, I followed the test case manually and got these

[root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
(257 0): ram 4096 disk 12648448 disk_size 4096
(257 4096): ram 2048 disk 0 disk_size 2048 -- inline
(257 8192): ram 4096 disk 12656640 disk_size 4096
file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20

[root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo 
wrote 2048/2048 bytes at offset 6144
2 KiB, 1 ops; 0.0000 sec (12.520 MiB/sec and 6410.2564 ops/sec)

[root@localhost trinity]# sync
[root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
(257 0): ram 4096 disk 12648448 disk_size 4096
(257 4096): ram 4096 disk 12582912 disk_size 4096
(257 8192): ram 4096 disk 12656640 disk_size 4096
file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00


Looks like we now are able to cope with these inline extents?


Thanks,

-liubo


>   status=0
>   exit
> 
> The stack trace of the BUG_ON() triggered by the last write is:
> 
>   [152154.035903] ------------[ cut here ]------------
>   [152154.036424] kernel BUG at mm/page-writeback.c:2286!
>   [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>   [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
>   [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
>   [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>   [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
>   [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
>   [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
>   [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
>   [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
>   [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
>   [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
>   [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
>   [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
>   [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
>   [152154.036424] Stack:
>   [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
>   [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
>   [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
>   [152154.036424] Call Trace:
>   [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
>   [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
>   [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
>   [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
>   [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
>   [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
>   [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
>   [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
>   [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
>   [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
>   [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
>   [152154.036424]  RSP <ffff880429effc68>
>   [152154.242621] ---[ end trace e3d3376b23a57041 ]---
> 
> Fix this by returning the error EOPNOTSUPP if an attempt to copy an
> inline extent into a non-zero offset happens, just like what is done for
> other scenarios that would require copying/splitting inline extents,
> which were introduced by the following commits:
> 
>    00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
>    3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/ioctl.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index d389815..0770c91 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3588,6 +3588,20 @@ process_slot:
>  				u64 trim = 0;
>  				u64 aligned_end = 0;
>  
> +				/*
> +				 * Don't copy an inline extent into an offset
> +				 * greater than zero. Having an inline extent
> +				 * at such an offset results in chaos as btrfs
> +				 * isn't prepared for such cases. Just skip
> +				 * this case for the same reasons as commented
> +				 * at btrfs_ioctl_clone().
> +				 */
> +				if (last_dest_end > 0) {
> +					ret = -EOPNOTSUPP;
> +					btrfs_end_transaction(trans, root);
> +					goto out;
> +				}
> +
>  				if (off > key.offset) {
>  					skip = off - key.offset;
>  					new_key.offset += skip;
> -- 
> 2.1.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: fix file corruption after cloning inline extents
  2017-03-02 22:18 ` Liu Bo
@ 2017-03-03  0:36   ` Liu Bo
  2017-03-03 15:36     ` Filipe Manana
  0 siblings, 1 reply; 6+ messages in thread
From: Liu Bo @ 2017-03-03  0:36 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, Filipe Manana

On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> > 
> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
> > cloning function as well) we end up allowing copy an inline extent from
> > the source file into a non-zero offset of the destination file. This is
> > something not expected and that the btrfs code is not prepared to deal
> > with - all inline extents must be at a file offset equals to 0.
> >
> 
> Somehow I failed to reproduce the BUG_ON with this case.
> 
> > For example, the following excerpt of a test case for fstests triggers
> > a crash/BUG_ON() on a write operation after an inline extent is cloned
> > into a non-zero offset:
> > 
> >   _scratch_mkfs >>$seqres.full 2>&1
> >   _scratch_mount
> > 
> >   # Create our test files. File foo has the same 2K of data at offset 4K
> >   # as file bar has at its offset 0.
> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
> >       -c "pwrite -S 0xbb 4k 2K" \
> >       -c "pwrite -S 0xcc 8K 4K" \
> >       $SCRATCH_MNT/foo | _filter_xfs_io
> > 
> >   # File bar consists of a single inline extent (2K size).
> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
> >      $SCRATCH_MNT/bar | _filter_xfs_io
> > 
> >   # Now call the clone ioctl to clone the extent of file bar into file
> >   # foo at its offset 4K. This made file foo have an inline extent at
> >   # offset 4K, something which the btrfs code can not deal with in future
> >   # IO operations because all inline extents are supposed to start at an
> >   # offset of 0, resulting in all sorts of chaos.
> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
> >   # what it returns for other cases dealing with inlined extents.
> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
> >       $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> > 
> >   # Because of the inline extent at offset 4K, the following write made
> >   # the kernel crash with a BUG_ON().
> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
> >
> 
> On 4.10, after allowing to clone an inline extent to dst file's offset greater
> than zero, I followed the test case manually and got these
> 
> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
> (257 0): ram 4096 disk 12648448 disk_size 4096
> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
> (257 8192): ram 4096 disk 12656640 disk_size 4096
> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20
> 
> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo 
> wrote 2048/2048 bytes at offset 6144
> 2 KiB, 1 ops; 0.0000 sec (12.520 MiB/sec and 6410.2564 ops/sec)
> 
> [root@localhost trinity]# sync
> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
> (257 0): ram 4096 disk 12648448 disk_size 4096
> (257 4096): ram 4096 disk 12582912 disk_size 4096
> (257 8192): ram 4096 disk 12656640 disk_size 4096
> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00
> 
> 
> Looks like we now are able to cope with these inline extents?

I went back to test against v4.1 and v4.5, it turns out that we got the below
BUG_ON() in MM and -EIO when writing to the inline extent, because of the fact
that, when writing to the page that covers the inline extent, firstly it reads
page to get an uptodate page for writing, in readpage(), for inline extent,
btrfs_get_extent() always goes to search fs tree to read inline data out to page
and then tries to insert a em, -EEXIST would be returned if there is an existing
one.

However, after commit 8dff9c853410 ("Btrfs: deal with duplciates during
extent_map insertion in btrfs_get_extent"), we have that fixed, so now we can
read/write inline extent even they've been mixed with other regular extents.

But...I'm not 100% sure whether such files (with mixing inline with regular)
would have any other problems rather than read/write.  Let me know if you could
think of a corruption due to that.

Thanks,

-liubo
> 
> 
> Thanks,
> 
> -liubo
> 
> 
> >   status=0
> >   exit
> > 
> > The stack trace of the BUG_ON() triggered by the last write is:
> > 
> >   [152154.035903] ------------[ cut here ]------------
> >   [152154.036424] kernel BUG at mm/page-writeback.c:2286!
> >   [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >   [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
> >   [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
> >   [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
> >   [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
> >   [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
> >   [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
> >   [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
> >   [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
> >   [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
> >   [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
> >   [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
> >   [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
> >   [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
> >   [152154.036424] Stack:
> >   [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
> >   [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
> >   [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
> >   [152154.036424] Call Trace:
> >   [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
> >   [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
> >   [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
> >   [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
> >   [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
> >   [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
> >   [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
> >   [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
> >   [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
> >   [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
> >   [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
> >   [152154.036424]  RSP <ffff880429effc68>
> >   [152154.242621] ---[ end trace e3d3376b23a57041 ]---
> > 
> > Fix this by returning the error EOPNOTSUPP if an attempt to copy an
> > inline extent into a non-zero offset happens, just like what is done for
> > other scenarios that would require copying/splitting inline extents,
> > which were introduced by the following commits:
> > 
> >    00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
> >    3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >  fs/btrfs/ioctl.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index d389815..0770c91 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3588,6 +3588,20 @@ process_slot:
> >  				u64 trim = 0;
> >  				u64 aligned_end = 0;
> >  
> > +				/*
> > +				 * Don't copy an inline extent into an offset
> > +				 * greater than zero. Having an inline extent
> > +				 * at such an offset results in chaos as btrfs
> > +				 * isn't prepared for such cases. Just skip
> > +				 * this case for the same reasons as commented
> > +				 * at btrfs_ioctl_clone().
> > +				 */
> > +				if (last_dest_end > 0) {
> > +					ret = -EOPNOTSUPP;
> > +					btrfs_end_transaction(trans, root);
> > +					goto out;
> > +				}
> > +
> >  				if (off > key.offset) {
> >  					skip = off - key.offset;
> >  					new_key.offset += skip;
> > -- 
> > 2.1.3
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: fix file corruption after cloning inline extents
  2017-03-03  0:36   ` Liu Bo
@ 2017-03-03 15:36     ` Filipe Manana
  2017-03-03 18:48       ` Liu Bo
  0 siblings, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2017-03-03 15:36 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, Filipe Manana

On Fri, Mar 3, 2017 at 12:36 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
>> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdmanana@kernel.org wrote:
>> > From: Filipe Manana <fdmanana@suse.com>
>> >
>> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
>> > cloning function as well) we end up allowing copy an inline extent from
>> > the source file into a non-zero offset of the destination file. This is
>> > something not expected and that the btrfs code is not prepared to deal
>> > with - all inline extents must be at a file offset equals to 0.
>> >
>>
>> Somehow I failed to reproduce the BUG_ON with this case.
>>
>> > For example, the following excerpt of a test case for fstests triggers
>> > a crash/BUG_ON() on a write operation after an inline extent is cloned
>> > into a non-zero offset:
>> >
>> >   _scratch_mkfs >>$seqres.full 2>&1
>> >   _scratch_mount
>> >
>> >   # Create our test files. File foo has the same 2K of data at offset 4K
>> >   # as file bar has at its offset 0.
>> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
>> >       -c "pwrite -S 0xbb 4k 2K" \
>> >       -c "pwrite -S 0xcc 8K 4K" \
>> >       $SCRATCH_MNT/foo | _filter_xfs_io
>> >
>> >   # File bar consists of a single inline extent (2K size).
>> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
>> >      $SCRATCH_MNT/bar | _filter_xfs_io
>> >
>> >   # Now call the clone ioctl to clone the extent of file bar into file
>> >   # foo at its offset 4K. This made file foo have an inline extent at
>> >   # offset 4K, something which the btrfs code can not deal with in future
>> >   # IO operations because all inline extents are supposed to start at an
>> >   # offset of 0, resulting in all sorts of chaos.
>> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
>> >   # what it returns for other cases dealing with inlined extents.
>> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
>> >       $SCRATCH_MNT/bar $SCRATCH_MNT/foo
>> >
>> >   # Because of the inline extent at offset 4K, the following write made
>> >   # the kernel crash with a BUG_ON().
>> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
>> >
>>
>> On 4.10, after allowing to clone an inline extent to dst file's offset greater
>> than zero, I followed the test case manually and got these
>>
>> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
>> (257 0): ram 4096 disk 12648448 disk_size 4096
>> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
>> (257 8192): ram 4096 disk 12656640 disk_size 4096
>> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20
>>
>> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo
>> wrote 2048/2048 bytes at offset 6144
>> 2 KiB, 1 ops; 0.0000 sec (12.520 MiB/sec and 6410.2564 ops/sec)
>>
>> [root@localhost trinity]# sync
>> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
>> (257 0): ram 4096 disk 12648448 disk_size 4096
>> (257 4096): ram 4096 disk 12582912 disk_size 4096
>> (257 8192): ram 4096 disk 12656640 disk_size 4096
>> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00
>>
>>
>> Looks like we now are able to cope with these inline extents?
>
> I went back to test against v4.1 and v4.5, it turns out that we got the below
> BUG_ON() in MM and -EIO when writing to the inline extent, because of the fact
> that, when writing to the page that covers the inline extent, firstly it reads
> page to get an uptodate page for writing, in readpage(), for inline extent,
> btrfs_get_extent() always goes to search fs tree to read inline data out to page
> and then tries to insert a em, -EEXIST would be returned if there is an existing
> one.
>
> However, after commit 8dff9c853410 ("Btrfs: deal with duplciates during
> extent_map insertion in btrfs_get_extent"), we have that fixed, so now we can
> read/write inline extent even they've been mixed with other regular extents.
>
> But...I'm not 100% sure whether such files (with mixing inline with regular)
> would have any other problems rather than read/write.  Let me know if you could
> think of a corruption due to that.

Without thinking too much and after doing some quick tests that passed
successfully, I'm not seeing where things can go wrong.
However it's odd to have a mix of inline and non-inline extents, since
the only cases where we create inline extents is for zero offsets and
their size is smaller than page_size. I am not entirely sure if, even
after the side effects of that commit, it would be safe to allow clone
operation to leave inline extents at the destination like this. A lot
more testing and verification should be done.

thanks

>
> Thanks,
>
> -liubo
>>
>>
>> Thanks,
>>
>> -liubo
>>
>>
>> >   status=0
>> >   exit
>> >
>> > The stack trace of the BUG_ON() triggered by the last write is:
>> >
>> >   [152154.035903] ------------[ cut here ]------------
>> >   [152154.036424] kernel BUG at mm/page-writeback.c:2286!
>> >   [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> >   [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
>> >   [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
>> >   [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>> >   [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
>> >   [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
>> >   [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
>> >   [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
>> >   [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
>> >   [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
>> >   [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
>> >   [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
>> >   [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
>> >   [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >   [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
>> >   [152154.036424] Stack:
>> >   [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
>> >   [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
>> >   [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
>> >   [152154.036424] Call Trace:
>> >   [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
>> >   [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
>> >   [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
>> >   [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
>> >   [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
>> >   [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
>> >   [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
>> >   [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
>> >   [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
>> >   [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
>> >   [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
>> >   [152154.036424]  RSP <ffff880429effc68>
>> >   [152154.242621] ---[ end trace e3d3376b23a57041 ]---
>> >
>> > Fix this by returning the error EOPNOTSUPP if an attempt to copy an
>> > inline extent into a non-zero offset happens, just like what is done for
>> > other scenarios that would require copying/splitting inline extents,
>> > which were introduced by the following commits:
>> >
>> >    00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
>> >    3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")
>> >
>> > Cc: stable@vger.kernel.org
>> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> > ---
>> >  fs/btrfs/ioctl.c | 14 ++++++++++++++
>> >  1 file changed, 14 insertions(+)
>> >
>> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> > index d389815..0770c91 100644
>> > --- a/fs/btrfs/ioctl.c
>> > +++ b/fs/btrfs/ioctl.c
>> > @@ -3588,6 +3588,20 @@ process_slot:
>> >                             u64 trim = 0;
>> >                             u64 aligned_end = 0;
>> >
>> > +                           /*
>> > +                            * Don't copy an inline extent into an offset
>> > +                            * greater than zero. Having an inline extent
>> > +                            * at such an offset results in chaos as btrfs
>> > +                            * isn't prepared for such cases. Just skip
>> > +                            * this case for the same reasons as commented
>> > +                            * at btrfs_ioctl_clone().
>> > +                            */
>> > +                           if (last_dest_end > 0) {
>> > +                                   ret = -EOPNOTSUPP;
>> > +                                   btrfs_end_transaction(trans, root);
>> > +                                   goto out;
>> > +                           }
>> > +
>> >                             if (off > key.offset) {
>> >                                     skip = off - key.offset;
>> >                                     new_key.offset += skip;
>> > --
>> > 2.1.3
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: fix file corruption after cloning inline extents
  2017-03-03 15:36     ` Filipe Manana
@ 2017-03-03 18:48       ` Liu Bo
  2017-03-06 23:31         ` Filipe Manana
  0 siblings, 1 reply; 6+ messages in thread
From: Liu Bo @ 2017-03-03 18:48 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, Filipe Manana

On Fri, Mar 03, 2017 at 03:36:36PM +0000, Filipe Manana wrote:
> On Fri, Mar 3, 2017 at 12:36 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> > On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
> >> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdmanana@kernel.org wrote:
> >> > From: Filipe Manana <fdmanana@suse.com>
> >> >
> >> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
> >> > cloning function as well) we end up allowing copy an inline extent from
> >> > the source file into a non-zero offset of the destination file. This is
> >> > something not expected and that the btrfs code is not prepared to deal
> >> > with - all inline extents must be at a file offset equals to 0.
> >> >
> >>
> >> Somehow I failed to reproduce the BUG_ON with this case.
> >>
> >> > For example, the following excerpt of a test case for fstests triggers
> >> > a crash/BUG_ON() on a write operation after an inline extent is cloned
> >> > into a non-zero offset:
> >> >
> >> >   _scratch_mkfs >>$seqres.full 2>&1
> >> >   _scratch_mount
> >> >
> >> >   # Create our test files. File foo has the same 2K of data at offset 4K
> >> >   # as file bar has at its offset 0.
> >> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
> >> >       -c "pwrite -S 0xbb 4k 2K" \
> >> >       -c "pwrite -S 0xcc 8K 4K" \
> >> >       $SCRATCH_MNT/foo | _filter_xfs_io
> >> >
> >> >   # File bar consists of a single inline extent (2K size).
> >> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
> >> >      $SCRATCH_MNT/bar | _filter_xfs_io
> >> >
> >> >   # Now call the clone ioctl to clone the extent of file bar into file
> >> >   # foo at its offset 4K. This made file foo have an inline extent at
> >> >   # offset 4K, something which the btrfs code can not deal with in future
> >> >   # IO operations because all inline extents are supposed to start at an
> >> >   # offset of 0, resulting in all sorts of chaos.
> >> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
> >> >   # what it returns for other cases dealing with inlined extents.
> >> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
> >> >       $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> >> >
> >> >   # Because of the inline extent at offset 4K, the following write made
> >> >   # the kernel crash with a BUG_ON().
> >> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
> >> >
> >>
> >> On 4.10, after allowing to clone an inline extent to dst file's offset greater
> >> than zero, I followed the test case manually and got these
> >>
> >> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
> >> (257 0): ram 4096 disk 12648448 disk_size 4096
> >> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
> >> (257 8192): ram 4096 disk 12656640 disk_size 4096
> >> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20
> >>
> >> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo
> >> wrote 2048/2048 bytes at offset 6144
> >> 2 KiB, 1 ops; 0.0000 sec (12.520 MiB/sec and 6410.2564 ops/sec)
> >>
> >> [root@localhost trinity]# sync
> >> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
> >> (257 0): ram 4096 disk 12648448 disk_size 4096
> >> (257 4096): ram 4096 disk 12582912 disk_size 4096
> >> (257 8192): ram 4096 disk 12656640 disk_size 4096
> >> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00
> >>
> >>
> >> Looks like we now are able to cope with these inline extents?
> >
> > I went back to test against v4.1 and v4.5, it turns out that we got the below
> > BUG_ON() in MM and -EIO when writing to the inline extent, because of the fact
> > that, when writing to the page that covers the inline extent, firstly it reads
> > page to get an uptodate page for writing, in readpage(), for inline extent,
> > btrfs_get_extent() always goes to search fs tree to read inline data out to page
> > and then tries to insert a em, -EEXIST would be returned if there is an existing
> > one.
> >
> > However, after commit 8dff9c853410 ("Btrfs: deal with duplciates during
> > extent_map insertion in btrfs_get_extent"), we have that fixed, so now we can
> > read/write inline extent even they've been mixed with other regular extents.
> >
> > But...I'm not 100% sure whether such files (with mixing inline with regular)
> > would have any other problems rather than read/write.  Let me know if you could
> > think of a corruption due to that.
> 
> Without thinking too much and after doing some quick tests that passed
> successfully, I'm not seeing where things can go wrong.
> However it's odd to have a mix of inline and non-inline extents, since
> the only cases where we create inline extents is for zero offsets and
> their size is smaller than page_size. I am not entirely sure if, even
> after the side effects of that commit, it would be safe to allow clone
> operation to leave inline extents at the destination like this. A lot
> more testing and verification should be done.
>

I got the same odd feeling about it, as you mentioned above, offset must be zero
and it depends on the file's isize when creating inline extents, I think the
only benefit of having inline extent is to save us a read from disk, with mixed
stuff we lose that benefit however, so at least such mixed behavior is not
recommended even though btrfs can tolerate it.

I came accross this when I was debugging a last_size != new_size problem, so we
can have this mixed stuff by doing these without NO_HOLES,

xfs_io -f -c "pwrite 0 8" foo
xfs_io -c "falloc 4 8188" foo

offset 4 is rounded down to offset 0 and before allocating blocks we wait for
any dirty stuff starting at offset 0, since the isize is not yet updated, the
inline extent is created as is again.  Now we have an inline extent from 0 to 8
and a prealloc extent from 4096 to 8192, AND its isize is 8192.

This leads to another question for doing fallocate against inline extent:
- Has it broken fallocate's behavior?
(since we don't reserve space for the first block from 0 to 4096.)

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: fix file corruption after cloning inline extents
  2017-03-03 18:48       ` Liu Bo
@ 2017-03-06 23:31         ` Filipe Manana
  0 siblings, 0 replies; 6+ messages in thread
From: Filipe Manana @ 2017-03-06 23:31 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, Filipe Manana

On Fri, Mar 3, 2017 at 6:48 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Fri, Mar 03, 2017 at 03:36:36PM +0000, Filipe Manana wrote:
>> On Fri, Mar 3, 2017 at 12:36 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
>> > On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
>> >> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdmanana@kernel.org wrote:
>> >> > From: Filipe Manana <fdmanana@suse.com>
>> >> >
>> >> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
>> >> > cloning function as well) we end up allowing copy an inline extent from
>> >> > the source file into a non-zero offset of the destination file. This is
>> >> > something not expected and that the btrfs code is not prepared to deal
>> >> > with - all inline extents must be at a file offset equals to 0.
>> >> >
>> >>
>> >> Somehow I failed to reproduce the BUG_ON with this case.
>> >>
>> >> > For example, the following excerpt of a test case for fstests triggers
>> >> > a crash/BUG_ON() on a write operation after an inline extent is cloned
>> >> > into a non-zero offset:
>> >> >
>> >> >   _scratch_mkfs >>$seqres.full 2>&1
>> >> >   _scratch_mount
>> >> >
>> >> >   # Create our test files. File foo has the same 2K of data at offset 4K
>> >> >   # as file bar has at its offset 0.
>> >> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
>> >> >       -c "pwrite -S 0xbb 4k 2K" \
>> >> >       -c "pwrite -S 0xcc 8K 4K" \
>> >> >       $SCRATCH_MNT/foo | _filter_xfs_io
>> >> >
>> >> >   # File bar consists of a single inline extent (2K size).
>> >> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
>> >> >      $SCRATCH_MNT/bar | _filter_xfs_io
>> >> >
>> >> >   # Now call the clone ioctl to clone the extent of file bar into file
>> >> >   # foo at its offset 4K. This made file foo have an inline extent at
>> >> >   # offset 4K, something which the btrfs code can not deal with in future
>> >> >   # IO operations because all inline extents are supposed to start at an
>> >> >   # offset of 0, resulting in all sorts of chaos.
>> >> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
>> >> >   # what it returns for other cases dealing with inlined extents.
>> >> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
>> >> >       $SCRATCH_MNT/bar $SCRATCH_MNT/foo
>> >> >
>> >> >   # Because of the inline extent at offset 4K, the following write made
>> >> >   # the kernel crash with a BUG_ON().
>> >> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
>> >> >
>> >>
>> >> On 4.10, after allowing to clone an inline extent to dst file's offset greater
>> >> than zero, I followed the test case manually and got these
>> >>
>> >> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
>> >> (257 0): ram 4096 disk 12648448 disk_size 4096
>> >> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
>> >> (257 8192): ram 4096 disk 12656640 disk_size 4096
>> >> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20
>> >>
>> >> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo
>> >> wrote 2048/2048 bytes at offset 6144
>> >> 2 KiB, 1 ops; 0.0000 sec (12.520 MiB/sec and 6410.2564 ops/sec)
>> >>
>> >> [root@localhost trinity]# sync
>> >> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
>> >> (257 0): ram 4096 disk 12648448 disk_size 4096
>> >> (257 4096): ram 4096 disk 12582912 disk_size 4096
>> >> (257 8192): ram 4096 disk 12656640 disk_size 4096
>> >> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00
>> >>
>> >>
>> >> Looks like we now are able to cope with these inline extents?
>> >
>> > I went back to test against v4.1 and v4.5, it turns out that we got the below
>> > BUG_ON() in MM and -EIO when writing to the inline extent, because of the fact
>> > that, when writing to the page that covers the inline extent, firstly it reads
>> > page to get an uptodate page for writing, in readpage(), for inline extent,
>> > btrfs_get_extent() always goes to search fs tree to read inline data out to page
>> > and then tries to insert a em, -EEXIST would be returned if there is an existing
>> > one.
>> >
>> > However, after commit 8dff9c853410 ("Btrfs: deal with duplciates during
>> > extent_map insertion in btrfs_get_extent"), we have that fixed, so now we can
>> > read/write inline extent even they've been mixed with other regular extents.
>> >
>> > But...I'm not 100% sure whether such files (with mixing inline with regular)
>> > would have any other problems rather than read/write.  Let me know if you could
>> > think of a corruption due to that.
>>
>> Without thinking too much and after doing some quick tests that passed
>> successfully, I'm not seeing where things can go wrong.
>> However it's odd to have a mix of inline and non-inline extents, since
>> the only cases where we create inline extents is for zero offsets and
>> their size is smaller than page_size. I am not entirely sure if, even
>> after the side effects of that commit, it would be safe to allow clone
>> operation to leave inline extents at the destination like this. A lot
>> more testing and verification should be done.
>>
>
> I got the same odd feeling about it, as you mentioned above, offset must be zero
> and it depends on the file's isize when creating inline extents, I think the
> only benefit of having inline extent is to save us a read from disk, with mixed
> stuff we lose that benefit however, so at least such mixed behavior is not
> recommended even though btrfs can tolerate it.
>
> I came accross this when I was debugging a last_size != new_size problem, so we
> can have this mixed stuff by doing these without NO_HOLES,
>
> xfs_io -f -c "pwrite 0 8" foo
> xfs_io -c "falloc 4 8188" foo
>
> offset 4 is rounded down to offset 0 and before allocating blocks we wait for
> any dirty stuff starting at offset 0, since the isize is not yet updated, the
> inline extent is created as is again.  Now we have an inline extent from 0 to 8
> and a prealloc extent from 4096 to 8192, AND its isize is 8192.

That should never happen, as the inline extent should be converted to
a regular extent and padded with zeroes.
I think you either found what leads to a read corruption of compressed
inline extents followed by holes (just search the mailing list for
such case from Zygo) or at least you are very close to it.

>
> This leads to another question for doing fallocate against inline extent:
> - Has it broken fallocate's behavior?

I think so.

thanks

> (since we don't reserve space for the first block from 0 to 4096.)
>
> Thanks,
>
> -liubo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-03-06 23:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-14 15:34 [PATCH] Btrfs: fix file corruption after cloning inline extents fdmanana
2017-03-02 22:18 ` Liu Bo
2017-03-03  0:36   ` Liu Bo
2017-03-03 15:36     ` Filipe Manana
2017-03-03 18:48       ` Liu Bo
2017-03-06 23:31         ` Filipe Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.