All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: let super_stripesize match with sectorsize
@ 2016-06-14 21:33 Liu Bo
  2016-06-15  3:42 ` Chandan Rajendra
  0 siblings, 1 reply; 10+ messages in thread
From: Liu Bo @ 2016-06-14 21:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Eryu Guan, David Sterba

Right now stripesize is set to 4096 while sectorsize is set to
max(4096, pagesize).  However, kernel requires super_stripesize
to match with sectorsize.

Reported-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 mkfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mkfs.c b/mkfs.c
index a3a3c14..8d00766 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1482,6 +1482,8 @@ int main(int argc, char **argv)
 	}
 
 	sectorsize = max(sectorsize, (u32)sysconf(_SC_PAGESIZE));
+	stripesize = sectorsize;
+
 	saved_optind = optind;
 	dev_cnt = argc - optind;
 	if (dev_cnt == 0)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-14 21:33 [PATCH] Btrfs: let super_stripesize match with sectorsize Liu Bo
@ 2016-06-15  3:42 ` Chandan Rajendra
  2016-06-15 10:20   ` Chandan Rajendra
  0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2016-06-15  3:42 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, Eryu Guan, David Sterba

On Tuesday, June 14, 2016 02:33:43 PM Liu Bo wrote:
> Right now stripesize is set to 4096 while sectorsize is set to
> max(4096, pagesize).  However, kernel requires super_stripesize
> to match with sectorsize.
> 
> Reported-by: Eryu Guan <guaneryu@gmail.com>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  mkfs.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mkfs.c b/mkfs.c
> index a3a3c14..8d00766 100644
> --- a/mkfs.c
> +++ b/mkfs.c
> @@ -1482,6 +1482,8 @@ int main(int argc, char **argv)
>  	}
> 
>  	sectorsize = max(sectorsize, (u32)sysconf(_SC_PAGESIZE));
> +	stripesize = sectorsize;
> +
>  	saved_optind = optind;
>  	dev_cnt = argc - optind;
>  	if (dev_cnt == 0)

Hello Liu Bo,

We have to fix the following check in check_super() as well,

       if (btrfs_super_stripesize(sb) != 4096) {
                error("invalid stripesize %u", btrfs_super_stripesize(sb));
                goto error_out;
        }

i.e. btrfs_super_stripesize(sb) must be equal to
btrfs_super_sectorsize(sb).

However in btrfs-progs (mkfs.c to be precise) since we had stripesize
hardcoded to 4096, setting stripesize to the value of sectorsize in
mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
devices with existing Btrfs filesystem instances,

NOTE: Assume we have changed the stripesize validation in btrfs-progs'
check_super() to,

        if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
                error("invalid stripesize %u", btrfs_super_stripesize(sb));
                goto error_out;
        }


main()
 for each device file passed as an argument,
   test_dev_for_mkfs()
     check_mounted
       check_mounted_where
         btrfs_scan_one_device
           btrfs_read_dev_super
             check_super() call will fail for existing filesystems which have
             stripesize set to 4k. All existing filesystem instances will fall
             into this category.

This error value is pushed up the call stack and this causes the device to not
get added to the fs_devices_mnt list in check_mounted_where(). Hence we would
fail to correctly check the mount status of the multi-device btrfs
filesystems.

I will try to figure out a solution to this problem.

-- 
chandan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-15  3:42 ` Chandan Rajendra
@ 2016-06-15 10:20   ` Chandan Rajendra
  2016-06-16  0:09     ` Liu Bo
  0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2016-06-15 10:20 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, Eryu Guan, David Sterba

On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> Hello Liu Bo,
> 
> We have to fix the following check in check_super() as well,
> 
>        if (btrfs_super_stripesize(sb) != 4096) {
>                 error("invalid stripesize %u", btrfs_super_stripesize(sb));
>                 goto error_out;
>         }
> 
> i.e. btrfs_super_stripesize(sb) must be equal to
> btrfs_super_sectorsize(sb).
> 
> However in btrfs-progs (mkfs.c to be precise) since we had stripesize
> hardcoded to 4096, setting stripesize to the value of sectorsize in
> mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
> devices with existing Btrfs filesystem instances,
> 
> NOTE: Assume we have changed the stripesize validation in btrfs-progs'
> check_super() to,
> 
>         if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
>                 error("invalid stripesize %u", btrfs_super_stripesize(sb));
>                 goto error_out;
>         }
> 
> 
> main()
>  for each device file passed as an argument,
>    test_dev_for_mkfs()
>      check_mounted
>        check_mounted_where
>          btrfs_scan_one_device
>            btrfs_read_dev_super
>              check_super() call will fail for existing filesystems which
> have stripesize set to 4k. All existing filesystem instances will fall into
> this category.
> 
> This error value is pushed up the call stack and this causes the device to
> not get added to the fs_devices_mnt list in check_mounted_where(). Hence we
> would fail to correctly check the mount status of the multi-device btrfs
> filesystems.
> 


We can end up in the following scenario,
- /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
  filesystem. The filesystem was created by an older version of mkfs.btrfs
  which set stripesize to 4k. 
- losetup -a
   /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)  
   /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)  
   /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)  
- /etc/mtab lists only /dev/loop0
- "losetup /dev/loop4 /root/disk-imgs/file-1.img"
   The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
   though /dev/loop1 has already been mounted and has
   /root/disk-imgs/file-1.img as its backing file.

So IMHO the only solution is to have the stripesize check in check_super() to
allow both '4k' and 'sectorsize' as valid values i.e.

        if ((btrfs_super_stripesize(sb) != 4096)
	    && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
                error("invalid stripesize %u", btrfs_super_stripesize(sb));
                goto error_out;
        }
-- 
chandan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-15 10:20   ` Chandan Rajendra
@ 2016-06-16  0:09     ` Liu Bo
  2016-06-16  8:23       ` Chandan Rajendra
  0 siblings, 1 reply; 10+ messages in thread
From: Liu Bo @ 2016-06-16  0:09 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: linux-btrfs, Eryu Guan, David Sterba

On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > Hello Liu Bo,
> > 
> > We have to fix the following check in check_super() as well,
> > 
> >        if (btrfs_super_stripesize(sb) != 4096) {
> >                 error("invalid stripesize %u", btrfs_super_stripesize(sb));
> >                 goto error_out;
> >         }
> > 
> > i.e. btrfs_super_stripesize(sb) must be equal to
> > btrfs_super_sectorsize(sb).
> > 
> > However in btrfs-progs (mkfs.c to be precise) since we had stripesize
> > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
> > devices with existing Btrfs filesystem instances,
> > 
> > NOTE: Assume we have changed the stripesize validation in btrfs-progs'
> > check_super() to,
> > 
> >         if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
> >                 error("invalid stripesize %u", btrfs_super_stripesize(sb));
> >                 goto error_out;
> >         }
> > 
> > 
> > main()
> >  for each device file passed as an argument,
> >    test_dev_for_mkfs()
> >      check_mounted
> >        check_mounted_where
> >          btrfs_scan_one_device
> >            btrfs_read_dev_super
> >              check_super() call will fail for existing filesystems which
> > have stripesize set to 4k. All existing filesystem instances will fall into
> > this category.
> > 
> > This error value is pushed up the call stack and this causes the device to
> > not get added to the fs_devices_mnt list in check_mounted_where(). Hence we
> > would fail to correctly check the mount status of the multi-device btrfs
> > filesystems.
> > 
> 
> 
> We can end up in the following scenario,
> - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
>   filesystem. The filesystem was created by an older version of mkfs.btrfs
>   which set stripesize to 4k. 
> - losetup -a
>    /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)  
>    /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)  
>    /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)  
> - /etc/mtab lists only /dev/loop0
> - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
>    The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
>    though /dev/loop1 has already been mounted and has
>    /root/disk-imgs/file-1.img as its backing file.
> 
> So IMHO the only solution is to have the stripesize check in check_super() to
> allow both '4k' and 'sectorsize' as valid values i.e.
> 
>         if ((btrfs_super_stripesize(sb) != 4096)
> 	    && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
>                 error("invalid stripesize %u", btrfs_super_stripesize(sb));
>                 goto error_out;
>         }

That's a good one.

But if we go back to the original point, in the kernel side,
1. in open_ctree(), root->stripesize = btrfs_super_stripesize();

2. in find_free_extent(),
	...
	search_start = ALIGN(offset, root->stripesize);
	...
3. in btrfs_alloc_tree_block(),
	...
	ret = btrfs_reserve_extent(..., &ins, ...);
	...
	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);

4. in btrfs_init_new_buffer(),
	...
	buf = btrfs_find_create_tree_block(root, bytenr);
	...

Because 'num_bytes' we pass to find_free_extent() is aligned to
sectorsize, the free space we can find is aligned to sectorsize,
which means 'offset' in '1. find_free_extent()' is aligned to sectorsize.

If our stripesize is larger than sectorsize, say 4 * sectorsize,
for data allocation it's fine while for metadata block allocations it's
not.  It is possible that when we allocate a new metadata block, we can
end up with an existing eb returned by '4. in btrfs_init_new_buffer()'.

PS: There is something wrong around '2. in find_free_extent()',
we only do alignment on offset, but for num_bytes, we don't do that,
so we may end up with a overlap with other data extents or metadata
blocks.

So I think we can just replace this ALING with a warning since the offset
returned by searching free space tree is aligned to block_group->full_stripe_len,
which is either sectorsize or BTRFS_STRIPE_LEN * nr_stripes (for
raid56), then we can just drop the check for stripesize everywhere.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-16  0:09     ` Liu Bo
@ 2016-06-16  8:23       ` Chandan Rajendra
  2016-06-16 17:01         ` Liu Bo
  0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2016-06-16  8:23 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs, Eryu Guan, David Sterba

On Wednesday, June 15, 2016 05:09:55 PM Liu Bo wrote:
> On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > > Hello Liu Bo,
> > > 
> > > We have to fix the following check in check_super() as well,
> > > 
> > >        if (btrfs_super_stripesize(sb) != 4096) {
> > >        
> > >                 error("invalid stripesize %u",
> > >                 btrfs_super_stripesize(sb));
> > >                 goto error_out;
> > >         
> > >         }
> > > 
> > > i.e. btrfs_super_stripesize(sb) must be equal to
> > > btrfs_super_sectorsize(sb).
> > > 
> > > However in btrfs-progs (mkfs.c to be precise) since we had stripesize
> > > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > > mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
> > > devices with existing Btrfs filesystem instances,
> > > 
> > > NOTE: Assume we have changed the stripesize validation in btrfs-progs'
> > > check_super() to,
> > > 
> > >         if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
> > >         
> > >                 error("invalid stripesize %u",
> > >                 btrfs_super_stripesize(sb));
> > >                 goto error_out;
> > >         
> > >         }
> > > 
> > > main()
> > > 
> > >  for each device file passed as an argument,
> > >  
> > >    test_dev_for_mkfs()
> > >    
> > >      check_mounted
> > >      
> > >        check_mounted_where
> > >        
> > >          btrfs_scan_one_device
> > >          
> > >            btrfs_read_dev_super
> > >            
> > >              check_super() call will fail for existing filesystems which
> > > 
> > > have stripesize set to 4k. All existing filesystem instances will fall
> > > into
> > > this category.
> > > 
> > > This error value is pushed up the call stack and this causes the device
> > > to
> > > not get added to the fs_devices_mnt list in check_mounted_where(). Hence
> > > we
> > > would fail to correctly check the mount status of the multi-device btrfs
> > > filesystems.
> > 
> > We can end up in the following scenario,
> > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> > 
> >   filesystem. The filesystem was created by an older version of mkfs.btrfs
> >   which set stripesize to 4k.
> > 
> > - losetup -a
> > 
> >    /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
> >    /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
> >    /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
> > 
> > - /etc/mtab lists only /dev/loop0
> > - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
> > 
> >    The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
> >    though /dev/loop1 has already been mounted and has
> >    /root/disk-imgs/file-1.img as its backing file.
> > 
> > So IMHO the only solution is to have the stripesize check in check_super()
> > to allow both '4k' and 'sectorsize' as valid values i.e.
> > 
> >         if ((btrfs_super_stripesize(sb) != 4096)
> > 	    
> > 	    && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
> > 	    
> >                 error("invalid stripesize %u",
> >                 btrfs_super_stripesize(sb));
> >                 goto error_out;
> >         
> >         }
> 
> That's a good one.
> 
> But if we go back to the original point, in the kernel side,
> 1. in open_ctree(), root->stripesize = btrfs_super_stripesize();
> 
> 2. in find_free_extent(),
> 	...
> 	search_start = ALIGN(offset, root->stripesize);
> 	...
> 3. in btrfs_alloc_tree_block(),
> 	...
> 	ret = btrfs_reserve_extent(..., &ins, ...);
> 	...
> 	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
> 
> 4. in btrfs_init_new_buffer(),
> 	...
> 	buf = btrfs_find_create_tree_block(root, bytenr);
> 	...
> 
> Because 'num_bytes' we pass to find_free_extent() is aligned to
> sectorsize, the free space we can find is aligned to sectorsize,
> which means 'offset' in '1. find_free_extent()' is aligned to sectorsize.
> 
> If our stripesize is larger than sectorsize, say 4 * sectorsize,
> for data allocation it's fine while for metadata block allocations it's
> not.  It is possible that when we allocate a new metadata block, we can
> end up with an existing eb returned by '4. in btrfs_init_new_buffer()'.
>

Liu, I am sorry ... I am unable to visualize a scenario where the above
described scenario could happen. Can you please provide an example?

> PS: There is something wrong around '2. in find_free_extent()',
> we only do alignment on offset, but for num_bytes, we don't do that,
> so we may end up with a overlap with other data extents or metadata
> blocks.
> 
> So I think we can just replace this ALING with a warning since the offset
> returned by searching free space tree is aligned to
> block_group->full_stripe_len, which is either sectorsize or
> BTRFS_STRIPE_LEN * nr_stripes (for
> raid56), then we can just drop the check for stripesize everywhere.
> 

-- 
chandan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-16  8:23       ` Chandan Rajendra
@ 2016-06-16 17:01         ` Liu Bo
  2016-06-17  5:18           ` Chandan Rajendra
  0 siblings, 1 reply; 10+ messages in thread
From: Liu Bo @ 2016-06-16 17:01 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: linux-btrfs, Eryu Guan, David Sterba

On Thu, Jun 16, 2016 at 01:53:59PM +0530, Chandan Rajendra wrote:
> On Wednesday, June 15, 2016 05:09:55 PM Liu Bo wrote:
> > On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> > > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > > > Hello Liu Bo,
> > > > 
> > > > We have to fix the following check in check_super() as well,
> > > > 
> > > >        if (btrfs_super_stripesize(sb) != 4096) {
> > > >        
> > > >                 error("invalid stripesize %u",
> > > >                 btrfs_super_stripesize(sb));
> > > >                 goto error_out;
> > > >         
> > > >         }
> > > > 
> > > > i.e. btrfs_super_stripesize(sb) must be equal to
> > > > btrfs_super_sectorsize(sb).
> > > > 
> > > > However in btrfs-progs (mkfs.c to be precise) since we had stripesize
> > > > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > > > mkfs.c will cause the following to occur when mkfs.btrfs is invoked for
> > > > devices with existing Btrfs filesystem instances,
> > > > 
> > > > NOTE: Assume we have changed the stripesize validation in btrfs-progs'
> > > > check_super() to,
> > > > 
> > > >         if (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb)) {
> > > >         
> > > >                 error("invalid stripesize %u",
> > > >                 btrfs_super_stripesize(sb));
> > > >                 goto error_out;
> > > >         
> > > >         }
> > > > 
> > > > main()
> > > > 
> > > >  for each device file passed as an argument,
> > > >  
> > > >    test_dev_for_mkfs()
> > > >    
> > > >      check_mounted
> > > >      
> > > >        check_mounted_where
> > > >        
> > > >          btrfs_scan_one_device
> > > >          
> > > >            btrfs_read_dev_super
> > > >            
> > > >              check_super() call will fail for existing filesystems which
> > > > 
> > > > have stripesize set to 4k. All existing filesystem instances will fall
> > > > into
> > > > this category.
> > > > 
> > > > This error value is pushed up the call stack and this causes the device
> > > > to
> > > > not get added to the fs_devices_mnt list in check_mounted_where(). Hence
> > > > we
> > > > would fail to correctly check the mount status of the multi-device btrfs
> > > > filesystems.
> > > 
> > > We can end up in the following scenario,
> > > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> > > 
> > >   filesystem. The filesystem was created by an older version of mkfs.btrfs
> > >   which set stripesize to 4k.
> > > 
> > > - losetup -a
> > > 
> > >    /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
> > >    /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
> > >    /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
> > > 
> > > - /etc/mtab lists only /dev/loop0
> > > - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
> > > 
> > >    The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds even
> > >    though /dev/loop1 has already been mounted and has
> > >    /root/disk-imgs/file-1.img as its backing file.
> > > 
> > > So IMHO the only solution is to have the stripesize check in check_super()
> > > to allow both '4k' and 'sectorsize' as valid values i.e.
> > > 
> > >         if ((btrfs_super_stripesize(sb) != 4096)
> > > 	    
> > > 	    && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
> > > 	    
> > >                 error("invalid stripesize %u",
> > >                 btrfs_super_stripesize(sb));
> > >                 goto error_out;
> > >         
> > >         }
> > 
> > That's a good one.
> > 
> > But if we go back to the original point, in the kernel side,
> > 1. in open_ctree(), root->stripesize = btrfs_super_stripesize();
> > 
> > 2. in find_free_extent(),
> > 	...
> > 	search_start = ALIGN(offset, root->stripesize);
> > 	...
> > 3. in btrfs_alloc_tree_block(),
> > 	...
> > 	ret = btrfs_reserve_extent(..., &ins, ...);
> > 	...
> > 	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
> > 
> > 4. in btrfs_init_new_buffer(),
> > 	...
> > 	buf = btrfs_find_create_tree_block(root, bytenr);
> > 	...
> > 
> > Because 'num_bytes' we pass to find_free_extent() is aligned to
> > sectorsize, the free space we can find is aligned to sectorsize,
> > which means 'offset' in '1. find_free_extent()' is aligned to sectorsize.
> > 
> > If our stripesize is larger than sectorsize, say 4 * sectorsize,
> > for data allocation it's fine while for metadata block allocations it's
> > not.  It is possible that when we allocate a new metadata block, we can
> > end up with an existing eb returned by '4. in btrfs_init_new_buffer()'.
> >
> 
> Liu, I am sorry ... I am unable to visualize a scenario where the above
> described scenario could happen. Can you please provide an example?

Sure, imagine that sectorsize is 4k and stripesize is 16k,
and a tree root's eb has eb->start = 12599296 (12582912 + 16384, a typical bytenr in
btrfs) which is aligned to 4k, and when CoW happens on another eb,

__btrfs_cow_block()
  ->btrfs_alloc_tree_block()
    ->btrfs_reserve_extent()
      ->find_free_extent()
    ->btrfs_init_new_buffer()

btrfs_reserve_extent() can return 12599296 for the new eb even if what
it've found is (12582912 + 4096), but
 after 'search_start = ALIGN(offset, root->stripesize)', it gets to
12599296.

In btrfs_init_new_buffer, we search eb tree by bytenr=12599296 and
get tree root's eb, the following btrfs_tree_lock will scream.

The example is taken from
btrfs-progs/tests/fuzz-tests/images/superblock-stripsize-bogus.raw.xz


Thanks,

-liubo
> 
> > PS: There is something wrong around '2. in find_free_extent()',
> > we only do alignment on offset, but for num_bytes, we don't do that,
> > so we may end up with a overlap with other data extents or metadata
> > blocks.
> > 
> > So I think we can just replace this ALING with a warning since the offset
> > returned by searching free space tree is aligned to
> > block_group->full_stripe_len, which is either sectorsize or
> > BTRFS_STRIPE_LEN * nr_stripes (for
> > raid56), then we can just drop the check for stripesize everywhere.
> > 
> 
> -- 
> chandan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-16 17:01         ` Liu Bo
@ 2016-06-17  5:18           ` Chandan Rajendra
  2016-06-17  6:08             ` Liu Bo
  0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2016-06-17  5:18 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs, Eryu Guan, David Sterba

On Thursday, June 16, 2016 10:01:41 AM Liu Bo wrote:
> On Thu, Jun 16, 2016 at 01:53:59PM +0530, Chandan Rajendra wrote:
> > On Wednesday, June 15, 2016 05:09:55 PM Liu Bo wrote:
> > > On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> > > > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > > > > Hello Liu Bo,
> > > > > 
> > > > > We have to fix the following check in check_super() as well,
> > > > > 
> > > > >        if (btrfs_super_stripesize(sb) != 4096) {
> > > > >        
> > > > >                 error("invalid stripesize %u",
> > > > >                 btrfs_super_stripesize(sb));
> > > > >                 goto error_out;
> > > > >         
> > > > >         }
> > > > > 
> > > > > i.e. btrfs_super_stripesize(sb) must be equal to
> > > > > btrfs_super_sectorsize(sb).
> > > > > 
> > > > > However in btrfs-progs (mkfs.c to be precise) since we had
> > > > > stripesize
> > > > > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > > > > mkfs.c will cause the following to occur when mkfs.btrfs is invoked
> > > > > for
> > > > > devices with existing Btrfs filesystem instances,
> > > > > 
> > > > > NOTE: Assume we have changed the stripesize validation in
> > > > > btrfs-progs'
> > > > > check_super() to,
> > > > > 
> > > > >         if (btrfs_super_stripesize(sb) !=
> > > > >         btrfs_super_sectorsize(sb)) {
> > > > >         
> > > > >                 error("invalid stripesize %u",
> > > > >                 btrfs_super_stripesize(sb));
> > > > >                 goto error_out;
> > > > >         
> > > > >         }
> > > > > 
> > > > > main()
> > > > > 
> > > > >  for each device file passed as an argument,
> > > > >  
> > > > >    test_dev_for_mkfs()
> > > > >    
> > > > >      check_mounted
> > > > >      
> > > > >        check_mounted_where
> > > > >        
> > > > >          btrfs_scan_one_device
> > > > >          
> > > > >            btrfs_read_dev_super
> > > > >            
> > > > >              check_super() call will fail for existing filesystems
> > > > >              which
> > > > > 
> > > > > have stripesize set to 4k. All existing filesystem instances will
> > > > > fall
> > > > > into
> > > > > this category.
> > > > > 
> > > > > This error value is pushed up the call stack and this causes the
> > > > > device
> > > > > to
> > > > > not get added to the fs_devices_mnt list in check_mounted_where().
> > > > > Hence
> > > > > we
> > > > > would fail to correctly check the mount status of the multi-device
> > > > > btrfs
> > > > > filesystems.
> > > > 
> > > > We can end up in the following scenario,
> > > > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> > > > 
> > > >   filesystem. The filesystem was created by an older version of
> > > >   mkfs.btrfs
> > > >   which set stripesize to 4k.
> > > > 
> > > > - losetup -a
> > > > 
> > > >    /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
> > > >    /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
> > > >    /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
> > > > 
> > > > - /etc/mtab lists only /dev/loop0
> > > > - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
> > > > 
> > > >    The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds
> > > >    even
> > > >    though /dev/loop1 has already been mounted and has
> > > >    /root/disk-imgs/file-1.img as its backing file.
> > > > 
> > > > So IMHO the only solution is to have the stripesize check in
> > > > check_super()
> > > > to allow both '4k' and 'sectorsize' as valid values i.e.
> > > > 
> > > >         if ((btrfs_super_stripesize(sb) != 4096)
> > > > 	    
> > > > 	    && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
> > > > 	    
> > > >                 error("invalid stripesize %u",
> > > >                 btrfs_super_stripesize(sb));
> > > >                 goto error_out;
> > > >         
> > > >         }
> > > 
> > > That's a good one.
> > > 
> > > But if we go back to the original point, in the kernel side,
> > > 1. in open_ctree(), root->stripesize = btrfs_super_stripesize();
> > > 
> > > 2. in find_free_extent(),
> > > 
> > > 	...
> > > 	search_start = ALIGN(offset, root->stripesize);
> > > 	...
> > > 
> > > 3. in btrfs_alloc_tree_block(),
> > > 
> > > 	...
> > > 	ret = btrfs_reserve_extent(..., &ins, ...);
> > > 	...
> > > 	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
> > > 
> > > 4. in btrfs_init_new_buffer(),
> > > 
> > > 	...
> > > 	buf = btrfs_find_create_tree_block(root, bytenr);
> > > 	...
> > > 
> > > Because 'num_bytes' we pass to find_free_extent() is aligned to
> > > sectorsize, the free space we can find is aligned to sectorsize,
> > > which means 'offset' in '1. find_free_extent()' is aligned to
> > > sectorsize.
> > > 
> > > If our stripesize is larger than sectorsize, say 4 * sectorsize,
> > > for data allocation it's fine while for metadata block allocations it's
> > > not.  It is possible that when we allocate a new metadata block, we can
> > > end up with an existing eb returned by '4. in btrfs_init_new_buffer()'.
> > 
> > Liu, I am sorry ... I am unable to visualize a scenario where the above
> > described scenario could happen. Can you please provide an example?
> 
> Sure, imagine that sectorsize is 4k and stripesize is 16k,
> and a tree root's eb has eb->start = 12599296 (12582912 + 16384, a typical
> bytenr in btrfs) which is aligned to 4k, and when CoW happens on another
> eb,
> 
> __btrfs_cow_block()
>   ->btrfs_alloc_tree_block()
>     ->btrfs_reserve_extent()
>       ->find_free_extent()
>     ->btrfs_init_new_buffer()
> 
> btrfs_reserve_extent() can return 12599296 for the new eb even if what
> it've found is (12582912 + 4096), but
>  after 'search_start = ALIGN(offset, root->stripesize)', it gets to
> 12599296.
> 
> In btrfs_init_new_buffer, we search eb tree by bytenr=12599296 and
> get tree root's eb, the following btrfs_tree_lock will scream.
> 
> The example is taken from
> btrfs-progs/tests/fuzz-tests/images/superblock-stripsize-bogus.raw.xz
> 

ah, this is indeed possible when nodesize is same as sectorsize
i.e. 4k. Thanks for the explaination Liubo.

The new validation patches that I have posted
(http://news.gmane.org/find-root.php?message_id=1466095078%2d25726%2d1%2dgit%2dsend%2demail%2dchandan%40linux.vnet.ibm.com
and
http://news.gmane.org/find-root.php?message_id=1466095109%2d26044%2d1%2dgit%2dsend%2demail%2dchandan%40linux.vnet.ibm.com)
restrict the stripesize to be either sectorsize or 4096. So I think these
restrictions are good enough to make sure we don't get into the situation
explained by you.

-- 
chandan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-17  5:18           ` Chandan Rajendra
@ 2016-06-17  6:08             ` Liu Bo
  2016-06-17  6:51               ` Chandan Rajendra
  0 siblings, 1 reply; 10+ messages in thread
From: Liu Bo @ 2016-06-17  6:08 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: linux-btrfs, Eryu Guan, David Sterba

On Fri, Jun 17, 2016 at 10:48:05AM +0530, Chandan Rajendra wrote:
> On Thursday, June 16, 2016 10:01:41 AM Liu Bo wrote:
> > On Thu, Jun 16, 2016 at 01:53:59PM +0530, Chandan Rajendra wrote:
> > > On Wednesday, June 15, 2016 05:09:55 PM Liu Bo wrote:
> > > > On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> > > > > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > > > > > Hello Liu Bo,
> > > > > > 
> > > > > > We have to fix the following check in check_super() as well,
> > > > > > 
> > > > > >        if (btrfs_super_stripesize(sb) != 4096) {
> > > > > >        
> > > > > >                 error("invalid stripesize %u",
> > > > > >                 btrfs_super_stripesize(sb));
> > > > > >                 goto error_out;
> > > > > >         
> > > > > >         }
> > > > > > 
> > > > > > i.e. btrfs_super_stripesize(sb) must be equal to
> > > > > > btrfs_super_sectorsize(sb).
> > > > > > 
> > > > > > However in btrfs-progs (mkfs.c to be precise) since we had
> > > > > > stripesize
> > > > > > hardcoded to 4096, setting stripesize to the value of sectorsize in
> > > > > > mkfs.c will cause the following to occur when mkfs.btrfs is invoked
> > > > > > for
> > > > > > devices with existing Btrfs filesystem instances,
> > > > > > 
> > > > > > NOTE: Assume we have changed the stripesize validation in
> > > > > > btrfs-progs'
> > > > > > check_super() to,
> > > > > > 
> > > > > >         if (btrfs_super_stripesize(sb) !=
> > > > > >         btrfs_super_sectorsize(sb)) {
> > > > > >         
> > > > > >                 error("invalid stripesize %u",
> > > > > >                 btrfs_super_stripesize(sb));
> > > > > >                 goto error_out;
> > > > > >         
> > > > > >         }
> > > > > > 
> > > > > > main()
> > > > > > 
> > > > > >  for each device file passed as an argument,
> > > > > >  
> > > > > >    test_dev_for_mkfs()
> > > > > >    
> > > > > >      check_mounted
> > > > > >      
> > > > > >        check_mounted_where
> > > > > >        
> > > > > >          btrfs_scan_one_device
> > > > > >          
> > > > > >            btrfs_read_dev_super
> > > > > >            
> > > > > >              check_super() call will fail for existing filesystems
> > > > > >              which
> > > > > > 
> > > > > > have stripesize set to 4k. All existing filesystem instances will
> > > > > > fall
> > > > > > into
> > > > > > this category.
> > > > > > 
> > > > > > This error value is pushed up the call stack and this causes the
> > > > > > device
> > > > > > to
> > > > > > not get added to the fs_devices_mnt list in check_mounted_where().
> > > > > > Hence
> > > > > > we
> > > > > > would fail to correctly check the mount status of the multi-device
> > > > > > btrfs
> > > > > > filesystems.
> > > > > 
> > > > > We can end up in the following scenario,
> > > > > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> > > > > 
> > > > >   filesystem. The filesystem was created by an older version of
> > > > >   mkfs.btrfs
> > > > >   which set stripesize to 4k.
> > > > > 
> > > > > - losetup -a
> > > > > 
> > > > >    /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
> > > > >    /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
> > > > >    /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
> > > > > 
> > > > > - /etc/mtab lists only /dev/loop0
> > > > > - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
> > > > > 
> > > > >    The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4' succeeds
> > > > >    even
> > > > >    though /dev/loop1 has already been mounted and has
> > > > >    /root/disk-imgs/file-1.img as its backing file.
> > > > > 
> > > > > So IMHO the only solution is to have the stripesize check in
> > > > > check_super()
> > > > > to allow both '4k' and 'sectorsize' as valid values i.e.
> > > > > 
> > > > >         if ((btrfs_super_stripesize(sb) != 4096)
> > > > > 	    
> > > > > 	    && (btrfs_super_stripesize(sb) != btrfs_super_sectorsize(sb))) {
> > > > > 	    
> > > > >                 error("invalid stripesize %u",
> > > > >                 btrfs_super_stripesize(sb));
> > > > >                 goto error_out;
> > > > >         
> > > > >         }
> > > > 
> > > > That's a good one.
> > > > 
> > > > But if we go back to the original point, in the kernel side,
> > > > 1. in open_ctree(), root->stripesize = btrfs_super_stripesize();
> > > > 
> > > > 2. in find_free_extent(),
> > > > 
> > > > 	...
> > > > 	search_start = ALIGN(offset, root->stripesize);
> > > > 	...
> > > > 
> > > > 3. in btrfs_alloc_tree_block(),
> > > > 
> > > > 	...
> > > > 	ret = btrfs_reserve_extent(..., &ins, ...);
> > > > 	...
> > > > 	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
> > > > 
> > > > 4. in btrfs_init_new_buffer(),
> > > > 
> > > > 	...
> > > > 	buf = btrfs_find_create_tree_block(root, bytenr);
> > > > 	...
> > > > 
> > > > Because 'num_bytes' we pass to find_free_extent() is aligned to
> > > > sectorsize, the free space we can find is aligned to sectorsize,
> > > > which means 'offset' in '1. find_free_extent()' is aligned to
> > > > sectorsize.
> > > > 
> > > > If our stripesize is larger than sectorsize, say 4 * sectorsize,
> > > > for data allocation it's fine while for metadata block allocations it's
> > > > not.  It is possible that when we allocate a new metadata block, we can
> > > > end up with an existing eb returned by '4. in btrfs_init_new_buffer()'.
> > > 
> > > Liu, I am sorry ... I am unable to visualize a scenario where the above
> > > described scenario could happen. Can you please provide an example?
> > 
> > Sure, imagine that sectorsize is 4k and stripesize is 16k,
> > and a tree root's eb has eb->start = 12599296 (12582912 + 16384, a typical
> > bytenr in btrfs) which is aligned to 4k, and when CoW happens on another
> > eb,
> > 
> > __btrfs_cow_block()
> >   ->btrfs_alloc_tree_block()
> >     ->btrfs_reserve_extent()
> >       ->find_free_extent()
> >     ->btrfs_init_new_buffer()
> > 
> > btrfs_reserve_extent() can return 12599296 for the new eb even if what
> > it've found is (12582912 + 4096), but
> >  after 'search_start = ALIGN(offset, root->stripesize)', it gets to
> > 12599296.
> > 
> > In btrfs_init_new_buffer, we search eb tree by bytenr=12599296 and
> > get tree root's eb, the following btrfs_tree_lock will scream.
> > 
> > The example is taken from
> > btrfs-progs/tests/fuzz-tests/images/superblock-stripsize-bogus.raw.xz
> > 
> 
> ah, this is indeed possible when nodesize is same as sectorsize
> i.e. 4k. Thanks for the explaination Liubo.

I don't think nodesize has to be same as sectorsize to make it possible.

> 
> The new validation patches that I have posted
> (http://news.gmane.org/find-root.php?message_id=1466095078%2d25726%2d1%2dgit%2dsend%2demail%2dchandan%40linux.vnet.ibm.com
> and
> http://news.gmane.org/find-root.php?message_id=1466095109%2d26044%2d1%2dgit%2dsend%2demail%2dchandan%40linux.vnet.ibm.com)
> restrict the stripesize to be either sectorsize or 4096. So I think these
> restrictions are good enough to make sure we don't get into the situation
> explained by you.

It's a workaround anyway, I'd rather fix the kernel to not use
stripesize and we can remove all checks against super_stripesize.

The code has evolved a lot to have free space align well to sectorsize,
so stripesize is not as necessary as when it's introduced firstly.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-17  6:08             ` Liu Bo
@ 2016-06-17  6:51               ` Chandan Rajendra
  2016-06-17 16:30                 ` David Sterba
  0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2016-06-17  6:51 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs, Eryu Guan, David Sterba, Chris Mason, Josef Bacik

On Thursday, June 16, 2016 11:08:05 PM Liu Bo wrote:
> On Fri, Jun 17, 2016 at 10:48:05AM +0530, Chandan Rajendra wrote:
> > On Thursday, June 16, 2016 10:01:41 AM Liu Bo wrote:
> > > On Thu, Jun 16, 2016 at 01:53:59PM +0530, Chandan Rajendra wrote:
> > > > On Wednesday, June 15, 2016 05:09:55 PM Liu Bo wrote:
> > > > > On Wed, Jun 15, 2016 at 03:50:17PM +0530, Chandan Rajendra wrote:
> > > > > > On Wednesday, June 15, 2016 09:12:28 AM Chandan Rajendra wrote:
> > > > > > > Hello Liu Bo,
> > > > > > > 
> > > > > > > We have to fix the following check in check_super() as well,
> > > > > > > 
> > > > > > >        if (btrfs_super_stripesize(sb) != 4096) {
> > > > > > >        
> > > > > > >                 error("invalid stripesize %u",
> > > > > > >                 btrfs_super_stripesize(sb));
> > > > > > >                 goto error_out;
> > > > > > >         
> > > > > > >         }
> > > > > > > 
> > > > > > > i.e. btrfs_super_stripesize(sb) must be equal to
> > > > > > > btrfs_super_sectorsize(sb).
> > > > > > > 
> > > > > > > However in btrfs-progs (mkfs.c to be precise) since we had
> > > > > > > stripesize
> > > > > > > hardcoded to 4096, setting stripesize to the value of sectorsize
> > > > > > > in
> > > > > > > mkfs.c will cause the following to occur when mkfs.btrfs is
> > > > > > > invoked
> > > > > > > for
> > > > > > > devices with existing Btrfs filesystem instances,
> > > > > > > 
> > > > > > > NOTE: Assume we have changed the stripesize validation in
> > > > > > > btrfs-progs'
> > > > > > > check_super() to,
> > > > > > > 
> > > > > > >         if (btrfs_super_stripesize(sb) !=
> > > > > > >         btrfs_super_sectorsize(sb)) {
> > > > > > >         
> > > > > > >                 error("invalid stripesize %u",
> > > > > > >                 btrfs_super_stripesize(sb));
> > > > > > >                 goto error_out;
> > > > > > >         
> > > > > > >         }
> > > > > > > 
> > > > > > > main()
> > > > > > > 
> > > > > > >  for each device file passed as an argument,
> > > > > > >  
> > > > > > >    test_dev_for_mkfs()
> > > > > > >    
> > > > > > >      check_mounted
> > > > > > >      
> > > > > > >        check_mounted_where
> > > > > > >        
> > > > > > >          btrfs_scan_one_device
> > > > > > >          
> > > > > > >            btrfs_read_dev_super
> > > > > > >            
> > > > > > >              check_super() call will fail for existing
> > > > > > >              filesystems
> > > > > > >              which
> > > > > > > 
> > > > > > > have stripesize set to 4k. All existing filesystem instances
> > > > > > > will
> > > > > > > fall
> > > > > > > into
> > > > > > > this category.
> > > > > > > 
> > > > > > > This error value is pushed up the call stack and this causes the
> > > > > > > device
> > > > > > > to
> > > > > > > not get added to the fs_devices_mnt list in
> > > > > > > check_mounted_where().
> > > > > > > Hence
> > > > > > > we
> > > > > > > would fail to correctly check the mount status of the
> > > > > > > multi-device
> > > > > > > btrfs
> > > > > > > filesystems.
> > > > > > 
> > > > > > We can end up in the following scenario,
> > > > > > - /dev/loop0, /dev/loop1 and /dev/loop2 are mounted as a single
> > > > > > 
> > > > > >   filesystem. The filesystem was created by an older version of
> > > > > >   mkfs.btrfs
> > > > > >   which set stripesize to 4k.
> > > > > > 
> > > > > > - losetup -a
> > > > > > 
> > > > > >    /dev/loop0: [0030]:19477 (/root/disk-imgs/file-0.img)
> > > > > >    /dev/loop1: [0030]:16577 (/root/disk-imgs/file-1.img)
> > > > > >    /dev/loop2: [64770]:3423229 (/root/disk-imgs/file-2.img)
> > > > > > 
> > > > > > - /etc/mtab lists only /dev/loop0
> > > > > > - "losetup /dev/loop4 /root/disk-imgs/file-1.img"
> > > > > > 
> > > > > >    The new mkfs.btrfs invoked as 'mkfs.btrfs -f /dev/loop4'
> > > > > >    succeeds
> > > > > >    even
> > > > > >    though /dev/loop1 has already been mounted and has
> > > > > >    /root/disk-imgs/file-1.img as its backing file.
> > > > > > 
> > > > > > So IMHO the only solution is to have the stripesize check in
> > > > > > check_super()
> > > > > > to allow both '4k' and 'sectorsize' as valid values i.e.
> > > > > > 
> > > > > >         if ((btrfs_super_stripesize(sb) != 4096)
> > > > > > 	    
> > > > > > 	    && (btrfs_super_stripesize(sb) !=
> > > > > > 	    btrfs_super_sectorsize(sb))) {
> > > > > > 	    
> > > > > >                 error("invalid stripesize %u",
> > > > > >                 btrfs_super_stripesize(sb));
> > > > > >                 goto error_out;
> > > > > >         
> > > > > >         }
> > > > > 
> > > > > That's a good one.
> > > > > 
> > > > > But if we go back to the original point, in the kernel side,
> > > > > 1. in open_ctree(), root->stripesize = btrfs_super_stripesize();
> > > > > 
> > > > > 2. in find_free_extent(),
> > > > > 
> > > > > 	...
> > > > > 	search_start = ALIGN(offset, root->stripesize);
> > > > > 	...
> > > > > 
> > > > > 3. in btrfs_alloc_tree_block(),
> > > > > 
> > > > > 	...
> > > > > 	ret = btrfs_reserve_extent(..., &ins, ...);
> > > > > 	...
> > > > > 	buf = btrfs_init_new_buffer(trans, root, ins.objectid, 
level);
> > > > > 
> > > > > 4. in btrfs_init_new_buffer(),
> > > > > 
> > > > > 	...
> > > > > 	buf = btrfs_find_create_tree_block(root, bytenr);
> > > > > 	...
> > > > > 
> > > > > Because 'num_bytes' we pass to find_free_extent() is aligned to
> > > > > sectorsize, the free space we can find is aligned to sectorsize,
> > > > > which means 'offset' in '1. find_free_extent()' is aligned to
> > > > > sectorsize.
> > > > > 
> > > > > If our stripesize is larger than sectorsize, say 4 * sectorsize,
> > > > > for data allocation it's fine while for metadata block allocations
> > > > > it's
> > > > > not.  It is possible that when we allocate a new metadata block, we
> > > > > can
> > > > > end up with an existing eb returned by '4. in
> > > > > btrfs_init_new_buffer()'.
> > > > 
> > > > Liu, I am sorry ... I am unable to visualize a scenario where the
> > > > above
> > > > described scenario could happen. Can you please provide an example?
> > > 
> > > Sure, imagine that sectorsize is 4k and stripesize is 16k,
> > > and a tree root's eb has eb->start = 12599296 (12582912 + 16384, a
> > > typical
> > > bytenr in btrfs) which is aligned to 4k, and when CoW happens on another
> > > eb,
> > > 
> > > __btrfs_cow_block()
> > > 
> > >   ->btrfs_alloc_tree_block()
> > >   
> > >     ->btrfs_reserve_extent()
> > >     
> > >       ->find_free_extent()
> > >     
> > >     ->btrfs_init_new_buffer()
> > > 
> > > btrfs_reserve_extent() can return 12599296 for the new eb even if what
> > > it've found is (12582912 + 4096), but
> > > 
> > >  after 'search_start = ALIGN(offset, root->stripesize)', it gets to
> > > 
> > > 12599296.
> > > 
> > > In btrfs_init_new_buffer, we search eb tree by bytenr=12599296 and
> > > get tree root's eb, the following btrfs_tree_lock will scream.
> > > 
> > > The example is taken from
> > > btrfs-progs/tests/fuzz-tests/images/superblock-stripsize-bogus.raw.xz
> > 
> > ah, this is indeed possible when nodesize is same as sectorsize
> > i.e. 4k. Thanks for the explaination Liubo.
> 
> I don't think nodesize has to be same as sectorsize to make it possible.
> 
> > The new validation patches that I have posted
> > (http://news.gmane.org/find-root.php?message_id=1466095078%2d25726%2d1%2dg
> > it%2dsend%2demail%2dchandan%40linux.vnet.ibm.com and
> > http://news.gmane.org/find-root.php?message_id=1466095109%2d26044%2d1%2dgi
> > t%2dsend%2demail%2dchandan%40linux.vnet.ibm.com) restrict the stripesize
> > to be either sectorsize or 4096. So I think these restrictions are good
> > enough to make sure we don't get into the situation explained by you.
> 
> It's a workaround anyway, I'd rather fix the kernel to not use
> stripesize and we can remove all checks against super_stripesize.
> 
> The code has evolved a lot to have free space align well to sectorsize,
> so stripesize is not as necessary as when it's introduced firstly.
> 

Yes, I mostly agree with what you are suggesting.

But, I am not completely sure about the following ...

static u32 find_raid56_stripe_len(u32 data_devices, u32 dev_stripe_target)
{
        /* TODO allow them to set a preferred stripe size */
        return SZ_64K;
}

Chris, Josef, David ...  Do you have any objections to replacing invocations
of find_raid56_stripe_len() with BTRFS_STRIPE_LEN? Or do we need to retain
find_raid56_stripe_len() to have the ability for having configurable
stripesize in the future?

-- 
chandan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: let super_stripesize match with sectorsize
  2016-06-17  6:51               ` Chandan Rajendra
@ 2016-06-17 16:30                 ` David Sterba
  0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2016-06-17 16:30 UTC (permalink / raw)
  To: Chandan Rajendra
  Cc: bo.li.liu, linux-btrfs, Eryu Guan, David Sterba, Chris Mason,
	Josef Bacik

On Fri, Jun 17, 2016 at 12:21:50PM +0530, Chandan Rajendra wrote:
> > > t%2dsend%2demail%2dchandan%40linux.vnet.ibm.com) restrict the stripesize
> > > to be either sectorsize or 4096. So I think these restrictions are good
> > > enough to make sure we don't get into the situation explained by you.
> > 
> > It's a workaround anyway, I'd rather fix the kernel to not use
> > stripesize and we can remove all checks against super_stripesize.
> > 
> > The code has evolved a lot to have free space align well to sectorsize,
> > so stripesize is not as necessary as when it's introduced firstly.
> > 
> 
> Yes, I mostly agree with what you are suggesting.
> 
> But, I am not completely sure about the following ...
> 
> static u32 find_raid56_stripe_len(u32 data_devices, u32 dev_stripe_target)
> {
>         /* TODO allow them to set a preferred stripe size */
>         return SZ_64K;
> }
> 
> Chris, Josef, David ...  Do you have any objections to replacing invocations
> of find_raid56_stripe_len() with BTRFS_STRIPE_LEN? Or do we need to retain
> find_raid56_stripe_len() to have the ability for having configurable
> stripesize in the future?

The stripe will be configurable some day, so please keep it for now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-06-17 16:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-14 21:33 [PATCH] Btrfs: let super_stripesize match with sectorsize Liu Bo
2016-06-15  3:42 ` Chandan Rajendra
2016-06-15 10:20   ` Chandan Rajendra
2016-06-16  0:09     ` Liu Bo
2016-06-16  8:23       ` Chandan Rajendra
2016-06-16 17:01         ` Liu Bo
2016-06-17  5:18           ` Chandan Rajendra
2016-06-17  6:08             ` Liu Bo
2016-06-17  6:51               ` Chandan Rajendra
2016-06-17 16:30                 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.