All of lore.kernel.org
 help / color / mirror / Atom feed
* Drive without DISCARD support crashes btrfs
@ 2011-05-17 15:27 David Sterba
  2011-05-17 16:00 ` [PATCH] btrfs: fix crash when no drive supports DISCARD David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2011-05-17 15:27 UTC (permalink / raw)
  To: lidongyang; +Cc: chris.mason, linux-btrfs

Hi,

while playing with lockedp, I started to hit consistent crashes with
xfstests/013 (fsstress):

[  537.284234] ------------[ cut here ]------------
[  537.288040] kernel BUG at fs/btrfs/tree-log.c:1792!
[  537.288040] invalid opcode: 0000 [#1] SMP
[  537.288040] last sysfs file: /sys/devices/virtual/bdi/btrfs-10/uevent
[  537.288040] CPU 1
[  537.288040] Modules linked in: btrfs
[  537.288040]
[  537.288040] Pid: 12069, comm: fsstress Not tainted 2.6.39-rc7-default+ #24 Intel
[  537.288040] RIP: 0010:[<ffffffffa0063c79>]  [<ffffffffa0063c79>] walk_up_log_tree+0x2f9/0x330 [btrfs]
[  537.288040] RSP: 0018:ffff880058191c08  EFLAGS: 00010282
[  537.288040] RAX: 00000000ffffffa1 RBX: ffff88004f863910 RCX: 0000000000000001
[  537.288040] RDX: 0000000000000f0e RSI: 0000000000000001 RDI: ffff880078bd8600
[  537.288040] RBP: ffff880058191c88 R08: 0000000000000000 R09: 0000000000000001
[  537.288040] R10: 0000000000000000 R11: 0000000000006945 R12: ffff880058191cb4
[  537.288040] R13: 0000000000000000 R14: ffff880000000000 R15: ffff880058191cf8
[  537.288040] FS:  00007f469ba3d700(0000) GS:ffff88007e600000(0000) knlGS:0000000000000000
[  537.288040] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  537.288040] CR2: 00007f469af27430 CR3: 000000007ba3c000 CR4: 00000000000006e0
[  537.288040] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  537.288040] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  537.288040] Process fsstress (pid: 12069, threadinfo ffff880058190000, task ffff8800782f1140)
[  537.288040] Stack:
[  537.288040]  ffff880058191fd8 ffff88005e326f20 ffff880058191fd8 fffffffffffffffa
[  537.288040]  ffff880058191fd8 ffff880058191fd8 ffff880058191fd8 ffff880058191fd8
[  537.288040]  ffff88004eca4598 ffff8800580a6000 ffff880058191da8 0000000000000000
[  537.288040] Call Trace:
[  537.288040]  [<ffffffffa0064348>] walk_log_tree+0x108/0x2a0 [btrfs]
[  537.288040]  [<ffffffffa0064523>] free_log_tree+0x43/0xc0 [btrfs]
[  537.288040]  [<ffffffffa0061950>] ? insert_one_name+0xc0/0xc0 [btrfs]
[  537.288040]  [<ffffffffa006663f>] btrfs_free_log+0x1f/0x30 [btrfs]
[  537.288040]  [<ffffffffa002bb61>] commit_fs_roots+0x91/0x150 [btrfs]
[  537.288040]  [<ffffffffa002cc70>] ? btrfs_commit_transaction+0x340/0x7d0 [btrfs]
[  537.288040]  [<ffffffffa002cc70>] ? btrfs_commit_transaction+0x340/0x7d0 [btrfs]
[  537.288040]  [<ffffffffa002cc7b>] btrfs_commit_transaction+0x34b/0x7d0 [btrfs]
[  537.288040]  [<ffffffff810b8c80>] ? wake_up_bit+0x40/0x40
[  537.288040]  [<ffffffff811a8cb0>] ? __sync_filesystem+0x90/0x90
[  537.288040]  [<ffffffffa0003210>] btrfs_sync_fs+0x60/0xe0 [btrfs]
[  537.288040]  [<ffffffff811a8c7e>] __sync_filesystem+0x5e/0x90
[  537.288040]  [<ffffffff811a8ccf>] sync_one_sb+0x1f/0x30
[  537.288040]  [<ffffffff8117f287>] iterate_supers+0x77/0xe0
[  537.288040]  [<ffffffff811a8bd0>] sync_filesystems+0x20/0x30
[  537.616540]  [<ffffffff811a8d71>] sys_sync+0x21/0x40
[  537.616540]  [<ffffffff81e94f42>] system_call_fastpath+0x16/0x1b
[  537.616540] Code: 00 00 48 83 c4 58 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f c9 c3 0f 1f 40 00 48 83 c4 58 b0 01 5b 41 5c 41 5d 41 5e 41 5f c9 c3 <0f> 0b be fc 06 00 00 48 c7 c7 c8 f4 07 a0 e8 34 08 03 e1 e9 c0
[  537.616540] RIP  [<ffffffffa0063c79>] walk_up_log_tree+0x2f9/0x330 [btrfs]
[  537.616540]  RSP <ffff880058191c08>
[  537.619306] ---[ end trace 5f9597f2f24fa95c ]---

it's applied on top of today's linus c1d10d18c5. Mount options are
compress and discard.

fsstress process is in D-state with
# cat /proc/12068/stack
[<ffffffffa002d419>] start_transaction+0x79/0x2f0 [btrfs]
[<ffffffffa002da13>] btrfs_start_transaction+0x13/0x20 [btrfs]
[<ffffffffa002e1aa>] __unlink_start_trans+0x4a/0x400 [btrfs]
[<ffffffffa0035997>] btrfs_unlink+0x37/0xe0 [btrfs]
[<ffffffff8118ab6a>] vfs_unlink+0x8a/0xf0
[<ffffffff8118ad53>] do_unlinkat+0x183/0x1c0
[<ffffffff8118bbc6>] sys_unlink+0x16/0x20
[<ffffffff81e94f42>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

shutdown is stuck, sync of btrfs does not proceed due to blocked
transaction commit due to previsou failures.

crash site:
1789                                 ret = btrfs_free_reserved_extent(root,
1790                                                 path->nodes[*level]->start,
1791                                                 path->nodes[*level]->len);
1792                                 BUG_ON(ret);

if EAX contains ret, 0xFFFFFFa1, which seems to be EOPNOTSUPP comming from
discard handling in btrfs_free_reserved_extent().

walk_up_log_tree
  btrfs_free_reserved_extent
    btrfs_discard_extent
      return -EOPNOTSUPP
    BUG_ON ret

I suspect line 1793 in btrfs_discard_extent():

1766 static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
1767                                 u64 num_bytes, u64 *actual_bytes)
1768 {
...
1793         if (discarded_bytes && ret == -EOPNOTSUPP)
1794                 ret = 0;
...
1800         return ret;
1801 }

discarded_bytes is initially 0, ret is EOPNOTSUPP from not-done
btrfs_issue_discard, true branch is not taken and nonzero retcode is
propagated.

I think the "discarded_bytes != 0" check should be dropped, I do not see why
it's there. Going to test now.


david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] btrfs: fix crash when no drive supports DISCARD
  2011-05-17 15:27 Drive without DISCARD support crashes btrfs David Sterba
@ 2011-05-17 16:00 ` David Sterba
  2011-05-17 18:49   ` Chris Mason
  2011-05-18  3:29   ` Li Dongyang
  0 siblings, 2 replies; 7+ messages in thread
From: David Sterba @ 2011-05-17 16:00 UTC (permalink / raw)
  To: chris.mason; +Cc: linux-btrfs, lidongyang, David Sterba, stable

xfstests/013 crashes when the test partition is mounted with -o discard:

walk_up_log_tree
  btrfs_free_reserved_extent
    btrfs_discard_extent
      return -EOPNOTSUPP
    BUG_ON ret

btrfs_discard_extent() should be fine when drive does not support the
DISCARD operation and filter the EOPNOTSUPP retcode, but currently it
does this only when some bytes were succesfully discarded.

Signed-off-by: David Sterba <dsterba@suse.cz>
CC: stable@kernel.org
---
 fs/btrfs/extent-tree.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9ee6bd5..feab2ab 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1790,7 +1790,7 @@ static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
 		}
 		kfree(multi);
 	}
-	if (discarded_bytes && ret == -EOPNOTSUPP)
+	if (ret == -EOPNOTSUPP)
 		ret = 0;
 
 	if (actual_bytes)
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: fix crash when no drive supports DISCARD
  2011-05-17 16:00 ` [PATCH] btrfs: fix crash when no drive supports DISCARD David Sterba
@ 2011-05-17 18:49   ` Chris Mason
  2011-05-18  3:29   ` Li Dongyang
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Mason @ 2011-05-17 18:49 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs, lidongyang, stable

Excerpts from David Sterba's message of 2011-05-17 12:00:31 -0400:
> xfstests/013 crashes when the test partition is mounted with -o discard:
> 
> walk_up_log_tree
>   btrfs_free_reserved_extent
>     btrfs_discard_extent
>       return -EOPNOTSUPP
>     BUG_ON ret
> 
> btrfs_discard_extent() should be fine when drive does not support the
> DISCARD operation and filter the EOPNOTSUPP retcode, but currently it
> does this only when some bytes were succesfully discarded.

Fair enough, we aren't really doing anything with EOPNOTSUPP now, so we
can just drop it in btrfs_discard_extent.

-chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: fix crash when no drive supports DISCARD
  2011-05-17 16:00 ` [PATCH] btrfs: fix crash when no drive supports DISCARD David Sterba
  2011-05-17 18:49   ` Chris Mason
@ 2011-05-18  3:29   ` Li Dongyang
  2011-05-20 11:52     ` David Sterba
  1 sibling, 1 reply; 7+ messages in thread
From: Li Dongyang @ 2011-05-18  3:29 UTC (permalink / raw)
  To: David Sterba; +Cc: chris.mason, linux-btrfs, stable

On Wednesday, May 18, 2011 12:00:31 AM David Sterba wrote:
> xfstests/013 crashes when the test partition is mounted with -o discard:
> 
> walk_up_log_tree
>   btrfs_free_reserved_extent
>     btrfs_discard_extent
>       return -EOPNOTSUPP
>     BUG_ON ret
> 
> btrfs_discard_extent() should be fine when drive does not support the
> DISCARD operation and filter the EOPNOTSUPP retcode, but currently it
> does this only when some bytes were succesfully discarded.
> 
> Signed-off-by: David Sterba <dsterba@suse.cz>
> CC: stable@kernel.org
> ---
>  fs/btrfs/extent-tree.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 9ee6bd5..feab2ab 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1790,7 +1790,7 @@ static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
>  		}
>  		kfree(multi);
>  	}
> -	if (discarded_bytes && ret == -EOPNOTSUPP)
> +	if (ret == -EOPNOTSUPP)
>  		ret = 0;
>  
>  	if (actual_bytes)
> 
Thanks for the fix, I thought EOPNOTSUPP could be useful by the caller that time, and maybe we could do somthing
like remove the discard mount_opt in the fs_info so we can avoid calling it again.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: fix crash when no drive supports DISCARD
  2011-05-18  3:29   ` Li Dongyang
@ 2011-05-20 11:52     ` David Sterba
  2011-05-20 14:15       ` Josef Bacik
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2011-05-20 11:52 UTC (permalink / raw)
  To: Li Dongyang; +Cc: David Sterba, chris.mason, linux-btrfs, stable

On Wed, May 18, 2011 at 11:29:14AM +0800, Li Dongyang wrote:
> Thanks for the fix, I thought EOPNOTSUPP could be useful by the caller
> that time, and maybe we could do somthing like remove the discard
> mount_opt in the fs_info so we can avoid calling it again.

I do not agree that discard should be removed from mount_opt, because
one may add TRIM-capable devices later, it's a whole filesystem option,
while. A disappeared discard mount option will probably cause panic
in administrator's head.

However, if a drive does not support TRIM, the btrfs_issue_discard calls
can take a shortcut and do not call up to blkdev_issue_discard (though
it does return immediatelly), caching the state after first call. But
this is matter of the lower level call (blkdev) and should not be
propagated beyond to the extent "level" (ie. btrfs_discard_extent).


david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: fix crash when no drive supports DISCARD
  2011-05-20 11:52     ` David Sterba
@ 2011-05-20 14:15       ` Josef Bacik
  2011-05-21  8:26         ` Li Dongyang
  0 siblings, 1 reply; 7+ messages in thread
From: Josef Bacik @ 2011-05-20 14:15 UTC (permalink / raw)
  To: Li Dongyang, David Sterba, chris.mason, linux-btrfs, stable

On 05/20/2011 07:52 AM, David Sterba wrote:
> On Wed, May 18, 2011 at 11:29:14AM +0800, Li Dongyang wrote:
>> Thanks for the fix, I thought EOPNOTSUPP could be useful by the caller
>> that time, and maybe we could do somthing like remove the discard
>> mount_opt in the fs_info so we can avoid calling it again.
> 
> I do not agree that discard should be removed from mount_opt, because
> one may add TRIM-capable devices later, it's a whole filesystem option,
> while. A disappeared discard mount option will probably cause panic
> in administrator's head.
> 
> However, if a drive does not support TRIM, the btrfs_issue_discard calls
> can take a shortcut and do not call up to blkdev_issue_discard (though
> it does return immediatelly), caching the state after first call. But
> this is matter of the lower level call (blkdev) and should not be
> propagated beyond to the extent "level" (ie. btrfs_discard_extent).
> 

There ought to just be a flag added to btrfs_device to say whether or
not it supports discard, and if we get back EOPNOTSUPP we stop putting
discards down on that device, that way if we have some devices that do
it and some that don't (like for instance if we do that tiered caching
with ssd's thing) we can make sure discard is actually done on drives
that care about it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] btrfs: fix crash when no drive supports DISCARD
  2011-05-20 14:15       ` Josef Bacik
@ 2011-05-21  8:26         ` Li Dongyang
  0 siblings, 0 replies; 7+ messages in thread
From: Li Dongyang @ 2011-05-21  8:26 UTC (permalink / raw)
  To: Josef Bacik; +Cc: David Sterba, chris.mason, linux-btrfs, stable

On Friday, May 20, 2011 10:15:46 PM Josef Bacik wrote:
> On 05/20/2011 07:52 AM, David Sterba wrote:
> > On Wed, May 18, 2011 at 11:29:14AM +0800, Li Dongyang wrote:
> >> Thanks for the fix, I thought EOPNOTSUPP could be useful by the caller
> >> that time, and maybe we could do somthing like remove the discard
> >> mount_opt in the fs_info so we can avoid calling it again.
> > 
> > I do not agree that discard should be removed from mount_opt, because
> > one may add TRIM-capable devices later, it's a whole filesystem option,
> > while. A disappeared discard mount option will probably cause panic
> > in administrator's head.
> > 
> > However, if a drive does not support TRIM, the btrfs_issue_discard calls
> > can take a shortcut and do not call up to blkdev_issue_discard (though
> > it does return immediatelly), caching the state after first call. But
> > this is matter of the lower level call (blkdev) and should not be
> > propagated beyond to the extent "level" (ie. btrfs_discard_extent).
> > 
> 
> There ought to just be a flag added to btrfs_device to say whether or
> not it supports discard, and if we get back EOPNOTSUPP we stop putting
> discards down on that device, that way if we have some devices that do
> it and some that don't (like for instance if we do that tiered caching
> with ssd's thing) we can make sure discard is actually done on drives
> that care about it.  Thanks,
> 
> Josef
> 
This should be the best way. but if we do not export the EOPNOTSUPP to callers
of btrfs_discard_extent, we will get no errors if we run fstrim on a btrfs which
all the devices don't have trim, I think this is not what we want.

Before commit 5378e60734f5b7bfe1b43dc191aaf6131c1befe7, btrfs_discard_extent
will only return the error of btrfs_map_block,so I think we should move the BUG_ONs in
to btrfs_discard_extent from the callers as actually they are testing the result of
btrfs_map_block.

Thanks
Li Dongyang

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-05-21  8:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-17 15:27 Drive without DISCARD support crashes btrfs David Sterba
2011-05-17 16:00 ` [PATCH] btrfs: fix crash when no drive supports DISCARD David Sterba
2011-05-17 18:49   ` Chris Mason
2011-05-18  3:29   ` Li Dongyang
2011-05-20 11:52     ` David Sterba
2011-05-20 14:15       ` Josef Bacik
2011-05-21  8:26         ` Li Dongyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.