linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Intermittent no space errors
@ 2010-07-26 21:09 Dave Cundiff
  2010-07-27 13:19 ` Yan, Zheng 
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Cundiff @ 2010-07-26 21:09 UTC (permalink / raw)
  To: linux-btrfs

Hello,

On 2.6.35-rc5 I'm seeing some weird behavior under heavy IO loads. I
have a backup process that fires up several rsync processes. These
mirror several dozen servers to individual sub-volumes. Everyday I
snapshot each sub-volume and rsync over it.

The problem I'm seeing is my rsync processes are failing randomly with
"No space left on device". This is a 6 Terabyte volume with plenty of
free space.

Mount options:
/dev/sdb on /backups type btrfs (rw,max_inline=0,compress)

[root@rsync1 ~]# btrfs filesystem df /backups/
Data: total=1.88TB, used=1.88TB
Metadata: total=43.38GB, used=32.06GB
System: total=12.00MB, used=260.00KB

[root@rsync1 ~]# df /dev/sdb
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb             5781249024 2087273084 3693975940  37% /backups

They don't all fail at once. Normally I have 4-5 running at a time and
1 or 2 will drop out with a no space error. The rest continue on. I've
noticed it will generally occur on ones that are in the middle of
transferring a very large file. If I lighten the load to one rsync at
a time it appears to happen less frequently.

Any known issues I should be aware of?

Thanks,

-- 
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Intermittent no space errors
  2010-07-26 21:09 Intermittent no space errors Dave Cundiff
@ 2010-07-27 13:19 ` Yan, Zheng 
  2010-07-27 20:30   ` Dave Cundiff
  0 siblings, 1 reply; 8+ messages in thread
From: Yan, Zheng  @ 2010-07-27 13:19 UTC (permalink / raw)
  To: Dave Cundiff; +Cc: linux-btrfs

On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff <syshackmin@gmail.com> wr=
ote:
> Hello,
>
> On 2.6.35-rc5 I'm seeing some weird behavior under heavy IO loads. I
> have a backup process that fires up several rsync processes. These
> mirror several dozen servers to individual sub-volumes. Everyday I
> snapshot each sub-volume and rsync over it.
>
> The problem I'm seeing is my rsync processes are failing randomly wit=
h
> "No space left on device". This is a 6 Terabyte volume with plenty of
> free space.
>
> Mount options:
> /dev/sdb on /backups type btrfs (rw,max_inline=3D0,compress)
>
> [root@rsync1 ~]# btrfs filesystem df /backups/
> Data: total=3D1.88TB, used=3D1.88TB
> Metadata: total=3D43.38GB, used=3D32.06GB
> System: total=3D12.00MB, used=3D260.00KB
>
> [root@rsync1 ~]# df /dev/sdb
> Filesystem =A0 =A0 =A0 =A0 =A0 1K-blocks =A0 =A0 =A0Used Available Us=
e% Mounted on
> /dev/sdb =A0 =A0 =A0 =A0 =A0 =A0 5781249024 2087273084 3693975940 =A0=
37% /backups
>
> They don't all fail at once. Normally I have 4-5 running at a time an=
d
> 1 or 2 will drop out with a no space error. The rest continue on. I'v=
e
> noticed it will generally occur on ones that are in the middle of
> transferring a very large file. If I lighten the load to one rsync at
> a time it appears to happen less frequently.
>
> Any known issues I should be aware of?
>

Thank you for reporting this. I will dig in.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Intermittent no space errors
  2010-07-27 13:19 ` Yan, Zheng 
@ 2010-07-27 20:30   ` Dave Cundiff
  2010-07-28  0:31     ` Yan, Zheng 
  2010-07-29  8:10     ` Justin Ossevoort
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Cundiff @ 2010-07-27 20:30 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I installed the git repo kernel and added some debug to the ENOSPC
returns. Unfortunately its still failing. If it helps any its bombing
out in btrfs_check_data_free_space() in extent-tree.c. Returning on
the ENOSPC at line 2959.

Unfortunately that is the extent of my ability to debug a filesystem. :=
P

Thanks,

On Tue, Jul 27, 2010 at 9:19 AM, Yan, Zheng <yanzheng@21cn.com> wrote:
> On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff <syshackmin@gmail.com> =
wrote:
>> Hello,
>>
>> On 2.6.35-rc5 I'm seeing some weird behavior under heavy IO loads. I
>> have a backup process that fires up several rsync processes. These
>> mirror several dozen servers to individual sub-volumes. Everyday I
>> snapshot each sub-volume and rsync over it.
>>
>> The problem I'm seeing is my rsync processes are failing randomly wi=
th
>> "No space left on device". This is a 6 Terabyte volume with plenty o=
f
>> free space.
>>
>> Mount options:
>> /dev/sdb on /backups type btrfs (rw,max_inline=3D0,compress)
>>
>> [root@rsync1 ~]# btrfs filesystem df /backups/
>> Data: total=3D1.88TB, used=3D1.88TB
>> Metadata: total=3D43.38GB, used=3D32.06GB
>> System: total=3D12.00MB, used=3D260.00KB
>>
>> [root@rsync1 ~]# df /dev/sdb
>> Filesystem =A0 =A0 =A0 =A0 =A0 1K-blocks =A0 =A0 =A0Used Available U=
se% Mounted on
>> /dev/sdb =A0 =A0 =A0 =A0 =A0 =A0 5781249024 2087273084 3693975940 =A0=
37% /backups
>>
>> They don't all fail at once. Normally I have 4-5 running at a time a=
nd
>> 1 or 2 will drop out with a no space error. The rest continue on. I'=
ve
>> noticed it will generally occur on ones that are in the middle of
>> transferring a very large file. If I lighten the load to one rsync a=
t
>> a time it appears to happen less frequently.
>>
>> Any known issues I should be aware of?
>>
>
> Thank you for reporting this. I will dig in.
>
> Yan, Zheng
>



--=20
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Intermittent no space errors
  2010-07-27 20:30   ` Dave Cundiff
@ 2010-07-28  0:31     ` Yan, Zheng 
  2010-08-04  0:24       ` Simon Kirby
  2010-07-29  8:10     ` Justin Ossevoort
  1 sibling, 1 reply; 8+ messages in thread
From: Yan, Zheng  @ 2010-07-28  0:31 UTC (permalink / raw)
  To: Dave Cundiff; +Cc: linux-btrfs

On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff <syshackmin@gmail.com> wrote:
> Hello,
>
> I installed the git repo kernel and added some debug to the ENOSPC
> returns. Unfortunately its still failing. If it helps any its bombing
> out in btrfs_check_data_free_space() in extent-tree.c. Returning on
> the ENOSPC at line 2959.
>
> Unfortunately that is the extent of my ability to debug a filesystem. :P
>

This is really helpful, thank you very much.

Yan, Zheng

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Intermittent no space errors
  2010-07-27 20:30   ` Dave Cundiff
  2010-07-28  0:31     ` Yan, Zheng 
@ 2010-07-29  8:10     ` Justin Ossevoort
  1 sibling, 0 replies; 8+ messages in thread
From: Justin Ossevoort @ 2010-07-29  8:10 UTC (permalink / raw)
  To: linux-btrfs

Hello,

Pherhaps it would be a good idea to add a tracepoint before each return
ENOSPC? It shouldn't matter too much since a reasonable assumption would
be that filesystems aren't running out of space most of the time. And
you can use 'perf' for more insight in these cases without recompiling
the kernel.

Regards,

    justin....

On 27/07/10 22:30, Dave Cundiff wrote:
> Hello,
>
> I installed the git repo kernel and added some debug to the ENOSPC
> returns. Unfortunately its still failing. If it helps any its bombing
> out in btrfs_check_data_free_space() in extent-tree.c. Returning on
> the ENOSPC at line 2959.
>
> Unfortunately that is the extent of my ability to debug a filesystem. :P
>
> Thanks,
>
> On Tue, Jul 27, 2010 at 9:19 AM, Yan, Zheng <yanzheng@21cn.com> wrote:
>   
>> On Tue, Jul 27, 2010 at 5:09 AM, Dave Cundiff <syshackmin@gmail.com> wrote:
>>     
>>> Hello,
>>>
>>> On 2.6.35-rc5 I'm seeing some weird behavior under heavy IO loads. I
>>> have a backup process that fires up several rsync processes. These
>>> mirror several dozen servers to individual sub-volumes. Everyday I
>>> snapshot each sub-volume and rsync over it.
>>>
>>> The problem I'm seeing is my rsync processes are failing randomly with
>>> "No space left on device". This is a 6 Terabyte volume with plenty of
>>> free space.
>>>
>>> Mount options:
>>> /dev/sdb on /backups type btrfs (rw,max_inline=0,compress)
>>>
>>> [root@rsync1 ~]# btrfs filesystem df /backups/
>>> Data: total=1.88TB, used=1.88TB
>>> Metadata: total=43.38GB, used=32.06GB
>>> System: total=12.00MB, used=260.00KB
>>>
>>> [root@rsync1 ~]# df /dev/sdb
>>> Filesystem           1K-blocks      Used Available Use% Mounted on
>>> /dev/sdb             5781249024 2087273084 3693975940  37% /backups
>>>
>>> They don't all fail at once. Normally I have 4-5 running at a time and
>>> 1 or 2 will drop out with a no space error. The rest continue on. I've
>>> noticed it will generally occur on ones that are in the middle of
>>> transferring a very large file. If I lighten the load to one rsync at
>>> a time it appears to happen less frequently.
>>>
>>> Any known issues I should be aware of?
>>>
>>>       
>> Thank you for reporting this. I will dig in.
>>
>> Yan, Zheng
>>
>>     
>
>
>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Intermittent no space errors
  2010-07-28  0:31     ` Yan, Zheng 
@ 2010-08-04  0:24       ` Simon Kirby
  2010-08-04 11:21         ` Yan, Zheng 
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Kirby @ 2010-08-04  0:24 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: Dave Cundiff, linux-btrfs

On Wed, Jul 28, 2010 at 08:31:10AM +0800, Yan, Zheng  wrote:

> On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff <syshackmin@gmail.com> wrote:
> > Hello,
> >
> > I installed the git repo kernel and added some debug to the ENOSPC
> > returns. Unfortunately its still failing. If it helps any its bombing
> > out in btrfs_check_data_free_space() in extent-tree.c. Returning on
> > the ENOSPC at line 2959.
> >
> > Unfortunately that is the extent of my ability to debug a filesystem. :P
> 
> This is really helpful, thank you very much.

We're seeing this too, since upgrading from 2.6.33.2 + merged old git btrfs
unstable HEAD to plain 2.6.35.

[sroot@backup01:.../.rmagic]# rm *
rm: cannot remove `WEEKLY_bar3d.png': No space left on device
rm: cannot remove `WEEKLY.html': No space left on device
rm: cannot remove `YEARLY_bar3d.png': No space left on device
rm: cannot remove `YEARLY.html': No space left on device
[sroot@backup01:.../.rmagic]# l
total 25
drwxr-xr-x 1 1037300 1037300   108 2010-08-03 18:19 ./
drwxr-xr-x 1 1037300 1037300    28 2009-11-07 05:57 ../
-rw-r--r-- 1 1037300 1037300  5720 2010-04-23 13:39 WEEKLY_bar3d.png
-rw-r--r-- 1 1037300 1037300 11882 2010-04-23 13:39 WEEKLY.html
-rw-r--r-- 1 1037300 1037300  2998 2010-04-23 13:39 YEARLY_bar3d.png
-rw-r--r-- 1 1037300 1037300  3016 2010-04-23 13:39 YEARLY.html
[sroot@backup01:.../.rmagic]# rm *
rm: cannot remove `WEEKLY_bar3d.png': No space left on device
rm: cannot remove `YEARLY.html': No space left on device
[sroot@backup01:.../.rmagic]# rm *
rm: cannot remove `WEEKLY_bar3d.png': No space left on device
rm: cannot remove `YEARLY.html': No space left on device
[sroot@backup01:.../.rmagic]# rm *
[sroot@backup01:.../.rmagic]#

[sroot@backup01:/root]# btrfs filesystem df /backup/bu001/vol04
Data: total=2.55TB, used=2.20TB
Metadata: total=230.13GB, used=183.83GB
System: total=12.00MB, used=548.00KB
[sroot@backup01:/root]# df -P /backup/bu001/vol04
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/mapper/bu001-vol04 3221225472 2742785400 478440072      86% /backup/bu001/vol04
[sroot@backup01:/root]# l /dev/mapper/bu001-vol04
brw-rw---- 1 root disk 252, 10 2010-08-03 16:02 /dev/mapper/bu001-vol04
[sroot@backup01:/root]# btrfs filesystem show /dev/dm-10
failed to read /dev/sr0
Label: none  uuid: 0c55f5f4-b618-4ec2-9dbc-e3e70a901e1a
        Total devices 1 FS bytes used 2.37TB
        devid    1 size 3.00TB used 3.00TB path /dev/dm-10

Btrfs Btrfs v0.19

We're also seeing things like this in dmesg, which appear to be coming
from btrfs-cleaner cleaning some old snapshot:

Aug  3 18:40:45 backup01 kernel: ------------[ cut here ]------------
Aug  3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180()
Aug  3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950
Aug  3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
Aug  3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G        W   2.6.35-hw #1
Aug  3 18:40:45 backup01 kernel: Call Trace:
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180
Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d5ea1>] btrfs_should_end_transaction+0x61/0x90
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c842d>] btrfs_drop_snapshot+0x21d/0x5f0
Aug  3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ? schedule+0x3f2/0x750
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d463a>] btrfs_clean_old_snapshots+0x12a/0x160
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0f00>] cleaner_kthread+0x160/0x190
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ? cleaner_kthread+0x0/0x190
Aug  3 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10
Aug  3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10
Aug  3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45d ]---
Aug  3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved 4497289216 freed 0 78598144
Aug  3 18:40:45 backup01 kernel: ------------[ cut here ]------------
Aug  3 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180()
Aug  3 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950
Aug  3 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
Aug  3 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tainted: G        W   2.6.35-hw #1
Aug  3 18:40:45 backup01 kernel: Call Trace:
Aug  3 18:40:45 backup01 kernel: [<ffffffff812f2eb0>] ? map_extent_buffer+0xb0/0xc0
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180
Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d56fa>] __btrfs_end_transaction+0x19a/0x220
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d578e>] btrfs_end_transaction_throttle+0xe/0x10
Aug  3 18:40:45 backup01 kernel: [<ffffffff812c84f1>] btrfs_drop_snapshot+0x2e1/0x5f0
Aug  3 18:40:45 backup01 kernel: [<ffffffff81662d72>] ? schedule+0x3f2/0x750
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d463a>] btrfs_clean_old_snapshots+0x12a/0x160
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0f00>] cleaner_kthread+0x160/0x190
Aug  3 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ? cleaner_kthread+0x0/0x190
Aug  3 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10
Aug  3 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0
Aug  3 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10
Aug  3 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45e ]---
Aug  3 18:40:45 backup01 kernel: block_rsv size 16194207744 reserved 4497281024 freed 8192 78598144
Aug  3 18:44:44 backup01 kernel: ------------[ cut here ]------------
Aug  3 18:44:44 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180()
Aug  3 18:44:44 backup01 kernel: Hardware name: PowerEdge 1950
Aug  3 18:44:44 backup01 kernel: Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
Aug  3 18:44:44 backup01 kernel: Pid: 7526, comm: btrfs-transacti Tainted: G        W   2.6.35-hw #1
Aug  3 18:44:44 backup01 kernel: Call Trace:
Aug  3 18:44:44 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block_rsv_check+0x151/0x180
Aug  3 18:44:44 backup01 kernel: [<ffffffff8104b270>] warn_slowpath_common+0x80/0xd0
Aug  3 18:44:44 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath_null+0x15/0x20
Aug  3 18:44:44 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_rsv_check+0x151/0x180
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d56fa>] __btrfs_end_transaction+0x19a/0x220
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d579b>] btrfs_end_transaction+0xb/0x10
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d5d63>] btrfs_commit_transaction+0x5c3/0x6a0
Aug  3 18:44:44 backup01 kernel: [<ffffffff810683e0>] ? autoremove_wake_function+0x0/0x40
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d0d90>] transaction_kthread+0x250/0x260
Aug  3 18:44:44 backup01 kernel: [<ffffffff812d0b40>] ? transaction_kthread+0x0/0x260
Aug  3 18:44:44 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/0xb0
Aug  3 18:44:44 backup01 kernel: [<ffffffff8100aca4>] kernel_thread_helper+0x4/0x10
Aug  3 18:44:44 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0/0xb0
Aug  3 18:44:44 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thread_helper+0x0/0x10
Aug  3 18:44:44 backup01 kernel: ---[ end trace cffc4418e2c1f45f ]---
Aug  3 18:44:44 backup01 kernel: block_rsv size 16194207744 reserved 4522696704 freed 53190656 0

I rebuilt with the #if 0 changed to #if 1 on extent-tree.c:2947:

#if 1 /* I hope we never need this code again, just in case */

ha! :)  "rm" is succeeding everywhere so far, and so this path hasn't
been hit yet.  Perhaps it has to fight with the btrfs-cleaner, or
something.  Will post a follow-up later.

Simon-

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Intermittent no space errors
  2010-08-04  0:24       ` Simon Kirby
@ 2010-08-04 11:21         ` Yan, Zheng 
  2010-08-09 23:25           ` Simon Kirby
  0 siblings, 1 reply; 8+ messages in thread
From: Yan, Zheng  @ 2010-08-04 11:21 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Dave Cundiff, linux-btrfs

On Wed, Aug 4, 2010 at 8:24 AM, Simon Kirby <sim@hostway.ca> wrote:
> On Wed, Jul 28, 2010 at 08:31:10AM +0800, Yan, Zheng =A0wrote:
>
>> On Wed, Jul 28, 2010 at 4:30 AM, Dave Cundiff <syshackmin@gmail.com>=
 wrote:
>> > Hello,
>> >
>> > I installed the git repo kernel and added some debug to the ENOSPC
>> > returns. Unfortunately its still failing. If it helps any its bomb=
ing
>> > out in btrfs_check_data_free_space() in extent-tree.c. Returning o=
n
>> > the ENOSPC at line 2959.
>> >
>> > Unfortunately that is the extent of my ability to debug a filesyst=
em. :P
>>
>> This is really helpful, thank you very much.
>
> We're seeing this too, since upgrading from 2.6.33.2 + merged old git=
 btrfs
> unstable HEAD to plain 2.6.35.
>
> [sroot@backup01:.../.rmagic]# rm *
> rm: cannot remove `WEEKLY_bar3d.png': No space left on device
> rm: cannot remove `WEEKLY.html': No space left on device
> rm: cannot remove `YEARLY_bar3d.png': No space left on device
> rm: cannot remove `YEARLY.html': No space left on device
> [sroot@backup01:.../.rmagic]# l
> total 25
> drwxr-xr-x 1 1037300 1037300 =A0 108 2010-08-03 18:19 ./
> drwxr-xr-x 1 1037300 1037300 =A0 =A028 2009-11-07 05:57 ../
> -rw-r--r-- 1 1037300 1037300 =A05720 2010-04-23 13:39 WEEKLY_bar3d.pn=
g
> -rw-r--r-- 1 1037300 1037300 11882 2010-04-23 13:39 WEEKLY.html
> -rw-r--r-- 1 1037300 1037300 =A02998 2010-04-23 13:39 YEARLY_bar3d.pn=
g
> -rw-r--r-- 1 1037300 1037300 =A03016 2010-04-23 13:39 YEARLY.html
> [sroot@backup01:.../.rmagic]# rm *
> rm: cannot remove `WEEKLY_bar3d.png': No space left on device
> rm: cannot remove `YEARLY.html': No space left on device
> [sroot@backup01:.../.rmagic]# rm *
> rm: cannot remove `WEEKLY_bar3d.png': No space left on device
> rm: cannot remove `YEARLY.html': No space left on device
> [sroot@backup01:.../.rmagic]# rm *
> [sroot@backup01:.../.rmagic]#
>
> [sroot@backup01:/root]# btrfs filesystem df /backup/bu001/vol04
> Data: total=3D2.55TB, used=3D2.20TB
> Metadata: total=3D230.13GB, used=3D183.83GB
> System: total=3D12.00MB, used=3D548.00KB
> [sroot@backup01:/root]# df -P /backup/bu001/vol04
> Filesystem =A0 =A0 =A0 =A0 1024-blocks =A0 =A0 =A0Used Available Capa=
city Mounted on
> /dev/mapper/bu001-vol04 3221225472 2742785400 478440072 =A0 =A0 =A086=
% /backup/bu001/vol04
> [sroot@backup01:/root]# l /dev/mapper/bu001-vol04
> brw-rw---- 1 root disk 252, 10 2010-08-03 16:02 /dev/mapper/bu001-vol=
04
> [sroot@backup01:/root]# btrfs filesystem show /dev/dm-10
> failed to read /dev/sr0
> Label: none =A0uuid: 0c55f5f4-b618-4ec2-9dbc-e3e70a901e1a
> =A0 =A0 =A0 =A0Total devices 1 FS bytes used 2.37TB
> =A0 =A0 =A0 =A0devid =A0 =A01 size 3.00TB used 3.00TB path /dev/dm-10
>
> Btrfs Btrfs v0.19
>
> We're also seeing things like this in dmesg, which appear to be comin=
g
> from btrfs-cleaner cleaning some old snapshot:
>
> Aug =A03 18:40:45 backup01 kernel: ------------[ cut here ]----------=
--
> Aug =A03 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c=
:3441 btrfs_block_rsv_check+0x151/0x180()
> Aug =A03 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950
> Aug =A03 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ip=
mi_si ipmi_msghandler aoe bnx2
> Aug =A03 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tai=
nted: G =A0 =A0 =A0 =A0W =A0 2.6.35-hw #1
> Aug =A03 18:40:45 backup01 kernel: Call Trace:
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block=
_rsv_check+0x151/0x180
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff8104b270>] warn_slowpath=
_common+0x80/0xd0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath=
_null+0x15/0x20
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_r=
sv_check+0x151/0x180
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d5ea1>] btrfs_should_=
end_transaction+0x61/0x90
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812c842d>] btrfs_drop_sn=
apshot+0x21d/0x5f0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff81662d72>] ? schedule+0x=
3f2/0x750
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d463a>] btrfs_clean_o=
ld_snapshots+0x12a/0x160
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d0f00>] cleaner_kthre=
ad+0x160/0x190
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ? cleaner_kth=
read+0x0/0x190
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/=
0xb0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff8100aca4>] kernel_thread=
_helper+0x4/0x10
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0=
/0xb0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thre=
ad_helper+0x0/0x10
> Aug =A03 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45d ]-=
--
> Aug =A03 18:40:45 backup01 kernel: block_rsv size 16194207744 reserve=
d 4497289216 freed 0 78598144
> Aug =A03 18:40:45 backup01 kernel: ------------[ cut here ]----------=
--
> Aug =A03 18:40:45 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c=
:3441 btrfs_block_rsv_check+0x151/0x180()
> Aug =A03 18:40:45 backup01 kernel: Hardware name: PowerEdge 1950
> Aug =A03 18:40:45 backup01 kernel: Modules linked in: ipmi_devintf ip=
mi_si ipmi_msghandler aoe bnx2
> Aug =A03 18:40:45 backup01 kernel: Pid: 7525, comm: btrfs-cleaner Tai=
nted: G =A0 =A0 =A0 =A0W =A0 2.6.35-hw #1
> Aug =A03 18:40:45 backup01 kernel: Call Trace:
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812f2eb0>] ? map_extent_=
buffer+0xb0/0xc0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block=
_rsv_check+0x151/0x180
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff8104b270>] warn_slowpath=
_common+0x80/0xd0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath=
_null+0x15/0x20
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_r=
sv_check+0x151/0x180
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d56fa>] __btrfs_end_t=
ransaction+0x19a/0x220
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d578e>] btrfs_end_tra=
nsaction_throttle+0xe/0x10
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812c84f1>] btrfs_drop_sn=
apshot+0x2e1/0x5f0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff81662d72>] ? schedule+0x=
3f2/0x750
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d463a>] btrfs_clean_o=
ld_snapshots+0x12a/0x160
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d0f00>] cleaner_kthre=
ad+0x160/0x190
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff812d0da0>] ? cleaner_kth=
read+0x0/0x190
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/=
0xb0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff8100aca4>] kernel_thread=
_helper+0x4/0x10
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0=
/0xb0
> Aug =A03 18:40:45 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thre=
ad_helper+0x0/0x10
> Aug =A03 18:40:45 backup01 kernel: ---[ end trace cffc4418e2c1f45e ]-=
--
> Aug =A03 18:40:45 backup01 kernel: block_rsv size 16194207744 reserve=
d 4497281024 freed 8192 78598144
> Aug =A03 18:44:44 backup01 kernel: ------------[ cut here ]----------=
--
> Aug =A03 18:44:44 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c=
:3441 btrfs_block_rsv_check+0x151/0x180()
> Aug =A03 18:44:44 backup01 kernel: Hardware name: PowerEdge 1950
> Aug =A03 18:44:44 backup01 kernel: Modules linked in: ipmi_devintf ip=
mi_si ipmi_msghandler aoe bnx2
> Aug =A03 18:44:44 backup01 kernel: Pid: 7526, comm: btrfs-transacti T=
ainted: G =A0 =A0 =A0 =A0W =A0 2.6.35-hw #1
> Aug =A03 18:44:44 backup01 kernel: Call Trace:
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff812c56a1>] ? btrfs_block=
_rsv_check+0x151/0x180
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff8104b270>] warn_slowpath=
_common+0x80/0xd0
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff8104b2d5>] warn_slowpath=
_null+0x15/0x20
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff812c56a1>] btrfs_block_r=
sv_check+0x151/0x180
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff812d56fa>] __btrfs_end_t=
ransaction+0x19a/0x220
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff812d579b>] btrfs_end_tra=
nsaction+0xb/0x10
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff812d5d63>] btrfs_commit_=
transaction+0x5c3/0x6a0
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff810683e0>] ? autoremove_=
wake_function+0x0/0x40
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff812d0d90>] transaction_k=
thread+0x250/0x260
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff812d0b40>] ? transaction=
_kthread+0x0/0x260
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff81067ea6>] kthread+0x96/=
0xb0
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff8100aca4>] kernel_thread=
_helper+0x4/0x10
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff81067e10>] ? kthread+0x0=
/0xb0
> Aug =A03 18:44:44 backup01 kernel: [<ffffffff8100aca0>] ? kernel_thre=
ad_helper+0x0/0x10
> Aug =A03 18:44:44 backup01 kernel: ---[ end trace cffc4418e2c1f45f ]-=
--
> Aug =A03 18:44:44 backup01 kernel: block_rsv size 16194207744 reserve=
d 4522696704 freed 53190656 0
>

These warning is because btrfs in 2.6.35 reserves more metadata space
for internal use
than older kernel. Your FS is too full, btrfs can't reserve enough
metadata space.

> I rebuilt with the #if 0 changed to #if 1 on extent-tree.c:2947:
>
> #if 1 /* I hope we never need this code again, just in case */
>
> ha! :) =A0"rm" is succeeding everywhere so far, and so this path hasn=
't
> been hit yet. =A0Perhaps it has to fight with the btrfs-cleaner, or
> something. =A0Will post a follow-up later.

yes, it has to fight with the btrfs-cleaner.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Intermittent no space errors
  2010-08-04 11:21         ` Yan, Zheng 
@ 2010-08-09 23:25           ` Simon Kirby
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Kirby @ 2010-08-09 23:25 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Dave Cundiff, linux-btrfs

On Wed, Aug 04, 2010 at 07:21:00PM +0800, Yan, Zheng  wrote:

> > We're seeing this too, since upgrading from 2.6.33.2 + merged old git btrfs
> > unstable HEAD to plain 2.6.35.
> >
> > [sroot@backup01:.../.rmagic]# rm *
> > rm: cannot remove `WEEKLY_bar3d.png': No space left on device
> > rm: cannot remove `WEEKLY.html': No space left on device
> > rm: cannot remove `YEARLY_bar3d.png': No space left on device
> > rm: cannot remove `YEARLY.html': No space left on device
>...
> > Aug ?3 18:44:44 backup01 kernel: ------------[ cut here ]------------
> > Aug ?3 18:44:44 backup01 kernel: WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x151/0x180()
>...
> 
> These warning is because btrfs in 2.6.35 reserves more metadata space
> for internal use
> than older kernel. Your FS is too full, btrfs can't reserve enough
> metadata space.

Hello!

Is it possible that 2.6.33.2 btrfs has mucked up the on-disk stuff in a
way that causes 2.6.35 to be unhappy?  The file system in question was
reported to be 85% full, according to "df".

In the meantime, we've been having some other problems on 2.6.35; for
example, rsync has been trying to append a block to a file for the past
5 days.  The file system is reported as 45% full:

[sroot@backup01:/root]# df -Pt btrfs /backup/bu000/vol05/
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/mapper/bu000-vol05 3221225472 1429529324 1791696148      45% /backup/bu000/vol05
[sroot@backup01:/root]# btrfs files df /backup/bu000/vol05
Data: total=1.57TB, used=1.31TB
Metadata: total=15.51GB, used=10.48GB
System: total=12.00MB, used=192.00KB

At some point today, the kernel also spat this out:

BUG: soft lockup - CPU#3 stuck for 61s! [rsync:21903]
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
CPU 3
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2

Pid: 21903, comm: rsync Tainted: G        W   2.6.35-hw #2 0NK937/PowerEdge 1950
RIP: 0010:[<ffffffff81101a2d>]  [<ffffffff81101a2d>] iput+0x5d/0x70
RSP: 0018:ffff8802c14abb48  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff8802c14abb58 RCX: 0000000000000003
RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88007c075980
RBP: ffffffff8100a84e R08: 0000000000000001 R09: 8000000000000000
R10: 0000000000000002 R11: 0000000000000000 R12: ffffffffffffff66
R13: ffffffff81af04e0 R14: 0000000000000000 R15: 7fffffffffffffff
FS:  00007fd13bbb06e0(0000) GS:ffff880001cc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002f5a108 CR3: 00000001eb94a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rsync (pid: 21903, threadinfo ffff8802c14aa000, task ffff880080b04b00)
Stack:
ffff88007c075888 ffff88007c0757b0 ffff8802c14abb98 ffffffff812d7439
<0> ffffffff81664cde 0000000000000001 0000000004d80000 0000000000004000
<0> ffff88042a708178 ffff88042a708000 ffff8802c14abc08 ffffffff812c599c
Call Trace:
[<ffffffff812d7439>] ? btrfs_start_one_delalloc_inode+0x129/0x160
[<ffffffff81664cde>] ? _raw_spin_lock+0xe/0x20
[<ffffffff812c599c>] ? shrink_delalloc+0x8c/0x130
[<ffffffff812c5f39>] ? btrfs_delalloc_reserve_metadata+0x189/0x190
[<ffffffff8110180e>] ? file_update_time+0x11e/0x180
[<ffffffff812c5f83>] ? btrfs_delalloc_reserve_space+0x43/0x60
[<ffffffff812e2a98>] ? btrfs_file_aio_write+0x508/0x970
[<ffffffff8100a84e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff810ec1b1>] ? do_sync_write+0xd1/0x120
[<ffffffff810fc767>] ? poll_select_copy_remaining+0xf7/0x140
[<ffffffff810ecd2b>] ? vfs_write+0xcb/0x1a0
[<ffffffff810ecef0>] ? sys_write+0x50/0x90
[<ffffffff81009f02>] ? system_call_fastpath+0x16/0x1b
Code: 00 01 00 00 48 c7 c2 a0 2c 10 81 48 8b 40 30 48 85 c0 74 12 48 8b 50 20 48 c7 c0 a0 2c 10 81 48 85 d2 48 0
Call Trace:
[<ffffffff812d7439>] ? btrfs_start_one_delalloc_inode+0x129/0x160
[<ffffffff81664cde>] ? _raw_spin_lock+0xe/0x20
[<ffffffff812c599c>] ? shrink_delalloc+0x8c/0x130
[<ffffffff812c5f39>] ? btrfs_delalloc_reserve_metadata+0x189/0x190
[<ffffffff8110180e>] ? file_update_time+0x11e/0x180
[<ffffffff812c5f83>] ? btrfs_delalloc_reserve_space+0x43/0x60
[<ffffffff812e2a98>] ? btrfs_file_aio_write+0x508/0x970
[<ffffffff8100a84e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff810ec1b1>] ? do_sync_write+0xd1/0x120
[<ffffffff810fc767>] ? poll_select_copy_remaining+0xf7/0x140
[<ffffffff810ecd2b>] ? vfs_write+0xcb/0x1a0
[<ffffffff810ecef0>] ? sys_write+0x50/0x90
[<ffffffff81009f02>] ? system_call_fastpath+0x16/0x1b

[sroot@backup01:/root]# ls -l /proc/21903/fd/1
lrwx------ 1 root root 64 2010-08-09 18:21 /proc/21903/fd/1 -> /backup/bu000/vol05/vg005_web11_backup/2010-08-04-17-00/64/54/.../customer file.mov.aYX4Js
[sroot@backup01:/root]# ls -lL /proc/21903/fd/1
-rw------- 1 root root 977797120 2010-08-04 20:39 /proc/21903/fd/1
[sroot@backup01:/root]# ps auxw|grep rsync
root     21903 73.2  0.0  12912   192 ?        R    Aug04 5177:08 rsync -aHq --numeric-ids --exclude-from=/etc/backups/backup.exclude --delete --delete-excluded /storage/vg005/web11/64/54/ /backup/bu000/vol05/vg005_web11_backup/2010-08-04-17-00/64/54

In other words, the rsync has made no progress for 5 days (or at least
the mtime hasn't changed since then).

"perf top" still shows output like this, showing that btrfs is trying
to btrfs_find_space_cluster all of the time:

     samples  pcnt function                       DSO
     _______ _____ ______________________________ _________________

     2127.00 11.9% find_next_bit                  [kernel]
     1914.00 10.7% find_next_zero_bit             [kernel]
     1580.00  8.8% schedule                       [kernel]
     1340.00  7.5% btrfs_find_space_cluster       [kernel]
     1238.00  6.9% _raw_spin_lock_irqsave         [kernel]
     1017.00  5.7% _raw_spin_lock                 [kernel]
      662.00  3.7% sched_clock_local              [kernel]
      615.00  3.4% native_read_tsc                [kernel]
      590.00  3.3% _raw_spin_lock_irq             [kernel]
      468.00  2.6% _raw_spin_unlock_irqrestore    [kernel]
      405.00  2.3% schedule_timeout               [kernel]
      338.00  1.9% native_sched_clock             [kernel]
      329.00  1.8% update_curr                    [kernel]
      323.00  1.8% lock_timer_base                [kernel]
      297.00  1.7% btrfs_start_one_delalloc_inode [kernel]
      285.00  1.6% pick_next_task_fair            [kernel]
      267.00  1.5% try_to_del_timer_sync          [kernel]
      248.00  1.4% sched_clock_cpu                [kernel]
      245.00  1.4% deflate_fast                   [kernel]

So, is it possible that the older btrfs code left things in a way that
wouldn't have happened if we had started with 3.6.35 to begin with? 
In the case of the 45% full file system, it seems it should be
able to allocate more space without issue.

Simon-

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-08-09 23:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-26 21:09 Intermittent no space errors Dave Cundiff
2010-07-27 13:19 ` Yan, Zheng 
2010-07-27 20:30   ` Dave Cundiff
2010-07-28  0:31     ` Yan, Zheng 
2010-08-04  0:24       ` Simon Kirby
2010-08-04 11:21         ` Yan, Zheng 
2010-08-09 23:25           ` Simon Kirby
2010-07-29  8:10     ` Justin Ossevoort

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).