All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel error during btrfs balance
@ 2011-01-17 14:14 Erik Logtenberg
  2011-01-17 14:31 ` Erik Logtenberg
  2011-01-18  0:54 ` Yan, Zheng 
  0 siblings, 2 replies; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-17 14:14 UTC (permalink / raw)
  To: linux-btrfs

Hi,

btrfs balance results in:

http://pastebin.com/v5j0809M

My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
balance do useful stuff to my free space:

kernel-2.6.37-2.fc15.x86_64
btrfs-progs-0.19-12.fc14.x86_64

Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
that had failed due to ENOSP).
Up until the crash, btrfs balance did retrieve a couple of Gigs free
space though, so that part of the plan worked just fine.

Thanks,

Erik.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-17 14:14 Kernel error during btrfs balance Erik Logtenberg
@ 2011-01-17 14:31 ` Erik Logtenberg
  2011-01-17 14:37   ` Erik Logtenberg
  2011-01-21  8:50   ` Erik Logtenberg
  2011-01-18  0:54 ` Yan, Zheng 
  1 sibling, 2 replies; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-17 14:31 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

Hi,

Please find attached the error log, for future reference.

Forgot to mention:
I could still use the system after this error, so it was not a complete
fatal error in that regard. All active processes (mostly rsync) were
hanging in state D though, so I couldn't kill them anymore. Also the FS
was not umountable. So I still had to reboot.

Thanks,

Erik.


On 01/17/2011 03:14 PM, Erik Logtenberg wrote:
> Hi,
> 
> btrfs balance results in:
> 
> http://pastebin.com/v5j0809M
> 
> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
> balance do useful stuff to my free space:
> 
> kernel-2.6.37-2.fc15.x86_64
> btrfs-progs-0.19-12.fc14.x86_64
> 
> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
> that had failed due to ENOSP).
> Up until the crash, btrfs balance did retrieve a couple of Gigs free
> space though, so that part of the plan worked just fine.
> 
> Thanks,
> 
> Erik.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: 2011-01-17_btrfs_balance.txt --]
[-- Type: text/plain, Size: 4386 bytes --]

[ 5225.236087] ------------[ cut here ]------------
[ 5225.236112] kernel BUG at fs/btrfs/relocation.c:836!
[ 5225.236131] invalid opcode: 0000 [#1] SMP 
[ 5225.236151] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[ 5225.236178] CPU 0 
[ 5225.236186] Modules linked in: ipt_LOG xt_limit btrfs zlib_deflate libcrc32c sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt tun ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss exportfs nls_utf8 cifs fscache sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm dummy uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm shpchp snd_timer dell_wmi snd soundcore sparse_keymap snd_page_alloc dcdbas i2c_i801 e1000e iTCO_wdt iTCO_vendor_support joydev wmi serio_raw microcode usb_storage uas raid1 pata_acpi ata_generic radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[ 5225.236582] 
[ 5225.236591] Pid: 2934, comm: btrfs Tainted: G          I 2.6.37-2.fc15.x86_64 #1                  
[ 5225.236627] RIP: 0010:[<ffffffffa0565237>]  [<ffffffffa0565237>] build_backref_tree+0x473/0xd6d [btrfs]
[ 5225.236676] RSP: 0018:ffff8801972af9c8  EFLAGS: 00010246
[ 5225.236695] RAX: ffff8800aa9ab880 RBX: ffff8800ac38e300 RCX: 0000000000000040
[ 5225.236720] RDX: 0000000000000030 RSI: 000000254e680000 RDI: ffff88019cfc3020
[ 5225.236745] RBP: ffff8801972afaf8 R08: 0000000000008050 R09: ffff8801972af970
[ 5225.236770] R10: ffff8801972af918 R11: ffff8801972af900 R12: ffff8800aa9ab880
[ 5225.236794] R13: ffff88018b186340 R14: ffff88021fa3bea0 R15: ffff88021fa3bb40
[ 5225.236820] FS:  00007f98476bd760(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
[ 5225.236848] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5225.236869] CR2: 0000000000a0eb44 CR3: 00000001972ab000 CR4: 00000000000426e0
[ 5225.236894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5225.236918] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5225.236943] Process btrfs (pid: 2934, threadinfo ffff8801972ae000, task ffff88019ce7c560)
[ 5225.236971] Stack:
[ 5225.236980]  ffffea0005056ee0 ffffea0000000000 ffff8801972af9f8 ffff8801972afaa8
[ 5225.237002]  0000000000000000 ffff88007b1eae58 ffff88019cfc3008 ffff88019cfc3578
[ 5225.237002]  ffff8801bd12ffe0 ffff88019cfc3020 ffff8800ac38e340 ffff88019cfc3580
[ 5225.237002] Call Trace:
[ 5225.237002]  [<ffffffffa0565c91>] relocate_tree_blocks+0x160/0x478 [btrfs]
[ 5225.237002]  [<ffffffffa056463d>] ? add_tree_block+0x11e/0x13e [btrfs]
[ 5225.237002]  [<ffffffffa0566b45>] relocate_block_group+0x1e3/0x490 [btrfs]
[ 5225.237002]  [<ffffffff8103edb9>] ? should_resched+0xe/0x2e
[ 5225.237002]  [<ffffffffa0566f39>] btrfs_relocate_block_group+0x147/0x28a [btrfs]
[ 5225.237002]  [<ffffffffa054e52a>] btrfs_relocate_chunk.clone.40+0x61/0x4ab [btrfs]
[ 5225.237002]  [<ffffffffa05152d4>] ? btrfs_item_key+0x1e/0x20 [btrfs]
[ 5225.237002]  [<ffffffffa05152f0>] ? btrfs_item_key_to_cpu+0x1a/0x36 [btrfs]
[ 5225.237002]  [<ffffffffa054c2a8>] ? read_extent_buffer+0xc3/0xe3 [btrfs]
[ 5225.237002]  [<ffffffffa05154e6>] ? btrfs_header_nritems.clone.12+0x17/0x1c [btrfs]
[ 5225.237002]  [<ffffffffa054cff6>] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs]
[ 5225.237002]  [<ffffffffa055045e>] btrfs_balance+0x1a3/0x1f0 [btrfs]
[ 5225.237002]  [<ffffffff8112bce5>] ? do_filp_open+0x226/0x5c8
[ 5225.237002]  [<ffffffffa0556773>] btrfs_ioctl+0x641/0x846 [btrfs]
[ 5225.237002]  [<ffffffff811f3ed1>] ? file_has_perm+0xa5/0xc7
[ 5225.237002]  [<ffffffff8112e091>] do_vfs_ioctl+0x4b1/0x4f2
[ 5225.237002]  [<ffffffff8112e128>] sys_ioctl+0x56/0x7a
[ 5225.237002]  [<ffffffff8100acc2>] system_call_fastpath+0x16/0x1b
[ 5225.237002] Code: 48 8b 45 89 49 8d 7d 10 48 8d 75 b0 49 89 44 24 18 8a 43 70 ff c0 41 88 44 24 70 e8 f7 c3 ff ff eb 17 f6 40 71 10 49 89 c4 75 02 <0f> 0b 49 8d 45 10 49 89 45 10 49 89 45 18 48 8b b5 20 ff ff ff 
[ 5225.237002] RIP  [<ffffffffa0565237>] build_backref_tree+0x473/0xd6d [btrfs]
[ 5225.237002]  RSP <ffff8801972af9c8>
[ 5225.247188] ---[ end trace a7919e7f17c0a727 ]---

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-17 14:31 ` Erik Logtenberg
@ 2011-01-17 14:37   ` Erik Logtenberg
  2011-01-17 14:39     ` Erik Logtenberg
  2011-01-21  8:50   ` Erik Logtenberg
  1 sibling, 1 reply; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-17 14:37 UTC (permalink / raw)
  To: linux-btrfs

Hi,

Additionally, I cannot mount the filesystem anymore. mount gives no
error messages but hangs in state D.
dmesg shows:
[  422.323116] btrfs: use compression
Which is a good thing, but it doesn't do anything otherwise.

Thanks,

Erik.


On 01/17/2011 03:31 PM, Erik Logtenberg wrote:
> Hi,
> 
> Please find attached the error log, for future reference.
> 
> Forgot to mention:
> I could still use the system after this error, so it was not a complete
> fatal error in that regard. All active processes (mostly rsync) were
> hanging in state D though, so I couldn't kill them anymore. Also the FS
> was not umountable. So I still had to reboot.
> 
> Thanks,
> 
> Erik.
> 
> 
> On 01/17/2011 03:14 PM, Erik Logtenberg wrote:
>> Hi,
>>
>> btrfs balance results in:
>>
>> http://pastebin.com/v5j0809M
>>
>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
>> balance do useful stuff to my free space:
>>
>> kernel-2.6.37-2.fc15.x86_64
>> btrfs-progs-0.19-12.fc14.x86_64
>>
>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
>> that had failed due to ENOSP).
>> Up until the crash, btrfs balance did retrieve a couple of Gigs free
>> space though, so that part of the plan worked just fine.
>>
>> Thanks,
>>
>> Erik.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-17 14:37   ` Erik Logtenberg
@ 2011-01-17 14:39     ` Erik Logtenberg
  0 siblings, 0 replies; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-17 14:39 UTC (permalink / raw)
  To: linux-btrfs

Hi,

Please disregard that last message, the filesystem did mount after a
period of hanging in state D. Apparently something called an "orphan"
was unlinked:

[  422.323116] btrfs: use compression
[  761.778675] btrfs: unlinked 1 orphans
[  761.841581] SELinux: initialized (dev dm-5, type btrfs), uses xattr

Thanks,

Erik.


On 01/17/2011 03:37 PM, Erik Logtenberg wrote:
> Hi,
> 
> Additionally, I cannot mount the filesystem anymore. mount gives no
> error messages but hangs in state D.
> dmesg shows:
> [  422.323116] btrfs: use compression
> Which is a good thing, but it doesn't do anything otherwise.
> 
> Thanks,
> 
> Erik.
> 
> 
> On 01/17/2011 03:31 PM, Erik Logtenberg wrote:
>> Hi,
>>
>> Please find attached the error log, for future reference.
>>
>> Forgot to mention:
>> I could still use the system after this error, so it was not a complete
>> fatal error in that regard. All active processes (mostly rsync) were
>> hanging in state D though, so I couldn't kill them anymore. Also the FS
>> was not umountable. So I still had to reboot.
>>
>> Thanks,
>>
>> Erik.
>>
>>
>> On 01/17/2011 03:14 PM, Erik Logtenberg wrote:
>>> Hi,
>>>
>>> btrfs balance results in:
>>>
>>> http://pastebin.com/v5j0809M
>>>
>>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
>>> balance do useful stuff to my free space:
>>>
>>> kernel-2.6.37-2.fc15.x86_64
>>> btrfs-progs-0.19-12.fc14.x86_64
>>>
>>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
>>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
>>> that had failed due to ENOSP).
>>> Up until the crash, btrfs balance did retrieve a couple of Gigs free
>>> space though, so that part of the plan worked just fine.
>>>
>>> Thanks,
>>>
>>> Erik.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-17 14:14 Kernel error during btrfs balance Erik Logtenberg
  2011-01-17 14:31 ` Erik Logtenberg
@ 2011-01-18  0:54 ` Yan, Zheng 
  2011-01-18 13:22   ` Erik Logtenberg
  1 sibling, 1 reply; 17+ messages in thread
From: Yan, Zheng  @ 2011-01-18  0:54 UTC (permalink / raw)
  To: Erik Logtenberg; +Cc: linux-btrfs

On Mon, Jan 17, 2011 at 10:14 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:
> Hi,
>
> btrfs balance results in:
>
> http://pastebin.com/v5j0809M
>
> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
> balance do useful stuff to my free space:
>
> kernel-2.6.37-2.fc15.x86_64
> btrfs-progs-0.19-12.fc14.x86_64
>
> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
> that had failed due to ENOSP).
> Up until the crash, btrfs balance did retrieve a couple of Gigs free
> space though, so that part of the plan worked just fine.
>

Please try 2.6.36 kernel.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-18  0:54 ` Yan, Zheng 
@ 2011-01-18 13:22   ` Erik Logtenberg
  2011-01-18 13:58     ` Helmut Hullen
  2011-01-18 14:13     ` Yan, Zheng 
  0 siblings, 2 replies; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-18 13:22 UTC (permalink / raw)
  To: linux-btrfs

On 01/18/2011 01:54 AM, Yan, Zheng wrote:
> On Mon, Jan 17, 2011 at 10:14 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:
>> Hi,
>>
>> btrfs balance results in:
>>
>> http://pastebin.com/v5j0809M
>>
>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
>> balance do useful stuff to my free space:
>>
>> kernel-2.6.37-2.fc15.x86_64
>> btrfs-progs-0.19-12.fc14.x86_64
>>
>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
>> that had failed due to ENOSP).
>> Up until the crash, btrfs balance did retrieve a couple of Gigs free
>> space though, so that part of the plan worked just fine.
>>
> 
> Please try 2.6.36 kernel.

Thanks for your (short) advice. Could you please elaborate. I was in
fact using a 2.6.35.10-74.fc14.x86_64 kernel before, but darkling
adviced me to switch to a newer kernel to reclaim free space by
balancing -- the idea was that newer kernels have better balancing
implementation, more effective at reclaiming free space.

Now your advice is to take a small step back again, from 2.6.37 to
2.6.36 (which is still higher than the 2.6.35 I was using before). Is
that because you think that 2.6.37 may have introduced the bug that I
ran into? Do you think that 2.6.36 is still recent enough to have the
effective balancing so that I will in fact be able to reclaim some free
space? Or is is just a shot in the dark with no reasoning whatsoever ;)

Please don't feel offended, but from your 4-word sentence I really can't
tell.

Thanks,

Erik.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-18 13:22   ` Erik Logtenberg
@ 2011-01-18 13:58     ` Helmut Hullen
  2011-01-18 14:13     ` Yan, Zheng 
  1 sibling, 0 replies; 17+ messages in thread
From: Helmut Hullen @ 2011-01-18 13:58 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Erik,

Du meintest am 18.01.11:

[...]

> Thanks for your (short) advice. Could you please elaborate. I was in
> fact using a 2.6.35.10-74.fc14.x86_64 kernel before,

I had to change from 2.6.35.8 to 2.6.37-rc4 (and now 2.6.37) for  
reliable work.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-18 13:22   ` Erik Logtenberg
  2011-01-18 13:58     ` Helmut Hullen
@ 2011-01-18 14:13     ` Yan, Zheng 
  2011-01-18 14:29       ` Erik Logtenberg
  1 sibling, 1 reply; 17+ messages in thread
From: Yan, Zheng  @ 2011-01-18 14:13 UTC (permalink / raw)
  To: Erik Logtenberg; +Cc: linux-btrfs

On Tue, Jan 18, 2011 at 9:22 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:
> On 01/18/2011 01:54 AM, Yan, Zheng wrote:
>> On Mon, Jan 17, 2011 at 10:14 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:
>>> Hi,
>>>
>>> btrfs balance results in:
>>>
>>> http://pastebin.com/v5j0809M
>>>
>>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
>>> balance do useful stuff to my free space:
>>>
>>> kernel-2.6.37-2.fc15.x86_64
>>> btrfs-progs-0.19-12.fc14.x86_64
>>>
>>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
>>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
>>> that had failed due to ENOSP).
>>> Up until the crash, btrfs balance did retrieve a couple of Gigs free
>>> space though, so that part of the plan worked just fine.
>>>
>>
>> Please try 2.6.36 kernel.
>
> Thanks for your (short) advice. Could you please elaborate. I was in
> fact using a 2.6.35.10-74.fc14.x86_64 kernel before, but darkling
> adviced me to switch to a newer kernel to reclaim free space by
> balancing -- the idea was that newer kernels have better balancing
> implementation, more effective at reclaiming free space.
>
> Now your advice is to take a small step back again, from 2.6.37 to
> 2.6.36 (which is still higher than the 2.6.35 I was using before). Is
> that because you think that 2.6.37 may have introduced the bug that I
> ran into? Do you think that 2.6.36 is still recent enough to have the
> effective balancing so that I will in fact be able to reclaim some free
> space? Or is is just a shot in the dark with no reasoning whatsoever ;)
>
> Please don't feel offended, but from your 4-word sentence I really can't
> tell.
>

Just try narrowing down the bug, because I never saw bug like this before.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-18 14:13     ` Yan, Zheng 
@ 2011-01-18 14:29       ` Erik Logtenberg
  0 siblings, 0 replies; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-18 14:29 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: linux-btrfs

On 01/18/2011 03:13 PM, Yan, Zheng wrote:
> On Tue, Jan 18, 2011 at 9:22 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:
>> On 01/18/2011 01:54 AM, Yan, Zheng wrote:
>>> On Mon, Jan 17, 2011 at 10:14 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:
>>>> Hi,
>>>>
>>>> btrfs balance results in:
>>>>
>>>> http://pastebin.com/v5j0809M
>>>>
>>>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
>>>> balance do useful stuff to my free space:
>>>>
>>>> kernel-2.6.37-2.fc15.x86_64
>>>> btrfs-progs-0.19-12.fc14.x86_64
>>>>
>>>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
>>>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
>>>> that had failed due to ENOSP).
>>>> Up until the crash, btrfs balance did retrieve a couple of Gigs free
>>>> space though, so that part of the plan worked just fine.
>>>>
>>>
>>> Please try 2.6.36 kernel.
>>
>> Thanks for your (short) advice. Could you please elaborate. I was in
>> fact using a 2.6.35.10-74.fc14.x86_64 kernel before, but darkling
>> adviced me to switch to a newer kernel to reclaim free space by
>> balancing -- the idea was that newer kernels have better balancing
>> implementation, more effective at reclaiming free space.
>>
>> Now your advice is to take a small step back again, from 2.6.37 to
>> 2.6.36 (which is still higher than the 2.6.35 I was using before). Is
>> that because you think that 2.6.37 may have introduced the bug that I
>> ran into? Do you think that 2.6.36 is still recent enough to have the
>> effective balancing so that I will in fact be able to reclaim some free
>> space? Or is is just a shot in the dark with no reasoning whatsoever ;)
>>
>> Please don't feel offended, but from your 4-word sentence I really can't
>> tell.
>>
> 
> Just try narrowing down the bug, because I never saw bug like this before.

Okay I can try that. Please note though that I cannot reliably reproduce
the bug. At this moment I am in the middle of my second try at balancing
the FS (still on 2.6.37), this time without 8 rsync's banging on the FS.
So far, everything is completely stable.

I could downgrade to 2.6.36 after this balance and then re-try
balancing, but if this second go doesn't crash like the first try, then
a succesful rebalance on 2.6.36 won't tell us much.

Please note that it could be a combination of bugs. I ran into an
out-of-space issue in the middle of a backup first (at that time on
2.6.35), and also noticed some minor file corruption as a result.
Then I switched over to 2.6.37 to fix the out-of-space issue (as there
should have been 45G free) using a balance. During that balance
operation I then ran in to the bug that I reported in my previous email.

So it could be the 2.6.37 kernel hitting a minor FS corruption caused by
out-of-space issues with the 2.6.35 kernel. I have no idea how I could
reproduce this at all.

Thanks,

Erik.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-17 14:31 ` Erik Logtenberg
  2011-01-17 14:37   ` Erik Logtenberg
@ 2011-01-21  8:50   ` Erik Logtenberg
  2011-01-21  9:19     ` Yan, Zheng 
  1 sibling, 1 reply; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-21  8:50 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I hit the same bug again I think:

[291835.724344] ------------[ cut here ]------------
[291835.724376] kernel BUG at fs/btrfs/relocation.c:836!
[291835.724401] invalid opcode: 0000 [#1] SMP
[291835.724424] last sysfs file:
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[291835.724461] CPU 0
[291835.724472] Modules linked in: uvcvideo snd_usb_audio
snd_usbmidi_lib videodev v4l1_compat snd_rawmidi v4l2_compat_ioctl32
btrfs zlib_deflate libcrc32c sha256_generic cryptd aes_x86_64
aes_generic cbc dm_crypt tun ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss
exportfs nls_utf8 cifs fscache sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
ip6table_filter ip6_tables ipv6 kvm_intel kvm dummy uinput
snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq
snd_seq_device e1000e snd_pcm snd_timer i2c_i801 snd shpchp iTCO_wdt
iTCO_vendor_support soundcore dell_wmi sparse_keymap snd_page_alloc
serio_raw joydev wmi dcdbas microcode usb_storage uas raid1 pata_acpi
ata_generic radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last
unloaded: scsi_wait_scan]
[291835.725002]
[291835.725013] Pid: 27386, comm: btrfs Tainted: G          I
2.6.37-2.fc15.x86_64 #1
[291835.725062] RIP: 0010:[<ffffffffa0565237>]  [<ffffffffa0565237>]
build_backref_tree+0x473/0xd6d [btrfs]
[291835.725126] RSP: 0018:ffff8800373bf9c8  EFLAGS: 00010246
[291835.725152] RAX: ffff8801367d5100 RBX: ffff88020b110880 RCX:
0000000000000040
[291835.725186] RDX: 0000000000000030 RSI: 0000006dd08d3000 RDI:
ffff880100069820
[291835.725219] RBP: ffff8800373bfaf8 R08: 0000000000008050 R09:
ffff8800373bf980
[291835.725253] R10: ffff8800373bf918 R11: ffff88020b110880 R12:
ffff8801367d5100
[291835.725254] R13: ffff88012c0a24c0 R14: ffff88021e2013f0 R15:
ffff88021e201cf0
[291835.725254] FS:  00007fcb1a6cc760(0000) GS:ffff8800bfa00000(0000)
knlGS:0000000000000000
[291835.725254] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[291835.725254] CR2: 0000000002feeeb8 CR3: 00000001c2943000 CR4:
00000000000426e0
[291835.725254] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[291835.725254] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[291835.725254] Process btrfs (pid: 27386, threadinfo ffff8800373be000,
task ffff88022452ae40)
[291835.725254] Stack:
[291835.725254]  ffffea0004b5a470 ffffea0000000000 ffff8800373bf9f8
ffff8800373bfaa8
[291835.725254]  0000000000000000 ffff88005faafbb0 ffff880100069808
ffff880100069d78
[291835.725254]  ffff88012c0a2aa0 ffff880100069820 ffff88020b1108c0
ffff880100069d80
[291835.725254] Call Trace:
[291835.725254]  [<ffffffffa0565c91>] relocate_tree_blocks+0x160/0x478
[btrfs]
[291835.725254]  [<ffffffffa056463d>] ? add_tree_block+0x11e/0x13e [btrfs]
[291835.725254]  [<ffffffffa0566b45>] relocate_block_group+0x1e3/0x490
[btrfs]
[291835.725254]  [<ffffffff8103edb9>] ? should_resched+0xe/0x2e
[291835.725254]  [<ffffffffa0566f39>]
btrfs_relocate_block_group+0x147/0x28a [btrfs]
[291835.725254]  [<ffffffffa054e52a>]
btrfs_relocate_chunk.clone.40+0x61/0x4ab [btrfs]
[291835.725254]  [<ffffffffa05152d4>] ? btrfs_item_key+0x1e/0x20 [btrfs]
[291835.725254]  [<ffffffffa05152f0>] ? btrfs_item_key_to_cpu+0x1a/0x36
[btrfs]
[291835.725254]  [<ffffffffa054c2a8>] ? read_extent_buffer+0xc3/0xe3 [btrfs]
[291835.725254]  [<ffffffffa05154e6>] ?
btrfs_header_nritems.clone.12+0x17/0x1c [btrfs]
[291835.725254]  [<ffffffffa054cff6>] ? btrfs_item_key_to_cpu+0x2a/0x46
[btrfs]
[291835.725254]  [<ffffffffa055045e>] btrfs_balance+0x1a3/0x1f0 [btrfs]
[291835.725254]  [<ffffffff8112bce5>] ? do_filp_open+0x226/0x5c8
[291835.725254]  [<ffffffffa0556773>] btrfs_ioctl+0x641/0x846 [btrfs]
[291835.725254]  [<ffffffff811f3ed1>] ? file_has_perm+0xa5/0xc7
[291835.725254]  [<ffffffff8112e091>] do_vfs_ioctl+0x4b1/0x4f2
[291835.725254]  [<ffffffff8112e128>] sys_ioctl+0x56/0x7a
[291835.725254]  [<ffffffff8100acc2>] system_call_fastpath+0x16/0x1b
[291835.725254] Code: 48 8b 45 89 49 8d 7d 10 48 8d 75 b0 49 89 44 24 18
8a 43 70 ff c0 41 88 44 24 70 e8 f7 c3 ff ff eb 17 f6 40 71 10 49 89 c4
75 02 <0f> 0b 49 8d 45 10 49 89 45 10 49 89 45 18 48 8b b5 20 ff ff ff
[291835.725254] RIP  [<ffffffffa0565237>] build_backref_tree+0x473/0xd6d
[btrfs]
[291835.725254]  RSP <ffff8800373bf9c8>
[291835.738971] ---[ end trace a7919e7f17c0a727 ]---


It is really difficult to reproduce this bug. This time, I was balancing
a 300GB volume, which was almost finished by the time it crashed. It had
been running for 2 days straight, and survived a complete backup run,
with 5 simultaneous rsyncs running on it. Last night when the rsyncs
kicked in, it crashed within half an hour though.

I will now try downgrading to 2.6.36 as per Zheng Yan's suggestion.

Thanks,

Erik.


Op 17-1-2011 15:31, Erik Logtenberg schreef:
> Hi,
> 
> Please find attached the error log, for future reference.
> 
> Forgot to mention:
> I could still use the system after this error, so it was not a complete
> fatal error in that regard. All active processes (mostly rsync) were
> hanging in state D though, so I couldn't kill them anymore. Also the FS
> was not umountable. So I still had to reboot.
> 
> Thanks,
> 
> Erik.
> 
> 
> On 01/17/2011 03:14 PM, Erik Logtenberg wrote:
>> Hi,
>>
>> btrfs balance results in:
>>
>> http://pastebin.com/v5j0809M
>>
>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
>> balance do useful stuff to my free space:
>>
>> kernel-2.6.37-2.fc15.x86_64
>> btrfs-progs-0.19-12.fc14.x86_64
>>
>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
>> that had failed due to ENOSP).
>> Up until the crash, btrfs balance did retrieve a couple of Gigs free
>> space though, so that part of the plan worked just fine.
>>
>> Thanks,
>>
>> Erik.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-21  8:50   ` Erik Logtenberg
@ 2011-01-21  9:19     ` Yan, Zheng 
  2011-01-26  9:04       ` Erik Logtenberg
  0 siblings, 1 reply; 17+ messages in thread
From: Yan, Zheng  @ 2011-01-21  9:19 UTC (permalink / raw)
  To: Erik Logtenberg; +Cc: linux-btrfs

please try patch attached below, Thanks.

---
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index b37d723..49d6b13 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1158,6 +1158,7 @@ static int clone_backref_node(struct
btrfs_trans_handle *trans,
 	new_node->bytenr =3D dest->node->start;
 	new_node->level =3D node->level;
 	new_node->lowest =3D node->lowest;
+	new_node->checked =3D 1;
 	new_node->root =3D dest;

 	if (!node->lowest) {
---


On Fri, Jan 21, 2011 at 4:50 PM, Erik Logtenberg <erik@logtenberg.eu> w=
rote:
> Hi,
>
> I hit the same bug again I think:
>
> [291835.724344] ------------[ cut here ]------------
> [291835.724376] kernel BUG at fs/btrfs/relocation.c:836!
> [291835.724401] invalid opcode: 0000 [#1] SMP
> [291835.724424] last sysfs file:
> /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
> [291835.724461] CPU 0
> [291835.724472] Modules linked in: uvcvideo snd_usb_audio
> snd_usbmidi_lib videodev v4l1_compat snd_rawmidi v4l2_compat_ioctl32
> btrfs zlib_deflate libcrc32c sha256_generic cryptd aes_x86_64
> aes_generic cbc dm_crypt tun ebtable_nat ebtables ipt_MASQUERADE
> iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss
> exportfs nls_utf8 cifs fscache sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> ip6table_filter ip6_tables ipv6 kvm_intel kvm dummy uinput
> snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq
> snd_seq_device e1000e snd_pcm snd_timer i2c_i801 snd shpchp iTCO_wdt
> iTCO_vendor_support soundcore dell_wmi sparse_keymap snd_page_alloc
> serio_raw joydev wmi dcdbas microcode usb_storage uas raid1 pata_acpi
> ata_generic radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last
> unloaded: scsi_wait_scan]
> [291835.725002]
> [291835.725013] Pid: 27386, comm: btrfs Tainted: G =A0 =A0 =A0 =A0 =A0=
I
> 2.6.37-2.fc15.x86_64 #1
> [291835.725062] RIP: 0010:[<ffffffffa0565237>] =A0[<ffffffffa0565237>=
]
> build_backref_tree+0x473/0xd6d [btrfs]
> [291835.725126] RSP: 0018:ffff8800373bf9c8 =A0EFLAGS: 00010246
> [291835.725152] RAX: ffff8801367d5100 RBX: ffff88020b110880 RCX:
> 0000000000000040
> [291835.725186] RDX: 0000000000000030 RSI: 0000006dd08d3000 RDI:
> ffff880100069820
> [291835.725219] RBP: ffff8800373bfaf8 R08: 0000000000008050 R09:
> ffff8800373bf980
> [291835.725253] R10: ffff8800373bf918 R11: ffff88020b110880 R12:
> ffff8801367d5100
> [291835.725254] R13: ffff88012c0a24c0 R14: ffff88021e2013f0 R15:
> ffff88021e201cf0
> [291835.725254] FS: =A000007fcb1a6cc760(0000) GS:ffff8800bfa00000(000=
0)
> knlGS:0000000000000000
> [291835.725254] CS: =A00010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [291835.725254] CR2: 0000000002feeeb8 CR3: 00000001c2943000 CR4:
> 00000000000426e0
> [291835.725254] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [291835.725254] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [291835.725254] Process btrfs (pid: 27386, threadinfo ffff8800373be00=
0,
> task ffff88022452ae40)
> [291835.725254] Stack:
> [291835.725254] =A0ffffea0004b5a470 ffffea0000000000 ffff8800373bf9f8
> ffff8800373bfaa8
> [291835.725254] =A00000000000000000 ffff88005faafbb0 ffff880100069808
> ffff880100069d78
> [291835.725254] =A0ffff88012c0a2aa0 ffff880100069820 ffff88020b1108c0
> ffff880100069d80
> [291835.725254] Call Trace:
> [291835.725254] =A0[<ffffffffa0565c91>] relocate_tree_blocks+0x160/0x=
478
> [btrfs]
> [291835.725254] =A0[<ffffffffa056463d>] ? add_tree_block+0x11e/0x13e =
[btrfs]
> [291835.725254] =A0[<ffffffffa0566b45>] relocate_block_group+0x1e3/0x=
490
> [btrfs]
> [291835.725254] =A0[<ffffffff8103edb9>] ? should_resched+0xe/0x2e
> [291835.725254] =A0[<ffffffffa0566f39>]
> btrfs_relocate_block_group+0x147/0x28a [btrfs]
> [291835.725254] =A0[<ffffffffa054e52a>]
> btrfs_relocate_chunk.clone.40+0x61/0x4ab [btrfs]
> [291835.725254] =A0[<ffffffffa05152d4>] ? btrfs_item_key+0x1e/0x20 [b=
trfs]
> [291835.725254] =A0[<ffffffffa05152f0>] ? btrfs_item_key_to_cpu+0x1a/=
0x36
> [btrfs]
> [291835.725254] =A0[<ffffffffa054c2a8>] ? read_extent_buffer+0xc3/0xe=
3 [btrfs]
> [291835.725254] =A0[<ffffffffa05154e6>] ?
> btrfs_header_nritems.clone.12+0x17/0x1c [btrfs]
> [291835.725254] =A0[<ffffffffa054cff6>] ? btrfs_item_key_to_cpu+0x2a/=
0x46
> [btrfs]
> [291835.725254] =A0[<ffffffffa055045e>] btrfs_balance+0x1a3/0x1f0 [bt=
rfs]
> [291835.725254] =A0[<ffffffff8112bce5>] ? do_filp_open+0x226/0x5c8
> [291835.725254] =A0[<ffffffffa0556773>] btrfs_ioctl+0x641/0x846 [btrf=
s]
> [291835.725254] =A0[<ffffffff811f3ed1>] ? file_has_perm+0xa5/0xc7
> [291835.725254] =A0[<ffffffff8112e091>] do_vfs_ioctl+0x4b1/0x4f2
> [291835.725254] =A0[<ffffffff8112e128>] sys_ioctl+0x56/0x7a
> [291835.725254] =A0[<ffffffff8100acc2>] system_call_fastpath+0x16/0x1=
b
> [291835.725254] Code: 48 8b 45 89 49 8d 7d 10 48 8d 75 b0 49 89 44 24=
 18
> 8a 43 70 ff c0 41 88 44 24 70 e8 f7 c3 ff ff eb 17 f6 40 71 10 49 89 =
c4
> 75 02 <0f> 0b 49 8d 45 10 49 89 45 10 49 89 45 18 48 8b b5 20 ff ff f=
f
> [291835.725254] RIP =A0[<ffffffffa0565237>] build_backref_tree+0x473/=
0xd6d
> [btrfs]
> [291835.725254] =A0RSP <ffff8800373bf9c8>
> [291835.738971] ---[ end trace a7919e7f17c0a727 ]---
>
>
> It is really difficult to reproduce this bug. This time, I was balanc=
ing
> a 300GB volume, which was almost finished by the time it crashed. It =
had
> been running for 2 days straight, and survived a complete backup run,
> with 5 simultaneous rsyncs running on it. Last night when the rsyncs
> kicked in, it crashed within half an hour though.
>
> I will now try downgrading to 2.6.36 as per Zheng Yan's suggestion.
>
> Thanks,
>
> Erik.
>
>
> Op 17-1-2011 15:31, Erik Logtenberg schreef:
>> Hi,
>>
>> Please find attached the error log, for future reference.
>>
>> Forgot to mention:
>> I could still use the system after this error, so it was not a compl=
ete
>> fatal error in that regard. All active processes (mostly rsync) were
>> hanging in state D though, so I couldn't kill them anymore. Also the=
 FS
>> was not umountable. So I still had to reboot.
>>
>> Thanks,
>>
>> Erik.
>>
>>
>> On 01/17/2011 03:14 PM, Erik Logtenberg wrote:
>>> Hi,
>>>
>>> btrfs balance results in:
>>>
>>> http://pastebin.com/v5j0809M
>>>
>>> My system: fully up-to-date Fedora 14 with rawhide kernel to make b=
trfs
>>> balance do useful stuff to my free space:
>>>
>>> kernel-2.6.37-2.fc15.x86_64
>>> btrfs-progs-0.19-12.fc14.x86_64
>>>
>>> Filesystem had 0 bytes free, should be 45G, so on darklings advice =
I ran
>>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup=
 jobs
>>> that had failed due to ENOSP).
>>> Up until the crash, btrfs balance did retrieve a couple of Gigs fre=
e
>>> space though, so that part of the plan worked just fine.
>>>
>>> Thanks,
>>>
>>> Erik.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btr=
fs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs=
" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-21  9:19     ` Yan, Zheng 
@ 2011-01-26  9:04       ` Erik Logtenberg
  2011-01-26  9:27         ` Hugo Mills
  0 siblings, 1 reply; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-26  9:04 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: linux-btrfs

Hi,

It took me a couple of days, because I needed to patch my kernel first
and then issue a rebalance, which ran for more than two days.
Nevertheless, the rebalance succeeded without any "kernel BUG"-messages,
so apparently your patch works!

I noticed that at first, the messages were like this:

[79329.526490] btrfs: found 1939 extents
[79375.950834] btrfs: found 1939 extents
[79376.083599] btrfs: relocating block group 352220872704 flags 1
[80052.940435] btrfs: found 3786 extents
[80108.439657] btrfs: found 3786 extents
[80112.325548] btrfs: relocating block group 351147130880 flags 1

Just like I saw during previous balance-runs. Then all of a sudden the
messages changed to:

[104178.827594] btrfs allocation failed flags 1, wanted 2013265920
[104178.827599] space_info has 4271198208 free, is not full
[104178.827602] space_info total=214748364800, used=210440957952,
pinned=0, reserved=36208640, may_use=3168993280, readonly=0
[104178.827606] block group 1107296256 has 5368709120 bytes, 5368582144
used 0 pinned 0 reserved
[104178.827610] entry offset 1778384896, bytes 86016, bitmap yes
[104178.827612] entry offset 1855827968, bytes 20480, bitmap no
[104178.827614] entry offset 1855852544, bytes 20480, bitmap no
[104178.827617] block group has cluster?: no
[104178.827618] 0 blocks of free space at or bigger than bytes is
[104178.827621] block group 8623489024 has 5368709120 bytes, 5368705024
used 0 pinned 0 reserved
[104178.827624] entry offset 8891924480, bytes 4096, bitmap yes
[104178.827626] block group has cluster?: no
[104178.827628] 0 blocks of free space at or bigger than bytes is
[104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120
used 0 pinned 0 reserved
[104178.827634] block group has cluster?: no

And so on.

Does this indicate an error of any sort, or is this expected behaviour?

Kind regards,

Erik.


On 01/21/2011 10:19 AM, Yan, Zheng wrote:
> please try patch attached below, Thanks.
> 
> ---
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index b37d723..49d6b13 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1158,6 +1158,7 @@ static int clone_backref_node(struct
> btrfs_trans_handle *trans,
>  	new_node->bytenr = dest->node->start;
>  	new_node->level = node->level;
>  	new_node->lowest = node->lowest;
> +	new_node->checked = 1;
>  	new_node->root = dest;
> 
>  	if (!node->lowest) {
> ---
> 
> 
> On Fri, Jan 21, 2011 at 4:50 PM, Erik Logtenberg <erik@logtenberg.eu> wrote:
>> Hi,
>>
>> I hit the same bug again I think:
>>
>> [291835.724344] ------------[ cut here ]------------
>> [291835.724376] kernel BUG at fs/btrfs/relocation.c:836!
>> [291835.724401] invalid opcode: 0000 [#1] SMP
>> [291835.724424] last sysfs file:
>> /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
>> [291835.724461] CPU 0
>> [291835.724472] Modules linked in: uvcvideo snd_usb_audio
>> snd_usbmidi_lib videodev v4l1_compat snd_rawmidi v4l2_compat_ioctl32
>> btrfs zlib_deflate libcrc32c sha256_generic cryptd aes_x86_64
>> aes_generic cbc dm_crypt tun ebtable_nat ebtables ipt_MASQUERADE
>> iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss
>> exportfs nls_utf8 cifs fscache sunrpc cpufreq_ondemand acpi_cpufreq
>> freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
>> ip6table_filter ip6_tables ipv6 kvm_intel kvm dummy uinput
>> snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq
>> snd_seq_device e1000e snd_pcm snd_timer i2c_i801 snd shpchp iTCO_wdt
>> iTCO_vendor_support soundcore dell_wmi sparse_keymap snd_page_alloc
>> serio_raw joydev wmi dcdbas microcode usb_storage uas raid1 pata_acpi
>> ata_generic radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last
>> unloaded: scsi_wait_scan]
>> [291835.725002]
>> [291835.725013] Pid: 27386, comm: btrfs Tainted: G          I
>> 2.6.37-2.fc15.x86_64 #1
>> [291835.725062] RIP: 0010:[<ffffffffa0565237>]  [<ffffffffa0565237>]
>> build_backref_tree+0x473/0xd6d [btrfs]
>> [291835.725126] RSP: 0018:ffff8800373bf9c8  EFLAGS: 00010246
>> [291835.725152] RAX: ffff8801367d5100 RBX: ffff88020b110880 RCX:
>> 0000000000000040
>> [291835.725186] RDX: 0000000000000030 RSI: 0000006dd08d3000 RDI:
>> ffff880100069820
>> [291835.725219] RBP: ffff8800373bfaf8 R08: 0000000000008050 R09:
>> ffff8800373bf980
>> [291835.725253] R10: ffff8800373bf918 R11: ffff88020b110880 R12:
>> ffff8801367d5100
>> [291835.725254] R13: ffff88012c0a24c0 R14: ffff88021e2013f0 R15:
>> ffff88021e201cf0
>> [291835.725254] FS:  00007fcb1a6cc760(0000) GS:ffff8800bfa00000(0000)
>> knlGS:0000000000000000
>> [291835.725254] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [291835.725254] CR2: 0000000002feeeb8 CR3: 00000001c2943000 CR4:
>> 00000000000426e0
>> [291835.725254] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [291835.725254] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [291835.725254] Process btrfs (pid: 27386, threadinfo ffff8800373be000,
>> task ffff88022452ae40)
>> [291835.725254] Stack:
>> [291835.725254]  ffffea0004b5a470 ffffea0000000000 ffff8800373bf9f8
>> ffff8800373bfaa8
>> [291835.725254]  0000000000000000 ffff88005faafbb0 ffff880100069808
>> ffff880100069d78
>> [291835.725254]  ffff88012c0a2aa0 ffff880100069820 ffff88020b1108c0
>> ffff880100069d80
>> [291835.725254] Call Trace:
>> [291835.725254]  [<ffffffffa0565c91>] relocate_tree_blocks+0x160/0x478
>> [btrfs]
>> [291835.725254]  [<ffffffffa056463d>] ? add_tree_block+0x11e/0x13e [btrfs]
>> [291835.725254]  [<ffffffffa0566b45>] relocate_block_group+0x1e3/0x490
>> [btrfs]
>> [291835.725254]  [<ffffffff8103edb9>] ? should_resched+0xe/0x2e
>> [291835.725254]  [<ffffffffa0566f39>]
>> btrfs_relocate_block_group+0x147/0x28a [btrfs]
>> [291835.725254]  [<ffffffffa054e52a>]
>> btrfs_relocate_chunk.clone.40+0x61/0x4ab [btrfs]
>> [291835.725254]  [<ffffffffa05152d4>] ? btrfs_item_key+0x1e/0x20 [btrfs]
>> [291835.725254]  [<ffffffffa05152f0>] ? btrfs_item_key_to_cpu+0x1a/0x36
>> [btrfs]
>> [291835.725254]  [<ffffffffa054c2a8>] ? read_extent_buffer+0xc3/0xe3 [btrfs]
>> [291835.725254]  [<ffffffffa05154e6>] ?
>> btrfs_header_nritems.clone.12+0x17/0x1c [btrfs]
>> [291835.725254]  [<ffffffffa054cff6>] ? btrfs_item_key_to_cpu+0x2a/0x46
>> [btrfs]
>> [291835.725254]  [<ffffffffa055045e>] btrfs_balance+0x1a3/0x1f0 [btrfs]
>> [291835.725254]  [<ffffffff8112bce5>] ? do_filp_open+0x226/0x5c8
>> [291835.725254]  [<ffffffffa0556773>] btrfs_ioctl+0x641/0x846 [btrfs]
>> [291835.725254]  [<ffffffff811f3ed1>] ? file_has_perm+0xa5/0xc7
>> [291835.725254]  [<ffffffff8112e091>] do_vfs_ioctl+0x4b1/0x4f2
>> [291835.725254]  [<ffffffff8112e128>] sys_ioctl+0x56/0x7a
>> [291835.725254]  [<ffffffff8100acc2>] system_call_fastpath+0x16/0x1b
>> [291835.725254] Code: 48 8b 45 89 49 8d 7d 10 48 8d 75 b0 49 89 44 24 18
>> 8a 43 70 ff c0 41 88 44 24 70 e8 f7 c3 ff ff eb 17 f6 40 71 10 49 89 c4
>> 75 02 <0f> 0b 49 8d 45 10 49 89 45 10 49 89 45 18 48 8b b5 20 ff ff ff
>> [291835.725254] RIP  [<ffffffffa0565237>] build_backref_tree+0x473/0xd6d
>> [btrfs]
>> [291835.725254]  RSP <ffff8800373bf9c8>
>> [291835.738971] ---[ end trace a7919e7f17c0a727 ]---
>>
>>
>> It is really difficult to reproduce this bug. This time, I was balancing
>> a 300GB volume, which was almost finished by the time it crashed. It had
>> been running for 2 days straight, and survived a complete backup run,
>> with 5 simultaneous rsyncs running on it. Last night when the rsyncs
>> kicked in, it crashed within half an hour though.
>>
>> I will now try downgrading to 2.6.36 as per Zheng Yan's suggestion.
>>
>> Thanks,
>>
>> Erik.
>>
>>
>> Op 17-1-2011 15:31, Erik Logtenberg schreef:
>>> Hi,
>>>
>>> Please find attached the error log, for future reference.
>>>
>>> Forgot to mention:
>>> I could still use the system after this error, so it was not a complete
>>> fatal error in that regard. All active processes (mostly rsync) were
>>> hanging in state D though, so I couldn't kill them anymore. Also the FS
>>> was not umountable. So I still had to reboot.
>>>
>>> Thanks,
>>>
>>> Erik.
>>>
>>>
>>> On 01/17/2011 03:14 PM, Erik Logtenberg wrote:
>>>> Hi,
>>>>
>>>> btrfs balance results in:
>>>>
>>>> http://pastebin.com/v5j0809M
>>>>
>>>> My system: fully up-to-date Fedora 14 with rawhide kernel to make btrfs
>>>> balance do useful stuff to my free space:
>>>>
>>>> kernel-2.6.37-2.fc15.x86_64
>>>> btrfs-progs-0.19-12.fc14.x86_64
>>>>
>>>> Filesystem had 0 bytes free, should be 45G, so on darklings advice I ran
>>>> btrfs balance on the fs, while doing heavy I/O (re-running 5 backup jobs
>>>> that had failed due to ENOSP).
>>>> Up until the crash, btrfs balance did retrieve a couple of Gigs free
>>>> space though, so that part of the plan worked just fine.
>>>>
>>>> Thanks,
>>>>
>>>> Erik.
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-26  9:04       ` Erik Logtenberg
@ 2011-01-26  9:27         ` Hugo Mills
  2011-01-26  9:40           ` Helmut Hullen
  2011-01-26  9:43           ` Erik Logtenberg
  0 siblings, 2 replies; 17+ messages in thread
From: Hugo Mills @ 2011-01-26  9:27 UTC (permalink / raw)
  To: Erik Logtenberg; +Cc: Yan, Zheng , linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2431 bytes --]

On Wed, Jan 26, 2011 at 10:04:02AM +0100, Erik Logtenberg wrote:
> Hi,
> 
> It took me a couple of days, because I needed to patch my kernel first
> and then issue a rebalance, which ran for more than two days.
> Nevertheless, the rebalance succeeded without any "kernel BUG"-messages,
> so apparently your patch works!
> 
> I noticed that at first, the messages were like this:
> 
> [79329.526490] btrfs: found 1939 extents
> [79375.950834] btrfs: found 1939 extents
> [79376.083599] btrfs: relocating block group 352220872704 flags 1
> [80052.940435] btrfs: found 3786 extents
> [80108.439657] btrfs: found 3786 extents
> [80112.325548] btrfs: relocating block group 351147130880 flags 1
> 
> Just like I saw during previous balance-runs. Then all of a sudden the
> messages changed to:
> 
> [104178.827594] btrfs allocation failed flags 1, wanted 2013265920
> [104178.827599] space_info has 4271198208 free, is not full
> [104178.827602] space_info total=214748364800, used=210440957952,
> pinned=0, reserved=36208640, may_use=3168993280, readonly=0
> [104178.827606] block group 1107296256 has 5368709120 bytes, 5368582144
> used 0 pinned 0 reserved
> [104178.827610] entry offset 1778384896, bytes 86016, bitmap yes
> [104178.827612] entry offset 1855827968, bytes 20480, bitmap no
> [104178.827614] entry offset 1855852544, bytes 20480, bitmap no
> [104178.827617] block group has cluster?: no
> [104178.827618] 0 blocks of free space at or bigger than bytes is
> [104178.827621] block group 8623489024 has 5368709120 bytes, 5368705024
> used 0 pinned 0 reserved
> [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes
> [104178.827626] block group has cluster?: no
> [104178.827628] 0 blocks of free space at or bigger than bytes is
> [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120
> used 0 pinned 0 reserved
> [104178.827634] block group has cluster?: no
> 
> And so on.
> 
> Does this indicate an error of any sort, or is this expected behaviour?

   As far as I know, it means that you've run out of space, and not
every block group has been rewritten by the balance process.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
    --- In one respect at least, the Martians are a happy people: ---    
                          they have no lawyers.                          

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-26  9:27         ` Hugo Mills
@ 2011-01-26  9:40           ` Helmut Hullen
  2011-01-26  9:46             ` Erik Logtenberg
  2011-01-29 10:56             ` Chris Samuel
  2011-01-26  9:43           ` Erik Logtenberg
  1 sibling, 2 replies; 17+ messages in thread
From: Helmut Hullen @ 2011-01-26  9:40 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Hugo,

Du meintest am 26.01.11:

>> It took me a couple of days, because I needed to patch my kernel
>> first and then issue a rebalance, which ran for more than two days.
>> Nevertheless, the rebalance succeeded without any "kernel
>> BUG"-messages, so apparently your patch works!

[...]

>    As far as I know, it means that you've run out of space, and not
> every block group has been rewritten by the balance process.

Yesterday I reported a similar problem in this mailing list, in the  
thread "version".

Running kernel 2.6.37 didn't show this error, but running kernel 2.6.38- 
rc2 ended with errors.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-26  9:27         ` Hugo Mills
  2011-01-26  9:40           ` Helmut Hullen
@ 2011-01-26  9:43           ` Erik Logtenberg
  1 sibling, 0 replies; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-26  9:43 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs


>> [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes
>> [104178.827626] block group has cluster?: no
>> [104178.827628] 0 blocks of free space at or bigger than bytes is
>> [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120
>> used 0 pinned 0 reserved
>> [104178.827634] block group has cluster?: no
>>
>> And so on.
>>
>> Does this indicate an error of any sort, or is this expected behaviour?
> 
>    As far as I know, it means that you've run out of space, and not
> every block group has been rewritten by the balance process.
> 
>    Hugo.
> 

It is a 300GB volume with 79GB free. So hardly out of space. Moreover, I
started the balance operation with the sole purpose of reclaiming some
free space. The volume had like 40GB less free space when balance
started, which was used by / reserved for Metadata.

Kind regards,

Erik.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-26  9:40           ` Helmut Hullen
@ 2011-01-26  9:46             ` Erik Logtenberg
  2011-01-29 10:56             ` Chris Samuel
  1 sibling, 0 replies; 17+ messages in thread
From: Erik Logtenberg @ 2011-01-26  9:46 UTC (permalink / raw)
  To: helmut; +Cc: linux-btrfs


> Yesterday I reported a similar problem in this mailing list, in the  
> thread "version".
> 
> Running kernel 2.6.37 didn't show this error, but running kernel 2.6.38- 
> rc2 ended with errors.
> 
> Viele Gruesse!
> Helmut

Ah, indeed, just like you I use 2.6.38-rc2. Or to be more precise:
2.6.38-0.rc2.git0.1.fc14.x86_64, which is the latest rawhide kernel,
with one additional patch, being the oneliner from Zheng Yan.

Kind regards,

Erik.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel error during btrfs balance
  2011-01-26  9:40           ` Helmut Hullen
  2011-01-26  9:46             ` Erik Logtenberg
@ 2011-01-29 10:56             ` Chris Samuel
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Samuel @ 2011-01-29 10:56 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: Text/Plain, Size: 488 bytes --]

On Wed, 26 Jan 2011 08:40:00 PM Helmut Hullen wrote:

> Yesterday I reported a similar problem in this mailing list, in the  
> thread "version".

I think that might have been a slightly different issue, but
I'd guess there would be no harm in trying Yan Zheng's patch!

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-01-29 10:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-17 14:14 Kernel error during btrfs balance Erik Logtenberg
2011-01-17 14:31 ` Erik Logtenberg
2011-01-17 14:37   ` Erik Logtenberg
2011-01-17 14:39     ` Erik Logtenberg
2011-01-21  8:50   ` Erik Logtenberg
2011-01-21  9:19     ` Yan, Zheng 
2011-01-26  9:04       ` Erik Logtenberg
2011-01-26  9:27         ` Hugo Mills
2011-01-26  9:40           ` Helmut Hullen
2011-01-26  9:46             ` Erik Logtenberg
2011-01-29 10:56             ` Chris Samuel
2011-01-26  9:43           ` Erik Logtenberg
2011-01-18  0:54 ` Yan, Zheng 
2011-01-18 13:22   ` Erik Logtenberg
2011-01-18 13:58     ` Helmut Hullen
2011-01-18 14:13     ` Yan, Zheng 
2011-01-18 14:29       ` Erik Logtenberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.