All of lore.kernel.org
 help / color / mirror / Atom feed
* cannot mount btrfs volume read/write (+task blocked backtrace)
@ 2015-11-07 22:20 Frédéric
  2015-11-08  6:45 ` Frédéric Grelot
  0 siblings, 1 reply; 3+ messages in thread
From: Frédéric @ 2015-11-07 22:20 UTC (permalink / raw)
  To: linux-btrfs

Hi all, 

I am trying to mount a btrfs filesystem, and I have been told on freenode/#btrfs to try the mailing list for more precise advice.
My setup is as follow : one debian server (stretch,  4.2.0-1-amd64 #1 SMP Debian 4.2.3-2 (2015-10-14) x86_64) with 5 disks running in a btrfs raid6 volume.
root@nas:~# btrfs --version
btrfs-progs v4.1.2
root@nas:~#   btrfs fi show
Label: none  uuid: 0d56cb74-65f9-4f4e-9c51-74ea286f3f79
	Total devices 5 FS bytes used 2.66TiB
	devid    3 size 931.51GiB used 320.29GiB path /dev/sda1
	devid    5 size 2.73TiB used 1.76TiB path /dev/sdb1
	devid    6 size 2.73TiB used 1.76TiB path /dev/sde1
	devid    7 size 2.73TiB used 1.76TiB path /dev/sdf1
	devid    8 size 2.73TiB used 609.50GiB path /dev/sdd1

btrfs-progs v4.1.2
root@nas:~#   btrfs fi df /raid
Data, RAID6: total=2.66TiB, used=2.66TiB
System, RAID6: total=64.00MiB, used=416.00KiB
Metadata, RAID6: total=10.00GiB, used=8.13GiB
GlobalReserve, single: total=512.00MiB, used=544.00KiB


As you can guess, device 8 is pretty new. Device 3 is to be removed. I already issued a "btrfs dev del" on it (and the data was balanced to dev 8), but had to interrupt it (I'm not proud of that, but the server had to be turned off).
I had few hitches with this volume recently : the btrfs dev del was interrupted, and one of the SATA cable was defective (I saw that recently, it is now replaced). I ran a scrub which had not had enough time to complete. I can not say for sure that it is not waiting for resuming...
The volume has several snapshots (about 20 or 30, I activated snapper 1 month ago), and I just enabled qgroups in order to monitor snapshot disk usage. I know that the qgroups quotas are not updated yet : because of the bad cable, the latest "btrfs dev del" started to have errors, and I could not do anything but kill the server. If I remember correctly, the disk with the bad cable was devid 5 (/dev/sdb1).
I enabled qgroups during the "btrfs dev del" process.

Now, I can mount the volume with no problem in read-only (so my backups are up-to-date!)
If I try to mount it RW, I get the following errors in dmesg :
steps :
mount -o ro,recover,nospace_cache,clear_cache,skip_balance
(time is 28s in dmesg)
mount -o remount,rw,recover,nospace_cache,clear_cache,skip_balance
(time 83s)
dmesg :
[   28.920176] BTRFS info (device sda1): enabling auto recovery
[   28.920183] BTRFS info (device sda1): force clearing of disk cache
[   28.920187] BTRFS info (device sda1): disabling disk space caching
[   28.920190] BTRFS: has skinny extents
[   29.551170] BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 3434, gen 0
[   83.483314] BTRFS info (device sda1): disabling disk space caching
[   83.483323] BTRFS info (device sda1): enabling auto recovery
[   83.483326] BTRFS info (device sda1): enabling auto recovery
[  360.188189] INFO: task btrfs-transacti:1100 blocked for more than 120 seconds.
[  360.188209]       Tainted: G        W       4.2.0-1-amd64 #1
[  360.188214] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.188221] btrfs-transacti D ffffffff8109a1c0     0  1100      2 0x00000000
[  360.188226]  ffff8800d857eec0 0000000000000046 ffff8800cb947e20 ffff88011abecd40
[  360.188229]  0000000000000246 ffff8800cb948000 ffff880119dc8000 ffff8800cb947e20
[  360.188231]  ffff88011ae83d58 ffff8800d8d72d48 ffff8800d8d72da8 ffffffff8154acaf
[  360.188233] Call Trace:
[  360.188239]  [<ffffffff8154acaf>] ? schedule+0x2f/0x70
[  360.188265]  [<ffffffffa022c0bf>] ? btrfs_commit_transaction+0x3ef/0xa90 [btrfs]
[  360.188269]  [<ffffffff810a9ad0>] ? wait_woken+0x80/0x80
[  360.188281]  [<ffffffffa0227654>] ? transaction_kthread+0x224/0x240 [btrfs]
[  360.188293]  [<ffffffffa0227430>] ? btrfs_cleanup_transaction+0x510/0x510 [btrfs]
[  360.188296]  [<ffffffff8108aa41>] ? kthread+0xc1/0xe0
[  360.188298]  [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170
[  360.188301]  [<ffffffff8154ea1f>] ? ret_from_fork+0x3f/0x70
[  360.188303]  [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170
[  480.188185] INFO: task btrfs-transacti:1100 blocked for more than 120 seconds.
[...]

As you can see, I get an error at boot+360s (6 minutes)
The error repeats every two minutes, and stopped at boot+28minutes.
However, the "mount" process was still active, and I stopped it (in order to try something else) more than 3 hours later. No message appeared after this one (t+28m).
I also tried to mount RO, umount, then mount RW (with the same option), but with no success : I got the same message+backtrace at boot+4 minutes

Following an advice on #btrfs, I am currently running a btrfs check --readonly, but it takes a pretty long time. However, may the check fix the problem or not, the backtrace may be of interest for you...
I'll update the mailing-list after the btrfs-check result

Goulou.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: cannot mount btrfs volume read/write (+task blocked backtrace)
  2015-11-07 22:20 cannot mount btrfs volume read/write (+task blocked backtrace) Frédéric
@ 2015-11-08  6:45 ` Frédéric Grelot
  2015-11-08 21:34   ` Duncan
  0 siblings, 1 reply; 3+ messages in thread
From: Frédéric Grelot @ 2015-11-08  6:45 UTC (permalink / raw)
  To: linux-btrfs

After 8 hours, "btrfs check --readonly" is still "checking quota groups". It does not have any IO activity, but uses 100% CPU.
#top
 1143 root      20   0 2266596 2.148g   1124 R 100.0 56.0 183:37.14 btrfs check --readonly /dev/sda1                                                                                                                              

I am not sure what to do : if you have no idea, since I have less than 3Tb of data I will just manually degrade the volume by removing one of the disk, build a new btrfs on it, copy the data since I can still open the old one RO, then manually take devices from the old volume to new new one (converting to raid1 first, then raid6).

Goulou.

----- Mail original -----
De: "Frédéric Grelot" <fredericg_99@yahoo.fr>
À: "linux-btrfs" <linux-btrfs@vger.kernel.org>
Envoyé: Samedi 7 Novembre 2015 23:20:34
Objet: cannot mount btrfs volume read/write (+task blocked backtrace)

Hi all, 

I am trying to mount a btrfs filesystem, and I have been told on freenode/#btrfs to try the mailing list for more precise advice.
My setup is as follow : one debian server (stretch,  4.2.0-1-amd64 #1 SMP Debian 4.2.3-2 (2015-10-14) x86_64) with 5 disks running in a btrfs raid6 volume.
root@nas:~# btrfs --version
btrfs-progs v4.1.2
root@nas:~#   btrfs fi show
Label: none  uuid: 0d56cb74-65f9-4f4e-9c51-74ea286f3f79
	Total devices 5 FS bytes used 2.66TiB
	devid    3 size 931.51GiB used 320.29GiB path /dev/sda1
	devid    5 size 2.73TiB used 1.76TiB path /dev/sdb1
	devid    6 size 2.73TiB used 1.76TiB path /dev/sde1
	devid    7 size 2.73TiB used 1.76TiB path /dev/sdf1
	devid    8 size 2.73TiB used 609.50GiB path /dev/sdd1

btrfs-progs v4.1.2
root@nas:~#   btrfs fi df /raid
Data, RAID6: total=2.66TiB, used=2.66TiB
System, RAID6: total=64.00MiB, used=416.00KiB
Metadata, RAID6: total=10.00GiB, used=8.13GiB
GlobalReserve, single: total=512.00MiB, used=544.00KiB


As you can guess, device 8 is pretty new. Device 3 is to be removed. I already issued a "btrfs dev del" on it (and the data was balanced to dev 8), but had to interrupt it (I'm not proud of that, but the server had to be turned off).
I had few hitches with this volume recently : the btrfs dev del was interrupted, and one of the SATA cable was defective (I saw that recently, it is now replaced). I ran a scrub which had not had enough time to complete. I can not say for sure that it is not waiting for resuming...
The volume has several snapshots (about 20 or 30, I activated snapper 1 month ago), and I just enabled qgroups in order to monitor snapshot disk usage. I know that the qgroups quotas are not updated yet : because of the bad cable, the latest "btrfs dev del" started to have errors, and I could not do anything but kill the server. If I remember correctly, the disk with the bad cable was devid 5 (/dev/sdb1).
I enabled qgroups during the "btrfs dev del" process.

Now, I can mount the volume with no problem in read-only (so my backups are up-to-date!)
If I try to mount it RW, I get the following errors in dmesg :
steps :
mount -o ro,recover,nospace_cache,clear_cache,skip_balance
(time is 28s in dmesg)
mount -o remount,rw,recover,nospace_cache,clear_cache,skip_balance
(time 83s)
dmesg :
[   28.920176] BTRFS info (device sda1): enabling auto recovery
[   28.920183] BTRFS info (device sda1): force clearing of disk cache
[   28.920187] BTRFS info (device sda1): disabling disk space caching
[   28.920190] BTRFS: has skinny extents
[   29.551170] BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 3434, gen 0
[   83.483314] BTRFS info (device sda1): disabling disk space caching
[   83.483323] BTRFS info (device sda1): enabling auto recovery
[   83.483326] BTRFS info (device sda1): enabling auto recovery
[  360.188189] INFO: task btrfs-transacti:1100 blocked for more than 120 seconds.
[  360.188209]       Tainted: G        W       4.2.0-1-amd64 #1
[  360.188214] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.188221] btrfs-transacti D ffffffff8109a1c0     0  1100      2 0x00000000
[  360.188226]  ffff8800d857eec0 0000000000000046 ffff8800cb947e20 ffff88011abecd40
[  360.188229]  0000000000000246 ffff8800cb948000 ffff880119dc8000 ffff8800cb947e20
[  360.188231]  ffff88011ae83d58 ffff8800d8d72d48 ffff8800d8d72da8 ffffffff8154acaf
[  360.188233] Call Trace:
[  360.188239]  [<ffffffff8154acaf>] ? schedule+0x2f/0x70
[  360.188265]  [<ffffffffa022c0bf>] ? btrfs_commit_transaction+0x3ef/0xa90 [btrfs]
[  360.188269]  [<ffffffff810a9ad0>] ? wait_woken+0x80/0x80
[  360.188281]  [<ffffffffa0227654>] ? transaction_kthread+0x224/0x240 [btrfs]
[  360.188293]  [<ffffffffa0227430>] ? btrfs_cleanup_transaction+0x510/0x510 [btrfs]
[  360.188296]  [<ffffffff8108aa41>] ? kthread+0xc1/0xe0
[  360.188298]  [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170
[  360.188301]  [<ffffffff8154ea1f>] ? ret_from_fork+0x3f/0x70
[  360.188303]  [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170
[  480.188185] INFO: task btrfs-transacti:1100 blocked for more than 120 seconds.
[...]

As you can see, I get an error at boot+360s (6 minutes)
The error repeats every two minutes, and stopped at boot+28minutes.
However, the "mount" process was still active, and I stopped it (in order to try something else) more than 3 hours later. No message appeared after this one (t+28m).
I also tried to mount RO, umount, then mount RW (with the same option), but with no success : I got the same message+backtrace at boot+4 minutes

Following an advice on #btrfs, I am currently running a btrfs check --readonly, but it takes a pretty long time. However, may the check fix the problem or not, the backtrace may be of interest for you...
I'll update the mailing-list after the btrfs-check result

Goulou.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: cannot mount btrfs volume read/write (+task blocked backtrace)
  2015-11-08  6:45 ` Frédéric Grelot
@ 2015-11-08 21:34   ` Duncan
  0 siblings, 0 replies; 3+ messages in thread
From: Duncan @ 2015-11-08 21:34 UTC (permalink / raw)
  To: linux-btrfs

Frédéric Grelot posted on Sun, 08 Nov 2015 07:45:39 +0100 as excerpted:

> After 8 hours, "btrfs check --readonly" is still "checking quota
> groups". It does not have any IO activity, but uses 100% CPU.
> #top
>  1143 root      20   0 2266596 2.148g   1124 R 100.0 56.0 183:37.14
>  btrfs check --readonly /dev/sda1
> 
> I am not sure what to do : if you have no idea, since I have less than
> 3Tb of data I will just manually degrade the volume by removing one of
> the disk, build a new btrfs on it, copy the data since I can still open
> the old one RO, then manually take devices from the old volume to new
> new one (converting to raid1 first, then raid6).

Ahh... quotas.  Devs are working very hard on them, but to this point 
quotas have never really worked /correctly/, and the estimate as to when 
they might... seems to always remain a couple kernel cycles out.

So my recommendation continues to be, either you really need quotas and 
thus should use a filesystem where they are known to be mature and 
stable, or you don't, and btrfs is an option, but don't enable quotas and 
avoid the problems they cause.  Unless of course you're specifically 
working with the devs to debug and fix current functionality and don't 
mind loosing a filesystem or two in the process as a result, in which 
case, THANK YOU for helping to stabilize the feature for everyone, 
however long it might take. 

As for when it might be safe to enable them, at this point, with the 
quota history we have, I'd suggest waiting at least two kernel cycles 
after all known quota issues have been fixed.  But whether that's 4.6 or 
5.6 (with roughly 5 releases per year and a assuming another major 
version bump at what would be .20, in keeping with the precedent set with 
the 3.x series, it's roughly two years per major version kernel cycle, so 
5.6 would be around 1Q2018) or 6.6... who can tell, until it happens?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-11-08 21:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-07 22:20 cannot mount btrfs volume read/write (+task blocked backtrace) Frédéric
2015-11-08  6:45 ` Frédéric Grelot
2015-11-08 21:34   ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.