raid6, disks of different sizes, ENOSPC errors despite having plenty of space

* raid6, disks of different sizes, ENOSPC errors despite having plenty of space
@ 2014-04-23 21:04 Sergey Ivanyuk
  2014-04-23 22:53 ` Hugo Mills
  2014-04-24 11:33 ` Duncan
  0 siblings, 2 replies; 4+ messages in thread
From: Sergey Ivanyuk @ 2014-04-23 21:04 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have a filesystem that I've converted to raid6 from raid1, on 4 drives (I
have another copy of the data):

        Total devices 4 FS bytes used 924.64GiB
        devid    1 size 1.82TiB used 474.00GiB path /dev/sdd
        devid    2 size 465.76GiB used 465.76GiB path /dev/sda
        devid    3 size 465.76GiB used 465.76GiB path /dev/sdb
        devid    4 size 465.76GiB used 465.73GiB path /dev/sdc

Data, RAID6: total=924.00GiB, used=923.42GiB
System, RAID1: total=32.00MiB, used=208.00KiB
Metadata, RAID1: total=1.70GiB, used=1.28GiB
Metadata, DUP: total=384.00MiB, used=252.13MiB
unknown, single: total=512.00MiB, used=0.00

Recent btrfs-progs built from source, kernel 3.15.0-rc2 on armv7l. Despite
having plenty of space left on the larger drive, attempting to copy more
data onto the filesystem results in a kworker process pegged at 100% CPU
for a very long time (10s of minutes), at which point the writes proceed
for some time, and the process repeats until the eventual "No space left on
device" error. Balancing fails with the same error, even if attempting to
convert back to raid1.

I realize that this likely has something to do with the disparity between
device sizes, and per the wiki a fixed-width stripe may help, though I'm
not sure if it's possible to change the stripe width in my situation, since
I can't rebalance. Is there anything I can do to get this filesystem back
to writable state?

Also, here's a stack trace for the stuck kworker process, which appears to
be a bug since it does this for a very long time:

Exception stack(0xab4699c8 to 0xab469a10)
99c0:                   aec7c870 00000000 00000000 aec7c841 08000000
aec7c870
99e0: ab469ad0 bd51e880 00003000 00000000 0006c000 00000000 00000005
ab469a10
9a00: 80299c8c 80310098 200e0013 ffffffff
[<80011e80>] (__irq_svc) from [<80310098>] (rb_next+0x14/0x5c)
[<80310098>] (rb_next) from [<80299c8c>]
(btrfs_find_space_for_alloc+0x138/0x344)
[<80299c8c>] (btrfs_find_space_for_alloc) from [<80240020>]
(find_free_extent+0x378/0xabc)
[<80240020>] (find_free_extent) from [<80240840>]
(btrfs_reserve_extent+0xdc/0x164)
[<80240840>] (btrfs_reserve_extent) from [<8025aef4>]
(cow_file_range+0x17c/0x5bc)
[<8025aef4>] (cow_file_range) from [<8025c1e0>]
(run_delalloc_range+0x34c/0x380)
[<8025c1e0>] (run_delalloc_range) from [<80274d6c>]
(__extent_writepage+0x708/0x940)
[<80274d6c>] (__extent_writepage) from [<802754b4>]
(extent_writepages+0x238/0x368)
[<802754b4>] (extent_writepages) from [<8009b190>] (do_writepages+0x24/0x38)
[<8009b190>] (do_writepages) from [<800ef59c>]
(__writeback_single_inode+0x28/0x110)
[<800ef59c>] (__writeback_single_inode) from [<800f04c8>]
(writeback_sb_inodes+0x184/0x38c)
[<800f04c8>] (writeback_sb_inodes) from [<800f0740>]
(__writeback_inodes_wb+0x70/0xac)
[<800f0740>] (__writeback_inodes_wb) from [<800f0978>]
(wb_writeback+0x1fc/0x20c)
[<800f0978>] (wb_writeback) from [<800f0b78>]
(bdi_writeback_workfn+0x144/0x338)
[<800f0b78>] (bdi_writeback_workfn) from [<80037cfc>]
(process_one_work+0x110/0x368)
[<80037cfc>] (process_one_work) from [<800383c8>]
(worker_thread+0x138/0x3e8)
[<800383c8>] (worker_thread) from [<8003de90>] (kthread+0xcc/0xe8)
[<8003de90>] (kthread) from [<8000e238>] (ret_from_fork+0x14/0x3c)

^ permalink raw reply	[flat|nested] 4+ messages in thread