Still not production ready

* Still not production ready
@ 2015-12-13 22:35 Martin Steigerwald
  2015-12-13 23:19 ` Marc MERLIN
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Martin Steigerwald @ 2015-12-13 22:35 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi!

For me it is still not production ready. Again I ran into:

btrfs kworker thread uses up 100% of a Sandybridge core for minutes on random 
write into big file
https://bugzilla.kernel.org/show_bug.cgi?id=90401

No matter whether SLES 12 uses it as default for root, no matter whether 
Fujitsu and Facebook use it: I will not let this onto any customer machine 
without lots and lots of underprovisioning and rigorous free space monitoring. 
Actually I will renew my recommendations in my trainings to be careful with 
BTRFS.

>From my experience the monitoring would check for:

merkaba:~> btrfs fi show /home
Label: 'home'  uuid: […]
        Total devices 2 FS bytes used 156.31GiB
        devid    1 size 170.00GiB used 164.13GiB path /dev/mapper/msata-home
        devid    2 size 170.00GiB used 164.13GiB path /dev/mapper/sata-home

If "used" is same as "size" then make big fat alarm. It is not sufficient for 
it to happen. It can run for quite some time just fine without any issues, but 
I never have seen a kworker thread using 100% of one core for extended period 
of time blocking everything else on the fs without this condition being met.

In addition to that last time I tried it aborts scrub any of my BTRFS 
filesstems. Reported in another thread here that got completely ignored so 
far. I think I could go back to 4.2 kernel to make this work.

I am not going to bother to go into more detail on any on this, as I get the 
impression that my bug reports and feedback get ignored. So I spare myself the 
time to do this work for now.

Only thing I wonder now whether this all could be cause my /home is already 
more than one and a half year old. Maybe newly created filesystems are created 
in a way that prevents these issues? But it already has a nice global reserve:

merkaba:~> btrfs fi df /
Data, RAID1: total=27.98GiB, used=24.07GiB
System, RAID1: total=19.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=536.80MiB
GlobalReserve, single: total=192.00MiB, used=0.00B

Actually when I see that this free space thing is still not fixed for good I 
wonder whether it is fixable at all. Is this an inherent issue of BTRFS or 
more generally COW filesystem design?

I think it got somewhat better. It took much longer to come into that state 
again than last time, but still, blocking like this is *no* option for a 
*production ready* filesystem.

I am seriously consider to switch to XFS for my production laptop again. Cause 
I never saw any of these free space issues with any of the XFS or Ext4 
filesystems I used in the last 10 years.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 25+ messages in thread