linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How robust is BTRFS?
@ 2020-12-03  2:53 Jens Bauer
  2020-12-03  7:59 ` Martin Steigerwald
  2020-12-03 10:59 ` Qu Wenruo
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Bauer @ 2020-12-03  2:53 UTC (permalink / raw)
  To: linux-btrfs

Hi all.

The BTRFS developers deserves some credit!

This is a testimony from a BTRFS-user.
For a little more than 6 months, I had my server running on BTRFS.
My setup was several RAID-10 partitions.
As my server was located on a remote island and I was about to leave, I just added two more harddisks, to make sure that the risk of failure would be minimal. Now I had four WD10JFCX on the EspressoBIN server running Ubuntu Bionic Beaver.

Before I left, I *had* noticed some beep-like sounds coming from one of the drives, but it seemed OK, so I didn't bother with it.

So I left, and 6 months later, I noticed that one of my 'partitions' were failing, so I thought I might go back and replace the failing drive. The journey takes 6 hours.

When I arrived, I noticed more beep-like sounds than when I left half a year earlier.
But I was impressed that my server was still running.

I decided to make a backup and re-format all drives, etc.

The drives were added in one-by-one, and I noticed that when I added the third drive, again I started hearing that sound I disliked so much.

After replacing the port-multiplier, I didn't notice any difference.

"The power supply!" I thought.. Though it's a 3A PSU and should easily handle four 2.5" WS10JFCX drives, it could be that the specs were possibly a little decorated, so I found myself a MeanWell IRM-60-5ST supply and used that instead.

Still the same noise.

I then investigated all the cables; lo and behold, silly me had used a china-pigtail for a barrel-connector, where the wires on the pigtail were so incredibly thin that they could not carry the current, resulting in the voltage being lowered the more drives I added.

I re-did my power cables and then everything worked well.

...

After correcting the problem, I got curious and listed the statistics for each partition.
I had more than 100000 read/write errors PER DAY for 6 months.
That's around 18 million read/write-errors, caused by drives turning on/off "randomly".

AND ALL MY FILES WERE INTACT.

This is on the border to being impossible.

I believe that no other file system would be able to survive such conditions.
-And the developers of this file system really should know what torture it's been through without failing.
Yes, all files were intact. I tested all those files that I had backed up 6 months earlier against against those that were on the drives; there were no differences - they were binary identical.

Today, my EspressoBIN + JMB575 port multiplier + four WD20JFCX drives are doing well. No read/write errors have occurred since I replaced my power cable. I upgraded to Focal Fossa and the server has become very stable and usable. I will not recommend the EspressoBIN (I bought two of them and one is failing periodically); instead I'll recommend Solid-Run's products, which are top-quality and well-tested before shipping.

So this testimony will hopefully encourage others to use BTRFS.
Besides a robust file system, you get a file system that's absolutely rapid (I'm used to HFS+ on a Mac with a much faster CPU - but BTRFS is still a lot faster).
You also get real good tools for manipulating the file system and you can add / remove drives on-the-fly.

Thank you to everyone who worked tirelessly on BTRFS - and also thank you to those who only contributed a correction of a spelling-mistake. Everything counts!


Love
Jens

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How robust is BTRFS?
  2020-12-03  2:53 How robust is BTRFS? Jens Bauer
@ 2020-12-03  7:59 ` Martin Steigerwald
  2020-12-03  8:55   ` Jens Bauer
  2020-12-03 10:59 ` Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Martin Steigerwald @ 2020-12-03  7:59 UTC (permalink / raw)
  To: linux-btrfs, Jens Bauer

Jens Bauer - 03.12.20, 03:53:11 CET:
> After correcting the problem, I got curious and listed the statistics
> for each partition. I had more than 100000 read/write errors PER DAY
> for 6 months. That's around 18 million read/write-errors, caused by
> drives turning on/off "randomly".
> 
> AND ALL MY FILES WERE INTACT.

Awesome! Really awesome!

I am running BTRFS on a ThinkPad T520 since at least 2014. After all 
these initial free space related issues went away with linux 4.5 or 4.6 
I had no issues with it anymore. In part I use BTRFS RAID 1 with an 
mSATA SSD on the laptop and it recovered from what I believe had been 
power loss related errors on the mSATA SSD twice already. Of course that 
is not anywhere near the dimension of errors the filesystem you have 
experienced.

I use it on my backup drives and I use it on my server VMs.

It works for me.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How robust is BTRFS?
  2020-12-03  7:59 ` Martin Steigerwald
@ 2020-12-03  8:55   ` Jens Bauer
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Bauer @ 2020-12-03  8:55 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs

Hi Martin.

I'm happy to hear that you have good experience with BTRFS as well. =)
-One thing I forgot to mention is that I'm also quite impressed with Western Digital's drives; they seem to last, even under such horrible conditions I put them through. ;)

It seems, after all, that one does not *have* to learn from failure. =D


Love
Jens

On Thu, 03 Dec 2020 08:59:38 +0100, Martin Steigerwald wrote:
> Jens Bauer - 03.12.20, 03:53:11 CET:
>> After correcting the problem, I got curious and listed the statistics
>> for each partition. I had more than 100000 read/write errors PER DAY
>> for 6 months. That's around 18 million read/write-errors, caused by
>> drives turning on/off "randomly".
>> 
>> AND ALL MY FILES WERE INTACT.
> 
> Awesome! Really awesome!
> 
> I am running BTRFS on a ThinkPad T520 since at least 2014. After all 
> these initial free space related issues went away with linux 4.5 or 4.6 
> I had no issues with it anymore. In part I use BTRFS RAID 1 with an 
> mSATA SSD on the laptop and it recovered from what I believe had been 
> power loss related errors on the mSATA SSD twice already. Of course that 
> is not anywhere near the dimension of errors the filesystem you have 
> experienced.
> 
> I use it on my backup drives and I use it on my server VMs.
> 
> It works for me.
> 
> Best,
> -- 
> Martin
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How robust is BTRFS?
  2020-12-03  2:53 How robust is BTRFS? Jens Bauer
  2020-12-03  7:59 ` Martin Steigerwald
@ 2020-12-03 10:59 ` Qu Wenruo
  2020-12-03 19:13   ` Jens Bauer
  1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2020-12-03 10:59 UTC (permalink / raw)
  To: Jens Bauer, linux-btrfs



On 2020/12/3 上午10:53, Jens Bauer wrote:
> Hi all.
> 
> The BTRFS developers deserves some credit!
> 
> This is a testimony from a BTRFS-user.
> For a little more than 6 months, I had my server running on BTRFS.
> My setup was several RAID-10 partitions.

You should be proud for not using RAID5/6. :)

> As my server was located on a remote island and I was about to leave, I just added two more harddisks, to make sure that the risk of failure would be minimal. Now I had four WD10JFCX on the EspressoBIN server running Ubuntu Bionic Beaver.
> 
> Before I left, I *had* noticed some beep-like sounds coming from one of the drives, but it seemed OK, so I didn't bother with it.
> 
> So I left, and 6 months later, I noticed that one of my 'partitions' were failing, so I thought I might go back and replace the failing drive. The journey takes 6 hours.
> 
> When I arrived, I noticed more beep-like sounds than when I left half a year earlier.
> But I was impressed that my server was still running.
> 
> I decided to make a backup and re-format all drives, etc.
> 
> The drives were added in one-by-one, and I noticed that when I added the third drive, again I started hearing that sound I disliked so much.
> 
> After replacing the port-multiplier, I didn't notice any difference.
> 
> "The power supply!" I thought.. Though it's a 3A PSU and should easily handle four 2.5" WS10JFCX drives, it could be that the specs were possibly a little decorated, so I found myself a MeanWell IRM-60-5ST supply and used that instead.
> 
> Still the same noise.
> 
> I then investigated all the cables; lo and behold, silly me had used a china-pigtail for a barrel-connector, where the wires on the pigtail were so incredibly thin that they could not carry the current, resulting in the voltage being lowered the more drives I added.
> 
> I re-did my power cables and then everything worked well.
> 
> ...
> 
> After correcting the problem, I got curious and listed the statistics for each partition.
> I had more than 100000 read/write errors PER DAY for 6 months.
> That's around 18 million read/write-errors, caused by drives turning on/off "randomly".
> 
> AND ALL MY FILES WERE INTACT.
> 
> This is on the border to being impossible.

I would say, yeah, really impressive, even to a btrfs developer.

Btrfs RAID10/RAID1 by design is really good, since it has the extra
checksum to make everything under check, thus unlike regular RAID10
which can only handle missing device once, it knows which data is
incorrect and will then just retry the good copy, and recovery the bad one.

Which means, btrfs can even handle extreme cases like 4 devices raid10,
and each disk disappear for a while, but never 2 disks disappear together.

But in your case, you really put btrfs failover to limit.

In fact, I have to check the code just now to make sure that btrfs can
tolerant metadata writeback error.
My original idea is, no, btrfs should just error out for even single
device metadata writeback, but it turns out that barrier and super block
writeback really can tolerant multi-device error.

Anyway, feel great that btrfs really helped you.

Thanks,
Qu

> 
> I believe that no other file system would be able to survive such conditions.
> -And the developers of this file system really should know what torture it's been through without failing.
> Yes, all files were intact. I tested all those files that I had backed up 6 months earlier against against those that were on the drives; there were no differences - they were binary identical.
> 
> Today, my EspressoBIN + JMB575 port multiplier + four WD20JFCX drives are doing well. No read/write errors have occurred since I replaced my power cable. I upgraded to Focal Fossa and the server has become very stable and usable. I will not recommend the EspressoBIN (I bought two of them and one is failing periodically); instead I'll recommend Solid-Run's products, which are top-quality and well-tested before shipping.
> 
> So this testimony will hopefully encourage others to use BTRFS.
> Besides a robust file system, you get a file system that's absolutely rapid (I'm used to HFS+ on a Mac with a much faster CPU - but BTRFS is still a lot faster).
> You also get real good tools for manipulating the file system and you can add / remove drives on-the-fly.
> 
> Thank you to everyone who worked tirelessly on BTRFS - and also thank you to those who only contributed a correction of a spelling-mistake. Everything counts!
> 
> 
> Love
> Jens
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How robust is BTRFS?
  2020-12-03 10:59 ` Qu Wenruo
@ 2020-12-03 19:13   ` Jens Bauer
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Bauer @ 2020-12-03 19:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hi Qu.

On Thu, 3 Dec 2020 18:59:35 +0800, Qu Wenruo wrote:
>> The BTRFS developers deserves some credit!
>> 
>> My setup was several RAID-10 partitions.
> 
> You should be proud for not using RAID5/6. :)

Yes, I did investigate the status before I decided which RAID type to use.
I prioritize high throughput and stability over space.

>> After correcting the problem, I got curious and listed the 
>> statistics for each partition.
>> I had more than 100000 read/write errors PER DAY for 6 months.
>> That's around 18 million read/write-errors, caused by drives turning 
>> on/off "randomly".

(I remember some of the 'more than 100000 per day' to be 119xxx, so it may easily have been more than 20 million errors).

>> AND ALL MY FILES WERE INTACT.
>> 
>> This is on the border to being impossible.
> 
> I would say, yeah, really impressive, even to a btrfs developer.

I actually expected it would be. ;)
There are still things I forgot to mention in my first post:
A few of the RAID-partitions were in RAID0 configuration and files there were also intact.
(Had it been any other RAID0, I'd have lost every file on those partitions, no doubt!)
-Another thing I forgot to mention is the total usage was around 1.5TB out of a total of 2TB, and verifying that my files were intact took days, as I did a byte-by-byte comparison.
The drives mainly store: More than 170 Web-sites, mail for 4 domains, a lot of video files on a NAS and archives containing source-code (like GCC) for local caching.

> Btrfs RAID10/RAID1 by design is really good, since it has the extra
> checksum to make everything under check, thus unlike regular RAID10
> which can only handle missing device once, it knows which data is
> incorrect and will then just retry the good copy, and recovery the bad one.

That's something I really sensed when I saw what my files survived. =)

> Which means, btrfs can even handle extreme cases like 4 devices raid10,
> and each disk disappear for a while, but never 2 disks disappear together.

I had the impression that it would be able to handle two disappearances at the same time, but not 3 - but if it's limited by the design, I won't argue - you know the inner workings better than I. ;)

> But in your case, you really put btrfs failover to limit.

Completely unintended, but now you know how to make an extreme test-setup: Just make sure the drives don't get enough current. ;)

> Anyway, feel great that btrfs really helped you.

My experience with BTRFS made me want to use it in every possible place I can.
I'm even thinking of doing silly things like iSCSI for Mac, hosting HFS+ images on a BTRFS (I'm convinced it would even speed up HFS+).

> Thanks,
> Qu

I'm really the one who need to thank you. ;)
-May everyone on this list have a wonderful Christmas. =)


Love
Jens

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-03 19:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03  2:53 How robust is BTRFS? Jens Bauer
2020-12-03  7:59 ` Martin Steigerwald
2020-12-03  8:55   ` Jens Bauer
2020-12-03 10:59 ` Qu Wenruo
2020-12-03 19:13   ` Jens Bauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).