On Mon, 2015-12-28 at 20:31 -0500, Sanidhya Solanki wrote:
> What is your experience like about running a production system on
> what
> is essentially a beta product? Crashes?
What do you mean? btrfs? I'm not yet running it in production (there
was a subthread recently, where I've explained a bit more why).

But right now I re-organise a lot of our data-pools and consider to set
up those, which serve just as replica holders, with btrfs.
But the RAID would likely be still from the HW controller.


> Would something like ZFS not be more suited to your environment?
Well I guess that's my personal political decision... I simply think
that btrfs should and will be the next gen Linux main filesystem.
Plus that IIRC zfs-fuse is no unmaintained and linux-zfs not yet part
of Debian rules it anyway out, as I'd be tool lazy to compile it myself
(at least for work ;) ).


> Especially as not all disks will be full, and, if a disk was to fail,
> the entire disk would need to be rebuilt from parity drives (as
> opposed
> to ZFS only using the parity data, and not copying empty blocks
> (another feature that is planned for BTRFS))
Ah? I thought btrfs would already do that as well?

Well anyway... I did some comparison between HW RAID and MD RAID each
with ext4 and btrfs.
I haven't tried btrfs-RAID6 back then, since it's IMHO still too far
away from being production ready.

IIRC, there were be some (for us) interesting cases where MD RAID would
have been somewhat faster than HW RAID,... but there are some other
major IO patters (IIRC sequential read/write) where HW RAID was simply
magnitudes faster (no big surprise of course).


> I do not believe it would be possible to guarantee crash or error
> recovery when using an in-place rebuild, without slowing down the
> entire rebuild to cache each block before replacing it with the new
> block. That would slow it down considerably, as you would have to:
> 
> copy to cache > checksum > write in place on disk > checksum new data
> >
> verify checksums
I'm not sure what you mean by "cache"... wouldn't btrfs' CoW mean that
you "just" copy the data, and once this is done, update the metadata
and things would be either consistent or they would not (and in case of
a crash still point to the old, not yet reshaped, data)?

A special case were of course nodatacow'ed data.... there one may need
some kind of cache or journal... (see the other thread of mine, where I
ask for checksumming with no-CoWed data =) ).


> I suppose that is the only proper way to do it anyway, but it will
> definitely be slow.
From my PoV... slowness doesn't matter *that* much here anyway, while
consitency/safety does.
I mean reshaping a RAID wouldn't be something you'd do every month (at
least not in production systems - test systems are of course another
case).
Once I'd have determined that another RAID chunk size would perform
*considerably* better than the current, I'd reshape... and whether that
runs than for a week or two... as long as it happens online and as long
as I can control a bit how much IO is spent on the reshape: who cares?


Cheers,
Chris.