BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions

All of lore.kernel.org
 help / color / mirror / Atom feed

* BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
@ 2016-02-05 19:36 Mackenzie Meyer
  2016-02-06  8:43 ` Duncan
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Mackenzie Meyer @ 2016-02-05 19:36 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I've tried checking around on google but can't find information
regarding the RAM requirements of BTRFS and most of the topics on
stability seem quite old.

So first would be memory requirements, my goal is to use deduplication
and compression. Approximately how many GB of RAM per TB of storage
would be recommended?

RAID 6 write holes?
The BTRFS wiki states that parity might be inconsistent after a crash.
That said, the wiki page for RAID 5/6 doesn't look like it has much
recent information on there. Has this issue been addressed and if not,
are there plans to address the RAID write hole issue? What would be a
recommended workaround to resolve inconsistent parity, should an
unexpected power down happen during write operations?

RAID 6 stability?
Any articles I've tried looking for online seem to be from early 2014,
I can't find anything recent discussing the stability of RAID 5 or 6.
Are there or have there recently been any data corruption bugs which
impact RAID 6? Would you consider RAID 6 safe/stable enough for
production use?

Do you still strongly recommend backups, or has stability reached a
point where backups aren't as critical? I'm thinking from a data
consistency standpoint, not a hardware failure standpoint.

I plan to start with a small array and add disks over time. That said,
currently I have mostly 2TB disks and some 3TB disks. If I replace all
2TB disks with 3TB disks, would BTRFS then start utilizing the full
3TB capacity of each disk, or would I need to destroy and rebuild my
array to benefit from the larger disks?

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-05 19:36 BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions Mackenzie Meyer
@ 2016-02-06  8:43 ` Duncan
  2016-02-09 14:07 ` Psalle
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Duncan @ 2016-02-06  8:43 UTC (permalink / raw)
  To: linux-btrfs

Mackenzie Meyer posted on Fri, 05 Feb 2016 14:36:33 -0500 as excerpted:

> Hello,
> 
> I've tried checking around on google but can't find information
> regarding the RAM requirements of BTRFS and most of the topics on
> stability seem quite old.
> 
> So first would be memory requirements, my goal is to use deduplication
> and compression. Approximately how many GB of RAM per TB of storage
> would be recommended?

The inline dedup patches are just reaching maturity and mainlining right 
now, and aren't in a release yet.  Dedup's not my use-case so I've not 
been following it /too/ closely, but briefly, as it's shaping up there's 
going to be two backends available, an on-device backend that will be a 
bit slower but should be more efficient at deduping, and an in-memory 
backend that will be fast but less efficient.  Dedup memory usage is 
configurable, however, with a rather low default of (IIRC) some MiB, 
certainly nothing like GiB, unless of course you configure it that way 
(which you may wish to if you have the memory and choose the memory-based 
backend).

The memory issue is thus neither dedup nor compression (which is on-the-
fly and uses very little additional memory).

Instead, btrfs memory issues tend to be scaling related, specifically, 
related to the number of subvolumes/snapshots, and to whether you have 
quotas activated.

I'm actually not sure what the current quota status is as of 4.4 and the 
in-development 4.5, but definitely, previous to that, quotas simply were 
not stable and had known-broken corner-cases as well as serious scaling 
issues, so the recommendation has been, you either need them or you 
don't: if you need them, use a filesystem where quotas are mature and 
stable; if not, use btrfs but leave quotas disabled as they have 
definitely caused many a user serious headaches as they simply didn't 
work well.  Unless of course you're specifically working with the devs to 
test current quota code, reporting results and potentially running 
debugging patches to get more info when there's problems, in which case 
go right ahead, as you're one of the folks that's helping to eventually 
make the feature stable and workable enough to actually depend on.

Btrfs snapshots/subvolumes are of course a relatively more stable btrfs 
feature and are regularly used by many.  But there remain scaling issues 
there as well, such that the recommended total filesystem cap on number 
of subvolumes (including snapshots, which are a special kind of subvolume 
that simply happens to share most of its data with other subvolumes) is 
no more than 1000-3000.  The number of snapshots of individual 
subvolumes, meanwhile, should be kept to 250 or so, which is actually 
pretty reasonable even if you're starting with for instance half-hourly 
auto-snapshotting (using snapper or the like), as long as a reasonable 
snapshot thinning schedule is followed as well.

(As I've posted several times, starting with half-hourly snapshots, 
thinning to hourly after say 12 hours, 2-hourly after 48 hours, daily 
after a week, and weekly after perhaps 13 weeks (a quarter), then keeping 
the remaining weeklys for another year, so 15 months of snapshots total, 
after which if you haven't backed up to other media you obviously aren't 
worried too much about losing the data anyway, you're still reasonably 
close to 250-300 snapshots per subvolume, thus allowing snapshotting of 
up to eight subvolumes on a similar program while still staying within 
the filesystem cap of 2000 or so snapshots/subvolumes.)

Because with btrfs snapshotting being so easy, what we too often see if 
people aren't warned about it, are people with 100K snapshots or the 
like.  And during normal runtime, other than perhaps some slowdown due to 
fragmentation, etc, even that works deceptively well.

Where the _problem_ occurs, however, is when you try to actually do 
filesystem maintenance on that monster!  Both btrfs balance and btrfs 
check slow down *dramatically*, to the point of practical unworkability, 
somewhere in the double-digit-K (tens of thousands) snapshots range.  
With 3000 it's noticeably slower than with only 1000, but while slow, 
it's still _usable_.  Which is why the 1000-3000 range.  Depending on 
people's pain point vs snapshotting needs, 1000 snapshots shouldn't be 
much of a problem yet, but might not be enough for people with more than 
3-4 subvolumes they want to keep snapshotted, while 3000 snapshots will 
already be hitting the pain point for many, but may be needed and still 
just fast enough to be tolerably usable given the tradeoff, for people 
with many subvolumes they want to keep snapshotted.

And based on reports, btrfs check with tens of thousands of snapshots is 
where memory usage goes thru the roof as well.  I'm not actually sure 
about balance in terms of memory usage, tho it's definitely much slower 
with that many snapshots.

It can also be noted that the problem affects snapshot deletion (unlike 
snapshot creation which is effectively instantaneous, thus making it 
deceptively easy for the unaware to get in such a hole with hundreds of 
thousands of them, if they're doing scheduled snapshots but don't have a 
snapshot thinning schedule setup as well) as well, since btrfs has to go 
thru and sort out which other snapshots reference the same extents and 
either delete the extents or simply reduce the reference count 
accordingly, and if there's 100K snapshots to process, that can be 
expected to take awhile.

Meanwhile, that's the problem for quotas as well, as they apparently at 
least double the problem compared to snapshots without quotas.  And 
because at least until very recently (current status unknown) they've 
actually been broken and not reliable anyway, it simply hasn't been worth 
the hassle, thus the "just turn them off, or if you actually need them, 
use a filesystem where they're actually stable and reliable, not btrfs as 
that's anything but the case here" recommendation.

> RAID 6 write holes?
> The BTRFS wiki states that parity might be inconsistent after a crash.
> That said, the wiki page for RAID 5/6 doesn't look like it has much
> recent information on there. Has this issue been addressed and if not,
> are there plans to address the RAID write hole issue? What would be a
> recommended workaround to resolve inconsistent parity, should an
> unexpected power down happen during write operations?

My own use-case is raid1 (preferably N-way-mirroring raid1 like mdraid, 
but with btrfs runtime checksum verification, except that N-way-mirroring 
is still to come, with current btrfs being pair-mirroring only, 
unfortunately, so pair-mirroring I must be satisfied with, for now), so 
while I've followed the raid56 situation with academic interest, it's not 
personal interest so my detail knowledge is a bit more limited for raid56.

So I don't know for sure what the btrfs-specific status is on the raid56 
write hole.  What I _do_ know is that raid5 and 6 in general are known to 
have a write hole, that applies to parity-raid, in general.  Various 
specific implementations try to do various things to limit damage, but as 
it's a limitation of parity-raid technology in general, there's a limit 
to the degree the hole _CAN_ be worked around or plugged.  Certainly it's 
possible, but many implementations don't consider the complexity and 
performance tradeoffs to be worth it, vs. the risk for their target use-
case(s).

So while I don't know the btrfs specifics, other than that btrfs raid56 
modes do, like most other raid56 implementations, have a write hole to 
worry about, I do know that the problem is a general parity-raid problem, 
not btrfs-specific (tho due to btrfs' per-chunk raid vs. the more usual 
per-device raid, it is indeed quite likely that the effect of the write 
hole on btrfs is different, and may be much more likely to trigger 
problems... I simply don't know further detail in that area).

> RAID 6 stability?
> Any articles I've tried looking for online seem to be from early 2014,
> I can't find anything recent discussing the stability of RAID 5 or 6.
> Are there or have there recently been any data corruption bugs which
> impact RAID 6?

Back when btrfs raid56 mode was first nominally complete in kernel 3.19, 
about a year ago, I warned people not to consider it at all stable until 
at _least_ a year, five kernel cycles, after initial nominal completion.  
Turns out I was right, and there were several pretty serious bugs in 
raid56 mode for 3.19, 4.0 and into the early 4.1 cycle (tho I believe the 
fixes were in well before 4.1 release).

I also suggested that a further stability-recommendation requirement, 
from my point of view anyway, was at least two full kernel cycles without 
serious raid56 bugs.  Thus, while 4.4 does complete the year, the 
question now becomes one of whether there have been any serious raid56 
bugs since the last "blocker-level" bug was fixed in early 4.1.

Keeping in mind that following filesystem failure reports on the lists of 
even so-called "stable" filesystems like ext4 certainly isn't for the 
faint of heart, because there you only see the problems not the tens to 
hundreds of thousands of "no problem" installs...

I'd honestly call it an open question.

There have certainly been a couple reports of failure to recover from 
device loss as expected into 4.3 at least, but I don't know if they've 
actually been traced to raid56 mode problems, or if they're unrelated 
bugs, or maybe simply related to that previously discussed write hole...

Personally, if I had to call it right now, I'd say treat raid56 as 
borderline stable, definitely not yet to the stability level of the rest 
of btrfs in general, but also reasonably obviously beyond the initial 
"teething problems" bugs, as there's reports of bugs that _could_ be 
raid56 related, but to my knowledge at least, there's been nothing 
definitely pinned to raid56 bugs since the last blocker-level bugs were 
fixed in 4.1.

If there's time, I'd definitely prefer to give it another couple kernel 
cycles, to 4.6 or so, after which 4.4 as an LTS kernel will get 4.6's bug-
fix backports, so assuming no big raid56 bugs show up by then, once it 
gets those backports I'd probably consider 4.4-LTS as btrfs raid56 stable 
as 4.6.

So 4.4 should be healthily developing toward btrfs raid56 stable, but I'd 
still not consider raid56 mode as stable as the rest of btrfs until 4.6 
or so, that of course assuming no bad raid56 bugs appear in the mean time.

There *IS* one known caveat at this point, however.  Raid56 parity 
rebuild or balance to more/fewer devices can be ***VERY*** slow at this 
point -- but isn't for everyone.  We have multiple reports and at least 
one independent test confirmation of those reports to that effect.  We're 
talking 2 MiB/sec slow... One guy doing a raid6 reshape from 10 devices 
to 12 indicated 3% completion in three days, 1%/day so ~ 100 days to 
complete, tho he was able to continue using the filesystem for other 
things, with longer IO times, of course, while it was happening.

Again, not everyone is seeing it, but it's common enough that there's 
probably something going on there that we don't know about yet, that 
needs fixed.

FWIW, here's a gmane-archive link to the most informative recent thread 
on the problem:

http://comments.gmane.org/gmane.comp.file-systems.btrfs/52469

Tho something just occurred to me... I wonder if the problem might 
actually be the snapshots and/or quota scaling issues discussed above.  
That scaling issue is definitely one entirely unrelated to raid56 mode 
problem that we already know about, and if they have quotas and 100K or 
so snapshots, it'd pretty well explain things, because that's /exactly/ 
the sort of maintenance-time problems triggered with too many snapshots 
and/or fewer but also active quotas.

> Would you consider RAID 6 safe/stable enough for production use?
> 
> Do you still strongly recommend backups, or has stability reached a
> point where backups aren't as critical? I'm thinking from a data
> consistency standpoint, not a hardware failure standpoint.

This actually could be your show-stopper, but not for the reason you 
think.

On this list, btrfs _in_ _general_ is still considered "stabilizING, not 
yet fully stable and mature, and not yet ready for 'production use.'"  
Let me emphasize that.  It's *not* just raid56 mode, which is as 
explained above not yet as stable as btrfs in general, but ALL OF BTRFS 
that is not yet considered fully stable, not yet ready for production 
usage, and *DEFINITELY* backups recommended!

In fact, I've actually developed a bit of a reputation on this list for 
drumming this point home in various levels of detail, depending on how 
detailed I feel like being:

The sysadmin's first rule of backups states that for any level of backup 
and the corresponding risk factor of having to use it, your data is 
either worth the hassle and resources necessary to do that (additional?) 
level of backup, or it's not.  Absolutely trivial data, internet cache, 
on Linux, probably the local packages cache, etc, is either easily 
redownloaded/recreated, or simply not worth worrying about at all, and is 
thus likely not worth even a single level of backup.  OTOH, extremely 
valuable data may be worth 101 levels of backup or more, some offsite in 
multiple locations to protect against disaster outages, etc, because the 
data is simply valuable enough that even given the extremely small risk 
of losing or finding bad all 100 previous levels of backup at the same 
time, it's still worth that 101 level (more?) of backup, just in case.

That's for *ANY* filesystem, including the most mature and stable ones.  
Put in simpler form, if you don't even have a single level of backup, you 
are by your actions, defining that data as of trivial value at best, 
since the risk of having to actually use that primary backup isn't 
trivial at all, even on the most stable and mature filesystem on proven 
stable hardware.

Of course with btrfs itself being still stabilizing, not yet fully stable 
and mature, the risk factor of actually having to use that backup is 
higher, high enough that arguably, you consider the working copy a throw-
away copy, and the first level of backup your actual primary copy, such 
that a second level of backup can actually be considered your first level 
of backup.

And of course as I said, btrfs raid56 mode, while developing in a healthy 
way with no known show-stoppers for a couple kernels, still isn't yet 
what I'd call quite as stable as btrfs in general, so that ups the risk 
factor yet again.

So seriously, either have that backup made _before_ you need it, or be 
glad when you lose the data that after all you only lost the trivial 
stuff, because your actions defined the time and resources saved by NOT 
doing the backup to be worth more than the data you were risking losing, 
and you saved it even if you did lose the data, which means you really 
CAN still be happy, because you really DID save what your actions defined 
as most important to you. =:^)

And if you don't like the way that sounds, seriously, do NOT consider 
raid56 mode at this time, and really, you should be reconsidering even 
thinking about btrfs at all, because you need stability in your 
filesystem that btrfs simply isn't ready to provide, yet.  (Tho I do 
wonder if the ext4, or for that matter, pretty much any other filesystem 
devs, would consider their filesystem _that_ stable, to be ready to 
handle those who aren't willing to make backups, who then blame the 
filesystem instead of their priority-defining actions when data is 
inevitably lost -- because on ANY filesystem and hardware, it's not if, 
it's when, that being the whole reason behind the sysadmin's first rule 
of backups and those multiple levels of backup in the first place.)

> I plan to start with a small array and add disks over time. That said,
> currently I have mostly 2TB disks and some 3TB disks. If I replace all
> 2TB disks with 3TB disks, would BTRFS then start utilizing the full 3TB
> capacity of each disk, or would I need to destroy and rebuild my array
> to benefit from the larger disks?

With raid6, btrfs needs four devices to allocate new raid6 chunks.  
Additional devices with space available simply increase the width of the 
stripe and (to a limit) the size of the chunk.  Allocation is is width-
first, using all devices with space available.

So with a mix of 2TB and 3TB devices, btrfs raid6 would allocate across 
all devices until the 2TB devices are full, after which, as long as there 
are still at least four 3TB devices available with their remaining free 
space, it'll continue to allocate additional chunks to just them.

When you add devices, they will of course have much more space available 
than the others.  So btrfs will start allocating to them too, but will 
still allocate from the old devices as well until they are entirely out 
of room.  As such, unless a balance is done to reallocate existing 
chunks, if you add one device at a time, you may come to a point where 
there's no longer unallocated space on older devices, only on new 
devices, and there's not at least four of them, so btrfs will be unable 
to allocate additional raid6 chunks, and the space on the odd new devices 
will be unusable in raid6 mode at least until you add more devices so 
there's unallocated free space on at least four of them again.

You can of course do a rebalance after adding new devices so existing 
stripes are rewritten broader, over the new devices as well (and 
similarly, btrfs device remove will trigger a reshape-rebalance to narrow 
the stripes and put the data that was on that removed device elsewhere) 
but as mentioned above, at least right now, some people are finding that 
operation to be *extremely* slow.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-05 19:36 BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions Mackenzie Meyer
  2016-02-06  8:43 ` Duncan
@ 2016-02-09 14:07 ` Psalle
  2016-02-09 20:39 ` Chris Murphy
  2016-02-10 10:16 ` Psalle
  3 siblings, 0 replies; 11+ messages in thread
From: Psalle @ 2016-02-09 14:07 UTC (permalink / raw)
  To: Mackenzie Meyer, linux-btrfs



On 05/02/16 20:36, Mackenzie Meyer wrote:
> Hello,
>
> I've tried checking around on google but can't find information
> regarding the RAM requirements of BTRFS and most of the topics on
> stability seem quite old.

To keep my answer short: every time I've tried (offline) deduplication 
or raid5 pools I've ended with borked filesystems. Last attempt was 
about a year ago. Given that the pages you mention looked the same by 
then, I'd stay away of raid56 for anything but testing purposes. I 
haven't read anything about raid5 that increases my confidence in it 
recently (i.e. post 3.19 kernels). Dedup, OTOH, I don't know. What I 
used were third-party (I think?) things so the fault may have rested on 
them and not btrfs (does that makes sense?)

I'm building a new small raid5 pool as we speak, though, for throw-away 
data, so I hope to be favourably impressed.

Cheers.

> So first would be memory requirements, my goal is to use deduplication
> and compression. Approximately how many GB of RAM per TB of storage
> would be recommended?
>
> RAID 6 write holes?
> The BTRFS wiki states that parity might be inconsistent after a crash.
> That said, the wiki page for RAID 5/6 doesn't look like it has much
> recent information on there. Has this issue been addressed and if not,
> are there plans to address the RAID write hole issue? What would be a
> recommended workaround to resolve inconsistent parity, should an
> unexpected power down happen during write operations?
>
> RAID 6 stability?
> Any articles I've tried looking for online seem to be from early 2014,
> I can't find anything recent discussing the stability of RAID 5 or 6.
> Are there or have there recently been any data corruption bugs which
> impact RAID 6? Would you consider RAID 6 safe/stable enough for
> production use?
>
> Do you still strongly recommend backups, or has stability reached a
> point where backups aren't as critical? I'm thinking from a data
> consistency standpoint, not a hardware failure standpoint.
>
> I plan to start with a small array and add disks over time. That said,
> currently I have mostly 2TB disks and some 3TB disks. If I replace all
> 2TB disks with 3TB disks, would BTRFS then start utilizing the full
> 3TB capacity of each disk, or would I need to destroy and rebuild my
> array to benefit from the larger disks?
>
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-05 19:36 BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions Mackenzie Meyer
  2016-02-06  8:43 ` Duncan
  2016-02-09 14:07 ` Psalle
@ 2016-02-09 20:39 ` Chris Murphy
  2016-02-10 13:57   ` Austin S. Hemmelgarn
  2016-02-10 10:16 ` Psalle
  3 siblings, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2016-02-09 20:39 UTC (permalink / raw)
  To: Mackenzie Meyer; +Cc: Btrfs BTRFS

On Fri, Feb 5, 2016 at 12:36 PM, Mackenzie Meyer <snackmasterx@gmail.com> wrote:

>
> RAID 6 write holes?

I don't even understand the nature of the write hole on Btrfs. If
modification is still always COW, then either an fs block, a strip, or
whole stripe write happens, I'm not sure where the hole comes from. It
suggests some raid56 writes are not atomic.

If you're worried about raid56 write holes, then a.) you need a server
running this raid where power failures or crashes don't happen b.)
don't use raid56 c.) use ZFS.

> RAID 6 stability?
> Any articles I've tried looking for online seem to be from early 2014,
> I can't find anything recent discussing the stability of RAID 5 or 6.
> Are there or have there recently been any data corruption bugs which
> impact RAID 6? Would you consider RAID 6 safe/stable enough for
> production use?

It's not stable for your use case, if you have to ask others if it's
stable enough for your use case. Simple as that. Right now some raid6
users are experiencing remarkably slow balances, on the order of
weeks. If device replacement rebuild times are that long, I'd say it's
disqualifying for most any use case, just because there are
alternatives that have better fail over behavior than this. So far
there's no word from any developers what the problem might be, or
where to gather more information. So chances are they're already aware
of it but haven't reproduced it, or isolated it, or have a fix for it
yet.

If you're prepared to make Btrfs better in the event you have a
problem, with possibly some delay in getting that volume up and
running again (including the likelihood of having to rebuild it from a
backup), then it might be compatible with your use case.

> Do you still strongly recommend backups, or has stability reached a
> point where backups aren't as critical? I'm thinking from a data
> consistency standpoint, not a hardware failure standpoint.

You can't separate them. On completely stable hardware, stem to stern,
you'd have no backups, no Btrfs or ZFS, you'd just run linear/concat
arrays with XFS, for example. So you can't just hand wave the hardware
part away. There are bugs in the entire storage stack, there are
connectors that can become intermittent, the system could crash. All
of these affect data consistency.

Stability has not reach a point where backups aren't as critical. I
don't really even know what that means though. No matter Btrfs or not,
you need to be doing backups such that if the primary stack is a 100%
loss without notice, is not a disaster. Plan on having to use it. If
you don't like the sound of that, look elsewhere.

> I plan to start with a small array and add disks over time. That said,
> currently I have mostly 2TB disks and some 3TB disks. If I replace all
> 2TB disks with 3TB disks, would BTRFS then start utilizing the full
> 3TB capacity of each disk, or would I need to destroy and rebuild my
> array to benefit from the larger disks?

Btrfs, or LVM raid, or  mdraid, and ZFS all let you grow arrays, each
has different levels of ease of doing this and how long it will take,
without having to recreate the file system from scratch.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-05 19:36 BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions Mackenzie Meyer
                   ` (2 preceding siblings ...)
  2016-02-09 20:39 ` Chris Murphy
@ 2016-02-10 10:16 ` Psalle
  3 siblings, 0 replies; 11+ messages in thread
From: Psalle @ 2016-02-10 10:16 UTC (permalink / raw)
  To: Mackenzie Meyer, linux-btrfs

On 05/02/16 20:36, Mackenzie Meyer wrote:
> RAID 6 stability?
I'll say more: currently, btrfs is in a state of flux where if you don't 
have a very recent kernel that's the first recommendation you're going 
to receive in case of problems. This means going out of stable packages 
in most distros.

Once you're in the bleeding kernel edge, you are obviously more likely 
to run into undiscovered bugs. I even see here people that has to patch 
the kernel with still non-mainline patches when trying to recover.

So don't for anything but testing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-09 20:39 ` Chris Murphy
@ 2016-02-10 13:57   ` Austin S. Hemmelgarn
  2016-02-10 19:06     ` Chris Murphy
  0 siblings, 1 reply; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2016-02-10 13:57 UTC (permalink / raw)
  To: Chris Murphy, Mackenzie Meyer; +Cc: Btrfs BTRFS

On 2016-02-09 15:39, Chris Murphy wrote:
> On Fri, Feb 5, 2016 at 12:36 PM, Mackenzie Meyer <snackmasterx@gmail.com> wrote:
>
>>
>> RAID 6 write holes?
>
> I don't even understand the nature of the write hole on Btrfs. If
> modification is still always COW, then either an fs block, a strip, or
> whole stripe write happens, I'm not sure where the hole comes from. It
> suggests some raid56 writes are not atomic.
It's an issue of torn writes in this case, not of atomicity of BTRFS. 
Disks can't atomically write more than sector size chunks, which means 
that almost all BTRFS filesystems are doing writes that disks can't 
atomically complete.  Add to that that we serialized writes to different 
devices, and it becomes trivial to lose some data if the system crashes 
while BTRFS is writing out a stripe (it shouldn't screw up existing data 
though, you'll just loose whatever you were trying to write).

One way to minimize this which would also boost performance on slow 
storage would be to avoid writing parts of the stripe that aren't 
changed (so for example, if only one disk in the stripe actually has 
changed data, only write that and the parities).
>
> If you're worried about raid56 write holes, then a.) you need a server
> running this raid where power failures or crashes don't happen b.)
> don't use raid56 c.) use ZFS.
It's not just BTRFS that has this issue though, ZFS does too, it just 
recovers more gracefully than BTRFS does, and even with the journaled 
RAID{5,6} support that's being added in MDRAID (and by extension DM-RAID 
and therefore LVM), it still has the same issue, it just moves it 
elsewhere (in this case, it has problems if there's a torn write to the 
journal).
>
>> RAID 6 stability?
>> Any articles I've tried looking for online seem to be from early 2014,
>> I can't find anything recent discussing the stability of RAID 5 or 6.
>> Are there or have there recently been any data corruption bugs which
>> impact RAID 6? Would you consider RAID 6 safe/stable enough for
>> production use?
>
> It's not stable for your use case, if you have to ask others if it's
> stable enough for your use case. Simple as that. Right now some raid6
> users are experiencing remarkably slow balances, on the order of
> weeks. If device replacement rebuild times are that long, I'd say it's
> disqualifying for most any use case, just because there are
> alternatives that have better fail over behavior than this. So far
> there's no word from any developers what the problem might be, or
> where to gather more information. So chances are they're already aware
> of it but haven't reproduced it, or isolated it, or have a fix for it
> yet.
Double on this, we should probably put something similar on the wiki, 
and this really applies to any feature, not just raid56.
>
>> Do you still strongly recommend backups, or has stability reached a
>> point where backups aren't as critical? I'm thinking from a data
>> consistency standpoint, not a hardware failure standpoint.
>
> You can't separate them. On completely stable hardware, stem to stern,
> you'd have no backups, no Btrfs or ZFS, you'd just run linear/concat
> arrays with XFS, for example. So you can't just hand wave the hardware
> part away. There are bugs in the entire storage stack, there are
> connectors that can become intermittent, the system could crash. All
> of these affect data consistency.
I may be wrong, but I believe the intent of this question was to try and 
figure out how likely BTRFS itself is to cause crashes or data 
corruption, independent of the hardware. In other words, 'Do I need to 
worry significantly about BTRFS in planning for disaster recovery, or 
can I focus primarily on the hardware itself?' or 'Is the most likely 
failure mode going to be hardware failure, or software?'. In general, 
right now I'd say that using BTRFS in traditional multi-device setup 
(nothing more than raid1 or possibly raid10), you've got roughly a 50% 
chance of an arbitrary crash being a software issue instead of hardware. 
Single disk, I'd say it's probably closer to 25%, and raid56 I'd say 
it's probably closer to 75%. By comparison, I'd say that with ZFS it's 
maybe a 5% chance (ZFS is developed as enterprise level software, it has 
to work, period), and with XFS on LVM raid, probably about 15% (similar 
to ZFS, XFS is supposed to be enterprise level software, the difference 
here comes from LVM, which has had some interesting issues recently due 
to incomplete testing of certain things before they got pushed upstream).
>
> Stability has not reach a point where backups aren't as critical. I
> don't really even know what that means though. No matter Btrfs or not,
> you need to be doing backups such that if the primary stack is a 100%
> loss without notice, is not a disaster. Plan on having to use it. If
> you don't like the sound of that, look elsewhere.
What your using has impact on how you need to do backups.  For someone 
who can afford long periods of down time for example, it may be 
perfectly fine to use something like Amazon S3 Glacier storage (which 
has a 4 hour lead time on restoration for read access) for backups. 
OTOH, if you can't afford more than a few minutes of down time and want 
to use BTRFS, you should probably have full on-line on-site backups 
which you can switch in on a moments notice while you fix things.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-10 13:57   ` Austin S. Hemmelgarn
@ 2016-02-10 19:06     ` Chris Murphy
  2016-02-10 19:59       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2016-02-10 19:06 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Mackenzie Meyer, Btrfs BTRFS

On Wed, Feb 10, 2016 at 6:57 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> It's an issue of torn writes in this case, not of atomicity of BTRFS. Disks
> can't atomically write more than sector size chunks, which means that almost
> all BTRFS filesystems are doing writes that disks can't atomically complete.
> Add to that that we serialized writes to different devices, and it becomes
> trivial to lose some data if the system crashes while BTRFS is writing out a
> stripe (it shouldn't screw up existing data though, you'll just loose
> whatever you were trying to write).

I follow all of this. I still don't know how a torn write leads to a
write hole in the conventional sense though. If the write is partial,
a pointer never should have been written to that unfinished write. So
the pointer that's there after a crash should either point to the old
stripe or new stripe (which includes parity), not to the new data
strips but an old (stale) parity strip for that partial stripe write
that was interrupted. It's easy to see how conventional raid gets this
wrong because it has no pointers to strips, those locations are known
due to the geometry (raid level, layout, number of devices) and fixed.
I don't know what rmw looks like on Btrfs raid56 without overwriting
the stripe - a whole new cow'd stripe, and then metadata is updated to
reflect the new location of that stripe?

> One way to minimize this which would also boost performance on slow storage
> would be to avoid writing parts of the stripe that aren't changed (so for
> example, if only one disk in the stripe actually has changed data, only
> write that and the parities).

I'm pretty sure that's part of rmw, which is not a full stripe write.
At least there appears to be some distinction in raid56.c between
them. The additional optimization that md raid has had for some time
is the ability during rmw of a single data chunk (what they call
strips, or the smallest unit in a stripe), they can actually optimize
the change down to a sector write. So they aren't even doing full
chunk/strip writes either. The parity strip though I think must be
completely rewritten.

>>
>>
>> If you're worried about raid56 write holes, then a.) you need a server
>> running this raid where power failures or crashes don't happen b.)
>> don't use raid56 c.) use ZFS.
>
> It's not just BTRFS that has this issue though, ZFS does too,

Well it's widely considered to not have the write hole. From a ZFS
conference I got this tidbit on how they closed the write hole, but I
still don't understand why they'd be pointing to a partial (torn)
write in the first place:

"key insight was realizing instead of treating a stripe as it's a
"stripe of separate blocks" you can take a block and break it up into
many sectors and have a stripe across the sectors that is of one logic
block, that eliminates the write hole because even if the write is
partial until all of those writes are complete there's not going to be
an uber block referencing any of that." –Bonwick
https://www.youtube.com/watch?v=dcV2PaMTAJ4
14:45

> What your using has impact on how you need to do backups.  For someone who
> can afford long periods of down time for example, it may be perfectly fine
> to use something like Amazon S3 Glacier storage (which has a 4 hour lead
> time on restoration for read access) for backups. OTOH, if you can't afford
> more than a few minutes of down time and want to use BTRFS, you should
> probably have full on-line on-site backups which you can switch in on a
> moments notice while you fix things.

Right or use glusterfs or ceph if you need to stay up and running
during a total brick implosion. Quite honestly, I would much rather
see Btrfs single support multiple streams per device, like XFS does
with allocation groups when used on linear/concat of multiple devices;
two to four per

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-10 19:06     ` Chris Murphy
@ 2016-02-10 19:59       ` Austin S. Hemmelgarn
  2016-02-11 14:14         ` Goffredo Baroncelli
  0 siblings, 1 reply; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2016-02-10 19:59 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Mackenzie Meyer, Btrfs BTRFS

On 2016-02-10 14:06, Chris Murphy wrote:
> On Wed, Feb 10, 2016 at 6:57 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>> It's an issue of torn writes in this case, not of atomicity of BTRFS. Disks
>> can't atomically write more than sector size chunks, which means that almost
>> all BTRFS filesystems are doing writes that disks can't atomically complete.
>> Add to that that we serialized writes to different devices, and it becomes
>> trivial to lose some data if the system crashes while BTRFS is writing out a
>> stripe (it shouldn't screw up existing data though, you'll just loose
>> whatever you were trying to write).
>
> I follow all of this. I still don't know how a torn write leads to a
> write hole in the conventional sense though. If the write is partial,
> a pointer never should have been written to that unfinished write. So
> the pointer that's there after a crash should either point to the old
> stripe or new stripe (which includes parity), not to the new data
> strips but an old (stale) parity strip for that partial stripe write
> that was interrupted. It's easy to see how conventional raid gets this
> wrong because it has no pointers to strips, those locations are known
> due to the geometry (raid level, layout, number of devices) and fixed.
> I don't know what rmw looks like on Btrfs raid56 without overwriting
> the stripe - a whole new cow'd stripe, and then metadata is updated to
> reflect the new location of that stripe?
>
I agree, it's not technically a write hole in the conventional sense, 
but the terminology has become commonplace for data loss in RAID{5,6} 
due to a failure somewhere in the write path, and this does fit in that 
sense.  In this case the failure is in writing out the metadata that 
references the blocks instead of in writing out the blocks themselves. 
Even though you don't loose any existing data, you still loose anything 
that you were trying to write out.
>
>
>
>> One way to minimize this which would also boost performance on slow storage
>> would be to avoid writing parts of the stripe that aren't changed (so for
>> example, if only one disk in the stripe actually has changed data, only
>> write that and the parities).
>
> I'm pretty sure that's part of rmw, which is not a full stripe write.
> At least there appears to be some distinction in raid56.c between
> them. The additional optimization that md raid has had for some time
> is the ability during rmw of a single data chunk (what they call
> strips, or the smallest unit in a stripe), they can actually optimize
> the change down to a sector write. So they aren't even doing full
> chunk/strip writes either. The parity strip though I think must be
> completely rewritten.
I actually wasn't aware that BTRFS did this (it's been a while since I 
looked at the kernel code), although I'm glad to hear it does.
>
>
>>>
>>>
>>> If you're worried about raid56 write holes, then a.) you need a server
>>> running this raid where power failures or crashes don't happen b.)
>>> don't use raid56 c.) use ZFS.
>>
>> It's not just BTRFS that has this issue though, ZFS does too,
>
> Well it's widely considered to not have the write hole. From a ZFS
> conference I got this tidbit on how they closed the write hole, but I
> still don't understand why they'd be pointing to a partial (torn)
> write in the first place:
>
> "key insight was realizing instead of treating a stripe as it's a
> "stripe of separate blocks" you can take a block and break it up into
> many sectors and have a stripe across the sectors that is of one logic
> block, that eliminates the write hole because even if the write is
> partial until all of those writes are complete there's not going to be
> an uber block referencing any of that." –Bonwick
> https://www.youtube.com/watch?v=dcV2PaMTAJ4
> 14:45
Again, a torn write to the metadata referencing the block (stripe in 
this case I believe) will result in loosing anything written by the 
update to the stripe.  There is no way that _any_ system can avoid this 
issue without having the ability to truly atomically write out the 
entire metadata tree after the block (stripe) update.  Doing so would 
require a degree of tight hardware level integration that's functionally 
impossible for any general purpose system (in essence, the filesystem 
would have to be implemented in the hardware, not software).
>
>
>> What your using has impact on how you need to do backups.  For someone who
>> can afford long periods of down time for example, it may be perfectly fine
>> to use something like Amazon S3 Glacier storage (which has a 4 hour lead
>> time on restoration for read access) for backups. OTOH, if you can't afford
>> more than a few minutes of down time and want to use BTRFS, you should
>> probably have full on-line on-site backups which you can switch in on a
>> moments notice while you fix things.
>
> Right or use glusterfs or ceph if you need to stay up and running
> during a total brick implosion. Quite honestly, I would much rather
> see Btrfs single support multiple streams per device, like XFS does
> with allocation groups when used on linear/concat of multiple devices;
> two to four per
>
I'm not entirely certain that I understand what you're referring to WRT 
multiple streams per device.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-10 19:59       ` Austin S. Hemmelgarn
@ 2016-02-11 14:14         ` Goffredo Baroncelli
  2016-02-11 14:58           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 11+ messages in thread
From: Goffredo Baroncelli @ 2016-02-11 14:14 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Chris Murphy; +Cc: Mackenzie Meyer, Btrfs BTRFS

On 2016-02-10 20:59, Austin S. Hemmelgarn wrote:
[...]
> Again, a torn write to the metadata referencing the block (stripe in
> this case I believe) will result in loosing anything written by the
> update to the stripe. 

I think that the order matters: first the data block are written (in a new location, so the old data are untouched), then the metadata, from the leafs up to the upper node (again in a new location), then the superblock which references to the upper node of the tree(s).

If you interrupt the writes in any time, the filesystem can survive because the old superblock-metadata-tree and data-block are still valid until the last pieces (the new superblock) is written.

And if this last step fails, the checksum shows that the super-block is invalid and the old one is taken in consideration.

> There is no way that _any_ system can avoid
> this issue without having the ability to truly atomically write out
> the entire metadata tree after the block (stripe) update.  

It is not needed to atomically write the (meta)data in a COW filesystem, because the new data don't owerwrite the old one. The only thing that is needed is that before the last piece is written all the previous (mata)data are already written.

For not COW filesystem a journal is required to avoid this kind of problem.

> Doing so
> would require a degree of tight hardware level integration that's
> functionally impossible for any general purpose system (in essence,
> the filesystem would have to be implemented in the hardware, not
> software).

To solve the raid-write-hole problem, a checksum system (of data and metadata) is sufficient. However to protect with checksum the data, it seems that a COW filesystem is required.

The only critical thing, is that the hardware has to not lie about the fact that the data reached the platter. Most of the problem reported in the ML are related to external disk used in USB enclousure, which most of the time lie about this aspect.

GB

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-11 14:14         ` Goffredo Baroncelli
@ 2016-02-11 14:58           ` Austin S. Hemmelgarn
  2016-02-11 17:29             ` Chris Murphy
  0 siblings, 1 reply; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2016-02-11 14:58 UTC (permalink / raw)
  To: kreijack, Chris Murphy; +Cc: Mackenzie Meyer, Btrfs BTRFS

On 2016-02-11 09:14, Goffredo Baroncelli wrote:
> On 2016-02-10 20:59, Austin S. Hemmelgarn wrote:
> [...]
>> Again, a torn write to the metadata referencing the block (stripe in
>> this case I believe) will result in loosing anything written by the
>> update to the stripe.
>
> I think that the order matters: first the data block are written (in a new location, so the old data are untouched), then the metadata, from the leafs up to the upper node (again in a new location), then the superblock which references to the upper node of the tree(s).
>
> If you interrupt the writes in any time, the filesystem can survive because the old superblock-metadata-tree and data-block are still valid until the last pieces (the new superblock) is written.
>
> And if this last step fails, the checksum shows that the super-block is invalid and the old one is taken in consideration.
You're not understanding what I'm saying.  If a write fails anywhere 
during the process of updating the metadata, up to and including the 
super-block, then you loose the data writes that triggered the metadata 
update.  This doesn't result in a broken filesystem, but it does result 
in data loss, even if it's not what most people think of as data loss.

To make a really simplified example, assume we have a single block of 
data (D) referenced by a single metadata block (M) and a single 
super-block referencing the metadata block (S).  On a COW filesystem, 
when you write to D, it allocates and writes a new block (D2) to store 
the data, then allocates and writes a new metadata block (M2) to point 
to D2, and then updates the superblock in-place to point to M2.  If the 
write to M2 fails, you loose all new data in D2 that wasn't already in 
D.  There is no way that a COW filesystem can avoid this type of data 
loss without being able to force the underlying storage to atomically 
write out all of D2, M2, and S at the same time, it's an inherent issue 
in COW semantics in general, not just filesystems.
>
>
>> There is no way that _any_ system can avoid
>> this issue without having the ability to truly atomically write out
>> the entire metadata tree after the block (stripe) update.
>
> It is not needed to atomically write the (meta)data in a COW filesystem, because the new data don't owerwrite the old one. The only thing that is needed is that before the last piece is written all the previous (mata)data are already written.\
Even when enforcing ordering, the issue I've outlined above is still 
present.  If a write fails at any point in the metadata updates 
cascading up the tree, then any new data below that point in the tree is 
lost.
>
> For not COW filesystem a journal is required to avoid this kind of problem.
To a certain extent yes, but journals have issues that COW doesn't.  A 
torn write in the journal on most traditional journaling filesystems 
will often result in a broken filesystem.
>
>> Doing so
>> would require a degree of tight hardware level integration that's
>> functionally impossible for any general purpose system (in essence,
>> the filesystem would have to be implemented in the hardware, not
>> software).
>
> To solve the raid-write-hole problem, a checksum system (of data and metadata) is sufficient. However to protect with checksum the data, it seems that a COW filesystem is required.
Either COW, or log structuring, or the ability to atomically write out 
groups of blocks.  Log structuring (like NILFS2, or LogFS, or even LFS 
from *BSD) has performance implications on traditional rotational media, 
and only recently are storage devices appearing that can actually handle 
atomic writes of groups of multiple blocks at the same time, so COW has 
been the predominant model because it works on everything, and doesn't 
have the performance issues of log-structured filesystems (if 
implemented correctly).
>
> The only critical thing, is that the hardware has to not lie about the fact that the data reached the platter. Most of the problem reported in the ML are related to external disk used in USB enclousure, which most of the time lie about this aspect.
That really depends on what you mean by 'lie about the data being on the 
platter'.  All modern hard disks have a write cache, and a decent 
percentage don't properly support flushing the write cache except by 
waiting for it to drain, many of them arbitrarily re-order writes within 
the cache, and none that I've seen have a non-volatile write cache, and 
therefore all such disks arguably lie about when the write is actually 
complete.  SSD's add yet another layer of complexity to this, because 
the good ones have either a non-volatile write cache, or have built-in 
batteries or super-capacitors to make sure they can flush the write 
cache when power is lost, so some SSD's can behave just like HDD's do 
and claim the write is complete when it hits the cache without 
technically lying, but most SSD's don't document whether they do this or 
not.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
  2016-02-11 14:58           ` Austin S. Hemmelgarn
@ 2016-02-11 17:29             ` Chris Murphy
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Murphy @ 2016-02-11 17:29 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: kreijack, Chris Murphy, Mackenzie Meyer, Btrfs BTRFS

On Thu, Feb 11, 2016 at 7:58 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-02-11 09:14, Goffredo Baroncelli wrote:
>>
>> On 2016-02-10 20:59, Austin S. Hemmelgarn wrote:
>> [...]
>>>
>>> Again, a torn write to the metadata referencing the block (stripe in
>>> this case I believe) will result in loosing anything written by the
>>> update to the stripe.
>>
>>
>> I think that the order matters: first the data block are written (in a new
>> location, so the old data are untouched), then the metadata, from the leafs
>> up to the upper node (again in a new location), then the superblock which
>> references to the upper node of the tree(s).
>>
>> If you interrupt the writes in any time, the filesystem can survive
>> because the old superblock-metadata-tree and data-block are still valid
>> until the last pieces (the new superblock) is written.
>>
>> And if this last step fails, the checksum shows that the super-block is
>> invalid and the old one is taken in consideration.
>
> You're not understanding what I'm saying.  If a write fails anywhere during
> the process of updating the metadata, up to and including the super-block,
> then you loose the data writes that triggered the metadata update.  This
> doesn't result in a broken filesystem, but it does result in data loss, even
> if it's not what most people think of as data loss.
>
> To make a really simplified example, assume we have a single block of data
> (D) referenced by a single metadata block (M) and a single super-block
> referencing the metadata block (S).  On a COW filesystem, when you write to
> D, it allocates and writes a new block (D2) to store the data, then
> allocates and writes a new metadata block (M2) to point to D2, and then
> updates the superblock in-place to point to M2.  If the write to M2 fails,
> you loose all new data in D2 that wasn't already in D.  There is no way that
> a COW filesystem can avoid this type of data loss without being able to
> force the underlying storage to atomically write out all of D2, M2, and S at
> the same time, it's an inherent issue in COW semantics in general, not just
> filesystems.

Sure but this is not a write hole. This exact same problem happens on
a single device file system. Do you know if raid56 parity strips are
considered part of (D) or (M)? In any case I think the Btrfs write
hole is different than what you're talking about.

The concern about the parity raid write hole is data is written OK,
and only in the event a device goes missing or there's an IO error on
a data containing block, such that reconstruction from parity is
required, if there's a bad or torn write for parity then the
reconstruction is bad and you won't know it. That's the key thing:
silent corruption during reconstruction.

Of all the problems we're having with raid56 in the general sense, let
alone the Btrfs specific case, the raid56 write hole seems like an
astronomically minor issue. What I'm much more curious about is how
these stripes are even being COWd in the first place, and how many
more IOs we're hit with compared to the same transaction on a single
device.




>>

>> The only critical thing, is that the hardware has to not lie about the
>> fact that the data reached the platter. Most of the problem reported in the
>> ML are related to external disk used in USB enclousure, which most of the
>> time lie about this aspect.
>
> That really depends on what you mean by 'lie about the data being on the
> platter'.  All modern hard disks have a write cache, and a decent percentage
> don't properly support flushing the write cache except by waiting for it to
> drain, many of them arbitrarily re-order writes within the cache, and none
> that I've seen have a non-volatile write cache, and therefore all such disks
> arguably lie about when the write is actually complete.  SSD's add yet
> another layer of complexity to this, because the good ones have either a
> non-volatile write cache, or have built-in batteries or super-capacitors to
> make sure they can flush the write cache when power is lost, so some SSD's
> can behave just like HDD's do and claim the write is complete when it hits
> the cache without technically lying, but most SSD's don't document whether
> they do this or not.

Yeah I think the ship of knowing what happens inside these boxes is in
the process of sailing away, if not gone. On the plus side, they do
all of this faster than a HDD would. So there's a better chance of the
command queue in the write cache actually completing to stable media
than is the case for a HDD *IF* the manufacturer has done the work and
testing.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-02-11 17:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-05 19:36 BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions Mackenzie Meyer
2016-02-06  8:43 ` Duncan
2016-02-09 14:07 ` Psalle
2016-02-09 20:39 ` Chris Murphy
2016-02-10 13:57   ` Austin S. Hemmelgarn
2016-02-10 19:06     ` Chris Murphy
2016-02-10 19:59       ` Austin S. Hemmelgarn
2016-02-11 14:14         ` Goffredo Baroncelli
2016-02-11 14:58           ` Austin S. Hemmelgarn
2016-02-11 17:29             ` Chris Murphy
2016-02-10 10:16 ` Psalle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.