All of lore.kernel.org
 help / color / mirror / Atom feed
* copies= option
@ 2014-05-04  2:16 Russell Coker
  2014-05-04  3:27 ` Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: Russell Coker @ 2014-05-04  2:16 UTC (permalink / raw)
  To: Btrfs BTRFS

Are there any plans for a feature like the ZFS copies= option?

I'd like to be able to set copies= separately for data and metadata.  In most 
cases RAID-1 provides adequate data protection but I'd like to have RAID-1 and 
copies=2 for metadata so that if one disk dies and another has some bad 
sectors during recovery I'm unlikely to lose metadata.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: copies= option
  2014-05-04  2:16 copies= option Russell Coker
@ 2014-05-04  3:27 ` Duncan
  2014-05-04  7:27   ` Brendan Hide
       [not found]   ` <xjTK1n01B308YdQ01jTMAf>
  0 siblings, 2 replies; 7+ messages in thread
From: Duncan @ 2014-05-04  3:27 UTC (permalink / raw)
  To: linux-btrfs

Russell Coker posted on Sun, 04 May 2014 12:16:54 +1000 as excerpted:

> Are there any plans for a feature like the ZFS copies= option?
> 
> I'd like to be able to set copies= separately for data and metadata.  In
> most cases RAID-1 provides adequate data protection but I'd like to have
> RAID-1 and copies=2 for metadata so that if one disk dies and another
> has some bad sectors during recovery I'm unlikely to lose metadata.

Hugo's the guy with the better info on this one, but until he answers...

The zfs license issues mean it's not an option for me and I'm thus not 
familiar with its options in any detail, but if I understand the question 
correctly, yes.

And of course since btrfs treats data and metadata separately, it's 
extremely unlikely that any sort of copies= option wouldn't be separately 
configurable for each.

There was a discussion of a very nice multi-way-configuration schema that 
I deliberately stayed out of as both a bit above my head and far enough 
in the future that I didn't want to get my hopes up too high about it 
yet.  I already want N-way-mirroring so bad I can taste it, and this was 
that and way more... if/when it ever actually gets coded and committed to 
the mainline kernel btrfs.  As I said, Hugo should have more on it, as he 
was active in that discussion as it seemed to line up perfectly with his 
area of interest.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: copies= option
  2014-05-04  3:27 ` Duncan
@ 2014-05-04  7:27   ` Brendan Hide
       [not found]   ` <xjTK1n01B308YdQ01jTMAf>
  1 sibling, 0 replies; 7+ messages in thread
From: Brendan Hide @ 2014-05-04  7:27 UTC (permalink / raw)
  To: Duncan, linux-btrfs, Hugo Mills, Russell Coker

On 2014/05/04 05:27 AM, Duncan wrote:
> Russell Coker posted on Sun, 04 May 2014 12:16:54 +1000 as excerpted:
>
>> Are there any plans for a feature like the ZFS copies= option?
>>
>> I'd like to be able to set copies= separately for data and metadata.  In
>> most cases RAID-1 provides adequate data protection but I'd like to have
>> RAID-1 and copies=2 for metadata so that if one disk dies and another
>> has some bad sectors during recovery I'm unlikely to lose metadata.
> Hugo's the guy with the better info on this one, but until he answers...
>
> The zfs license issues mean it's not an option for me and I'm thus not
> familiar with its options in any detail, but if I understand the question
> correctly, yes.
>
> And of course since btrfs treats data and metadata separately, it's
> extremely unlikely that any sort of copies= option wouldn't be separately
> configurable for each.
>
> There was a discussion of a very nice multi-way-configuration schema that
> I deliberately stayed out of as both a bit above my head and far enough
> in the future that I didn't want to get my hopes up too high about it
> yet.  I already want N-way-mirroring so bad I can taste it, and this was
> that and way more... if/when it ever actually gets coded and committed to
> the mainline kernel btrfs.  As I said, Hugo should have more on it, as he
> was active in that discussion as it seemed to line up perfectly with his
> area of interest.
>
The simple answer is yes, this is planned. As Duncan implied, however, 
it is not on the immediate roadmap. Internally we appear to be referring 
to this feature as "N-way redundancy" or "N-way mirroring".

My understanding is that the biggest hurdle before the primary devs will 
look into N-way redundancy is to finish the Raid5/6 implementation to 
include self-healing/scrubbing support - a critical issue before it can 
be adopted further.

-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: copies= option
       [not found]   ` <xjTK1n01B308YdQ01jTMAf>
@ 2014-05-04 18:12     ` Duncan
  2014-05-04 18:31       ` Hugo Mills
  2014-05-06 20:13       ` Chris Murphy
  0 siblings, 2 replies; 7+ messages in thread
From: Duncan @ 2014-05-04 18:12 UTC (permalink / raw)
  To: Brendan Hide; +Cc: linux-btrfs, Hugo Mills, Russell Coker

On Sun, 04 May 2014 09:27:10 +0200
Brendan Hide <brendan@swiftspirit.co.za> wrote:

> On 2014/05/04 05:27 AM, Duncan wrote:
> > Russell Coker posted on Sun, 04 May 2014 12:16:54 +1000 as
> > excerpted:
> >
> >> Are there any plans for a feature like the ZFS copies= option?
> >>
> >> I'd like to be able to set copies= separately for data and
> >> metadata.  In most cases RAID-1 provides adequate data protection
> >> but I'd like to have RAID-1 and copies=2 for metadata so that if
> >> one disk dies and another has some bad sectors during recovery I'm
> >> unlikely to lose metadata.
> > Hugo's the guy with the better info on this one, but until he
> > answers...
> >
> > The zfs license issues mean it's not an option for me and I'm thus
> > not familiar with its options in any detail, but if I understand
> > the question correctly, yes.
> >
> > And of course since btrfs treats data and metadata separately, it's
> > extremely unlikely that any sort of copies= option wouldn't be
> > separately configurable for each.
> >
> > There was a discussion of a very nice multi-way-configuration
> > schema that I deliberately stayed out of as both a bit above my
> > head and far enough in the future that I didn't want to get my
> > hopes up too high about it yet.  I already want N-way-mirroring so
> > bad I can taste it, and this was that and way more... if/when it
> > ever actually gets coded and committed to the mainline kernel
> > btrfs.  As I said, Hugo should have more on it, as he was active in
> > that discussion as it seemed to line up perfectly with his area of
> > interest.
> >
> The simple answer is yes, this is planned. As Duncan implied,
> however, it is not on the immediate roadmap. Internally we appear to
> be referring to this feature as "N-way redundancy" or "N-way
> mirroring".
> 
> My understanding is that the biggest hurdle before the primary devs
> will look into N-way redundancy is to finish the Raid5/6
> implementation to include self-healing/scrubbing support - a critical
> issue before it can be adopted further.

Well, there's N-way-mirroring, which /is/ on the roadmap for fairly
soon (after raid56 completion), and which is the feature I've been
heavily anticipating ever since I first looked into btrfs and realized
that raid1 didn't include it already, but what I was referring to above
was something much nicer than that.

As I said I don't understand the full details, Hugo's the one that can
properly answer there, but the general idea (I think) is the ability to
three-way specify N-copies, M-parity, S-stripe, possibly with
near/far-layout specification like md/raid's raid10, as well.  But Hugo
refers to it with three different letters, cps copies/parity/stripes,
perhaps?  That doesn't look quite correct...

But that at least has the potential to be /so/ nice, and possibly
also /so/ complicated, that I'm deliberately avoiding looking too much
at the details as it's far enough out and may in fact never get fully
implemented that I don't want to spoil my enjoyment of
(relatively, compared to that) simple N-way-mirroring when it comes.

And more particularly, I really /really/ hope they don't put off a
reasonably simple and (hopefully) fast implementation of
N-way-mirroring as soon as possible after raid56 completion, because I
really /really/ want N-way-mirroring, and this other thing would
certainly be extremely nice, but I'm quite fearful that it could also be
the perfect being the enemy of the good-enough, and btrfs already has a
long history of features repeatedly taking far longer to implement than
originally predicted, which with something that potentially complex,
I'm very afraid could mean a 2-5 year wait before it's actually usable.

And given how long I've been waiting for the simple-compared-to-that
N-way-mirroring thing and how much I anticipate it, I just don't know
what I'd do if I were to find out that they were going to work on this
"perfect" thing instead, with N-way-mirroring being one possible option
with it, but that as a result, given the btrfs history to date, it'd
very likely be a good five years before I could get the comparatively
simple N-way-mirroring (or even, for me, just a specific
3-way-mirroring to compliment the specific 2-way-mirroring that's
already there) that's all I'm really asking for.

So I guess you can see why I don't want to get into the details of the
more fancy solution too much, both as a means of protecting my own
sanity, and to hopefully avoid throwing the 3-way-mirroring that's my
own personal focal point off the track.  So Hugo's the one with the
details, to the extent they've been discussed at least, there.

-- 
Duncan - No HTML messages please, as they are filtered as spam.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: copies= option
  2014-05-04 18:12     ` Duncan
@ 2014-05-04 18:31       ` Hugo Mills
  2014-05-05  2:59         ` Duncan
  2014-05-06 20:13       ` Chris Murphy
  1 sibling, 1 reply; 7+ messages in thread
From: Hugo Mills @ 2014-05-04 18:31 UTC (permalink / raw)
  To: Duncan; +Cc: Brendan Hide, linux-btrfs, Russell Coker

[-- Attachment #1: Type: text/plain, Size: 6858 bytes --]

On Sun, May 04, 2014 at 11:12:38AM -0700, Duncan wrote:
> On Sun, 04 May 2014 09:27:10 +0200
> Brendan Hide <brendan@swiftspirit.co.za> wrote:
> 
> > On 2014/05/04 05:27 AM, Duncan wrote:
> > > Russell Coker posted on Sun, 04 May 2014 12:16:54 +1000 as
> > > excerpted:
> > >
> > >> Are there any plans for a feature like the ZFS copies= option?
> > >>
> > >> I'd like to be able to set copies= separately for data and
> > >> metadata.  In most cases RAID-1 provides adequate data protection
> > >> but I'd like to have RAID-1 and copies=2 for metadata so that if
> > >> one disk dies and another has some bad sectors during recovery I'm
> > >> unlikely to lose metadata.
> > > Hugo's the guy with the better info on this one, but until he
> > > answers...
> > >
> > > The zfs license issues mean it's not an option for me and I'm thus
> > > not familiar with its options in any detail, but if I understand
> > > the question correctly, yes.
> > >
> > > And of course since btrfs treats data and metadata separately, it's
> > > extremely unlikely that any sort of copies= option wouldn't be
> > > separately configurable for each.
> > >
> > > There was a discussion of a very nice multi-way-configuration
> > > schema that I deliberately stayed out of as both a bit above my
> > > head and far enough in the future that I didn't want to get my
> > > hopes up too high about it yet.  I already want N-way-mirroring so
> > > bad I can taste it, and this was that and way more... if/when it
> > > ever actually gets coded and committed to the mainline kernel
> > > btrfs.  As I said, Hugo should have more on it, as he was active in
> > > that discussion as it seemed to line up perfectly with his area of
> > > interest.
> > >
> > The simple answer is yes, this is planned. As Duncan implied,
> > however, it is not on the immediate roadmap. Internally we appear to
> > be referring to this feature as "N-way redundancy" or "N-way
> > mirroring".
> > 
> > My understanding is that the biggest hurdle before the primary devs
> > will look into N-way redundancy is to finish the Raid5/6
> > implementation to include self-healing/scrubbing support - a critical
> > issue before it can be adopted further.
> 
> Well, there's N-way-mirroring, which /is/ on the roadmap for fairly
> soon (after raid56 completion), and which is the feature I've been
> heavily anticipating ever since I first looked into btrfs and realized
> that raid1 didn't include it already, but what I was referring to above
> was something much nicer than that.
> 
> As I said I don't understand the full details, Hugo's the one that can
> properly answer there, but the general idea (I think) is the ability to
> three-way specify N-copies, M-parity, S-stripe, possibly with
> near/far-layout specification like md/raid's raid10, as well.  But Hugo
> refers to it with three different letters, cps copies/parity/stripes,
> perhaps?  That doesn't look quite correct...

   My proposal was simply a description mechanism, not an
implementation. The description is N-copies, M-device-stripe,
P-parity-devices (NcMsPp), and (more or less comfortably) covers at
minimum all of the current and currently-proposed replication levels.
There's a couple of tweaks covering description of allocation rules
(DUP vs RAID-1).

   I think, as you say below, that it's going to be hard to make this
completely general in terms of application, but we've already seen
code that extends the available replication capabilities beyond the
current terminology (to RAID-6.3, ... -6.6), which we can cope with in
the proposed nomenclature -- NsP3 to NsP6. There are other things in
the pipeline, such as the N-way mirroring, which also aren't
describable in traditional RAID terms, but which the "csp" notation
will handle nicely.

   It doesn't deal with complex nested configurations (e.g. the
difference between RAID-10 and RAID-0+1), but given btrfs's more
freewheeling chunk allocation decisions, those distinctions tend to go
away.

   So: don't expect to see completely general usability of csp
notation, but do expect it to be used in the future to describe the
increasing complexity of replication strategies in btrfs. There may
even be a shift internally to csp-style description of replication;
I'd probably expect that to arrive with per-object RAID levels, since
if there's going to be a big overhaul of that area, it would make
sense to do that change at the same time.

   [It's worth noting that when I mooted extending the current
RAID-level bit-field to pack in csp-style notation, Chris was mildly
horrified at the concept. The next best implementation would be to use
the xattrs for per-object RAID for this.]

   Hugo.

> But that at least has the potential to be /so/ nice, and possibly
> also /so/ complicated, that I'm deliberately avoiding looking too much
> at the details as it's far enough out and may in fact never get fully
> implemented that I don't want to spoil my enjoyment of
> (relatively, compared to that) simple N-way-mirroring when it comes.
> 
> And more particularly, I really /really/ hope they don't put off a
> reasonably simple and (hopefully) fast implementation of
> N-way-mirroring as soon as possible after raid56 completion, because I
> really /really/ want N-way-mirroring, and this other thing would
> certainly be extremely nice, but I'm quite fearful that it could also be
> the perfect being the enemy of the good-enough, and btrfs already has a
> long history of features repeatedly taking far longer to implement than
> originally predicted, which with something that potentially complex,
> I'm very afraid could mean a 2-5 year wait before it's actually usable.
> 
> And given how long I've been waiting for the simple-compared-to-that
> N-way-mirroring thing and how much I anticipate it, I just don't know
> what I'd do if I were to find out that they were going to work on this
> "perfect" thing instead, with N-way-mirroring being one possible option
> with it, but that as a result, given the btrfs history to date, it'd
> very likely be a good five years before I could get the comparatively
> simple N-way-mirroring (or even, for me, just a specific
> 3-way-mirroring to compliment the specific 2-way-mirroring that's
> already there) that's all I'm really asking for.
> 
> So I guess you can see why I don't want to get into the details of the
> more fancy solution too much, both as a means of protecting my own
> sanity, and to hopefully avoid throwing the 3-way-mirroring that's my
> own personal focal point off the track.  So Hugo's the one with the
> details, to the extent they've been discussed at least, there.
> 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
        --- emacs:  Eighty Megabytes And Constantly Swapping. ---        

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: copies= option
  2014-05-04 18:31       ` Hugo Mills
@ 2014-05-05  2:59         ` Duncan
  0 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2014-05-05  2:59 UTC (permalink / raw)
  To: linux-btrfs

Hugo Mills posted on Sun, 04 May 2014 19:31:55 +0100 as excerpted:

>  My proposal was simply a description mechanism, not an
> implementation. The description is N-copies, M-device-stripe,
> P-parity-devices (NcMsPp), and (more or less comfortably) covers at
> minimum all of the current and currently-proposed replication levels.
> There's a couple of tweaks covering description of allocation rules
> (DUP vs RAID-1).

Thanks.  That was it. =:^)

But I had interpreted the discussion as a bit more concrete in terms of 
ultimate implementation than it apparently was.  Anyway, it would indeed 
be nice to see an eventual implementation such that the above notation 
could be used with, for instance, mkfs.btrfs, and
btrfs balance start -Xconvert, but regardless, that does look to be a way 
off.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: copies= option
  2014-05-04 18:12     ` Duncan
  2014-05-04 18:31       ` Hugo Mills
@ 2014-05-06 20:13       ` Chris Murphy
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Murphy @ 2014-05-06 20:13 UTC (permalink / raw)
  To: Duncan; +Cc: Brendan Hide, linux-btrfs, Hugo Mills, Russell Coker


> N-copies, M-device-stripe, P-parity-devices (NcMsPp)

At expense of being the terminology nut, who doesn't even like SNIA's chosen terminology because it's confusing, I suggest a concerted effort to either use SNIA's terms anyway, or push back and ask them to make changes before propagating deviant terminology.

Strip is a consecutive blocks on a single extent (on a single device)
Strip size is the number of blocks in a single extent (on a single device)

Stripe is a set of strips on each member extent (on multiple devices)
Stripe size is strip size times non-parity extents.

e.g. Btrfs default strip size is 64KiB, therefore a 5 disk raid5 volume stripe size is 256KiB. I use and specify size units in bytes rather than SNIAs blocks (sectors) because it's less ambiguous.

In other words, for M- what we care about is the strip size, which is what md/mdadm calls a chunk. We can't know the stripe size without knowing how many non-parity member devices there are.


Chris Murphy


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-05-06 20:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-04  2:16 copies= option Russell Coker
2014-05-04  3:27 ` Duncan
2014-05-04  7:27   ` Brendan Hide
     [not found]   ` <xjTK1n01B308YdQ01jTMAf>
2014-05-04 18:12     ` Duncan
2014-05-04 18:31       ` Hugo Mills
2014-05-05  2:59         ` Duncan
2014-05-06 20:13       ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.