All of lore.kernel.org
 help / color / mirror / Atom feed
* Simplified LRC in CEPH
       [not found] <3472A07E6605974CBC9BC573F1BC02E4AE753328@CERNXCHG44.cern.ch>
@ 2014-08-01  8:00 ` Andreas Joachim Peters
  2014-08-01 12:32   ` Loic Dachary
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Joachim Peters @ 2014-08-01  8:00 UTC (permalink / raw)
  To: ceph-devel

Hi Loic et. al.

I managed to prototype (and understand) LRC encoding similiar to Xorbas in the ISA plug-in.

As an example take a (16,4) code (which gives nice alignment for 4k blocks) :

For 4 sub groups of the data chunks you build e.g. local parities LP1-LP4

LP1 = 1 ^ 2 ^ 3 ^ 4

LP2 = 5 ^ 6 ^ 7 ^ 8

LP3 = 9 ^ 10 ^ 11 ^ 12

LP4 = 13 ^ 14 ^ 15 ^ 16

You do normal erasure encoding with 4 RS chunks:

RS(1..16) = (R1, R2, R3, R4)

You compute the local parity LP5 for the erasure chunks:

LP5 = R1 ^ R2 ^ R3 ^ R4

The relation which holds for Vandermonde matrices (because the first matrix row contains only 1's)

LP1 ^ LP2 ^ LP3 ^ LP4 = R1

So you need to store only 24 chunks (not 25):

(1 .. 16) (R2,R3,R4) (LP1,LP2,LP3,LP4,LP5)

Side remark: in this simplified explanation I imply R1, not LP5 as described in the Xorbas paper

The degree of the code is 5 e.g. you can construct a failure with 5 losses where you loose data, while if you are 'lucky' the code can even repair up to 8 failures (one loss in each sub group + LP5,R2,R3,R4).

The reconstruction traffic for single failures is:

[(20 x 4) + (4 x 8)]/24 =~ 4.66 x [disk size] instead of 16 x [disk size]

There are three repair scenarios:

1) only single failures in any of the local groups using LRC repair (simple parities)
2) multiple failures which can be reconstructed with RS parities without LRC repair
3) multiple failures which can be reconstructed with RS parities after LRC repair

[ 4) reconstruction impossible  ]

Having your proposed LRC layer (decoding) model in mind there is a certain contradiction here because there are cases where it is not required to use LRC since you can resolve all the failures with RS alone.

In the end I think, it is sufficient if we introduce a parameter l in the EC parameter list which defines the number of subgroups in the data chunks and imply to use always one local parity for all RS chunks. So you can specify an LRC easily with three simple parameters:

k=16 m=4 l=4

The Xorbas configuration would be written as k=10 m=4 l=2

Wouldn't that be much simpler and sufficient? What do you think?

Cheers Andreas.




















^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Simplified LRC in CEPH
  2014-08-01  8:00 ` Simplified LRC in CEPH Andreas Joachim Peters
@ 2014-08-01 12:32   ` Loic Dachary
  2014-08-01 12:44     ` Andreas Joachim Peters
  0 siblings, 1 reply; 6+ messages in thread
From: Loic Dachary @ 2014-08-01 12:32 UTC (permalink / raw)
  To: Andreas Joachim Peters, ceph-devel

[-- Attachment #1: Type: text/plain, Size: 2897 bytes --]

Hi Andreas,

Enlightening explanation, thank you !

On 01/08/2014 13:45, Andreas Joachim Peters wrote:
> Hi Loic et. al.
> 
> I managed to prototype (and understand) LRC encoding similiar to Xorbas in the ISA plug-in.
> 
> As an example take a (16,4) code (which gives nice alignment for 4k blocks) :
> 
> For 4 sub groups of the data chunks you build e.g. local parities LP1-LP4
> 
> LP1 = 1 ^ 2 ^ 3 ^ 4
> 
> LP2 = 5 ^ 6 ^ 7 ^ 8
> 
> LP3 = 9 ^ 10 ^ 11 ^ 12
> 
> LP4 = 13 ^ 14 ^ 15 ^ 16
> 
> You do normal erasure encoding with 4 RS chunks:
> 
> RS(1..16) = (R1, R2, R3, R4)
> 
> You compute the local parity LP5 for the erasure chunks:
> 
> LP5 = R1 ^ R2 ^ R3 ^ R4
> 
> The relation which holds for Vandermonde matrices (because the first matrix row contains only 1's)
> 
> LP1 ^ LP2 ^ LP3 ^ LP4 = R1
> 
> So you need to store only 24 chunks (not 25):
> 
> (1 .. 16) (R2,R3,R4) (LP1,LP2,LP3,LP4,LP5)
> 
> Side remark: in this simplified explanation I imply R1, not LP5 as described in the Xorbas paper

Does it make a difference or is it equivalent ?

> The degree of the code is 5 e.g. you can construct a failure with 5 losses where you loose data, while if you are 'lucky' the code can even repair up to 8 failures (one loss in each sub group + LP5,R2,R3,R4).
> 
> The reconstruction traffic for single failures is:
> 
> [(20 x 4) + (4 x 8)]/24 =~ 4.66 x [disk size] instead of 16 x [disk size]
> 
> There are three repair scenarios:
> 
> 1) only single failures in any of the local groups using LRC repair (simple parities)
> 2) multiple failures which can be reconstructed with RS parities without LRC repair
> 3) multiple failures which can be reconstructed with RS parities after LRC repair
> 
> [ 4) reconstruction impossible  ]
> 
> Having your proposed LRC layer (decoding) model in mind there is a certain contradiction here because there are cases where it is not required to use LRC since you can resolve all the failures with RS alone.
> 
> In the end I think, it is sufficient if we introduce a parameter l in the EC parameter list which defines the number of subgroups in the data chunks and imply to use always one local parity for all RS chunks. So you can specify an LRC easily with three simple parameters:
> 
> k=16 m=4 l=4
> 
> The Xorbas configuration would be written as k=10 m=4 l=2
> 
> Wouldn't that be much simpler and sufficient? What do you think?

It would, definitely. How would you control where data / parity chunks are located ?

Cheers
> 
> Cheers Andreas.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Simplified LRC in CEPH
  2014-08-01 12:32   ` Loic Dachary
@ 2014-08-01 12:44     ` Andreas Joachim Peters
  2014-08-01 13:14       ` Loic Dachary
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Joachim Peters @ 2014-08-01 12:44 UTC (permalink / raw)
  To: Loic Dachary, ceph-devel

Hi Loic, 

> It would, definitely. How would you control where data / parity chunks are located ?

I ordered the chunks after encoding in this way:

( 1 2 3 4 LP1 ) ( 5 6 7 8 LP2 ) ( 9 10 11 12 LP3 ) ( 13 14 15 16 LP4 ) ( R2 R3 R4 LP5 )

Always (k/l)+1 consecutive chunks belong location-wise together ...  and I demand that (k/l) <= m

That is probably straight forward to express in a crush rule.

Cheers Andreas.

PS: just one correction, I wrote 'degree' but it is called the 'distance' of the code 
 
________________________________________
From: Loic Dachary [loic@dachary.org]
Sent: 01 August 2014 14:32
To: Andreas Joachim Peters; ceph-devel@vger.kernel.org
Subject: Re: Simplified LRC in CEPH

Hi Andreas,

Enlightening explanation, thank you !

On 01/08/2014 13:45, Andreas Joachim Peters wrote:
> Hi Loic et. al.
>
> I managed to prototype (and understand) LRC encoding similiar to Xorbas in the ISA plug-in.
>
> As an example take a (16,4) code (which gives nice alignment for 4k blocks) :
>
> For 4 sub groups of the data chunks you build e.g. local parities LP1-LP4
>
> LP1 = 1 ^ 2 ^ 3 ^ 4
>
> LP2 = 5 ^ 6 ^ 7 ^ 8
>
> LP3 = 9 ^ 10 ^ 11 ^ 12
>
> LP4 = 13 ^ 14 ^ 15 ^ 16
>
> You do normal erasure encoding with 4 RS chunks:
>
> RS(1..16) = (R1, R2, R3, R4)
>
> You compute the local parity LP5 for the erasure chunks:
>
> LP5 = R1 ^ R2 ^ R3 ^ R4
>
> The relation which holds for Vandermonde matrices (because the first matrix row contains only 1's)
>
> LP1 ^ LP2 ^ LP3 ^ LP4 = R1
>
> So you need to store only 24 chunks (not 25):
>
> (1 .. 16) (R2,R3,R4) (LP1,LP2,LP3,LP4,LP5)
>
> Side remark: in this simplified explanation I imply R1, not LP5 as described in the Xorbas paper

Does it make a difference or is it equivalent ?

> The degree of the code is 5 e.g. you can construct a failure with 5 losses where you loose data, while if you are 'lucky' the code can even repair up to 8 failures (one loss in each sub group + LP5,R2,R3,R4).
>
> The reconstruction traffic for single failures is:
>
> [(20 x 4) + (4 x 8)]/24 =~ 4.66 x [disk size] instead of 16 x [disk size]
>
> There are three repair scenarios:
>
> 1) only single failures in any of the local groups using LRC repair (simple parities)
> 2) multiple failures which can be reconstructed with RS parities without LRC repair
> 3) multiple failures which can be reconstructed with RS parities after LRC repair
>
> [ 4) reconstruction impossible  ]
>
> Having your proposed LRC layer (decoding) model in mind there is a certain contradiction here because there are cases where it is not required to use LRC since you can resolve all the failures with RS alone.
>
> In the end I think, it is sufficient if we introduce a parameter l in the EC parameter list which defines the number of subgroups in the data chunks and imply to use always one local parity for all RS chunks. So you can specify an LRC easily with three simple parameters:
>
> k=16 m=4 l=4
>
> The Xorbas configuration would be written as k=10 m=4 l=2
>
> Wouldn't that be much simpler and sufficient? What do you think?

It would, definitely. How would you control where data / parity chunks are located ?

Cheers
>
> Cheers Andreas.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Simplified LRC in CEPH
  2014-08-01 12:44     ` Andreas Joachim Peters
@ 2014-08-01 13:14       ` Loic Dachary
  2014-08-01 14:25         ` Andreas Joachim Peters
  0 siblings, 1 reply; 6+ messages in thread
From: Loic Dachary @ 2014-08-01 13:14 UTC (permalink / raw)
  To: Andreas Joachim Peters, ceph-devel

[-- Attachment #1: Type: text/plain, Size: 4332 bytes --]

Hi Andreas,

It probably is just what we need. Although https://github.com/ceph/ceph/pull/1921 is more flexible in terms of chunk placement, I can't think of a use case where it would actually be useful. Maybe it's just me being back from hollidays but it smells like a solution to a non existent problem ;-) The other difference is that your proposal does not allow for nested locality. I.e. datacenter locality and rack locality within a datacenter, for instance. What do you think ?

Cheers

On 01/08/2014 18:29, Andreas Joachim Peters wrote:
> Hi Loic, 
> 
>> It would, definitely. How would you control where data / parity chunks are located ?
> 
> I ordered the chunks after encoding in this way:
> 
> ( 1 2 3 4 LP1 ) ( 5 6 7 8 LP2 ) ( 9 10 11 12 LP3 ) ( 13 14 15 16 LP4 ) ( R2 R3 R4 LP5 )
> 
> Always (k/l)+1 consecutive chunks belong location-wise together ...  and I demand that (k/l) <= m
> 
> That is probably straight forward to express in a crush rule.
> 
> Cheers Andreas.
> 
> PS: just one correction, I wrote 'degree' but it is called the 'distance' of the code 
>  
> ________________________________________
> From: Loic Dachary [loic@dachary.org]
> Sent: 01 August 2014 14:32
> To: Andreas Joachim Peters; ceph-devel@vger.kernel.org
> Subject: Re: Simplified LRC in CEPH
> 
> Hi Andreas,
> 
> Enlightening explanation, thank you !
> 
> On 01/08/2014 13:45, Andreas Joachim Peters wrote:
>> Hi Loic et. al.
>>
>> I managed to prototype (and understand) LRC encoding similiar to Xorbas in the ISA plug-in.
>>
>> As an example take a (16,4) code (which gives nice alignment for 4k blocks) :
>>
>> For 4 sub groups of the data chunks you build e.g. local parities LP1-LP4
>>
>> LP1 = 1 ^ 2 ^ 3 ^ 4
>>
>> LP2 = 5 ^ 6 ^ 7 ^ 8
>>
>> LP3 = 9 ^ 10 ^ 11 ^ 12
>>
>> LP4 = 13 ^ 14 ^ 15 ^ 16
>>
>> You do normal erasure encoding with 4 RS chunks:
>>
>> RS(1..16) = (R1, R2, R3, R4)
>>
>> You compute the local parity LP5 for the erasure chunks:
>>
>> LP5 = R1 ^ R2 ^ R3 ^ R4
>>
>> The relation which holds for Vandermonde matrices (because the first matrix row contains only 1's)
>>
>> LP1 ^ LP2 ^ LP3 ^ LP4 = R1
>>
>> So you need to store only 24 chunks (not 25):
>>
>> (1 .. 16) (R2,R3,R4) (LP1,LP2,LP3,LP4,LP5)
>>
>> Side remark: in this simplified explanation I imply R1, not LP5 as described in the Xorbas paper
> 
> Does it make a difference or is it equivalent ?
> 
>> The degree of the code is 5 e.g. you can construct a failure with 5 losses where you loose data, while if you are 'lucky' the code can even repair up to 8 failures (one loss in each sub group + LP5,R2,R3,R4).
>>
>> The reconstruction traffic for single failures is:
>>
>> [(20 x 4) + (4 x 8)]/24 =~ 4.66 x [disk size] instead of 16 x [disk size]
>>
>> There are three repair scenarios:
>>
>> 1) only single failures in any of the local groups using LRC repair (simple parities)
>> 2) multiple failures which can be reconstructed with RS parities without LRC repair
>> 3) multiple failures which can be reconstructed with RS parities after LRC repair
>>
>> [ 4) reconstruction impossible  ]
>>
>> Having your proposed LRC layer (decoding) model in mind there is a certain contradiction here because there are cases where it is not required to use LRC since you can resolve all the failures with RS alone.
>>
>> In the end I think, it is sufficient if we introduce a parameter l in the EC parameter list which defines the number of subgroups in the data chunks and imply to use always one local parity for all RS chunks. So you can specify an LRC easily with three simple parameters:
>>
>> k=16 m=4 l=4
>>
>> The Xorbas configuration would be written as k=10 m=4 l=2
>>
>> Wouldn't that be much simpler and sufficient? What do you think?
> 
> It would, definitely. How would you control where data / parity chunks are located ?
> 
> Cheers
>>
>> Cheers Andreas.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Simplified LRC in CEPH
  2014-08-01 13:14       ` Loic Dachary
@ 2014-08-01 14:25         ` Andreas Joachim Peters
  2014-08-01 15:29           ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Joachim Peters @ 2014-08-01 14:25 UTC (permalink / raw)
  To: Loic Dachary, ceph-devel

Hi Loic, 
your initial scheme is more flexible. Nevertheless I think that a simplified description covers 99% of peoples use cases. 

You can absorb the logic of implied parity into your generic LRC plug-in if you find a good way to describe something like 'compute but don't' store'. If you then offer additionally the poor man's description (k,m,l) one cover's the simple and complex use case!

Cheers Andreas.

________________________________________
From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Loic Dachary [loic@dachary.org]
Sent: 01 August 2014 15:14
To: Andreas Joachim Peters; ceph-devel@vger.kernel.org
Subject: Re: Simplified LRC in CEPH

Hi Andreas,

It probably is just what we need. Although https://github.com/ceph/ceph/pull/1921 is more flexible in terms of chunk placement, I can't think of a use case where it would actually be useful. Maybe it's just me being back from hollidays but it smells like a solution to a non existent problem ;-) The other difference is that your proposal does not allow for nested locality. I.e. datacenter locality and rack locality within a datacenter, for instance. What do you think ?

Cheers

On 01/08/2014 18:29, Andreas Joachim Peters wrote:
> Hi Loic,
>
>> It would, definitely. How would you control where data / parity chunks are located ?
>
> I ordered the chunks after encoding in this way:
>
> ( 1 2 3 4 LP1 ) ( 5 6 7 8 LP2 ) ( 9 10 11 12 LP3 ) ( 13 14 15 16 LP4 ) ( R2 R3 R4 LP5 )
>
> Always (k/l)+1 consecutive chunks belong location-wise together ...  and I demand that (k/l) <= m
>
> That is probably straight forward to express in a crush rule.
>
> Cheers Andreas.
>
> PS: just one correction, I wrote 'degree' but it is called the 'distance' of the code
>
> ________________________________________
> From: Loic Dachary [loic@dachary.org]
> Sent: 01 August 2014 14:32
> To: Andreas Joachim Peters; ceph-devel@vger.kernel.org
> Subject: Re: Simplified LRC in CEPH
>
> Hi Andreas,
>
> Enlightening explanation, thank you !
>
> On 01/08/2014 13:45, Andreas Joachim Peters wrote:
>> Hi Loic et. al.
>>
>> I managed to prototype (and understand) LRC encoding similiar to Xorbas in the ISA plug-in.
>>
>> As an example take a (16,4) code (which gives nice alignment for 4k blocks) :
>>
>> For 4 sub groups of the data chunks you build e.g. local parities LP1-LP4
>>
>> LP1 = 1 ^ 2 ^ 3 ^ 4
>>
>> LP2 = 5 ^ 6 ^ 7 ^ 8
>>
>> LP3 = 9 ^ 10 ^ 11 ^ 12
>>
>> LP4 = 13 ^ 14 ^ 15 ^ 16
>>
>> You do normal erasure encoding with 4 RS chunks:
>>
>> RS(1..16) = (R1, R2, R3, R4)
>>
>> You compute the local parity LP5 for the erasure chunks:
>>
>> LP5 = R1 ^ R2 ^ R3 ^ R4
>>
>> The relation which holds for Vandermonde matrices (because the first matrix row contains only 1's)
>>
>> LP1 ^ LP2 ^ LP3 ^ LP4 = R1
>>
>> So you need to store only 24 chunks (not 25):
>>
>> (1 .. 16) (R2,R3,R4) (LP1,LP2,LP3,LP4,LP5)
>>
>> Side remark: in this simplified explanation I imply R1, not LP5 as described in the Xorbas paper
>
> Does it make a difference or is it equivalent ?
>
>> The degree of the code is 5 e.g. you can construct a failure with 5 losses where you loose data, while if you are 'lucky' the code can even repair up to 8 failures (one loss in each sub group + LP5,R2,R3,R4).
>>
>> The reconstruction traffic for single failures is:
>>
>> [(20 x 4) + (4 x 8)]/24 =~ 4.66 x [disk size] instead of 16 x [disk size]
>>
>> There are three repair scenarios:
>>
>> 1) only single failures in any of the local groups using LRC repair (simple parities)
>> 2) multiple failures which can be reconstructed with RS parities without LRC repair
>> 3) multiple failures which can be reconstructed with RS parities after LRC repair
>>
>> [ 4) reconstruction impossible  ]
>>
>> Having your proposed LRC layer (decoding) model in mind there is a certain contradiction here because there are cases where it is not required to use LRC since you can resolve all the failures with RS alone.
>>
>> In the end I think, it is sufficient if we introduce a parameter l in the EC parameter list which defines the number of subgroups in the data chunks and imply to use always one local parity for all RS chunks. So you can specify an LRC easily with three simple parameters:
>>
>> k=16 m=4 l=4
>>
>> The Xorbas configuration would be written as k=10 m=4 l=2
>>
>> Wouldn't that be much simpler and sufficient? What do you think?
>
> It would, definitely. How would you control where data / parity chunks are located ?
>
> Cheers
>>
>> Cheers Andreas.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Simplified LRC in CEPH
  2014-08-01 14:25         ` Andreas Joachim Peters
@ 2014-08-01 15:29           ` Sage Weil
  0 siblings, 0 replies; 6+ messages in thread
From: Sage Weil @ 2014-08-01 15:29 UTC (permalink / raw)
  To: Andreas Joachim Peters; +Cc: Loic Dachary, ceph-devel

On Fri, 1 Aug 2014, Andreas Joachim Peters wrote:
> Hi Loic, 
> your initial scheme is more flexible. Nevertheless I think that a 
> simplified description covers 99% of peoples use cases.
> 
> You can absorb the logic of implied parity into your generic LRC plug-in 
> if you find a good way to describe something like 'compute but don't' 
> store'. If you then offer additionally the poor man's description 
> (k,m,l) one cover's the simple and complex use case!

Yeah.

I like the part of Loic's strategy where we can do a search for 
different repair scenarios and choose the one that minimizes the cost 
(in terms of IO or CPU or whatever).  If it can express the xorbas-style 
code with an implied block, than I suspect we would want to do something 
similar internally anyway.

In any case, being able to simply say l=... would definitely be a win.

sage


> 
> Cheers Andreas.
> 
> ________________________________________
> From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Loic Dachary [loic@dachary.org]
> Sent: 01 August 2014 15:14
> To: Andreas Joachim Peters; ceph-devel@vger.kernel.org
> Subject: Re: Simplified LRC in CEPH
> 
> Hi Andreas,
> 
> It probably is just what we need. Although https://github.com/ceph/ceph/pull/1921 is more flexible in terms of chunk placement, I can't think of a use case where it would actually be useful. Maybe it's just me being back from hollidays but it smells like a solution to a non existent problem ;-) The other difference is that your proposal does not allow for nested locality. I.e. datacenter locality and rack locality within a datacenter, for instance. What do you think ?
> 
> Cheers
> 
> On 01/08/2014 18:29, Andreas Joachim Peters wrote:
> > Hi Loic,
> >
> >> It would, definitely. How would you control where data / parity chunks are located ?
> >
> > I ordered the chunks after encoding in this way:
> >
> > ( 1 2 3 4 LP1 ) ( 5 6 7 8 LP2 ) ( 9 10 11 12 LP3 ) ( 13 14 15 16 LP4 ) ( R2 R3 R4 LP5 )
> >
> > Always (k/l)+1 consecutive chunks belong location-wise together ...  and I demand that (k/l) <= m
> >
> > That is probably straight forward to express in a crush rule.
> >
> > Cheers Andreas.
> >
> > PS: just one correction, I wrote 'degree' but it is called the 'distance' of the code
> >
> > ________________________________________
> > From: Loic Dachary [loic@dachary.org]
> > Sent: 01 August 2014 14:32
> > To: Andreas Joachim Peters; ceph-devel@vger.kernel.org
> > Subject: Re: Simplified LRC in CEPH
> >
> > Hi Andreas,
> >
> > Enlightening explanation, thank you !
> >
> > On 01/08/2014 13:45, Andreas Joachim Peters wrote:
> >> Hi Loic et. al.
> >>
> >> I managed to prototype (and understand) LRC encoding similiar to Xorbas in the ISA plug-in.
> >>
> >> As an example take a (16,4) code (which gives nice alignment for 4k blocks) :
> >>
> >> For 4 sub groups of the data chunks you build e.g. local parities LP1-LP4
> >>
> >> LP1 = 1 ^ 2 ^ 3 ^ 4
> >>
> >> LP2 = 5 ^ 6 ^ 7 ^ 8
> >>
> >> LP3 = 9 ^ 10 ^ 11 ^ 12
> >>
> >> LP4 = 13 ^ 14 ^ 15 ^ 16
> >>
> >> You do normal erasure encoding with 4 RS chunks:
> >>
> >> RS(1..16) = (R1, R2, R3, R4)
> >>
> >> You compute the local parity LP5 for the erasure chunks:
> >>
> >> LP5 = R1 ^ R2 ^ R3 ^ R4
> >>
> >> The relation which holds for Vandermonde matrices (because the first matrix row contains only 1's)
> >>
> >> LP1 ^ LP2 ^ LP3 ^ LP4 = R1
> >>
> >> So you need to store only 24 chunks (not 25):
> >>
> >> (1 .. 16) (R2,R3,R4) (LP1,LP2,LP3,LP4,LP5)
> >>
> >> Side remark: in this simplified explanation I imply R1, not LP5 as described in the Xorbas paper
> >
> > Does it make a difference or is it equivalent ?
> >
> >> The degree of the code is 5 e.g. you can construct a failure with 5 losses where you loose data, while if you are 'lucky' the code can even repair up to 8 failures (one loss in each sub group + LP5,R2,R3,R4).
> >>
> >> The reconstruction traffic for single failures is:
> >>
> >> [(20 x 4) + (4 x 8)]/24 =~ 4.66 x [disk size] instead of 16 x [disk size]
> >>
> >> There are three repair scenarios:
> >>
> >> 1) only single failures in any of the local groups using LRC repair (simple parities)
> >> 2) multiple failures which can be reconstructed with RS parities without LRC repair
> >> 3) multiple failures which can be reconstructed with RS parities after LRC repair
> >>
> >> [ 4) reconstruction impossible  ]
> >>
> >> Having your proposed LRC layer (decoding) model in mind there is a certain contradiction here because there are cases where it is not required to use LRC since you can resolve all the failures with RS alone.
> >>
> >> In the end I think, it is sufficient if we introduce a parameter l in the EC parameter list which defines the number of subgroups in the data chunks and imply to use always one local parity for all RS chunks. So you can specify an LRC easily with three simple parameters:
> >>
> >> k=16 m=4 l=4
> >>
> >> The Xorbas configuration would be written as k=10 m=4 l=2
> >>
> >> Wouldn't that be much simpler and sufficient? What do you think?
> >
> > It would, definitely. How would you control where data / parity chunks are located ?
> >
> > Cheers
> >>
> >> Cheers Andreas.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> > --
> > Lo?c Dachary, Artisan Logiciel Libre
> >
> 
> --
> Lo?c Dachary, Artisan Logiciel Libre
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-01 15:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3472A07E6605974CBC9BC573F1BC02E4AE753328@CERNXCHG44.cern.ch>
2014-08-01  8:00 ` Simplified LRC in CEPH Andreas Joachim Peters
2014-08-01 12:32   ` Loic Dachary
2014-08-01 12:44     ` Andreas Joachim Peters
2014-08-01 13:14       ` Loic Dachary
2014-08-01 14:25         ` Andreas Joachim Peters
2014-08-01 15:29           ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.