All of lore.kernel.org
 help / color / mirror / Atom feed
* ISA erasure code plugin and cache
@ 2014-08-04 11:56 Loic Dachary
  2014-08-04 12:15 ` Andreas Joachim Peters
  0 siblings, 1 reply; 12+ messages in thread
From: Loic Dachary @ 2014-08-04 11:56 UTC (permalink / raw)
  To: Andreas-Joachim Peters; +Cc: Ma, Jianpeng, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 2073 bytes --]

Hi,

Here is how I understand the current code:

When an OSD is missing, recovery is required and the primary OSD will collect the available chunks to do so. It will then call the decode method via ECUtil::decode which is a small wrapper around the corresponding ErasureCodeInterface::decode method.

   https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361

The ISA plugin will then use the isa_decode method to perform the work

   https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L212

and will be repeatedly called until all objects in the PGs that were relying on the missing OSD are recovered. To avoid computing the decoding table for each object, it is stored in a LRU cache

   https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L480

and copied in the stack if already there:

   https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L433

Each PG has a separate instance of ErasureCodeIsa, obtained when it is created:

   https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292

It means that data members of each ErasureCodeIsa are copied as many times as there are PGs. If an OSD handles participates in 200 PG that belong to an erasure coded pool configured to use ErasureCodeIsa, the data members will be duplicated 200 times.

It is good practice to make it so that the encode/decode methods of ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no side effect on the object. In the isa plugin the LRU cache storing the decode tables is modified by the decode method and guarded by a mutex:

   get https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L281
   put https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L310

Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to find problems, I just wanted to make sure I get the intention before doing so.

Cheers
-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: ISA erasure code plugin and cache
  2014-08-04 11:56 ISA erasure code plugin and cache Loic Dachary
@ 2014-08-04 12:15 ` Andreas Joachim Peters
  2014-08-04 12:37   ` Loic Dachary
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Joachim Peters @ 2014-08-04 12:15 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ma, Jianpeng, Ceph Development

Hi Loic, 

the background relevant to your comments have (unfortunately) never been answered on the mailing list. 

The cache is written in a way, that it is useful for a fixed (k,m) combination and thread-safe.

So, if there is one instance of the plugin per pool, it is the right implementation. If there is (as you write) one instance for each PG in any pool, it is sort of 'stupid' because the same encoding table is stored for each PG seperatly.

So if the final statement is to create one plugin instance per PG, I will change it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it can stay. 

Just need to know that ... this boils down to the fact, that encoding & decoding should not be considered 'stateless'.

Cheers Andreas.


________________________________________
From: Loic Dachary [loic@dachary.org]
Sent: 04 August 2014 13:56
To: Andreas Joachim Peters
Cc: Ma, Jianpeng; Ceph Development
Subject: ISA erasure code plugin and cache

Hi,

Here is how I understand the current code:

When an OSD is missing, recovery is required and the primary OSD will collect the available chunks to do so. It will then call the decode method via ECUtil::decode which is a small wrapper around the corresponding ErasureCodeInterface::decode method.

   https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361

The ISA plugin will then use the isa_decode method to perform the work

   https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L212

and will be repeatedly called until all objects in the PGs that were relying on the missing OSD are recovered. To avoid computing the decoding table for each object, it is stored in a LRU cache

   https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L480

and copied in the stack if already there:

   https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L433

Each PG has a separate instance of ErasureCodeIsa, obtained when it is created:

   https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292

It means that data members of each ErasureCodeIsa are copied as many times as there are PGs. If an OSD handles participates in 200 PG that belong to an erasure coded pool configured to use ErasureCodeIsa, the data members will be duplicated 200 times.

It is good practice to make it so that the encode/decode methods of ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no side effect on the object. In the isa plugin the LRU cache storing the decode tables is modified by the decode method and guarded by a mutex:

   get https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L281
   put https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L310

Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to find problems, I just wanted to make sure I get the intention before doing so.

Cheers
--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISA erasure code plugin and cache
  2014-08-04 12:15 ` Andreas Joachim Peters
@ 2014-08-04 12:37   ` Loic Dachary
  2014-08-04 12:43     ` Andreas Joachim Peters
  2014-08-04 12:50     ` Ma, Jianpeng
  0 siblings, 2 replies; 12+ messages in thread
From: Loic Dachary @ 2014-08-04 12:37 UTC (permalink / raw)
  To: Andreas Joachim Peters; +Cc: Ma, Jianpeng, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 3789 bytes --]



On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic, 
> 
> the background relevant to your comments have (unfortunately) never been answered on the mailing list. 
> 
> The cache is written in a way, that it is useful for a fixed (k,m) combination and thread-safe.
> 
> So, if there is one instance of the plugin per pool, it is the right implementation. If there is (as you write) one instance for each PG in any pool, it is sort of 'stupid' because the same encoding table is stored for each PG seperatly.

I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs. Without cache the decode table has to be calculated for each object in the placement group and there are a lot of objects. The table may be duplicated hundreds of time so it has an impact on the memory footprint, but it should not have a visible impact on the decode performances. An optimisation of your implementation to save memory would be nice, but it is not critical.

How large are the decode tables ?

> So if the final statement is to create one plugin instance per PG, I will change it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it can stay. 
> 
> Just need to know that ... this boils down to the fact, that encoding & decoding should not be considered 'stateless'.
> 
> Cheers Andreas.
> 
> 
> ________________________________________
> From: Loic Dachary [loic@dachary.org]
> Sent: 04 August 2014 13:56
> To: Andreas Joachim Peters
> Cc: Ma, Jianpeng; Ceph Development
> Subject: ISA erasure code plugin and cache
> 
> Hi,
> 
> Here is how I understand the current code:
> 
> When an OSD is missing, recovery is required and the primary OSD will collect the available chunks to do so. It will then call the decode method via ECUtil::decode which is a small wrapper around the corresponding ErasureCodeInterface::decode method.
> 
>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
> 
> The ISA plugin will then use the isa_decode method to perform the work
> 
>    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L212
> 
> and will be repeatedly called until all objects in the PGs that were relying on the missing OSD are recovered. To avoid computing the decoding table for each object, it is stored in a LRU cache
> 
>    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L480
> 
> and copied in the stack if already there:
> 
>    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L433
> 
> Each PG has a separate instance of ErasureCodeIsa, obtained when it is created:
> 
>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
> 
> It means that data members of each ErasureCodeIsa are copied as many times as there are PGs. If an OSD handles participates in 200 PG that belong to an erasure coded pool configured to use ErasureCodeIsa, the data members will be duplicated 200 times.
> 
> It is good practice to make it so that the encode/decode methods of ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no side effect on the object. In the isa plugin the LRU cache storing the decode tables is modified by the decode method and guarded by a mutex:
> 
>    get https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L281
>    put https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L310
> 
> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to find problems, I just wanted to make sure I get the intention before doing so.
> 
> Cheers
> --
> Loïc Dachary, Artisan Logiciel Libre
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: ISA erasure code plugin and cache
  2014-08-04 12:37   ` Loic Dachary
@ 2014-08-04 12:43     ` Andreas Joachim Peters
  2014-08-04 12:50     ` Ma, Jianpeng
  1 sibling, 0 replies; 12+ messages in thread
From: Andreas Joachim Peters @ 2014-08-04 12:43 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ma, Jianpeng, Ceph Development

Hi Loic, 

the size of a table is k*(m+k) * 32 e.g. ~4k for a (10,4) codec.

The size to store all possible decoding tables of (10,4) is few MB - this matters if you have it 200 times duplicated ...

Cheers Andreas.
 

________________________________________
From: Loic Dachary [loic@dachary.org]
Sent: 04 August 2014 14:37
To: Andreas Joachim Peters
Cc: Ma, Jianpeng; Ceph Development
Subject: Re: ISA erasure code plugin and cache

On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
>
> the background relevant to your comments have (unfortunately) never been answered on the mailing list.
>
> The cache is written in a way, that it is useful for a fixed (k,m) combination and thread-safe.
>
> So, if there is one instance of the plugin per pool, it is the right implementation. If there is (as you write) one instance for each PG in any pool, it is sort of 'stupid' because the same encoding table is stored for each PG seperatly.

I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs. Without cache the decode table has to be calculated for each object in the placement group and there are a lot of objects. The table may be duplicated hundreds of time so it has an impact on the memory footprint, but it should not have a visible impact on the decode performances. An optimisation of your implementation to save memory would be nice, but it is not critical.

How large are the decode tables ?

> So if the final statement is to create one plugin instance per PG, I will change it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it can stay.
>
> Just need to know that ... this boils down to the fact, that encoding & decoding should not be considered 'stateless'.
>
> Cheers Andreas.
>
>
> ________________________________________
> From: Loic Dachary [loic@dachary.org]
> Sent: 04 August 2014 13:56
> To: Andreas Joachim Peters
> Cc: Ma, Jianpeng; Ceph Development
> Subject: ISA erasure code plugin and cache
>
> Hi,
>
> Here is how I understand the current code:
>
> When an OSD is missing, recovery is required and the primary OSD will collect the available chunks to do so. It will then call the decode method via ECUtil::decode which is a small wrapper around the corresponding ErasureCodeInterface::decode method.
>
>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
>
> The ISA plugin will then use the isa_decode method to perform the work
>
>    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L212
>
> and will be repeatedly called until all objects in the PGs that were relying on the missing OSD are recovered. To avoid computing the decoding table for each object, it is stored in a LRU cache
>
>    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L480
>
> and copied in the stack if already there:
>
>    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L433
>
> Each PG has a separate instance of ErasureCodeIsa, obtained when it is created:
>
>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
>
> It means that data members of each ErasureCodeIsa are copied as many times as there are PGs. If an OSD handles participates in 200 PG that belong to an erasure coded pool configured to use ErasureCodeIsa, the data members will be duplicated 200 times.
>
> It is good practice to make it so that the encode/decode methods of ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no side effect on the object. In the isa plugin the LRU cache storing the decode tables is modified by the decode method and guarded by a mutex:
>
>    get https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L281
>    put https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodeIsa.cc#L310
>
> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to find problems, I just wanted to make sure I get the intention before doing so.
>
> Cheers
> --
> Loïc Dachary, Artisan Logiciel Libre
>

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: ISA erasure code plugin and cache
  2014-08-04 12:37   ` Loic Dachary
  2014-08-04 12:43     ` Andreas Joachim Peters
@ 2014-08-04 12:50     ` Ma, Jianpeng
  2014-08-04 12:56       ` Loic Dachary
  1 sibling, 1 reply; 12+ messages in thread
From: Ma, Jianpeng @ 2014-08-04 12:50 UTC (permalink / raw)
  To: Loic Dachary, Andreas Joachim Peters; +Cc: Ceph Development

> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
> Sent: Monday, August 4, 2014 8:37 PM
> To: Andreas Joachim Peters
> Cc: Ma, Jianpeng; Ceph Development
> Subject: Re: ISA erasure code plugin and cache
> 
> 
> 
> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
> >
> > the background relevant to your comments have (unfortunately) never been
> answered on the mailing list.
> >
> > The cache is written in a way, that it is useful for a fixed (k,m) combination
> and thread-safe.
> >
> > So, if there is one instance of the plugin per pool, it is the right
> implementation. If there is (as you write) one instance for each PG in any pool,
> it is sort of 'stupid' because the same encoding table is stored for each PG
> seperatly.
> 
> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
> Without cache the decode table has to be calculated for each object in the
> placement group and there are a lot of objects. The table may be duplicated
> hundreds of time so it has an impact on the memory footprint, but it should not
> have a visible impact on the decode performances. An optimisation of your
> implementation to save memory would be nice, but it is not critical.
> 
AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
But the later won't.  It only need a entry to cache.
Or am I missing something?

Jianpeng Ma

> How large are the decode tables ?
> 
> > So if the final statement is to create one plugin instance per PG, I will change
> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
> can stay.
> >
> > Just need to know that ... this boils down to the fact, that encoding &
> decoding should not be considered 'stateless'.
> >
> > Cheers Andreas.
> >
> >
> > ________________________________________
> > From: Loic Dachary [loic@dachary.org]
> > Sent: 04 August 2014 13:56
> > To: Andreas Joachim Peters
> > Cc: Ma, Jianpeng; Ceph Development
> > Subject: ISA erasure code plugin and cache
> >
> > Hi,
> >
> > Here is how I understand the current code:
> >
> > When an OSD is missing, recovery is required and the primary OSD will collect
> the available chunks to do so. It will then call the decode method via
> ECUtil::decode which is a small wrapper around the corresponding
> ErasureCodeInterface::decode method.
> >
> >    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
> >
> > The ISA plugin will then use the isa_decode method to perform the work
> >
> >
> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> Isa.cc#L212
> >
> > and will be repeatedly called until all objects in the PGs that were relying on
> the missing OSD are recovered. To avoid computing the decoding table for each
> object, it is stored in a LRU cache
> >
> >
> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> Isa.cc#L480
> >
> > and copied in the stack if already there:
> >
> >
> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> Isa.cc#L433
> >
> > Each PG has a separate instance of ErasureCodeIsa, obtained when it is
> created:
> >
> >    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
> >
> > It means that data members of each ErasureCodeIsa are copied as many
> times as there are PGs. If an OSD handles participates in 200 PG that belong to
> an erasure coded pool configured to use ErasureCodeIsa, the data members
> will be duplicated 200 times.
> >
> > It is good practice to make it so that the encode/decode methods of
> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
> side effect on the object. In the isa plugin the LRU cache storing the decode
> tables is modified by the decode method and guarded by a mutex:
> >
> >    get
> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> Isa.cc#L281
> >    put
> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> Isa.cc#L310
> >
> > Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
> find problems, I just wanted to make sure I get the intention before doing so.
> >
> > Cheers
> > --
> > Loïc Dachary, Artisan Logiciel Libre
> >
> 
> --
> Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISA erasure code plugin and cache
  2014-08-04 12:50     ` Ma, Jianpeng
@ 2014-08-04 12:56       ` Loic Dachary
  2014-08-04 13:19         ` Andreas Joachim Peters
  0 siblings, 1 reply; 12+ messages in thread
From: Loic Dachary @ 2014-08-04 12:56 UTC (permalink / raw)
  To: Ma, Jianpeng, Andreas Joachim Peters; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 4967 bytes --]



On 04/08/2014 14:50, Ma, Jianpeng wrote:
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
>> Sent: Monday, August 4, 2014 8:37 PM
>> To: Andreas Joachim Peters
>> Cc: Ma, Jianpeng; Ceph Development
>> Subject: Re: ISA erasure code plugin and cache
>>
>>
>>
>> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
>>>
>>> the background relevant to your comments have (unfortunately) never been
>> answered on the mailing list.
>>>
>>> The cache is written in a way, that it is useful for a fixed (k,m) combination
>> and thread-safe.
>>>
>>> So, if there is one instance of the plugin per pool, it is the right
>> implementation. If there is (as you write) one instance for each PG in any pool,
>> it is sort of 'stupid' because the same encoding table is stored for each PG
>> seperatly.
>>
>> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
>> Without cache the decode table has to be calculated for each object in the
>> placement group and there are a lot of objects. The table may be duplicated
>> hundreds of time so it has an impact on the memory footprint, but it should not
>> have a visible impact on the decode performances. An optimisation of your
>> implementation to save memory would be nice, but it is not critical.
>>
> AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
> But the later won't.  It only need a entry to cache.
> Or am I missing something?

It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.

Cheers

> 
> Jianpeng Ma
> 
>> How large are the decode tables ?
>>
>>> So if the final statement is to create one plugin instance per PG, I will change
>> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
>> can stay.
>>>
>>> Just need to know that ... this boils down to the fact, that encoding &
>> decoding should not be considered 'stateless'.
>>>
>>> Cheers Andreas.
>>>
>>>
>>> ________________________________________
>>> From: Loic Dachary [loic@dachary.org]
>>> Sent: 04 August 2014 13:56
>>> To: Andreas Joachim Peters
>>> Cc: Ma, Jianpeng; Ceph Development
>>> Subject: ISA erasure code plugin and cache
>>>
>>> Hi,
>>>
>>> Here is how I understand the current code:
>>>
>>> When an OSD is missing, recovery is required and the primary OSD will collect
>> the available chunks to do so. It will then call the decode method via
>> ECUtil::decode which is a small wrapper around the corresponding
>> ErasureCodeInterface::decode method.
>>>
>>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
>>>
>>> The ISA plugin will then use the isa_decode method to perform the work
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L212
>>>
>>> and will be repeatedly called until all objects in the PGs that were relying on
>> the missing OSD are recovered. To avoid computing the decoding table for each
>> object, it is stored in a LRU cache
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L480
>>>
>>> and copied in the stack if already there:
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L433
>>>
>>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
>> created:
>>>
>>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
>>>
>>> It means that data members of each ErasureCodeIsa are copied as many
>> times as there are PGs. If an OSD handles participates in 200 PG that belong to
>> an erasure coded pool configured to use ErasureCodeIsa, the data members
>> will be duplicated 200 times.
>>>
>>> It is good practice to make it so that the encode/decode methods of
>> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
>> side effect on the object. In the isa plugin the LRU cache storing the decode
>> tables is modified by the decode method and guarded by a mutex:
>>>
>>>    get
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L281
>>>    put
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L310
>>>
>>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
>> find problems, I just wanted to make sure I get the intention before doing so.
>>>
>>> Cheers
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: ISA erasure code plugin and cache
  2014-08-04 12:56       ` Loic Dachary
@ 2014-08-04 13:19         ` Andreas Joachim Peters
  2014-08-04 14:22           ` Sage Weil
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Joachim Peters @ 2014-08-04 13:19 UTC (permalink / raw)
  To: Loic Dachary, Ma, Jianpeng; +Cc: Ceph Development

Yes, 
you are both right, it would have only the current failure situation in each cache and it stores the encoding table in each plug-in instance.
Probably the sharing of cache and encoding table would add more (unwanted) thread synchronization points. 

If this is the final design, I might remove also the lock? Nevertheless, it has no measurable performance impact ...

Cheers Andreas.


________________________________________
From: Loic Dachary [loic@dachary.org]
Sent: 04 August 2014 14:56
To: Ma, Jianpeng; Andreas Joachim Peters
Cc: Ceph Development
Subject: Re: ISA erasure code plugin and cache

On 04/08/2014 14:50, Ma, Jianpeng wrote:
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
>> Sent: Monday, August 4, 2014 8:37 PM
>> To: Andreas Joachim Peters
>> Cc: Ma, Jianpeng; Ceph Development
>> Subject: Re: ISA erasure code plugin and cache
>>
>>
>>
>> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
>>>
>>> the background relevant to your comments have (unfortunately) never been
>> answered on the mailing list.
>>>
>>> The cache is written in a way, that it is useful for a fixed (k,m) combination
>> and thread-safe.
>>>
>>> So, if there is one instance of the plugin per pool, it is the right
>> implementation. If there is (as you write) one instance for each PG in any pool,
>> it is sort of 'stupid' because the same encoding table is stored for each PG
>> seperatly.
>>
>> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
>> Without cache the decode table has to be calculated for each object in the
>> placement group and there are a lot of objects. The table may be duplicated
>> hundreds of time so it has an impact on the memory footprint, but it should not
>> have a visible impact on the decode performances. An optimisation of your
>> implementation to save memory would be nice, but it is not critical.
>>
> AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
> But the later won't.  It only need a entry to cache.
> Or am I missing something?

It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.

Cheers

>
> Jianpeng Ma
>
>> How large are the decode tables ?
>>
>>> So if the final statement is to create one plugin instance per PG, I will change
>> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
>> can stay.
>>>
>>> Just need to know that ... this boils down to the fact, that encoding &
>> decoding should not be considered 'stateless'.
>>>
>>> Cheers Andreas.
>>>
>>>
>>> ________________________________________
>>> From: Loic Dachary [loic@dachary.org]
>>> Sent: 04 August 2014 13:56
>>> To: Andreas Joachim Peters
>>> Cc: Ma, Jianpeng; Ceph Development
>>> Subject: ISA erasure code plugin and cache
>>>
>>> Hi,
>>>
>>> Here is how I understand the current code:
>>>
>>> When an OSD is missing, recovery is required and the primary OSD will collect
>> the available chunks to do so. It will then call the decode method via
>> ECUtil::decode which is a small wrapper around the corresponding
>> ErasureCodeInterface::decode method.
>>>
>>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
>>>
>>> The ISA plugin will then use the isa_decode method to perform the work
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L212
>>>
>>> and will be repeatedly called until all objects in the PGs that were relying on
>> the missing OSD are recovered. To avoid computing the decoding table for each
>> object, it is stored in a LRU cache
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L480
>>>
>>> and copied in the stack if already there:
>>>
>>>
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L433
>>>
>>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
>> created:
>>>
>>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
>>>
>>> It means that data members of each ErasureCodeIsa are copied as many
>> times as there are PGs. If an OSD handles participates in 200 PG that belong to
>> an erasure coded pool configured to use ErasureCodeIsa, the data members
>> will be duplicated 200 times.
>>>
>>> It is good practice to make it so that the encode/decode methods of
>> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
>> side effect on the object. In the isa plugin the LRU cache storing the decode
>> tables is modified by the decode method and guarded by a mutex:
>>>
>>>    get
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L281
>>>    put
>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>> Isa.cc#L310
>>>
>>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
>> find problems, I just wanted to make sure I get the intention before doing so.
>>>
>>> Cheers
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: ISA erasure code plugin and cache
  2014-08-04 13:19         ` Andreas Joachim Peters
@ 2014-08-04 14:22           ` Sage Weil
  2014-08-04 15:22             ` Andreas Joachim Peters
  0 siblings, 1 reply; 12+ messages in thread
From: Sage Weil @ 2014-08-04 14:22 UTC (permalink / raw)
  To: Andreas Joachim Peters; +Cc: Loic Dachary, Ma, Jianpeng, Ceph Development

On Mon, 4 Aug 2014, Andreas Joachim Peters wrote:
> Yes, 
> you are both right, it would have only the current failure situation in 
> each cache and it stores the encoding table in each plug-in instance.
> Probably the sharing of cache and encoding table would add more 
> (unwanted) thread synchronization points.
> 
> If this is the final design, I might remove also the lock? Nevertheless, 
> it has no measurable performance impact ...

Perhaps, but I think the memory impact is significant, especially since 
people will want to use erasure coding for "cold" storage pools are 
underpowered hardware.  I think we should adjust the strategy so that 
there is a single instance per pool, or (perhaps better) the cache is 
global to the plugin (and shared across pools that may share the same 
EC profile).

sage


> Cheers Andreas.
> 
> 
> ________________________________________
> From: Loic Dachary [loic@dachary.org]
> Sent: 04 August 2014 14:56
> To: Ma, Jianpeng; Andreas Joachim Peters
> Cc: Ceph Development
> Subject: Re: ISA erasure code plugin and cache
> 
> On 04/08/2014 14:50, Ma, Jianpeng wrote:
> >> -----Original Message-----
> >> From: ceph-devel-owner@vger.kernel.org
> >> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
> >> Sent: Monday, August 4, 2014 8:37 PM
> >> To: Andreas Joachim Peters
> >> Cc: Ma, Jianpeng; Ceph Development
> >> Subject: Re: ISA erasure code plugin and cache
> >>
> >>
> >>
> >> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
> >>>
> >>> the background relevant to your comments have (unfortunately) never been
> >> answered on the mailing list.
> >>>
> >>> The cache is written in a way, that it is useful for a fixed (k,m) combination
> >> and thread-safe.
> >>>
> >>> So, if there is one instance of the plugin per pool, it is the right
> >> implementation. If there is (as you write) one instance for each PG in any pool,
> >> it is sort of 'stupid' because the same encoding table is stored for each PG
> >> seperatly.
> >>
> >> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
> >> Without cache the decode table has to be calculated for each object in the
> >> placement group and there are a lot of objects. The table may be duplicated
> >> hundreds of time so it has an impact on the memory footprint, but it should not
> >> have a visible impact on the decode performances. An optimisation of your
> >> implementation to save memory would be nice, but it is not critical.
> >>
> > AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
> > But the later won't.  It only need a entry to cache.
> > Or am I missing something?
> 
> It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.
> 
> Cheers
> 
> >
> > Jianpeng Ma
> >
> >> How large are the decode tables ?
> >>
> >>> So if the final statement is to create one plugin instance per PG, I will change
> >> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
> >> can stay.
> >>>
> >>> Just need to know that ... this boils down to the fact, that encoding &
> >> decoding should not be considered 'stateless'.
> >>>
> >>> Cheers Andreas.
> >>>
> >>>
> >>> ________________________________________
> >>> From: Loic Dachary [loic@dachary.org]
> >>> Sent: 04 August 2014 13:56
> >>> To: Andreas Joachim Peters
> >>> Cc: Ma, Jianpeng; Ceph Development
> >>> Subject: ISA erasure code plugin and cache
> >>>
> >>> Hi,
> >>>
> >>> Here is how I understand the current code:
> >>>
> >>> When an OSD is missing, recovery is required and the primary OSD will collect
> >> the available chunks to do so. It will then call the decode method via
> >> ECUtil::decode which is a small wrapper around the corresponding
> >> ErasureCodeInterface::decode method.
> >>>
> >>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
> >>>
> >>> The ISA plugin will then use the isa_decode method to perform the work
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L212
> >>>
> >>> and will be repeatedly called until all objects in the PGs that were relying on
> >> the missing OSD are recovered. To avoid computing the decoding table for each
> >> object, it is stored in a LRU cache
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L480
> >>>
> >>> and copied in the stack if already there:
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L433
> >>>
> >>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
> >> created:
> >>>
> >>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
> >>>
> >>> It means that data members of each ErasureCodeIsa are copied as many
> >> times as there are PGs. If an OSD handles participates in 200 PG that belong to
> >> an erasure coded pool configured to use ErasureCodeIsa, the data members
> >> will be duplicated 200 times.
> >>>
> >>> It is good practice to make it so that the encode/decode methods of
> >> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
> >> side effect on the object. In the isa plugin the LRU cache storing the decode
> >> tables is modified by the decode method and guarded by a mutex:
> >>>
> >>>    get
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L281
> >>>    put
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L310
> >>>
> >>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
> >> find problems, I just wanted to make sure I get the intention before doing so.
> >>>
> >>> Cheers
> >>> --
> >>> Lo?c Dachary, Artisan Logiciel Libre
> >>>
> >>
> >> --
> >> Lo?c Dachary, Artisan Logiciel Libre
> >
> 
> --
> Lo?c Dachary, Artisan Logiciel Libre
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: ISA erasure code plugin and cache
  2014-08-04 14:22           ` Sage Weil
@ 2014-08-04 15:22             ` Andreas Joachim Peters
  2014-08-04 17:04               ` Loic Dachary
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Joachim Peters @ 2014-08-04 15:22 UTC (permalink / raw)
  To: Sage Weil; +Cc: Loic Dachary, Ma, Jianpeng, Ceph Development

Could one not attach the EC instance to the "pg_pool_t" structure which is given by reference to PGBackend *PGBackend::build_pg_backend? Just a naive question ... i don't know the upstream code ...

That would follow the spirit of 'saving/sharing resources' and would follow the logic that the CODEC configuration is by pool and not by PG ?!?!?

Cheers Andreas.



________________________________________
From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Sage Weil [sweil@redhat.com]
Sent: 04 August 2014 16:22
To: Andreas Joachim Peters
Cc: Loic Dachary; Ma, Jianpeng; Ceph Development
Subject: RE: ISA erasure code plugin and cache

On Mon, 4 Aug 2014, Andreas Joachim Peters wrote:
> Yes,
> you are both right, it would have only the current failure situation in
> each cache and it stores the encoding table in each plug-in instance.
> Probably the sharing of cache and encoding table would add more
> (unwanted) thread synchronization points.
>
> If this is the final design, I might remove also the lock? Nevertheless,
> it has no measurable performance impact ...

Perhaps, but I think the memory impact is significant, especially since
people will want to use erasure coding for "cold" storage pools are
underpowered hardware.  I think we should adjust the strategy so that
there is a single instance per pool, or (perhaps better) the cache is
global to the plugin (and shared across pools that may share the same
EC profile).

sage


> Cheers Andreas.
>
>
> ________________________________________
> From: Loic Dachary [loic@dachary.org]
> Sent: 04 August 2014 14:56
> To: Ma, Jianpeng; Andreas Joachim Peters
> Cc: Ceph Development
> Subject: Re: ISA erasure code plugin and cache
>
> On 04/08/2014 14:50, Ma, Jianpeng wrote:
> >> -----Original Message-----
> >> From: ceph-devel-owner@vger.kernel.org
> >> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
> >> Sent: Monday, August 4, 2014 8:37 PM
> >> To: Andreas Joachim Peters
> >> Cc: Ma, Jianpeng; Ceph Development
> >> Subject: Re: ISA erasure code plugin and cache
> >>
> >>
> >>
> >> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
> >>>
> >>> the background relevant to your comments have (unfortunately) never been
> >> answered on the mailing list.
> >>>
> >>> The cache is written in a way, that it is useful for a fixed (k,m) combination
> >> and thread-safe.
> >>>
> >>> So, if there is one instance of the plugin per pool, it is the right
> >> implementation. If there is (as you write) one instance for each PG in any pool,
> >> it is sort of 'stupid' because the same encoding table is stored for each PG
> >> seperatly.
> >>
> >> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
> >> Without cache the decode table has to be calculated for each object in the
> >> placement group and there are a lot of objects. The table may be duplicated
> >> hundreds of time so it has an impact on the memory footprint, but it should not
> >> have a visible impact on the decode performances. An optimisation of your
> >> implementation to save memory would be nice, but it is not critical.
> >>
> > AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
> > But the later won't.  It only need a entry to cache.
> > Or am I missing something?
>
> It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.
>
> Cheers
>
> >
> > Jianpeng Ma
> >
> >> How large are the decode tables ?
> >>
> >>> So if the final statement is to create one plugin instance per PG, I will change
> >> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
> >> can stay.
> >>>
> >>> Just need to know that ... this boils down to the fact, that encoding &
> >> decoding should not be considered 'stateless'.
> >>>
> >>> Cheers Andreas.
> >>>
> >>>
> >>> ________________________________________
> >>> From: Loic Dachary [loic@dachary.org]
> >>> Sent: 04 August 2014 13:56
> >>> To: Andreas Joachim Peters
> >>> Cc: Ma, Jianpeng; Ceph Development
> >>> Subject: ISA erasure code plugin and cache
> >>>
> >>> Hi,
> >>>
> >>> Here is how I understand the current code:
> >>>
> >>> When an OSD is missing, recovery is required and the primary OSD will collect
> >> the available chunks to do so. It will then call the decode method via
> >> ECUtil::decode which is a small wrapper around the corresponding
> >> ErasureCodeInterface::decode method.
> >>>
> >>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
> >>>
> >>> The ISA plugin will then use the isa_decode method to perform the work
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L212
> >>>
> >>> and will be repeatedly called until all objects in the PGs that were relying on
> >> the missing OSD are recovered. To avoid computing the decoding table for each
> >> object, it is stored in a LRU cache
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L480
> >>>
> >>> and copied in the stack if already there:
> >>>
> >>>
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L433
> >>>
> >>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
> >> created:
> >>>
> >>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
> >>>
> >>> It means that data members of each ErasureCodeIsa are copied as many
> >> times as there are PGs. If an OSD handles participates in 200 PG that belong to
> >> an erasure coded pool configured to use ErasureCodeIsa, the data members
> >> will be duplicated 200 times.
> >>>
> >>> It is good practice to make it so that the encode/decode methods of
> >> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
> >> side effect on the object. In the isa plugin the LRU cache storing the decode
> >> tables is modified by the decode method and guarded by a mutex:
> >>>
> >>>    get
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L281
> >>>    put
> >> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
> >> Isa.cc#L310
> >>>
> >>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
> >> find problems, I just wanted to make sure I get the intention before doing so.
> >>>
> >>> Cheers
> >>> --
> >>> Lo?c Dachary, Artisan Logiciel Libre
> >>>
> >>
> >> --
> >> Lo?c Dachary, Artisan Logiciel Libre
> >
>
> --
> Lo?c Dachary, Artisan Logiciel Libre
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISA erasure code plugin and cache
  2014-08-04 15:22             ` Andreas Joachim Peters
@ 2014-08-04 17:04               ` Loic Dachary
  2014-08-04 20:07                 ` Andreas Joachim Peters
  0 siblings, 1 reply; 12+ messages in thread
From: Loic Dachary @ 2014-08-04 17:04 UTC (permalink / raw)
  To: Andreas Joachim Peters; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 8834 bytes --]

Hi Andreas, 

What about getting a singleton containing the LRU cache in

    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L48

and making it a parameter to the constructor

    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L56

or the init method

    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L70

I assume LRU cache can only be shared if the parameters are the same because it would not make sense to share the cache of an instance using the Cauchy technique with the cache of an instance using the Vandermonde technique.

If the singleton is created in the plugin entry point:

   https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L77

there is no need to protect it against race conditions because the call to the entry point already is guarded:

   https://github.com/ceph/ceph/blob/master/src/erasure-code/ErasureCodePlugin.cc#L74

Cheers

On 04/08/2014 17:22, Andreas Joachim Peters wrote:
> Could one not attach the EC instance to the "pg_pool_t" structure which is given by reference to PGBackend *PGBackend::build_pg_backend? Just a naive question ... i don't know the upstream code ...
> 
> That would follow the spirit of 'saving/sharing resources' and would follow the logic that the CODEC configuration is by pool and not by PG ?!?!?
> 
> Cheers Andreas.
> 
> 
> 
> ________________________________________
> From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Sage Weil [sweil@redhat.com]
> Sent: 04 August 2014 16:22
> To: Andreas Joachim Peters
> Cc: Loic Dachary; Ma, Jianpeng; Ceph Development
> Subject: RE: ISA erasure code plugin and cache
> 
> On Mon, 4 Aug 2014, Andreas Joachim Peters wrote:
>> Yes,
>> you are both right, it would have only the current failure situation in
>> each cache and it stores the encoding table in each plug-in instance.
>> Probably the sharing of cache and encoding table would add more
>> (unwanted) thread synchronization points.
>>
>> If this is the final design, I might remove also the lock? Nevertheless,
>> it has no measurable performance impact ...
> 
> Perhaps, but I think the memory impact is significant, especially since
> people will want to use erasure coding for "cold" storage pools are
> underpowered hardware.  I think we should adjust the strategy so that
> there is a single instance per pool, or (perhaps better) the cache is
> global to the plugin (and shared across pools that may share the same
> EC profile).
> 
> sage
> 
> 
>> Cheers Andreas.
>>
>>
>> ________________________________________
>> From: Loic Dachary [loic@dachary.org]
>> Sent: 04 August 2014 14:56
>> To: Ma, Jianpeng; Andreas Joachim Peters
>> Cc: Ceph Development
>> Subject: Re: ISA erasure code plugin and cache
>>
>> On 04/08/2014 14:50, Ma, Jianpeng wrote:
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
>>>> Sent: Monday, August 4, 2014 8:37 PM
>>>> To: Andreas Joachim Peters
>>>> Cc: Ma, Jianpeng; Ceph Development
>>>> Subject: Re: ISA erasure code plugin and cache
>>>>
>>>>
>>>>
>>>> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
>>>>>
>>>>> the background relevant to your comments have (unfortunately) never been
>>>> answered on the mailing list.
>>>>>
>>>>> The cache is written in a way, that it is useful for a fixed (k,m) combination
>>>> and thread-safe.
>>>>>
>>>>> So, if there is one instance of the plugin per pool, it is the right
>>>> implementation. If there is (as you write) one instance for each PG in any pool,
>>>> it is sort of 'stupid' because the same encoding table is stored for each PG
>>>> seperatly.
>>>>
>>>> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
>>>> Without cache the decode table has to be calculated for each object in the
>>>> placement group and there are a lot of objects. The table may be duplicated
>>>> hundreds of time so it has an impact on the memory footprint, but it should not
>>>> have a visible impact on the decode performances. An optimisation of your
>>>> implementation to save memory would be nice, but it is not critical.
>>>>
>>> AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
>>> But the later won't.  It only need a entry to cache.
>>> Or am I missing something?
>>
>> It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.
>>
>> Cheers
>>
>>>
>>> Jianpeng Ma
>>>
>>>> How large are the decode tables ?
>>>>
>>>>> So if the final statement is to create one plugin instance per PG, I will change
>>>> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
>>>> can stay.
>>>>>
>>>>> Just need to know that ... this boils down to the fact, that encoding &
>>>> decoding should not be considered 'stateless'.
>>>>>
>>>>> Cheers Andreas.
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Loic Dachary [loic@dachary.org]
>>>>> Sent: 04 August 2014 13:56
>>>>> To: Andreas Joachim Peters
>>>>> Cc: Ma, Jianpeng; Ceph Development
>>>>> Subject: ISA erasure code plugin and cache
>>>>>
>>>>> Hi,
>>>>>
>>>>> Here is how I understand the current code:
>>>>>
>>>>> When an OSD is missing, recovery is required and the primary OSD will collect
>>>> the available chunks to do so. It will then call the decode method via
>>>> ECUtil::decode which is a small wrapper around the corresponding
>>>> ErasureCodeInterface::decode method.
>>>>>
>>>>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
>>>>>
>>>>> The ISA plugin will then use the isa_decode method to perform the work
>>>>>
>>>>>
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L212
>>>>>
>>>>> and will be repeatedly called until all objects in the PGs that were relying on
>>>> the missing OSD are recovered. To avoid computing the decoding table for each
>>>> object, it is stored in a LRU cache
>>>>>
>>>>>
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L480
>>>>>
>>>>> and copied in the stack if already there:
>>>>>
>>>>>
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L433
>>>>>
>>>>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
>>>> created:
>>>>>
>>>>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
>>>>>
>>>>> It means that data members of each ErasureCodeIsa are copied as many
>>>> times as there are PGs. If an OSD handles participates in 200 PG that belong to
>>>> an erasure coded pool configured to use ErasureCodeIsa, the data members
>>>> will be duplicated 200 times.
>>>>>
>>>>> It is good practice to make it so that the encode/decode methods of
>>>> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
>>>> side effect on the object. In the isa plugin the LRU cache storing the decode
>>>> tables is modified by the decode method and guarded by a mutex:
>>>>>
>>>>>    get
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L281
>>>>>    put
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L310
>>>>>
>>>>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
>>>> find problems, I just wanted to make sure I get the intention before doing so.
>>>>>
>>>>> Cheers
>>>>> --
>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>>
>>>>
>>>> --
>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>
>>
>> --
>> Lo?c Dachary, Artisan Logiciel Libre
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: ISA erasure code plugin and cache
  2014-08-04 17:04               ` Loic Dachary
@ 2014-08-04 20:07                 ` Andreas Joachim Peters
  2014-08-04 21:26                   ` Loic Dachary
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Joachim Peters @ 2014-08-04 20:07 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

Hi Loic, 
brilliant!

That's also a good compromise to share encoding tables and avoid locking for encoding.

If there is no upstream change envisaged, I will prepare a pull request according to your proposal. The decoding cache can be shared for each technique, k+m are intrinsic in the key's used for lookup e.g. we would have two LRU caches if both techniques are used and one encoding table for each (k+m) value in use.
The current setting let's one LRU cache grow to a maximum of 15 MB, however I would consider such an event as 'extremly' rare ... it is mainly triggered artificially using the benchmark tool.

Cheers Andreas.

________________________________________
From: Loic Dachary [loic@dachary.org]
Sent: 04 August 2014 19:04
To: Andreas Joachim Peters
Cc: Ceph Development
Subject: Re: ISA erasure code plugin and cache

Hi Andreas,

What about getting a singleton containing the LRU cache in

    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L48

and making it a parameter to the constructor

    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L56

or the init method

    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L70

I assume LRU cache can only be shared if the parameters are the same because it would not make sense to share the cache of an instance using the Cauchy technique with the cache of an instance using the Vandermonde technique.

If the singleton is created in the plugin entry point:

   https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L77

there is no need to protect it against race conditions because the call to the entry point already is guarded:

   https://github.com/ceph/ceph/blob/master/src/erasure-code/ErasureCodePlugin.cc#L74

Cheers

On 04/08/2014 17:22, Andreas Joachim Peters wrote:
> Could one not attach the EC instance to the "pg_pool_t" structure which is given by reference to PGBackend *PGBackend::build_pg_backend? Just a naive question ... i don't know the upstream code ...
>
> That would follow the spirit of 'saving/sharing resources' and would follow the logic that the CODEC configuration is by pool and not by PG ?!?!?
>
> Cheers Andreas.
>
>
>
> ________________________________________
> From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Sage Weil [sweil@redhat.com]
> Sent: 04 August 2014 16:22
> To: Andreas Joachim Peters
> Cc: Loic Dachary; Ma, Jianpeng; Ceph Development
> Subject: RE: ISA erasure code plugin and cache
>
> On Mon, 4 Aug 2014, Andreas Joachim Peters wrote:
>> Yes,
>> you are both right, it would have only the current failure situation in
>> each cache and it stores the encoding table in each plug-in instance.
>> Probably the sharing of cache and encoding table would add more
>> (unwanted) thread synchronization points.
>>
>> If this is the final design, I might remove also the lock? Nevertheless,
>> it has no measurable performance impact ...
>
> Perhaps, but I think the memory impact is significant, especially since
> people will want to use erasure coding for "cold" storage pools are
> underpowered hardware.  I think we should adjust the strategy so that
> there is a single instance per pool, or (perhaps better) the cache is
> global to the plugin (and shared across pools that may share the same
> EC profile).
>
> sage
>
>
>> Cheers Andreas.
>>
>>
>> ________________________________________
>> From: Loic Dachary [loic@dachary.org]
>> Sent: 04 August 2014 14:56
>> To: Ma, Jianpeng; Andreas Joachim Peters
>> Cc: Ceph Development
>> Subject: Re: ISA erasure code plugin and cache
>>
>> On 04/08/2014 14:50, Ma, Jianpeng wrote:
>>>> -----Original Message-----
>>>> From: ceph-devel-owner@vger.kernel.org
>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
>>>> Sent: Monday, August 4, 2014 8:37 PM
>>>> To: Andreas Joachim Peters
>>>> Cc: Ma, Jianpeng; Ceph Development
>>>> Subject: Re: ISA erasure code plugin and cache
>>>>
>>>>
>>>>
>>>> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
>>>>>
>>>>> the background relevant to your comments have (unfortunately) never been
>>>> answered on the mailing list.
>>>>>
>>>>> The cache is written in a way, that it is useful for a fixed (k,m) combination
>>>> and thread-safe.
>>>>>
>>>>> So, if there is one instance of the plugin per pool, it is the right
>>>> implementation. If there is (as you write) one instance for each PG in any pool,
>>>> it is sort of 'stupid' because the same encoding table is stored for each PG
>>>> seperatly.
>>>>
>>>> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
>>>> Without cache the decode table has to be calculated for each object in the
>>>> placement group and there are a lot of objects. The table may be duplicated
>>>> hundreds of time so it has an impact on the memory footprint, but it should not
>>>> have a visible impact on the decode performances. An optimisation of your
>>>> implementation to save memory would be nice, but it is not critical.
>>>>
>>> AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
>>> But the later won't.  It only need a entry to cache.
>>> Or am I missing something?
>>
>> It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.
>>
>> Cheers
>>
>>>
>>> Jianpeng Ma
>>>
>>>> How large are the decode tables ?
>>>>
>>>>> So if the final statement is to create one plugin instance per PG, I will change
>>>> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
>>>> can stay.
>>>>>
>>>>> Just need to know that ... this boils down to the fact, that encoding &
>>>> decoding should not be considered 'stateless'.
>>>>>
>>>>> Cheers Andreas.
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Loic Dachary [loic@dachary.org]
>>>>> Sent: 04 August 2014 13:56
>>>>> To: Andreas Joachim Peters
>>>>> Cc: Ma, Jianpeng; Ceph Development
>>>>> Subject: ISA erasure code plugin and cache
>>>>>
>>>>> Hi,
>>>>>
>>>>> Here is how I understand the current code:
>>>>>
>>>>> When an OSD is missing, recovery is required and the primary OSD will collect
>>>> the available chunks to do so. It will then call the decode method via
>>>> ECUtil::decode which is a small wrapper around the corresponding
>>>> ErasureCodeInterface::decode method.
>>>>>
>>>>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
>>>>>
>>>>> The ISA plugin will then use the isa_decode method to perform the work
>>>>>
>>>>>
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L212
>>>>>
>>>>> and will be repeatedly called until all objects in the PGs that were relying on
>>>> the missing OSD are recovered. To avoid computing the decoding table for each
>>>> object, it is stored in a LRU cache
>>>>>
>>>>>
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L480
>>>>>
>>>>> and copied in the stack if already there:
>>>>>
>>>>>
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L433
>>>>>
>>>>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
>>>> created:
>>>>>
>>>>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
>>>>>
>>>>> It means that data members of each ErasureCodeIsa are copied as many
>>>> times as there are PGs. If an OSD handles participates in 200 PG that belong to
>>>> an erasure coded pool configured to use ErasureCodeIsa, the data members
>>>> will be duplicated 200 times.
>>>>>
>>>>> It is good practice to make it so that the encode/decode methods of
>>>> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
>>>> side effect on the object. In the isa plugin the LRU cache storing the decode
>>>> tables is modified by the decode method and guarded by a mutex:
>>>>>
>>>>>    get
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L281
>>>>>    put
>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>> Isa.cc#L310
>>>>>
>>>>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
>>>> find problems, I just wanted to make sure I get the intention before doing so.
>>>>>
>>>>> Cheers
>>>>> --
>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>>
>>>>
>>>> --
>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>
>>
>> --
>> Lo?c Dachary, Artisan Logiciel Libre
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ISA erasure code plugin and cache
  2014-08-04 20:07                 ` Andreas Joachim Peters
@ 2014-08-04 21:26                   ` Loic Dachary
  0 siblings, 0 replies; 12+ messages in thread
From: Loic Dachary @ 2014-08-04 21:26 UTC (permalink / raw)
  To: Andreas Joachim Peters; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 10551 bytes --]

Hi Andreas,

On 04/08/2014 22:07, Andreas Joachim Peters wrote:
> Hi Loic, 
> brilliant!
> 
> That's also a good compromise to share encoding tables and avoid locking for encoding.

I'm glad you like it ;-)

> If there is no upstream change envisaged, I will prepare a pull request according to your proposal. The decoding cache can be shared for each technique, k+m are intrinsic in the key's used for lookup e.g. we would have two LRU caches if both techniques are used and one encoding table for each (k+m) value in use.
> The current setting let's one LRU cache grow to a maximum of 15 MB, however I would consider such an event as 'extremly' rare ... it is mainly triggered artificially using the benchmark tool.

I'll work on adapting the isa plugin to use the common ErasureCode class but that should have little impact on what you'll do. Feel free to ignore this and proceed with the implementation. I'll rebase if necessary once you're done.

Cheers

> Cheers Andreas.
> 
> ________________________________________
> From: Loic Dachary [loic@dachary.org]
> Sent: 04 August 2014 19:04
> To: Andreas Joachim Peters
> Cc: Ceph Development
> Subject: Re: ISA erasure code plugin and cache
> 
> Hi Andreas,
> 
> What about getting a singleton containing the LRU cache in
> 
>     https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L48
> 
> and making it a parameter to the constructor
> 
>     https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L56
> 
> or the init method
> 
>     https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L70
> 
> I assume LRU cache can only be shared if the parameters are the same because it would not make sense to share the cache of an instance using the Cauchy technique with the cache of an instance using the Vandermonde technique.
> 
> If the singleton is created in the plugin entry point:
> 
>    https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCodePluginIsa.cc#L77
> 
> there is no need to protect it against race conditions because the call to the entry point already is guarded:
> 
>    https://github.com/ceph/ceph/blob/master/src/erasure-code/ErasureCodePlugin.cc#L74
> 
> Cheers
> 
> On 04/08/2014 17:22, Andreas Joachim Peters wrote:
>> Could one not attach the EC instance to the "pg_pool_t" structure which is given by reference to PGBackend *PGBackend::build_pg_backend? Just a naive question ... i don't know the upstream code ...
>>
>> That would follow the spirit of 'saving/sharing resources' and would follow the logic that the CODEC configuration is by pool and not by PG ?!?!?
>>
>> Cheers Andreas.
>>
>>
>>
>> ________________________________________
>> From: ceph-devel-owner@vger.kernel.org [ceph-devel-owner@vger.kernel.org] on behalf of Sage Weil [sweil@redhat.com]
>> Sent: 04 August 2014 16:22
>> To: Andreas Joachim Peters
>> Cc: Loic Dachary; Ma, Jianpeng; Ceph Development
>> Subject: RE: ISA erasure code plugin and cache
>>
>> On Mon, 4 Aug 2014, Andreas Joachim Peters wrote:
>>> Yes,
>>> you are both right, it would have only the current failure situation in
>>> each cache and it stores the encoding table in each plug-in instance.
>>> Probably the sharing of cache and encoding table would add more
>>> (unwanted) thread synchronization points.
>>>
>>> If this is the final design, I might remove also the lock? Nevertheless,
>>> it has no measurable performance impact ...
>>
>> Perhaps, but I think the memory impact is significant, especially since
>> people will want to use erasure coding for "cold" storage pools are
>> underpowered hardware.  I think we should adjust the strategy so that
>> there is a single instance per pool, or (perhaps better) the cache is
>> global to the plugin (and shared across pools that may share the same
>> EC profile).
>>
>> sage
>>
>>
>>> Cheers Andreas.
>>>
>>>
>>> ________________________________________
>>> From: Loic Dachary [loic@dachary.org]
>>> Sent: 04 August 2014 14:56
>>> To: Ma, Jianpeng; Andreas Joachim Peters
>>> Cc: Ceph Development
>>> Subject: Re: ISA erasure code plugin and cache
>>>
>>> On 04/08/2014 14:50, Ma, Jianpeng wrote:
>>>>> -----Original Message-----
>>>>> From: ceph-devel-owner@vger.kernel.org
>>>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Loic Dachary
>>>>> Sent: Monday, August 4, 2014 8:37 PM
>>>>> To: Andreas Joachim Peters
>>>>> Cc: Ma, Jianpeng; Ceph Development
>>>>> Subject: Re: ISA erasure code plugin and cache
>>>>>
>>>>>
>>>>>
>>>>> On 04/08/2014 14:15, Andreas Joachim Peters wrote:> Hi Loic,
>>>>>>
>>>>>> the background relevant to your comments have (unfortunately) never been
>>>>> answered on the mailing list.
>>>>>>
>>>>>> The cache is written in a way, that it is useful for a fixed (k,m) combination
>>>>> and thread-safe.
>>>>>>
>>>>>> So, if there is one instance of the plugin per pool, it is the right
>>>>> implementation. If there is (as you write) one instance for each PG in any pool,
>>>>> it is sort of 'stupid' because the same encoding table is stored for each PG
>>>>> seperatly.
>>>>>
>>>>> I would not called it stupid ;-) Just not as efficient as if the cache was by all PGs.
>>>>> Without cache the decode table has to be calculated for each object in the
>>>>> placement group and there are a lot of objects. The table may be duplicated
>>>>> hundreds of time so it has an impact on the memory footprint, but it should not
>>>>> have a visible impact on the decode performances. An optimisation of your
>>>>> implementation to save memory would be nice, but it is not critical.
>>>>>
>>>> AFAIK, for a recovery pg, all the objects of pg have the same lost chunks. So only the first object miss the cache.
>>>> But the later won't.  It only need a entry to cache.
>>>> Or am I missing something?
>>>
>>> It also is my understanding and that's what makes the cache so useful. Now, in the long run the cache stays and grows. Since it is a few mega bytes per PG, it will eventually has a non negligible impact on a long running OSD. But again, it's nothing critical to performances.
>>>
>>> Cheers
>>>
>>>>
>>>> Jianpeng Ma
>>>>
>>>>> How large are the decode tables ?
>>>>>
>>>>>> So if the final statement is to create one plugin instance per PG, I will change
>>>>> it accordingly and shared encoding & decoding tables for a fixed (k,m), if not it
>>>>> can stay.
>>>>>>
>>>>>> Just need to know that ... this boils down to the fact, that encoding &
>>>>> decoding should not be considered 'stateless'.
>>>>>>
>>>>>> Cheers Andreas.
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Loic Dachary [loic@dachary.org]
>>>>>> Sent: 04 August 2014 13:56
>>>>>> To: Andreas Joachim Peters
>>>>>> Cc: Ma, Jianpeng; Ceph Development
>>>>>> Subject: ISA erasure code plugin and cache
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Here is how I understand the current code:
>>>>>>
>>>>>> When an OSD is missing, recovery is required and the primary OSD will collect
>>>>> the available chunks to do so. It will then call the decode method via
>>>>> ECUtil::decode which is a small wrapper around the corresponding
>>>>> ErasureCodeInterface::decode method.
>>>>>>
>>>>>>    https://github.com/ceph/ceph/blob/master/src/osd/ECBackend.cc#L361
>>>>>>
>>>>>> The ISA plugin will then use the isa_decode method to perform the work
>>>>>>
>>>>>>
>>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>>> Isa.cc#L212
>>>>>>
>>>>>> and will be repeatedly called until all objects in the PGs that were relying on
>>>>> the missing OSD are recovered. To avoid computing the decoding table for each
>>>>> object, it is stored in a LRU cache
>>>>>>
>>>>>>
>>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>>> Isa.cc#L480
>>>>>>
>>>>>> and copied in the stack if already there:
>>>>>>
>>>>>>
>>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>>> Isa.cc#L433
>>>>>>
>>>>>> Each PG has a separate instance of ErasureCodeIsa, obtained when it is
>>>>> created:
>>>>>>
>>>>>>    https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.cc#L292
>>>>>>
>>>>>> It means that data members of each ErasureCodeIsa are copied as many
>>>>> times as there are PGs. If an OSD handles participates in 200 PG that belong to
>>>>> an erasure coded pool configured to use ErasureCodeIsa, the data members
>>>>> will be duplicated 200 times.
>>>>>>
>>>>>> It is good practice to make it so that the encode/decode methods of
>>>>> ErasureCodeIsa are thread safe. In the jerasure plugin these methods have no
>>>>> side effect on the object. In the isa plugin the LRU cache storing the decode
>>>>> tables is modified by the decode method and guarded by a mutex:
>>>>>>
>>>>>>    get
>>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>>> Isa.cc#L281
>>>>>>    put
>>>>> https://github.com/ceph/ceph/blob/master/src/erasure-code/isa/ErasureCode
>>>>> Isa.cc#L310
>>>>>>
>>>>>> Please correct me if I'm mistaken ;-) I've not reviewed the code yet and try to
>>>>> find problems, I just wanted to make sure I get the intention before doing so.
>>>>>>
>>>>>> Cheers
>>>>>> --
>>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>>>
>>>>>
>>>>> --
>>>>> Lo?c Dachary, Artisan Logiciel Libre
>>>>
>>>
>>> --
>>> Lo?c Dachary, Artisan Logiciel Libre
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-04 21:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-04 11:56 ISA erasure code plugin and cache Loic Dachary
2014-08-04 12:15 ` Andreas Joachim Peters
2014-08-04 12:37   ` Loic Dachary
2014-08-04 12:43     ` Andreas Joachim Peters
2014-08-04 12:50     ` Ma, Jianpeng
2014-08-04 12:56       ` Loic Dachary
2014-08-04 13:19         ` Andreas Joachim Peters
2014-08-04 14:22           ` Sage Weil
2014-08-04 15:22             ` Andreas Joachim Peters
2014-08-04 17:04               ` Loic Dachary
2014-08-04 20:07                 ` Andreas Joachim Peters
2014-08-04 21:26                   ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.