All of lore.kernel.org
 help / color / mirror / Atom feed
* GSOC on ceph-mgr : SMARTER SMARTER REWEIGHT-BY-UTILIZATION
@ 2017-05-08  5:45 Spandan Kumar Sahu
  2017-05-08 10:40 ` kefu chai
  0 siblings, 1 reply; 5+ messages in thread
From: Spandan Kumar Sahu @ 2017-05-08  5:45 UTC (permalink / raw)
  To: Ceph Development, kefu chai

Hey evryone

My name is Spandan Kumar Sahu, a second year undergraduate student
from Indian Institute of Technology, Kharagpur (India), pursuing
Bachelor of Technology in Computer Science and Engineering.

It is my pleasure to have been selected under the GSoC program. This
[1] is my proposal. I have also included an example of its working.
[2]

As a start, Kefu Chai, suggested me to document the problem under
doc/dev and attempt to put together as many documents regarding
reweight, as possible.

I would really appreciate if anyone can go through the proposal, and
suggest me changes/problems.

Thanks

Spandan Kumar Sahu
IIT Kharagpur

[1] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/GSoCProposalSMARTERREWEIGHT-BY-UTILIZATION.pdf
[2] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/Readme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSOC on ceph-mgr : SMARTER SMARTER REWEIGHT-BY-UTILIZATION
  2017-05-08  5:45 GSOC on ceph-mgr : SMARTER SMARTER REWEIGHT-BY-UTILIZATION Spandan Kumar Sahu
@ 2017-05-08 10:40 ` kefu chai
  2017-05-08 15:03   ` Spandan Kumar Sahu
  0 siblings, 1 reply; 5+ messages in thread
From: kefu chai @ 2017-05-08 10:40 UTC (permalink / raw)
  To: Spandan Kumar Sahu; +Cc: Ceph Development

On Mon, May 8, 2017 at 1:45 PM, Spandan Kumar Sahu
<spandankumarsahu@gmail.com> wrote:
> Hey evryone
>
> My name is Spandan Kumar Sahu, a second year undergraduate student
> from Indian Institute of Technology, Kharagpur (India), pursuing
> Bachelor of Technology in Computer Science and Engineering.


Welcome to this community, Spandan!

>
> It is my pleasure to have been selected under the GSoC program. This
> [1] is my proposal. I have also included an example of its working.
> [2]
>
> As a start, Kefu Chai, suggested me to document the problem under
> doc/dev and attempt to put together as many documents regarding
> reweight, as possible.

I think it would serve a good reference for whomever interested in
this topic in future. we can start by maintaining a markdown document
in your ceph repo, and when it's ready for review you can send a pull
request from it.

>
> I would really appreciate if anyone can go through the proposal, and
> suggest me changes/problems.

i like your idea of applying PID to ceph. but i am not sure if the PID
algorithm applies to Ceph. or put in other words, is a Ceph cluster a
linear system? what is it's transfer function? does it satisfy the
Nyquist stability criterion? if not, how can we determine its
stability? as it's always the most difficult part to tune the PID
controller parameters when designing a PID based control system.

instead, i think Ceph is a stochastic process. as discussed in another
thread (with the title of "crush multipick anomaly") in this mailing
list.

>
> Thanks
>
> Spandan Kumar Sahu
> IIT Kharagpur
>
> [1] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/GSoCProposalSMARTERREWEIGHT-BY-UTILIZATION.pdf
> [2] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/Readme



-- 
Regards
Kefu Chai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSOC on ceph-mgr : SMARTER SMARTER REWEIGHT-BY-UTILIZATION
  2017-05-08 10:40 ` kefu chai
@ 2017-05-08 15:03   ` Spandan Kumar Sahu
  2017-05-09  1:43     ` Spandan Kumar Sahu
  0 siblings, 1 reply; 5+ messages in thread
From: Spandan Kumar Sahu @ 2017-05-08 15:03 UTC (permalink / raw)
  To: kefu chai; +Cc: Ceph Development

On Mon, May 8, 2017 at 4:10 PM, kefu chai <tchaikov@gmail.com> wrote:
> On Mon, May 8, 2017 at 1:45 PM, Spandan Kumar Sahu
> <spandankumarsahu@gmail.com> wrote:
>> Hey evryone
>>
>> My name is Spandan Kumar Sahu, a second year undergraduate student
>> from Indian Institute of Technology, Kharagpur (India), pursuing
>> Bachelor of Technology in Computer Science and Engineering.
>
>
> Welcome to this community, Spandan!
>
>>
>> It is my pleasure to have been selected under the GSoC program. This
>> [1] is my proposal. I have also included an example of its working.
>> [2]
>>
>> As a start, Kefu Chai, suggested me to document the problem under
>> doc/dev and attempt to put together as many documents regarding
>> reweight, as possible.
>
> I think it would serve a good reference for whomever interested in
> this topic in future. we can start by maintaining a markdown document
> in your ceph repo, and when it's ready for review you can send a pull
> request from it.
>
I am currently working on it. I will send a PR soon.

>>
>> I would really appreciate if anyone can go through the proposal, and
>> suggest me changes/problems.
>
> i like your idea of applying PID to ceph. but i am not sure if the PID
> algorithm applies to Ceph. or put in other words, is a Ceph cluster a
> linear system? what is it's transfer function? does it satisfy the
> Nyquist stability criterion? if not, how can we determine its
> stability? as it's always the most difficult part to tune the PID
> controller parameters when designing a PID based control system.
>
The Ceph cluster is a stochastic process, under a short number of
trials ( or runs/iterations). However, when the trials are made in
reasonably large number, the weight distribution is linearly
proportional to the weight of the OSD, (if the anomaly were not to
happen). Hence, the weight distribution is linear in terms of weight
of the OSD for reasonable number of iterations.
Under normal conditions, the "load percentage the OSD handles" is an
"expected" linear function of the "ratio of the OSD's weight to the
total weight".

> instead, i think Ceph is a stochastic process. as discussed in another
> thread (with the title of "crush multipick anomaly") in this mailing
> list.
>
Also, should I start with the first part of the project, as to how to
analyse a reweight algorithm?

>>
>> Thanks
>>
>> Spandan Kumar Sahu
>> IIT Kharagpur
>>
>> [1] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/GSoCProposalSMARTERREWEIGHT-BY-UTILIZATION.pdf
>> [2] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/Readme
>
>
>
> --
> Regards
> Kefu Chai



-- 
Spandan Kumar Sahu
IIT Kharagpur

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSOC on ceph-mgr : SMARTER SMARTER REWEIGHT-BY-UTILIZATION
  2017-05-08 15:03   ` Spandan Kumar Sahu
@ 2017-05-09  1:43     ` Spandan Kumar Sahu
  2017-05-11  7:18       ` kefu chai
  0 siblings, 1 reply; 5+ messages in thread
From: Spandan Kumar Sahu @ 2017-05-09  1:43 UTC (permalink / raw)
  To: kefu chai; +Cc: Ceph Development

On Mon, May 8, 2017 at 8:33 PM, Spandan Kumar Sahu
<spandankumarsahu@gmail.com> wrote:
> On Mon, May 8, 2017 at 4:10 PM, kefu chai <tchaikov@gmail.com> wrote:
>> On Mon, May 8, 2017 at 1:45 PM, Spandan Kumar Sahu
>> <spandankumarsahu@gmail.com> wrote:
>>> Hey evryone
>>>
>>> My name is Spandan Kumar Sahu, a second year undergraduate student
>>> from Indian Institute of Technology, Kharagpur (India), pursuing
>>> Bachelor of Technology in Computer Science and Engineering.
>>
>>
>> Welcome to this community, Spandan!
>>
>>>
>>> It is my pleasure to have been selected under the GSoC program. This
>>> [1] is my proposal. I have also included an example of its working.
>>> [2]
>>>
>>> As a start, Kefu Chai, suggested me to document the problem under
>>> doc/dev and attempt to put together as many documents regarding
>>> reweight, as possible.
>>
>> I think it would serve a good reference for whomever interested in
>> this topic in future. we can start by maintaining a markdown document
>> in your ceph repo, and when it's ready for review you can send a pull
>> request from it.
>>
> I am currently working on it. I will send a PR soon.
>
>>>
>>> I would really appreciate if anyone can go through the proposal, and
>>> suggest me changes/problems.
>>
>> i like your idea of applying PID to ceph. but i am not sure if the PID
>> algorithm applies to Ceph. or put in other words, is a Ceph cluster a
>> linear system? what is it's transfer function? does it satisfy the
>> Nyquist stability criterion? if not, how can we determine its
>> stability? as it's always the most difficult part to tune the PID
>> controller parameters when designing a PID based control system.
>>
> The Ceph cluster is a stochastic process, under a short number of
> trials ( or runs/iterations). However, when the trials are made in
> reasonably large number, the weight distribution is linearly
> proportional to the weight of the OSD, (if the anomaly were not to
> happen). Hence, the weight distribution is linear in terms of weight
> of the OSD for reasonable number of iterations.
> Under normal conditions, the "load percentage the OSD handles" is an
> "expected" linear function of the "ratio of the OSD's weight to the
> total weight".
>
The transfer function and Nyquist stability criterion are difficult to
determine because the cluster behaves stochastic-ally for small
numbers. The stability can be determined by the difference in the
current load percentage of the OSD and the expected load percentage.

As an example, Loic did implement a simpler version of the PID, and
gained significant improvement.This is his algorithm :
" - Distribute the desired number of PGs
    - Subtract 1% of the weight of the OSD that is the most over used
    - Add the subtracted weight to the OSD that is the most under used
    - Repeat until the Kullback–Leibler divergence[8] is small enough
" (Discussed on " revisiting uneven Crush" in the mailing list)

So, basically, he was more or less implementing only the Proportional
(P) part of the PID system and only for the most and least used OSDs.
This was the performance gain he obtained :
" In all tests the situation improves at least by an order of
magnitude. For instance when there is a 30% difference between two
OSDs, it is down to less than 3% after optimization. "

This made me hopeful that we can go ahead with a stronger form of PID
and expect a better optimisation.

I understand that tuning the PID is the most challenging part. But
there are various PID tuning algorithms, and there are certain PID
tuners, one of which I have worked on and have included in my
proposal.

>> instead, i think Ceph is a stochastic process. as discussed in another
>> thread (with the title of "crush multipick anomaly") in this mailing
>> list.
>>
> Also, should I start with the first part of the project, as to how to
> analyse a reweight algorithm?
>
>>>
>>> Thanks
>>>
>>> Spandan Kumar Sahu
>>> IIT Kharagpur
>>>
>>> [1] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/GSoCProposalSMARTERREWEIGHT-BY-UTILIZATION.pdf
>>> [2] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/Readme
>>
>>
>>
>> --
>> Regards
>> Kefu Chai
>
>
>
> --
> Spandan Kumar Sahu
> IIT Kharagpur



-- 
Spandan Kumar Sahu
IIT Kharagpur

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSOC on ceph-mgr : SMARTER SMARTER REWEIGHT-BY-UTILIZATION
  2017-05-09  1:43     ` Spandan Kumar Sahu
@ 2017-05-11  7:18       ` kefu chai
  0 siblings, 0 replies; 5+ messages in thread
From: kefu chai @ 2017-05-11  7:18 UTC (permalink / raw)
  To: Spandan Kumar Sahu; +Cc: Ceph Development

On Tue, May 9, 2017 at 9:43 AM, Spandan Kumar Sahu
<spandankumarsahu@gmail.com> wrote:
>>
>>>>
>>>> I would really appreciate if anyone can go through the proposal, and
>>>> suggest me changes/problems.
>>>
>>> i like your idea of applying PID to ceph. but i am not sure if the PID
>>> algorithm applies to Ceph. or put in other words, is a Ceph cluster a
>>> linear system? what is it's transfer function? does it satisfy the
>>> Nyquist stability criterion? if not, how can we determine its
>>> stability? as it's always the most difficult part to tune the PID
>>> controller parameters when designing a PID based control system.
>>>
>> The Ceph cluster is a stochastic process, under a short number of
>> trials ( or runs/iterations). However, when the trials are made in
>> reasonably large number, the weight distribution is linearly
>> proportional to the weight of the OSD, (if the anomaly were not to

the weight is a *distribution*, but not a function of time. could you
explain a little bit its input and output using the model explained by
https://en.wikipedia.org/wiki/Linear_system ?

>> happen). Hence, the weight distribution is linear in terms of weight

the function of "distribution" does not have the property of
superposition, that's the crux of the problem we are facing. we
assumed that it was, but it turns out to be not. that's why Loïc and
other developers are trying to use some optimization methods to reach
an optimum solution.

>> of the OSD for reasonable number of iterations.
>> Under normal conditions, the "load percentage the OSD handles" is an
>> "expected" linear function of the "ratio of the OSD's weight to the
>> total weight".
>>
> The transfer function and Nyquist stability criterion are difficult to
> determine because the cluster behaves stochastic-ally for small
> numbers. The stability can be determined by the difference in the
> current load percentage of the OSD and the expected load percentage.

but you mentioned above that it is a linear system if the number of
trials is large enough. so let's assume that we have large number of
trials.

>
> As an example, Loic did implement a simpler version of the PID, and
> gained significant improvement.This is his algorithm :
> " - Distribute the desired number of PGs
>     - Subtract 1% of the weight of the OSD that is the most over used
>     - Add the subtracted weight to the OSD that is the most under used
>     - Repeat until the Kullback–Leibler divergence[8] is small enough
> " (Discussed on " revisiting uneven Crush" in the mailing list)

this is an iterative method to approach the optimum weights.

>
> So, basically, he was more or less implementing only the Proportional
> (P) part of the PID system and only for the most and least used OSDs.
> This was the performance gain he obtained :
> " In all tests the situation improves at least by an order of
> magnitude. For instance when there is a 30% difference between two
> OSDs, it is down to less than 3% after optimization. "
>
> This made me hopeful that we can go ahead with a stronger form of PID
> and expect a better optimisation.
>
> I understand that tuning the PID is the most challenging part. But
> there are various PID tuning algorithms, and there are certain PID
> tuners, one of which I have worked on and have included in my
> proposal.

there are.  but i think PID has its limitation.

>
>>> instead, i think Ceph is a stochastic process. as discussed in another
>>> thread (with the title of "crush multipick anomaly") in this mailing
>>> list.
>>>
>> Also, should I start with the first part of the project, as to how to
>> analyse a reweight algorithm?

yes! please see Loïc's crush tool it offers a subcommand "analyze"
which will show how the PG is distributed among OSDs. the more even
the better.

>>
>>>>
>>>> Thanks
>>>>
>>>> Spandan Kumar Sahu
>>>> IIT Kharagpur
>>>>
>>>> [1] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/GSoCProposalSMARTERREWEIGHT-BY-UTILIZATION.pdf
>>>> [2] : https://github.com/SpandanKumarSahu/Ceph_Proposal/blob/master/Readme
>>>
>>>
>>>
>>> --
>>> Regards
>>> Kefu Chai
>>
>>
>>
>> --
>> Spandan Kumar Sahu
>> IIT Kharagpur
>
>
>
> --
> Spandan Kumar Sahu
> IIT Kharagpur



-- 
Regards
Kefu Chai

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-11  7:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-08  5:45 GSOC on ceph-mgr : SMARTER SMARTER REWEIGHT-BY-UTILIZATION Spandan Kumar Sahu
2017-05-08 10:40 ` kefu chai
2017-05-08 15:03   ` Spandan Kumar Sahu
2017-05-09  1:43     ` Spandan Kumar Sahu
2017-05-11  7:18       ` kefu chai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.