All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Beta testing crush optimization
@ 2017-05-31  9:52 han vincent
  0 siblings, 0 replies; 24+ messages in thread
From: han vincent @ 2017-05-31  9:52 UTC (permalink / raw)
  To: ceph-devel; +Cc: loic

hello, loic:

I had a cluster build with hammer 0.94.10, then I used the following
commands to change the algorithm from "straw" to "straw2".
1. ceph osd crush tunables hammer
2. ceph osd getcrushmap -o /tmp/cmap
3. crushtool -d /tmp/cmap -o /tmp/cmap.txt
4. vim /tmp/cmap.txt and change the algorithm of each bucket from
"straw" to "straw2"
5. crushtool -c /tmp/cmap.txt -o /tmp/cmap
6. ceph osd setcrushmap -i /tmp/cmap
7. ceph osd crush reweight-all
after that, I used "python crush" to optimize the cluster, the version
of "python crush" is 1.0.32

1. ceph report > report.json
2. crush optimize --crushmap report.json --out-path optimized.crush
Unfortunately, there was an error in the output:

2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]
2017-05-30 18:48:01,838 49.3af map to [9, 2] instead of [9, 3]
2017-05-30 18:48:01,838 49.e3 map to [6, 4] instead of [6, 5]
2017-05-30 18:48:01,838 49.e1 map to [7, 2] instead of [7, 3]
2017-05-30 18:48:01,838 49.e0 map to [5, 1] instead of [5, 0]
2017-05-30 18:48:01,838 49.20d map to [3, 1] instead of [3, 0]
2017-05-30 18:48:01,838 49.20c map to [2, 9] instead of [2, 8]
2017-05-30 18:48:01,838 49.36e map to [6, 1] instead of [6, 0]
......

Traceback (most recent call last):
 File "/usr/bin/crush", line 25, in <module>
sys.exit(Ceph().main(sys.argv[1:]))
 File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, in main
return self.constructor(argv).run()
 File "/usr/lib64/python2.7/site-packages/crush/optimize.py", line 373, in run
crushmap = self.main.convert_to_crushmap(self.args.crushmap)
 File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
line 690, in convert_to_crushmap
c.parse(crushmap)
 File "/usr/lib64/python2.7/site-packages/crush/__init__.py", line 138, in parse
return self.parse_crushmap(self._convert_to_crushmap(something))
 File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
line 416, in _convert_to_crushmap
crushmap = CephReport().parse_report(something)
 File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
line 137, in parse_report
raise MappingError("some mapping failed, please file a bug at "
crush.ceph.MappingError: some mapping failed, please file a bug at
http://libcrush.org/main/python-crush/issues/new
Do you know what the problem is? can you help me? I would be very
grateful to you.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-06  9:02                                   ` han vincent
@ 2017-06-06 13:58                                     ` Loic Dachary
  0 siblings, 0 replies; 24+ messages in thread
From: Loic Dachary @ 2017-06-06 13:58 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development

Hi,

A new version of python-crush was published (1.0.34), please upgrade to get the fixes. Thanks again for your patience. It's quite interesting to see that although your RGW cluster have many pools, the vast majority of the data is in the rgw.bucket pool alone. It makes it possible to rebalance the cluster simply by rebalancing this pool.

On 06/06/2017 11:02 AM, han vincent wrote:
> Hi, loic:
>    I saw you had published a new version(v1.0.33) of "python crush" .
> And then I test it used my crushmap.
>    I am glad to tell you that the "optimize" command have worked well.
> when I used the following command such as:
> 
>    crush optimize --crushmap report.json --out-path optimized.crush
> --rule replicated_ruleset --pool 49
> 
>    There was no error output anymore. And after that I import the
> optimized crushmap to my cluster, after rebalancing all the OSDs are
> +-0.9 over/under filled.
> 
>    when I used the "compare" command, there was an error in output:
>    crush compare --rule replicated_ruleset --replication-count 2
> --origin /tmp/report.json --destination optimized.crush

You need to specify --destination-choose-args for the optimized crushmap:

$ crush compare --origin /tmp/report.json --pool 49 --destination /tmp/optimized.crush --destination-choose-args 49
There are 2048 PGs.

Replacing the crushmap specified with --origin with the crushmap
specified with --destination will move 75 PGs (3.662109375% of the total)
from one item to another.

The rows below show the number of PGs moved from the given
item to each item named in the columns. The PGs% at the
end of the rows shows the percentage of the total number
of PGs that is moved away from this particular item. The
last row shows the percentage of the total number of PGs
that is moved to the item named in the column.

       osd.0  osd.1  osd.2  osd.3  osd.4  osd.5  osd.6  osd.7  osd.8  osd.9   PGs%
osd.0      0      9      0      0      0      1      0      0      0      1  0.54%
osd.1      0      0      2      5      0      0      1      2      0      1  0.54%
osd.2      0      1      0      1      0      0      0      0      0      0  0.10%
osd.3      0      0      7      0      0      0      0      0      0      0  0.34%
osd.4      2      0      1      3      0      2      0      3      0      0  0.54%
osd.5      2      1      4      0     10      0      0      0      1      1  0.93%
osd.6      0      0      0      0      0      0      0      0      1      0  0.05%
osd.7      0      0      0      2      0      0      7      0      1      0  0.49%
osd.8      1      0      1      0      0      0      0      0      0      1  0.15%
PGs%   0.24%  0.54%  0.73%  0.54%  0.49%  0.15%  0.39%  0.24%  0.15%  0.20%  3.66%

> 
>    Traceback (most recent call last):
>    File "/usr/bin/crush", line 25, in <module>
>     sys.exit(Ceph().main(sys.argv[1:]))
>    File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, in main
>     return self.constructor(argv).run()
>    File "/usr/lib64/python2.7/site-packages/crush/compare.py", line 327, in run
>     self.run_compare()
>    File "/usr/lib64/python2.7/site-packages/crush/compare.py", line
> 332, in run_compare
>     self.set_destination_crushmap(self.args.destination)
>    File "/usr/lib64/python2.7/site-packages/crush/compare.py", line
> 59, in set_destination_crushmap
>     d.parse(self.main.convert_to_crushmap(destination))
>    File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
> line 739, in convert_to_crushmap
>     self.set_compat_choose_args(c, crushmap, choose_args_name)
>    File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
> line 716, in set_compat_choose_args
>     assert choose_args_name
>    AssertionError
> 
>    After that, I got the report of my cluster and "optimize" it once
> again, there still have an error in output:
>   ceph report > report.new.json
>   crush optimize --crushmap report.json --out-path optimized.crush
> --rule replicated_ruleset --pool 49
>   [root@node-4 ~]# crush optimize --crushmap report.json --out-path
> optimized.crush --rule replicated_ruleset --pool 49
> Traceback (most recent call last):
>   File "/usr/bin/crush", line 25, in <module>
>     sys.exit(Ceph().main(sys.argv[1:]))
>   File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, in main
>     return self.constructor(argv).run()
>   File "/usr/lib64/python2.7/site-packages/crush/optimize.py", line 373, in run
>     crushmap = self.main.convert_to_crushmap(self.args.crushmap)
>   File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
> line 731, in convert_to_crushmap
>     self.set_analyze_args(crushmap)
>   File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
> line 657, in set_analyze_args
>     compat_pool = self.get_compat_choose_args(crushmap)
>   File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
> line 645, in get_compat_choose_args
>     assert 1 == len(crushmap['private']['pools'])
> AssertionError

This bug was fixed in the latest release:

$ crush optimize --crushmap /tmp/report.new.json --pool 49 --out-path /tmp/optimized.crush
2017-06-06 15:23:27,649 argv = optimize --crushmap /tmp/report.new.json --pool 49 --out-path /tmp/optimized.crush --pool=49 --choose-args=49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --out-version=h --no-positions
2017-06-06 15:23:27,672 default optimizing
2017-06-06 15:23:27,804 default already optimized
2017-06-06 15:23:27,809 node-7v optimizing
2017-06-06 15:23:27,809 node-5v optimizing
2017-06-06 15:23:27,812 node-4 optimizing
2017-06-06 15:23:27,813 node-8v optimizing
2017-06-06 15:23:27,816 node-6v optimizing
2017-06-06 15:23:27,989 node-7v already optimized
2017-06-06 15:23:27,991 node-6v already optimized
2017-06-06 15:23:28,026 node-4 already optimized
2017-06-06 15:23:28,554 node-5v already optimized
2017-06-06 15:23:28,556 node-8v already optimized

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-02  6:40                                 ` Loic Dachary
  2017-06-02  9:28                                   ` han vincent
@ 2017-06-06  9:02                                   ` han vincent
  2017-06-06 13:58                                     ` Loic Dachary
  1 sibling, 1 reply; 24+ messages in thread
From: han vincent @ 2017-06-06  9:02 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

Hi, loic:
   I saw you had published a new version(v1.0.33) of "python crush" .
And then I test it used my crushmap.
   I am glad to tell you that the "optimize" command have worked well.
when I used the following command such as:

   crush optimize --crushmap report.json --out-path optimized.crush
--rule replicated_ruleset --pool 49

   There was no error output anymore. And after that I import the
optimized crushmap to my cluster, after rebalancing all the OSDs are
+-0.9 over/under filled.

   when I used the "compare" command, there was an error in output:
   crush compare --rule replicated_ruleset --replication-count 2
--origin /tmp/report.json --destination optimized.crush

   Traceback (most recent call last):
   File "/usr/bin/crush", line 25, in <module>
    sys.exit(Ceph().main(sys.argv[1:]))
   File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, in main
    return self.constructor(argv).run()
   File "/usr/lib64/python2.7/site-packages/crush/compare.py", line 327, in run
    self.run_compare()
   File "/usr/lib64/python2.7/site-packages/crush/compare.py", line
332, in run_compare
    self.set_destination_crushmap(self.args.destination)
   File "/usr/lib64/python2.7/site-packages/crush/compare.py", line
59, in set_destination_crushmap
    d.parse(self.main.convert_to_crushmap(destination))
   File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
line 739, in convert_to_crushmap
    self.set_compat_choose_args(c, crushmap, choose_args_name)
   File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
line 716, in set_compat_choose_args
    assert choose_args_name
   AssertionError

   After that, I got the report of my cluster and "optimize" it once
again, there still have an error in output:
  ceph report > report.new.json
  crush optimize --crushmap report.json --out-path optimized.crush
--rule replicated_ruleset --pool 49
  [root@node-4 ~]# crush optimize --crushmap report.json --out-path
optimized.crush --rule replicated_ruleset --pool 49
Traceback (most recent call last):
  File "/usr/bin/crush", line 25, in <module>
    sys.exit(Ceph().main(sys.argv[1:]))
  File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, in main
    return self.constructor(argv).run()
  File "/usr/lib64/python2.7/site-packages/crush/optimize.py", line 373, in run
    crushmap = self.main.convert_to_crushmap(self.args.crushmap)
  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
line 731, in convert_to_crushmap
    self.set_analyze_args(crushmap)
  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
line 657, in set_analyze_args
    compat_pool = self.get_compat_choose_args(crushmap)
  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py",
line 645, in get_compat_choose_args
    assert 1 == len(crushmap['private']['pools'])
AssertionError

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-02  6:40                                 ` Loic Dachary
@ 2017-06-02  9:28                                   ` han vincent
  2017-06-06  9:02                                   ` han vincent
  1 sibling, 0 replies; 24+ messages in thread
From: han vincent @ 2017-06-02  9:28 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

thank you for you advice, I will do this change in the following
scenarios: business downturn, limit the speed of migration , or any
other experiment
however, after do that, i will send you a mail.

2017-06-02 14:40 GMT+08:00 Loic Dachary <loic@dachary.org>:
> Hi,
>
> For the record, here is what could be rebalanced (after changing to straw2). The pool 152 contains the bulk of the data, the other pools contain very little data and it does not matter if they are unbalanced. The cluster shows hosts +-5% over/under filled and OSDs at most 21% over filled and at most 16% under filled. After rebalancing the hosts are +- 0.1% over/under filled and the OSDs are +-1.5% over/under filled.
>
> Cheers
>
> On 06/02/2017 08:20 AM, Loic Dachary wrote:
>>
>>
>> On 06/02/2017 05:15 AM, han vincent wrote:
>>> Hmm, I forgot to change to straw2.
>>> My cluster is too large, and its total capacity is 1658TB, 808TB is used.
>>> I am afraid I can not do this change, as the change will cause lots of
>>> data to migrate.
>>
>> Each PG in pool 152 has 8GB * 3 replica = 24GB and 16 of them will move (out of the 32768). It means at most 384GB will move. In reality this will be less because you can observe that all remapped PGs have at least one OSD in common and a number of them have two OSD in common.
>>
>> You defined the failure domain of your cluster to be the host. Each host (38) contains ~900 PG which is 10 times more.
>>
>> That being said, I understand you don't want to disturb your cluster. Thank you for the time you spent discussing rebalancing, it was extremely valuable.
>>
>> Cheers
>>
>>
>>> 2017-06-01 20:49 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>
>>>>
>>>> On 06/01/2017 02:32 PM, han vincent wrote:
>>>>> OK, but i do not want to upgrade my cluster to Luminous. There are
>>>>> lots of data in my cluster and it have run stable for nearly one year.
>>>>> I think the risk of upgrade to Luminous will be relatively large.
>>>>
>>>> In that case the first step is to move from straw to straw2. It will modify the following mappings:
>>>>
>>>> 2017-06-01 14:26:12,675 152.213a map to [432, 271, 451] instead of [432, 324, 275]
>>>> 2017-06-01 14:26:12,693 152.7115 map to [334, 359, 0] instead of [334, 359, 229]
>>>> 2017-06-01 14:26:12,715 152.1867 map to [416, 368, 433] instead of [416, 204, 19]
>>>> 2017-06-01 14:26:12,741 152.6e3e map to [11, 161, 67] instead of [372, 161, 67]
>>>> 2017-06-01 14:26:12,745 152.3385 map to [430, 325, 35] instead of [430, 325, 387]
>>>> 2017-06-01 14:26:12,747 152.5c2c map to [303, 351, 0] instead of [303, 351, 90]
>>>> 2017-06-01 14:26:12,777 152.d27 map to [171, 133, 52] instead of [171, 133, 63]
>>>> 2017-06-01 14:26:12,780 152.c3d map to [235, 366, 83] instead of [235, 370, 83]
>>>> 2017-06-01 14:26:12,826 152.54ad map to [86, 298, 92] instead of [86, 298, 318]
>>>> 2017-06-01 14:26:12,832 152.18cb map to [17, 389, 97] instead of [17, 439, 97]
>>>> 2017-06-01 14:26:12,852 152.716 map to [93, 37, 306] instead of [318, 37, 306]
>>>> 2017-06-01 14:26:12,866 152.640e map to [220, 21, 120] instead of [220, 397, 120]
>>>> 2017-06-01 14:26:12,883 152.6a3c map to [328, 245, 223] instead of [328, 347, 223]
>>>> 2017-06-01 14:26:12,946 152.1bb6 map to [110, 302, 318] instead of [110, 207, 318]
>>>> 2017-06-01 14:26:12,957 152.3366 map to [128, 78, 75] instead of [128, 78, 327]
>>>> 2017-06-01 14:26:13,005 153.1a98 map to [320, 22, 180] instead of [320, 22, 181]
>>>> 2017-06-01 14:26:13,026 153.5cf map to [9, 418, 344] instead of [9, 418, 435]
>>>> 2017-06-01 14:26:13,069 153.f5d map to [168, 445, 7] instead of [168, 445, 236]
>>>>
>>>>>
>>>>> 2017-06-01 20:23 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>
>>>>>>
>>>>>> On 06/01/2017 02:17 PM, han vincent wrote:
>>>>>>> you can get the crushmap from
>>>>>>> https://drive.google.com/open?id=0B5Kd4hBGEUvnUU9BcWY1NWtWNHM.
>>>>>>
>>>>>> Got it. You may want to remove it now as you probably don't want to expose all the informations it contains to the general public.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>>
>>>>>>> 2017-06-01 20:08 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/01/2017 01:52 PM, han vincent wrote:
>>>>>>>>> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/01/2017 01:38 PM, han vincent wrote:
>>>>>>>>>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>>>>>>>>>> Hi loic:
>>>>>>>>>>>>>    I still have two questions, in the following to commands:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>>>>>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>>>>>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>>>>>>>>>
>>>>>>>>>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>>>>>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>>>>>>>>>
>>>>>>>>>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>>>>>>>>>
>>>>>>>>>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>>>>>
>>>>>>>>>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>>>>>>>>>> there any way to optimize the cluster?
>>>>>>>>>>
>>>>>>>>>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>>>
>>>>>>>>> the crushmap of which cluster I send to you is in lab environment, I
>>>>>>>>> have a much bigger cluster in production.
>>>>>>>>> I will send the crushmap to you later, could you help me to optimize
>>>>>>>>> this cluster?
>>>>>>>>
>>>>>>>> It is an interesting use case, I will help.
>>>>>>>>
>>>>>>>>> If you have detailed steps, please send it to me.
>>>>>>>>
>>>>>>>> Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> --
>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>
>>
>
> --
> Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-02  6:20                               ` Loic Dachary
@ 2017-06-02  6:40                                 ` Loic Dachary
  2017-06-02  9:28                                   ` han vincent
  2017-06-06  9:02                                   ` han vincent
  0 siblings, 2 replies; 24+ messages in thread
From: Loic Dachary @ 2017-06-02  6:40 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development

Hi,

For the record, here is what could be rebalanced (after changing to straw2). The pool 152 contains the bulk of the data, the other pools contain very little data and it does not matter if they are unbalanced. The cluster shows hosts +-5% over/under filled and OSDs at most 21% over filled and at most 16% under filled. After rebalancing the hosts are +- 0.1% over/under filled and the OSDs are +-1.5% over/under filled. 

Cheers

On 06/02/2017 08:20 AM, Loic Dachary wrote:
> 
> 
> On 06/02/2017 05:15 AM, han vincent wrote:
>> Hmm, I forgot to change to straw2.
>> My cluster is too large, and its total capacity is 1658TB, 808TB is used.
>> I am afraid I can not do this change, as the change will cause lots of
>> data to migrate.
> 
> Each PG in pool 152 has 8GB * 3 replica = 24GB and 16 of them will move (out of the 32768). It means at most 384GB will move. In reality this will be less because you can observe that all remapped PGs have at least one OSD in common and a number of them have two OSD in common.
> 
> You defined the failure domain of your cluster to be the host. Each host (38) contains ~900 PG which is 10 times more.
> 
> That being said, I understand you don't want to disturb your cluster. Thank you for the time you spent discussing rebalancing, it was extremely valuable.
> 
> Cheers
> 
> 
>> 2017-06-01 20:49 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>
>>>
>>> On 06/01/2017 02:32 PM, han vincent wrote:
>>>> OK, but i do not want to upgrade my cluster to Luminous. There are
>>>> lots of data in my cluster and it have run stable for nearly one year.
>>>> I think the risk of upgrade to Luminous will be relatively large.
>>>
>>> In that case the first step is to move from straw to straw2. It will modify the following mappings:
>>>
>>> 2017-06-01 14:26:12,675 152.213a map to [432, 271, 451] instead of [432, 324, 275]
>>> 2017-06-01 14:26:12,693 152.7115 map to [334, 359, 0] instead of [334, 359, 229]
>>> 2017-06-01 14:26:12,715 152.1867 map to [416, 368, 433] instead of [416, 204, 19]
>>> 2017-06-01 14:26:12,741 152.6e3e map to [11, 161, 67] instead of [372, 161, 67]
>>> 2017-06-01 14:26:12,745 152.3385 map to [430, 325, 35] instead of [430, 325, 387]
>>> 2017-06-01 14:26:12,747 152.5c2c map to [303, 351, 0] instead of [303, 351, 90]
>>> 2017-06-01 14:26:12,777 152.d27 map to [171, 133, 52] instead of [171, 133, 63]
>>> 2017-06-01 14:26:12,780 152.c3d map to [235, 366, 83] instead of [235, 370, 83]
>>> 2017-06-01 14:26:12,826 152.54ad map to [86, 298, 92] instead of [86, 298, 318]
>>> 2017-06-01 14:26:12,832 152.18cb map to [17, 389, 97] instead of [17, 439, 97]
>>> 2017-06-01 14:26:12,852 152.716 map to [93, 37, 306] instead of [318, 37, 306]
>>> 2017-06-01 14:26:12,866 152.640e map to [220, 21, 120] instead of [220, 397, 120]
>>> 2017-06-01 14:26:12,883 152.6a3c map to [328, 245, 223] instead of [328, 347, 223]
>>> 2017-06-01 14:26:12,946 152.1bb6 map to [110, 302, 318] instead of [110, 207, 318]
>>> 2017-06-01 14:26:12,957 152.3366 map to [128, 78, 75] instead of [128, 78, 327]
>>> 2017-06-01 14:26:13,005 153.1a98 map to [320, 22, 180] instead of [320, 22, 181]
>>> 2017-06-01 14:26:13,026 153.5cf map to [9, 418, 344] instead of [9, 418, 435]
>>> 2017-06-01 14:26:13,069 153.f5d map to [168, 445, 7] instead of [168, 445, 236]
>>>
>>>>
>>>> 2017-06-01 20:23 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>
>>>>>
>>>>> On 06/01/2017 02:17 PM, han vincent wrote:
>>>>>> you can get the crushmap from
>>>>>> https://drive.google.com/open?id=0B5Kd4hBGEUvnUU9BcWY1NWtWNHM.
>>>>>
>>>>> Got it. You may want to remove it now as you probably don't want to expose all the informations it contains to the general public.
>>>>>
>>>>> Cheers
>>>>>
>>>>>>
>>>>>> 2017-06-01 20:08 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>
>>>>>>>
>>>>>>> On 06/01/2017 01:52 PM, han vincent wrote:
>>>>>>>> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/01/2017 01:38 PM, han vincent wrote:
>>>>>>>>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>>>>>>>>> Hi loic:
>>>>>>>>>>>>    I still have two questions, in the following to commands:
>>>>>>>>>>>>
>>>>>>>>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>>>>>>>>
>>>>>>>>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>>>>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>>>>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>>>>>>>>
>>>>>>>>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>>>>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>>>>>>>>
>>>>>>>>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>>>>>>>>
>>>>>>>>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>>>>
>>>>>>>>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>>>>>>>>> there any way to optimize the cluster?
>>>>>>>>>
>>>>>>>>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>>
>>>>>>>> the crushmap of which cluster I send to you is in lab environment, I
>>>>>>>> have a much bigger cluster in production.
>>>>>>>> I will send the crushmap to you later, could you help me to optimize
>>>>>>>> this cluster?
>>>>>>>
>>>>>>> It is an interesting use case, I will help.
>>>>>>>
>>>>>>>> If you have detailed steps, please send it to me.
>>>>>>>
>>>>>>> Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> --
>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
>>>>> --
>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-02  3:15                             ` han vincent
@ 2017-06-02  6:20                               ` Loic Dachary
  2017-06-02  6:40                                 ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-06-02  6:20 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development



On 06/02/2017 05:15 AM, han vincent wrote:
> Hmm, I forgot to change to straw2.
> My cluster is too large, and its total capacity is 1658TB, 808TB is used.
> I am afraid I can not do this change, as the change will cause lots of
> data to migrate.

Each PG in pool 152 has 8GB * 3 replica = 24GB and 16 of them will move (out of the 32768). It means at most 384GB will move. In reality this will be less because you can observe that all remapped PGs have at least one OSD in common and a number of them have two OSD in common.

You defined the failure domain of your cluster to be the host. Each host (38) contains ~900 PG which is 10 times more.

That being said, I understand you don't want to disturb your cluster. Thank you for the time you spent discussing rebalancing, it was extremely valuable.

Cheers


> 2017-06-01 20:49 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>
>>
>> On 06/01/2017 02:32 PM, han vincent wrote:
>>> OK, but i do not want to upgrade my cluster to Luminous. There are
>>> lots of data in my cluster and it have run stable for nearly one year.
>>> I think the risk of upgrade to Luminous will be relatively large.
>>
>> In that case the first step is to move from straw to straw2. It will modify the following mappings:
>>
>> 2017-06-01 14:26:12,675 152.213a map to [432, 271, 451] instead of [432, 324, 275]
>> 2017-06-01 14:26:12,693 152.7115 map to [334, 359, 0] instead of [334, 359, 229]
>> 2017-06-01 14:26:12,715 152.1867 map to [416, 368, 433] instead of [416, 204, 19]
>> 2017-06-01 14:26:12,741 152.6e3e map to [11, 161, 67] instead of [372, 161, 67]
>> 2017-06-01 14:26:12,745 152.3385 map to [430, 325, 35] instead of [430, 325, 387]
>> 2017-06-01 14:26:12,747 152.5c2c map to [303, 351, 0] instead of [303, 351, 90]
>> 2017-06-01 14:26:12,777 152.d27 map to [171, 133, 52] instead of [171, 133, 63]
>> 2017-06-01 14:26:12,780 152.c3d map to [235, 366, 83] instead of [235, 370, 83]
>> 2017-06-01 14:26:12,826 152.54ad map to [86, 298, 92] instead of [86, 298, 318]
>> 2017-06-01 14:26:12,832 152.18cb map to [17, 389, 97] instead of [17, 439, 97]
>> 2017-06-01 14:26:12,852 152.716 map to [93, 37, 306] instead of [318, 37, 306]
>> 2017-06-01 14:26:12,866 152.640e map to [220, 21, 120] instead of [220, 397, 120]
>> 2017-06-01 14:26:12,883 152.6a3c map to [328, 245, 223] instead of [328, 347, 223]
>> 2017-06-01 14:26:12,946 152.1bb6 map to [110, 302, 318] instead of [110, 207, 318]
>> 2017-06-01 14:26:12,957 152.3366 map to [128, 78, 75] instead of [128, 78, 327]
>> 2017-06-01 14:26:13,005 153.1a98 map to [320, 22, 180] instead of [320, 22, 181]
>> 2017-06-01 14:26:13,026 153.5cf map to [9, 418, 344] instead of [9, 418, 435]
>> 2017-06-01 14:26:13,069 153.f5d map to [168, 445, 7] instead of [168, 445, 236]
>>
>>>
>>> 2017-06-01 20:23 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>
>>>>
>>>> On 06/01/2017 02:17 PM, han vincent wrote:
>>>>> you can get the crushmap from
>>>>> https://drive.google.com/open?id=0B5Kd4hBGEUvnUU9BcWY1NWtWNHM.
>>>>
>>>> Got it. You may want to remove it now as you probably don't want to expose all the informations it contains to the general public.
>>>>
>>>> Cheers
>>>>
>>>>>
>>>>> 2017-06-01 20:08 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>
>>>>>>
>>>>>> On 06/01/2017 01:52 PM, han vincent wrote:
>>>>>>> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/01/2017 01:38 PM, han vincent wrote:
>>>>>>>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>>>>>>>> Hi loic:
>>>>>>>>>>>    I still have two questions, in the following to commands:
>>>>>>>>>>>
>>>>>>>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>>>>>>>
>>>>>>>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>>>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>>>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>>>>>>>
>>>>>>>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>>>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>>>>>>>
>>>>>>>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>>>>>>>
>>>>>>>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>>>
>>>>>>>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>>>>>>>> there any way to optimize the cluster?
>>>>>>>>
>>>>>>>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> --
>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>
>>>>>>> the crushmap of which cluster I send to you is in lab environment, I
>>>>>>> have a much bigger cluster in production.
>>>>>>> I will send the crushmap to you later, could you help me to optimize
>>>>>>> this cluster?
>>>>>>
>>>>>> It is an interesting use case, I will help.
>>>>>>
>>>>>>> If you have detailed steps, please send it to me.
>>>>>>
>>>>>> Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> --
>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 12:49                           ` Loic Dachary
@ 2017-06-02  3:15                             ` han vincent
  2017-06-02  6:20                               ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: han vincent @ 2017-06-02  3:15 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

Hmm, I forgot to change to straw2.
My cluster is too large, and its total capacity is 1658TB, 808TB is used.
I am afraid I can not do this change, as the change will cause lots of
data to migrate.

2017-06-01 20:49 GMT+08:00 Loic Dachary <loic@dachary.org>:
>
>
> On 06/01/2017 02:32 PM, han vincent wrote:
>> OK, but i do not want to upgrade my cluster to Luminous. There are
>> lots of data in my cluster and it have run stable for nearly one year.
>> I think the risk of upgrade to Luminous will be relatively large.
>
> In that case the first step is to move from straw to straw2. It will modify the following mappings:
>
> 2017-06-01 14:26:12,675 152.213a map to [432, 271, 451] instead of [432, 324, 275]
> 2017-06-01 14:26:12,693 152.7115 map to [334, 359, 0] instead of [334, 359, 229]
> 2017-06-01 14:26:12,715 152.1867 map to [416, 368, 433] instead of [416, 204, 19]
> 2017-06-01 14:26:12,741 152.6e3e map to [11, 161, 67] instead of [372, 161, 67]
> 2017-06-01 14:26:12,745 152.3385 map to [430, 325, 35] instead of [430, 325, 387]
> 2017-06-01 14:26:12,747 152.5c2c map to [303, 351, 0] instead of [303, 351, 90]
> 2017-06-01 14:26:12,777 152.d27 map to [171, 133, 52] instead of [171, 133, 63]
> 2017-06-01 14:26:12,780 152.c3d map to [235, 366, 83] instead of [235, 370, 83]
> 2017-06-01 14:26:12,826 152.54ad map to [86, 298, 92] instead of [86, 298, 318]
> 2017-06-01 14:26:12,832 152.18cb map to [17, 389, 97] instead of [17, 439, 97]
> 2017-06-01 14:26:12,852 152.716 map to [93, 37, 306] instead of [318, 37, 306]
> 2017-06-01 14:26:12,866 152.640e map to [220, 21, 120] instead of [220, 397, 120]
> 2017-06-01 14:26:12,883 152.6a3c map to [328, 245, 223] instead of [328, 347, 223]
> 2017-06-01 14:26:12,946 152.1bb6 map to [110, 302, 318] instead of [110, 207, 318]
> 2017-06-01 14:26:12,957 152.3366 map to [128, 78, 75] instead of [128, 78, 327]
> 2017-06-01 14:26:13,005 153.1a98 map to [320, 22, 180] instead of [320, 22, 181]
> 2017-06-01 14:26:13,026 153.5cf map to [9, 418, 344] instead of [9, 418, 435]
> 2017-06-01 14:26:13,069 153.f5d map to [168, 445, 7] instead of [168, 445, 236]
>
>>
>> 2017-06-01 20:23 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>
>>>
>>> On 06/01/2017 02:17 PM, han vincent wrote:
>>>> you can get the crushmap from
>>>> https://drive.google.com/open?id=0B5Kd4hBGEUvnUU9BcWY1NWtWNHM.
>>>
>>> Got it. You may want to remove it now as you probably don't want to expose all the informations it contains to the general public.
>>>
>>> Cheers
>>>
>>>>
>>>> 2017-06-01 20:08 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>
>>>>>
>>>>> On 06/01/2017 01:52 PM, han vincent wrote:
>>>>>> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>
>>>>>>>
>>>>>>> On 06/01/2017 01:38 PM, han vincent wrote:
>>>>>>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>>>>>>> Hi loic:
>>>>>>>>>>    I still have two questions, in the following to commands:
>>>>>>>>>>
>>>>>>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>>>>>>
>>>>>>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>>>>>>
>>>>>>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>>>>>>
>>>>>>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>>>>>>
>>>>>>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>>
>>>>>>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>>>>>>> there any way to optimize the cluster?
>>>>>>>
>>>>>>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> --
>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>
>>>>>> the crushmap of which cluster I send to you is in lab environment, I
>>>>>> have a much bigger cluster in production.
>>>>>> I will send the crushmap to you later, could you help me to optimize
>>>>>> this cluster?
>>>>>
>>>>> It is an interesting use case, I will help.
>>>>>
>>>>>> If you have detailed steps, please send it to me.
>>>>>
>>>>> Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.
>>>>>
>>>>> Cheers
>>>>>
>>>>> --
>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 12:32                         ` han vincent
@ 2017-06-01 12:49                           ` Loic Dachary
  2017-06-02  3:15                             ` han vincent
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-06-01 12:49 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development



On 06/01/2017 02:32 PM, han vincent wrote:
> OK, but i do not want to upgrade my cluster to Luminous. There are
> lots of data in my cluster and it have run stable for nearly one year.
> I think the risk of upgrade to Luminous will be relatively large.

In that case the first step is to move from straw to straw2. It will modify the following mappings:

2017-06-01 14:26:12,675 152.213a map to [432, 271, 451] instead of [432, 324, 275]
2017-06-01 14:26:12,693 152.7115 map to [334, 359, 0] instead of [334, 359, 229]
2017-06-01 14:26:12,715 152.1867 map to [416, 368, 433] instead of [416, 204, 19]
2017-06-01 14:26:12,741 152.6e3e map to [11, 161, 67] instead of [372, 161, 67]
2017-06-01 14:26:12,745 152.3385 map to [430, 325, 35] instead of [430, 325, 387]
2017-06-01 14:26:12,747 152.5c2c map to [303, 351, 0] instead of [303, 351, 90]
2017-06-01 14:26:12,777 152.d27 map to [171, 133, 52] instead of [171, 133, 63]
2017-06-01 14:26:12,780 152.c3d map to [235, 366, 83] instead of [235, 370, 83]
2017-06-01 14:26:12,826 152.54ad map to [86, 298, 92] instead of [86, 298, 318]
2017-06-01 14:26:12,832 152.18cb map to [17, 389, 97] instead of [17, 439, 97]
2017-06-01 14:26:12,852 152.716 map to [93, 37, 306] instead of [318, 37, 306]
2017-06-01 14:26:12,866 152.640e map to [220, 21, 120] instead of [220, 397, 120]
2017-06-01 14:26:12,883 152.6a3c map to [328, 245, 223] instead of [328, 347, 223]
2017-06-01 14:26:12,946 152.1bb6 map to [110, 302, 318] instead of [110, 207, 318]
2017-06-01 14:26:12,957 152.3366 map to [128, 78, 75] instead of [128, 78, 327]
2017-06-01 14:26:13,005 153.1a98 map to [320, 22, 180] instead of [320, 22, 181]
2017-06-01 14:26:13,026 153.5cf map to [9, 418, 344] instead of [9, 418, 435]
2017-06-01 14:26:13,069 153.f5d map to [168, 445, 7] instead of [168, 445, 236]

> 
> 2017-06-01 20:23 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>
>>
>> On 06/01/2017 02:17 PM, han vincent wrote:
>>> you can get the crushmap from
>>> https://drive.google.com/open?id=0B5Kd4hBGEUvnUU9BcWY1NWtWNHM.
>>
>> Got it. You may want to remove it now as you probably don't want to expose all the informations it contains to the general public.
>>
>> Cheers
>>
>>>
>>> 2017-06-01 20:08 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>
>>>>
>>>> On 06/01/2017 01:52 PM, han vincent wrote:
>>>>> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>
>>>>>>
>>>>>> On 06/01/2017 01:38 PM, han vincent wrote:
>>>>>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>>>>>> Hi loic:
>>>>>>>>>    I still have two questions, in the following to commands:
>>>>>>>>>
>>>>>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>>>>>
>>>>>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>>>>>
>>>>>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>>>>>
>>>>>>>>
>>>>>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>>>>>
>>>>>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>>>>>
>>>>>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> --
>>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>>
>>>>>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>>>>>> there any way to optimize the cluster?
>>>>>>
>>>>>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> --
>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>
>>>>> the crushmap of which cluster I send to you is in lab environment, I
>>>>> have a much bigger cluster in production.
>>>>> I will send the crushmap to you later, could you help me to optimize
>>>>> this cluster?
>>>>
>>>> It is an interesting use case, I will help.
>>>>
>>>>> If you have detailed steps, please send it to me.
>>>>
>>>> Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.
>>>>
>>>> Cheers
>>>>
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 12:23                       ` Loic Dachary
@ 2017-06-01 12:32                         ` han vincent
  2017-06-01 12:49                           ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: han vincent @ 2017-06-01 12:32 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

OK, but i do not want to upgrade my cluster to Luminous. There are
lots of data in my cluster and it have run stable for nearly one year.
I think the risk of upgrade to Luminous will be relatively large.

2017-06-01 20:23 GMT+08:00 Loic Dachary <loic@dachary.org>:
>
>
> On 06/01/2017 02:17 PM, han vincent wrote:
>> you can get the crushmap from
>> https://drive.google.com/open?id=0B5Kd4hBGEUvnUU9BcWY1NWtWNHM.
>
> Got it. You may want to remove it now as you probably don't want to expose all the informations it contains to the general public.
>
> Cheers
>
>>
>> 2017-06-01 20:08 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>
>>>
>>> On 06/01/2017 01:52 PM, han vincent wrote:
>>>> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>
>>>>>
>>>>> On 06/01/2017 01:38 PM, han vincent wrote:
>>>>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>>
>>>>>>>
>>>>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>>>>> Hi loic:
>>>>>>>>    I still have two questions, in the following to commands:
>>>>>>>>
>>>>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>>>>
>>>>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>>>>
>>>>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>>>>
>>>>>>>
>>>>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>>>>
>>>>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>>>>
>>>>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> --
>>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>>
>>>>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>>>>> there any way to optimize the cluster?
>>>>>
>>>>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>>>>
>>>>> Cheers
>>>>>
>>>>> --
>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>
>>>> the crushmap of which cluster I send to you is in lab environment, I
>>>> have a much bigger cluster in production.
>>>> I will send the crushmap to you later, could you help me to optimize
>>>> this cluster?
>>>
>>> It is an interesting use case, I will help.
>>>
>>>> If you have detailed steps, please send it to me.
>>>
>>> Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.
>>>
>>> Cheers
>>>
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 12:17                     ` han vincent
@ 2017-06-01 12:23                       ` Loic Dachary
  2017-06-01 12:32                         ` han vincent
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-06-01 12:23 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development



On 06/01/2017 02:17 PM, han vincent wrote:
> you can get the crushmap from
> https://drive.google.com/open?id=0B5Kd4hBGEUvnUU9BcWY1NWtWNHM.

Got it. You may want to remove it now as you probably don't want to expose all the informations it contains to the general public.

Cheers

> 
> 2017-06-01 20:08 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>
>>
>> On 06/01/2017 01:52 PM, han vincent wrote:
>>> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>
>>>>
>>>> On 06/01/2017 01:38 PM, han vincent wrote:
>>>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>>
>>>>>>
>>>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>>>> Hi loic:
>>>>>>>    I still have two questions, in the following to commands:
>>>>>>>
>>>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>>>
>>>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>>>
>>>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>>>
>>>>>>
>>>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>>>
>>>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>>>
>>>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> --
>>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>>
>>>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>>>> there any way to optimize the cluster?
>>>>
>>>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>>>
>>>> Cheers
>>>>
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>
>>> the crushmap of which cluster I send to you is in lab environment, I
>>> have a much bigger cluster in production.
>>> I will send the crushmap to you later, could you help me to optimize
>>> this cluster?
>>
>> It is an interesting use case, I will help.
>>
>>> If you have detailed steps, please send it to me.
>>
>> Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.
>>
>> Cheers
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 12:08                   ` Loic Dachary
@ 2017-06-01 12:17                     ` han vincent
  2017-06-01 12:23                       ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: han vincent @ 2017-06-01 12:17 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

you can get the crushmap from
https://drive.google.com/open?id=0B5Kd4hBGEUvnUU9BcWY1NWtWNHM.

2017-06-01 20:08 GMT+08:00 Loic Dachary <loic@dachary.org>:
>
>
> On 06/01/2017 01:52 PM, han vincent wrote:
>> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>
>>>
>>> On 06/01/2017 01:38 PM, han vincent wrote:
>>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>>
>>>>>
>>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>>> Hi loic:
>>>>>>    I still have two questions, in the following to commands:
>>>>>>
>>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>>
>>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>>
>>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>>
>>>>>
>>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>>
>>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>>
>>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>>
>>>>> Cheers
>>>>>
>>>>> --
>>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>
>>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>>> there any way to optimize the cluster?
>>>
>>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>>
>>> Cheers
>>>
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>
>> the crushmap of which cluster I send to you is in lab environment, I
>> have a much bigger cluster in production.
>> I will send the crushmap to you later, could you help me to optimize
>> this cluster?
>
> It is an interesting use case, I will help.
>
>> If you have detailed steps, please send it to me.
>
> Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.
>
> Cheers
>
> --
> Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 11:52                 ` han vincent
@ 2017-06-01 12:08                   ` Loic Dachary
  2017-06-01 12:17                     ` han vincent
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-06-01 12:08 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development



On 06/01/2017 01:52 PM, han vincent wrote:
> 2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>
>>
>> On 06/01/2017 01:38 PM, han vincent wrote:
>>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>>
>>>>
>>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>>> Hi loic:
>>>>>    I still have two questions, in the following to commands:
>>>>>
>>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>>
>>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>>
>>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>>
>>>>
>>>> The --pool option is required. You can only optimize one pool at a time.
>>>>
>>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>>
>>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>>
>>>> Cheers
>>>>
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>
>>>  if the pools use the same crush rule or the same crush hierarchy, is
>>> there any way to optimize the cluster?
>>
>> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>>
>> Cheers
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
> 
> the crushmap of which cluster I send to you is in lab environment, I
> have a much bigger cluster in production.
> I will send the crushmap to you later, could you help me to optimize
> this cluster?

It is an interesting use case, I will help.

> If you have detailed steps, please send it to me.

Using the output of "ceph report" cluster, I will be able to verify it works as expected. The steps are simple but they will require an upgrade to Luminous.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 11:43               ` Loic Dachary
@ 2017-06-01 11:52                 ` han vincent
  2017-06-01 12:08                   ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: han vincent @ 2017-06-01 11:52 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

2017-06-01 19:43 GMT+08:00 Loic Dachary <loic@dachary.org>:
>
>
> On 06/01/2017 01:38 PM, han vincent wrote:
>> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>>
>>>
>>> On 06/01/2017 12:05 PM, han vincent wrote:
>>>> Hi loic:
>>>>    I still have two questions, in the following to commands:
>>>>
>>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>
>>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>>
>>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>>
>>>
>>> The --pool option is required. You can only optimize one pool at a time.
>>>
>>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>>
>>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>>
>>> Cheers
>>>
>>> --
>>> Loïc Dachary, Artisan Logiciel Libre
>>
>>  if the pools use the same crush rule or the same crush hierarchy, is
>> there any way to optimize the cluster?
>
> It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.
>
> Cheers
>
> --
> Loïc Dachary, Artisan Logiciel Libre

the crushmap of which cluster I send to you is in lab environment, I
have a much bigger cluster in production.
I will send the crushmap to you later, could you help me to optimize
this cluster?
If you have detailed steps, please send it to me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 11:38             ` han vincent
@ 2017-06-01 11:43               ` Loic Dachary
  2017-06-01 11:52                 ` han vincent
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-06-01 11:43 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development



On 06/01/2017 01:38 PM, han vincent wrote:
> 2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>>
>>
>> On 06/01/2017 12:05 PM, han vincent wrote:
>>> Hi loic:
>>>    I still have two questions, in the following to commands:
>>>
>>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>
>>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>>    how to use it to optimize multiple pools in a cluster of hammer?
>>>
>>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>>
>>
>> The --pool option is required. You can only optimize one pool at a time.
>>
>> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>>
>> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>>
>> Cheers
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
> 
>  if the pools use the same crush rule or the same crush hierarchy, is
> there any way to optimize the cluster?

It depends on the pools. In your cluster it does not matter because all pools except pool 49 are mostly empty. You can rebalance pool 49 and leave the other pool alone.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01 10:21           ` Loic Dachary
@ 2017-06-01 11:38             ` han vincent
  2017-06-01 11:43               ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: han vincent @ 2017-06-01 11:38 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

2017-06-01 18:21 GMT+08:00 Loic Dachary <loic@dachary.org>:
>
>
> On 06/01/2017 12:05 PM, han vincent wrote:
>> Hi loic:
>>    I still have two questions, in the following to commands:
>>
>>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>
>>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>>    how to use it to optimize multiple pools in a cluster of hammer?
>>
>>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>>
>
> The --pool option is required. You can only optimize one pool at a time.
>
> If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.
>
> The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.
>
> Cheers
>
> --
> Loïc Dachary, Artisan Logiciel Libre

 if the pools use the same crush rule or the same crush hierarchy, is
there any way to optimize the cluster?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
       [not found]         ` <CANNfkuZbfNSW5CQHaV0yyF3744FPf-gh0vKBj45bZKrvZ27MhA@mail.gmail.com>
@ 2017-06-01 10:21           ` Loic Dachary
  2017-06-01 11:38             ` han vincent
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-06-01 10:21 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development



On 06/01/2017 12:05 PM, han vincent wrote:
> Hi loic:
>    I still have two questions, in the following to commands:
>    
>    1. crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
> 
>    Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
>    if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
>    how to use it to optimize multiple pools in a cluster of hammer?
> 
>    2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>    In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option?
>    

The --pool option is required. You can only optimize one pool at a time.

If multiple pools use the same crush rule or the same crush hierarchy, rebalancing one of them will hurt the balance of the others. It is possible to optimize multiple pools but only if they have different rules and crush hiearchies.

The --choose-args option is the name of the weights that achieve the rebalancing. By convention it is the same as the pool id: this is required by Luminous clusters. Even though your cluster is not luminous, we stick to this convention.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-06-01  7:17       ` Loic Dachary
@ 2017-06-01 10:07         ` han vincent
       [not found]         ` <CANNfkuZbfNSW5CQHaV0yyF3744FPf-gh0vKBj45bZKrvZ27MhA@mail.gmail.com>
  1 sibling, 0 replies; 24+ messages in thread
From: han vincent @ 2017-06-01 10:07 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

Hi loic:
   I still have two questions, in the following to commands:

   1. crush optimize --crushmap /tmp/han-vincent-report.json
--out-path /tmp/han-vincent-report-optimized.txt --out-format txt
--pool 49
   Is "--pool" options must specified in this command? if not, will it
optimize all the pools without "--pool" option?
   if there are several pools in my cluster and each pool has a lot of
pgs. If I optimize one of the, will it affect the other pools?
   how to use it to optimize multiple pools in a cluster of hammer?

   2. crush analyze --crushmap /tmp/han-vincent-report-optimized.txt
--pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024
--rule=replicated_ruleset --choose-args=49
   In this command, the value of the "--choose-args" option is 49, it
is same as the pool id, what is the mean of "--choose-args" option?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
       [not found]     ` <CANNfkubxHfAYn-aLkHCQV_YZoMuVTzug9nRbvaTSE4UyKiQLuw@mail.gmail.com>
@ 2017-06-01  7:17       ` Loic Dachary
  2017-06-01 10:07         ` han vincent
       [not found]         ` <CANNfkuZbfNSW5CQHaV0yyF3744FPf-gh0vKBj45bZKrvZ27MhA@mail.gmail.com>
  0 siblings, 2 replies; 24+ messages in thread
From: Loic Dachary @ 2017-06-01  7:17 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development



On 06/01/2017 08:09 AM, han vincent wrote:
> Hi, Loic:
>   Thanks for you apply, And I still have some questions troubled me.
> 
>>>>Hi,
> 
>>>>I found the reason for the map problem, thanks a lot for reporting it. In a nutshell the "stable" tunable was implemented after hammer and your ceph report does not mention it at all. python-crush incorrectly assumes this means it should default to 1. It must default to 0 instead. When I do that manually, all mappings are correct. I'll fix this and publish a new version by tomorrow.
> If you fix the bug, please tell me, thanks.
> you said there was a "statble" tunable was implemented after hammer, Does it mean that the version of the "straw2" is unstable in hammer?

The tunable is meant to improve the straw2 placement. It does not indicate the straw2 code is unstable, it is a different concept.

> do you know which version the "statble" tunable was pulished? 

It was added Fri Nov 13 09:21:03 2015 -0500 by https://github.com/ceph/ceph/commit/fdb3f664448e80d984470f32f04e2e6f03ab52ec

It was released with Jewel http://docs.ceph.com/docs/master/release-notes/#v10.2.0-jewel
crush: add chooseleaf_stable tunable (pr#6572, Sangdi Xu, Sage Weil)

Cheers

>>>>Your cluster is not very unbalanced (less than 10% overfilled):
> 
>>>>(virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --crushmap /tmp/han-vincent-report.json --pool 49https://github.com/ceph/ceph/commit/fdb3f664448e80d984470f32f04e2e6f03ab52ec
>>>>         ~id~  ~weight~  ~PGs~  ~over/under filled %~ ~name~
>>>>node-6v    -4      1.08    427                   4.25
>>>>node-4     -2      1.08    416                   1.56
>>>>node-7v    -5      1.08    407                  -0.63
>>>>node-8v    -6      1.08    405                  -1.12
>>>>node-5v    -3      1.08    393                  -4.05
> 
>>>>Worst case scenario if a host fails:
> 
>>>>        ~over filled %~
>>>>~type~
>>>>device             7.81
>>>>host               4.10
>>>>root               0.00
>>>>(virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --type device --crushmap /tmp/han-vincent-report.json --pool 49
>>>>        ~id~  ~weight~  ~PGs~  ~over/under filled %~ ~name~
>>>>osd.5      5      0.54    221                   7.91
>>>>osd.0      0      0.54    211                   3.03
>>>>osd.7      7      0.54    210                   2.54
>>>>osd.4      4      0.54    206                   0.59
>>>>osd.8      8      0.54    206                   0.59
>>>>osd.1      1      0.54    205                   0.10
>>>>osd.3      3      0.54    200                  -2.34
>>>>osd.9      9      0.54    199                  -2.83
>>>>osd.6      6      0.54    197                  -3.81
>>>>osd.2      2      0.54    193                  -5.76
> 
>>>>Worst case scenario if a host fails:
> 
>>>>        ~over filled %~
>>>>~type~
>>>>device             7.81
>>>>host               4.10
>>>>root               0.00
> 
>>>>With optimization things will improve:
> 
>>>>$ crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>>>>2017-05-31 15:17:59,917 argv = optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --out-version=h --no-positions --choose-args=49
>>>>2017-05-31 15:17:59,940 default optimizing
>>>>2017-05-31 15:18:05,007 default wants to swap 43 PGs
>>>>2017-05-31 15:18:05,013 node-6v optimizing
>>>>2017-05-31 15:18:05,013 node-4 optimizing
>>>>2017-05-31 15:18:05,016 node-8v optimizing
>>>>2017-05-31 15:18:05,016 node-7v optimizing
>>>>2017-05-31 15:18:05,018 node-5v optimizing
>>>>2017-05-31 15:18:05,369 node-4 wants to swap 8 PGs
>>>>2017-05-31 15:18:05,742 node-6v wants to swap 10 PGs
>>>>2017-05-31 15:18:06,382 node-5v wants to swap 7 PGs
>>>>2017-05-31 15:18:06,602 node-7v wants to swap 7 PGs
>>>>2017-05-31 15:18:07,346 node-8v already optimized
> Is "--pool" options must specified in this command? if not, will it optimize all the pools without "--pool" option?
> if there are several pools in my cluster and each pool has a lot of pgs. If I optimize one of the, will it affect the other pools?
> how to use it to optimize multiple pools in a cluster of hammer?
> 
>>>>(virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>         ~id~  ~weight~  ~PGs~  ~over/under filled %~ ~name~
>>>>node-4     -2      1.08    410                   0.10
>>>>node-5v    -3      1.08    410                   0.10
>>>>node-6v    -4      1.08    410                   0.10
>>>>node-7v    -5      1.08    409                  -0.15
>>>>node-8v    -6      1.08    409                  -0.15
> In this command, the value of the "--choose-args" option is 49, it is same as the pool id, what is the mean of "--choose-args" option? 
> 
>>>>Worst case scenario if a host fails:
> 
>>>>        ~over filled %~
>>>>~type~
>>>>device             5.47
>>>>host               3.71
>>>>root               0.00
>>>>(virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --type device --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>>>>        ~id~  ~weight~  ~PGs~  ~over/under filled %~ ~name~
>>>>osd.2      2      0.54    206                   0.59
>>>>osd.8      8      0.54    206                   0.59
>>>>osd.0      0      0.54    205                   0.10
>>>>osd.1      1      0.54    205                   0.10
>>>>osd.4      4      0.54    205                   0.10
>>>>osd.5      5      0.54    205                   0.10
>>>>osd.7      7      0.54    205                   0.10
>>>>osd.3      3      0.54    204                  -0.39
>>>>osd.6      6      0.54    204                  -0.39
>>>>osd.9      9      0.54    203                  -0.88
> 
>>>>Worst case scenario if a host fails:
> 
>>>>        ~over filled %~
>>>>~type~
>>>>device             5.47
> 
>>>>host               3.71
>>>>root               0.00
> 
>>>>Note that the other pools won't be optimized and their PGs will be moved around for no good reason. However, since they contain very few PGs each (8 for most of them, 32 for one of them) and very little data (less than 1MB total), it won't matter much.
> 
>>>>Cheers
> 
> 
> 
>>On 05/31/2017 02:34 PM, Loic Dachary wrote:
>> Hi,
>>
>> On 05/31/2017 12:32 PM, han vincent wrote:
>>> hello, loic:
>>>
>>> I had a cluster build with hammer 0.94.10, then I used the following commands to change the algorithm from "straw" to "straw2".
>>> 1. ceph osd crush tunables hammer
>>> 2. ceph osd getcrushmap -o /tmp/cmap
>>> 3. crushtool -d /tmp/cmap -o /tmp/cmap.txt 4. vim /tmp/cmap.txt and 
>>> change the algorithm of each bucket from "straw" to "straw2"
>>> 5. crushtool -c /tmp/cmap.txt -o /tmp/cmap 6. ceph osd setcrushmap -i 
>>> /tmp/cmap 7. ceph osd crush reweight-all after that, I used "python 
>>> crush" to optimize the cluster, the version of "python crush" is 
>>> 1.0.32
>>>
>>> 1. ceph report > report.json
>>> 2. crush optimize --crushmap report.json --out-path optimized.crush 
>>> Unfortunately, there was an error in the output:
>>>
>>> 2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]
>>> 2017-05-30 18:48:01,838 49.3af map to [9, 2] instead of [9, 3]
>>> 2017-05-30 18:48:01,838 49.e3 map to [6, 4] instead of [6, 5]
>>> 2017-05-30 18:48:01,838 49.e1 map to [7, 2] instead of [7, 3]
>>> 2017-05-30 18:48:01,838 49.e0 map to [5, 1] instead of [5, 0]
>>> 2017-05-30 18:48:01,838 49.20d map to [3, 1] instead of [3, 0]
>>> 2017-05-30 18:48:01,838 49.20c map to [2, 9] instead of [2, 8]
>>> 2017-05-30 18:48:01,838 49.36e map to [6, 1] instead of [6, 0] ......
>>>
>>> Traceback (most recent call last):
>>>  File "/usr/bin/crush", line 25, in <module>
>>> sys.exit(Ceph().main(sys.argv[1:]))
>>>  File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, 
>>> in main return self.constructor(argv).run()  File 
>>> "/usr/lib64/python2.7/site-packages/crush/optimize.py", line 373, in 
>>> run crushmap = self.main.convert_to_crushmap(self.args.crushmap)
>>>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", 
>>> line 690, in convert_to_crushmap
>>> c.parse(crushmap)
>>>  File "/usr/lib64/python2.7/site-packages/crush/__init__.py", line 
>>> 138, in parse return 
>>> self.parse_crushmap(self._convert_to_crushmap(something))
>>>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", 
>>> line 416, in _convert_to_crushmap crushmap = 
>>> CephReport().parse_report(something)
>>>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", 
>>> line 137, in parse_report raise MappingError("some mapping failed, please file a bug at "
>>> crush.ceph.MappingError: some mapping failed, please file a bug at 
>>> http://libcrush.org/main/python-crush/issues/new
>>> Do you know what the problem is? can you help me? I would be very grateful to you.
>>
>> This is a safeguard to make sure python-crush maps exactly as expected. I'm not sure yet why there is a difference but I'll work on that, using the crush implementation found in hammer 0.94.10. For your information, the full output of:
>>
>> $ crush analyze --crushmap /tmp/han-vincent-report.json
>>
>> is at https://paste2.org/PyeHe2dC What I find strange is that your output regarding pool 42 is different than mine. You have:
>>
>>
>> 2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]
>>
>> and I have
>>
>> 2017-05-31 12:55:04,207 42.3 map to [4, 3] instead of [4, 2]
>> 2017-05-31 12:55:04,207 42.7 map to [8, 0] instead of [8, 1]
>> 2017-05-31 12:55:04,207 42.1 map to [4, 9] instead of [4, 8]
>>
>> I wonder if that's a sign that the changes to the crushmap following your change to straw2 are still going on. Would you mind sending me the output of ceph report (please run it again after receiving this mail) ?
>>
>> Cheers
>>
> 
>>--
>>Loïc Dachary, Artisan Logiciel Libre
> 
> 
> 2017-05-31 20:40 GMT+08:00 Loic Dachary <loic@dachary.org <mailto:loic@dachary.org>>:
> 
>     Hi,
> 
>     I found the reason for the map problem, thanks a lot for reporting it. In a nutshell the "stable" tunable was implemented after hammer and your ceph report does not mention it at all. python-crush incorrectly assumes this means it should default to 1. It must default to 0 instead. When I do that manually, all mappings are correct. I'll fix this and publish a new version by tomorrow.
> 
>     Your cluster is not very unbalanced (less than 10% overfilled):
> 
>     (virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --crushmap /tmp/han-vincent-report.json --pool 49
>              ~id~  ~weight~  ~PGs~  ~over/under filled %~
>     ~name~
>     node-6v    -4      1.08    427                   4.25
>     node-4     -2      1.08    416                   1.56
>     node-7v    -5      1.08    407                  -0.63
>     node-8v    -6      1.08    405                  -1.12
>     node-5v    -3      1.08    393                  -4.05
> 
>     Worst case scenario if a host fails:
> 
>             ~over filled %~
>     ~type~
>     device             7.81
>     host               4.10
>     root               0.00
>     (virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --type device --crushmap /tmp/han-vincent-report.json --pool 49
>             ~id~  ~weight~  ~PGs~  ~over/under filled %~
>     ~name~
>     osd.5      5      0.54    221                   7.91
>     osd.0      0      0.54    211                   3.03
>     osd.7      7      0.54    210                   2.54
>     osd.4      4      0.54    206                   0.59
>     osd.8      8      0.54    206                   0.59
>     osd.1      1      0.54    205                   0.10
>     osd.3      3      0.54    200                  -2.34
>     osd.9      9      0.54    199                  -2.83
>     osd.6      6      0.54    197                  -3.81
>     osd.2      2      0.54    193                  -5.76
> 
>     Worst case scenario if a host fails:
> 
>             ~over filled %~
>     ~type~
>     device             7.81
>     host               4.10
>     root               0.00
> 
>     With optimization things will improve:
> 
>     $ crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
>     2017-05-31 15:17:59,917 argv = optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --out-version=h --no-positions --choose-args=49
>     2017-05-31 15:17:59,940 default optimizing
>     2017-05-31 15:18:05,007 default wants to swap 43 PGs
>     2017-05-31 15:18:05,013 node-6v optimizing
>     2017-05-31 15:18:05,013 node-4 optimizing
>     2017-05-31 15:18:05,016 node-8v optimizing
>     2017-05-31 15:18:05,016 node-7v optimizing
>     2017-05-31 15:18:05,018 node-5v optimizing
>     2017-05-31 15:18:05,369 node-4 wants to swap 8 PGs
>     2017-05-31 15:18:05,742 node-6v wants to swap 10 PGs
>     2017-05-31 15:18:06,382 node-5v wants to swap 7 PGs
>     2017-05-31 15:18:06,602 node-7v wants to swap 7 PGs
>     2017-05-31 15:18:07,346 node-8v already optimized
> 
>     (virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>              ~id~  ~weight~  ~PGs~  ~over/under filled %~
>     ~name~
>     node-4     -2      1.08    410                   0.10
>     node-5v    -3      1.08    410                   0.10
>     node-6v    -4      1.08    410                   0.10
>     node-7v    -5      1.08    409                  -0.15
>     node-8v    -6      1.08    409                  -0.15
> 
>     Worst case scenario if a host fails:
> 
>             ~over filled %~
>     ~type~
>     device             5.47
>     host               3.71
>     root               0.00
>     (virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --type device --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
>             ~id~  ~weight~  ~PGs~  ~over/under filled %~
>     ~name~
>     osd.2      2      0.54    206                   0.59
>     osd.8      8      0.54    206                   0.59
>     osd.0      0      0.54    205                   0.10
>     osd.1      1      0.54    205                   0.10
>     osd.4      4      0.54    205                   0.10
>     osd.5      5      0.54    205                   0.10
>     osd.7      7      0.54    205                   0.10
>     osd.3      3      0.54    204                  -0.39
>     osd.6      6      0.54    204                  -0.39
>     osd.9      9      0.54    203                  -0.88
> 
>     Worst case scenario if a host fails:
> 
>             ~over filled %~
>     ~type~
>     device             5.47
> 
>     host               3.71
>     root               0.00
> 
>     Note that the other pools won't be optimized and their PGs will be moved around for no good reason. However, since they contain very few PGs each (8 for most of them, 32 for one of them) and very little data (less than 1MB total), it won't matter much.
> 
>     Cheers
> 
>     On 05/31/2017 02:34 PM, Loic Dachary wrote:
>     > Hi,
>     >
>     > On 05/31/2017 12:32 PM, han vincent wrote:
>     >> hello, loic:
>     >>
>     >> I had a cluster build with hammer 0.94.10, then I used the following commands to change the algorithm from "straw" to "straw2".
>     >> 1. ceph osd crush tunables hammer
>     >> 2. ceph osd getcrushmap -o /tmp/cmap
>     >> 3. crushtool -d /tmp/cmap -o /tmp/cmap.txt
>     >> 4. vim /tmp/cmap.txt and change the algorithm of each bucket from "straw" to "straw2"
>     >> 5. crushtool -c /tmp/cmap.txt -o /tmp/cmap
>     >> 6. ceph osd setcrushmap -i /tmp/cmap
>     >> 7. ceph osd crush reweight-all
>     >> after that, I used "python crush" to optimize the cluster, the version of "python crush" is 1.0.32
>     >>
>     >> 1. ceph report > report.json
>     >> 2. crush optimize --crushmap report.json --out-path optimized.crush
>     >> Unfortunately, there was an error in the output:
>     >>
>     >> 2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]
>     >> 2017-05-30 18:48:01,838 49.3af map to [9, 2] instead of [9, 3]
>     >> 2017-05-30 18:48:01,838 49.e3 map to [6, 4] instead of [6, 5]
>     >> 2017-05-30 18:48:01,838 49.e1 map to [7, 2] instead of [7, 3]
>     >> 2017-05-30 18:48:01,838 49.e0 map to [5, 1] instead of [5, 0]
>     >> 2017-05-30 18:48:01,838 49.20d map to [3, 1] instead of [3, 0]
>     >> 2017-05-30 18:48:01,838 49.20c map to [2, 9] instead of [2, 8]
>     >> 2017-05-30 18:48:01,838 49.36e map to [6, 1] instead of [6, 0]
>     >> ......
>     >>
>     >> Traceback (most recent call last):
>     >>  File "/usr/bin/crush", line 25, in <module>
>     >> sys.exit(Ceph().main(sys.argv[1:]))
>     >>  File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, in main
>     >> return self.constructor(argv).run()
>     >>  File "/usr/lib64/python2.7/site-packages/crush/optimize.py", line 373, in run
>     >> crushmap = self.main.convert_to_crushmap(self.args.crushmap)
>     >>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 690, in convert_to_crushmap
>     >> c.parse(crushmap)
>     >>  File "/usr/lib64/python2.7/site-packages/crush/__init__.py", line 138, in parse
>     >> return self.parse_crushmap(self._convert_to_crushmap(something))
>     >>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 416, in _convert_to_crushmap
>     >> crushmap = CephReport().parse_report(something)
>     >>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 137, in parse_report
>     >> raise MappingError("some mapping failed, please file a bug at "
>     >> crush.ceph.MappingError: some mapping failed, please file a bug at http://libcrush.org/main/python-crush/issues/new <http://libcrush.org/main/python-crush/issues/new>
>     >> Do you know what the problem is? can you help me? I would be very grateful to you.
>     >
>     > This is a safeguard to make sure python-crush maps exactly as expected. I'm not sure yet why there is a difference but I'll work on that, using the crush implementation found in hammer 0.94.10. For your information, the full output of:
>     >
>     > $ crush analyze --crushmap /tmp/han-vincent-report.json
>     >
>     > is at https://paste2.org/PyeHe2dC What I find strange is that your output regarding pool 42 is different than mine. You have:
>     >
>     >
>     > 2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]
>     >
>     > and I have
>     >
>     > 2017-05-31 12:55:04,207 42.3 map to [4, 3] instead of [4, 2]
>     > 2017-05-31 12:55:04,207 42.7 map to [8, 0] instead of [8, 1]
>     > 2017-05-31 12:55:04,207 42.1 map to [4, 9] instead of [4, 8]
>     >
>     > I wonder if that's a sign that the changes to the crushmap following your change to straw2 are still going on. Would you mind sending me the output of ceph report (please run it again after receiving this mail) ?
>     >
>     > Cheers
>     >
> 
>     --
>     Loïc Dachary, Artisan Logiciel Libre
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-05-31 11:34 ` Loic Dachary
@ 2017-05-31 12:40   ` Loic Dachary
       [not found]     ` <CANNfkubxHfAYn-aLkHCQV_YZoMuVTzug9nRbvaTSE4UyKiQLuw@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-05-31 12:40 UTC (permalink / raw)
  To: han vincent; +Cc: Ceph Development

Hi,

I found the reason for the map problem, thanks a lot for reporting it. In a nutshell the "stable" tunable was implemented after hammer and your ceph report does not mention it at all. python-crush incorrectly assumes this means it should default to 1. It must default to 0 instead. When I do that manually, all mappings are correct. I'll fix this and publish a new version by tomorrow.

Your cluster is not very unbalanced (less than 10% overfilled):

(virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --crushmap /tmp/han-vincent-report.json --pool 49
         ~id~  ~weight~  ~PGs~  ~over/under filled %~
~name~                                               
node-6v    -4      1.08    427                   4.25
node-4     -2      1.08    416                   1.56
node-7v    -5      1.08    407                  -0.63
node-8v    -6      1.08    405                  -1.12
node-5v    -3      1.08    393                  -4.05

Worst case scenario if a host fails:

        ~over filled %~
~type~                 
device             7.81
host               4.10
root               0.00
(virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --type device --crushmap /tmp/han-vincent-report.json --pool 49
        ~id~  ~weight~  ~PGs~  ~over/under filled %~
~name~                                              
osd.5      5      0.54    221                   7.91
osd.0      0      0.54    211                   3.03
osd.7      7      0.54    210                   2.54
osd.4      4      0.54    206                   0.59
osd.8      8      0.54    206                   0.59
osd.1      1      0.54    205                   0.10
osd.3      3      0.54    200                  -2.34
osd.9      9      0.54    199                  -2.83
osd.6      6      0.54    197                  -3.81
osd.2      2      0.54    193                  -5.76

Worst case scenario if a host fails:

        ~over filled %~
~type~                 
device             7.81
host               4.10
root               0.00

With optimization things will improve:

$ crush optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49
2017-05-31 15:17:59,917 argv = optimize --crushmap /tmp/han-vincent-report.json --out-path /tmp/han-vincent-report-optimized.txt --out-format txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --out-version=h --no-positions --choose-args=49
2017-05-31 15:17:59,940 default optimizing
2017-05-31 15:18:05,007 default wants to swap 43 PGs
2017-05-31 15:18:05,013 node-6v optimizing
2017-05-31 15:18:05,013 node-4 optimizing
2017-05-31 15:18:05,016 node-8v optimizing
2017-05-31 15:18:05,016 node-7v optimizing
2017-05-31 15:18:05,018 node-5v optimizing
2017-05-31 15:18:05,369 node-4 wants to swap 8 PGs
2017-05-31 15:18:05,742 node-6v wants to swap 10 PGs
2017-05-31 15:18:06,382 node-5v wants to swap 7 PGs
2017-05-31 15:18:06,602 node-7v wants to swap 7 PGs
2017-05-31 15:18:07,346 node-8v already optimized

(virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
         ~id~  ~weight~  ~PGs~  ~over/under filled %~
~name~                                               
node-4     -2      1.08    410                   0.10
node-5v    -3      1.08    410                   0.10
node-6v    -4      1.08    410                   0.10
node-7v    -5      1.08    409                  -0.15
node-8v    -6      1.08    409                  -0.15

Worst case scenario if a host fails:

        ~over filled %~
~type~                 
device             5.47
host               3.71
root               0.00
(virtualenv) loic@fold:~/software/libcrush/python-crush$ crush analyze --type device --crushmap /tmp/han-vincent-report-optimized.txt --pool 49 --replication-count=2 --pg-num=1024 --pgp-num=1024 --rule=replicated_ruleset --choose-args=49
        ~id~  ~weight~  ~PGs~  ~over/under filled %~
~name~                                              
osd.2      2      0.54    206                   0.59
osd.8      8      0.54    206                   0.59
osd.0      0      0.54    205                   0.10
osd.1      1      0.54    205                   0.10
osd.4      4      0.54    205                   0.10
osd.5      5      0.54    205                   0.10
osd.7      7      0.54    205                   0.10
osd.3      3      0.54    204                  -0.39
osd.6      6      0.54    204                  -0.39
osd.9      9      0.54    203                  -0.88

Worst case scenario if a host fails:

        ~over filled %~
~type~                 
device             5.47

host               3.71
root               0.00

Note that the other pools won't be optimized and their PGs will be moved around for no good reason. However, since they contain very few PGs each (8 for most of them, 32 for one of them) and very little data (less than 1MB total), it won't matter much.

Cheers

On 05/31/2017 02:34 PM, Loic Dachary wrote:
> Hi,
> 
> On 05/31/2017 12:32 PM, han vincent wrote:
>> hello, loic:
>>      
>> I had a cluster build with hammer 0.94.10, then I used the following commands to change the algorithm from "straw" to "straw2".
>> 1. ceph osd crush tunables hammer
>> 2. ceph osd getcrushmap -o /tmp/cmap
>> 3. crushtool -d /tmp/cmap -o /tmp/cmap.txt
>> 4. vim /tmp/cmap.txt and change the algorithm of each bucket from "straw" to "straw2"
>> 5. crushtool -c /tmp/cmap.txt -o /tmp/cmap
>> 6. ceph osd setcrushmap -i /tmp/cmap
>> 7. ceph osd crush reweight-all
>> after that, I used "python crush" to optimize the cluster, the version of "python crush" is 1.0.32
>>
>> 1. ceph report > report.json
>> 2. crush optimize --crushmap report.json --out-path optimized.crush
>> Unfortunately, there was an error in the output:
>>
>> 2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]
>> 2017-05-30 18:48:01,838 49.3af map to [9, 2] instead of [9, 3]
>> 2017-05-30 18:48:01,838 49.e3 map to [6, 4] instead of [6, 5]
>> 2017-05-30 18:48:01,838 49.e1 map to [7, 2] instead of [7, 3]
>> 2017-05-30 18:48:01,838 49.e0 map to [5, 1] instead of [5, 0]
>> 2017-05-30 18:48:01,838 49.20d map to [3, 1] instead of [3, 0]
>> 2017-05-30 18:48:01,838 49.20c map to [2, 9] instead of [2, 8]
>> 2017-05-30 18:48:01,838 49.36e map to [6, 1] instead of [6, 0]
>> ......
>>
>> Traceback (most recent call last):
>>  File "/usr/bin/crush", line 25, in <module>
>> sys.exit(Ceph().main(sys.argv[1:]))
>>  File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, in main
>> return self.constructor(argv).run()
>>  File "/usr/lib64/python2.7/site-packages/crush/optimize.py", line 373, in run
>> crushmap = self.main.convert_to_crushmap(self.args.crushmap)
>>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 690, in convert_to_crushmap
>> c.parse(crushmap)
>>  File "/usr/lib64/python2.7/site-packages/crush/__init__.py", line 138, in parse
>> return self.parse_crushmap(self._convert_to_crushmap(something))
>>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 416, in _convert_to_crushmap
>> crushmap = CephReport().parse_report(something)
>>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 137, in parse_report
>> raise MappingError("some mapping failed, please file a bug at "
>> crush.ceph.MappingError: some mapping failed, please file a bug at http://libcrush.org/main/python-crush/issues/new
>> Do you know what the problem is? can you help me? I would be very grateful to you.    
> 
> This is a safeguard to make sure python-crush maps exactly as expected. I'm not sure yet why there is a difference but I'll work on that, using the crush implementation found in hammer 0.94.10. For your information, the full output of:
> 
> $ crush analyze --crushmap /tmp/han-vincent-report.json
> 
> is at https://paste2.org/PyeHe2dC What I find strange is that your output regarding pool 42 is different than mine. You have:
> 
> 
> 2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]
> 
> and I have
> 
> 2017-05-31 12:55:04,207 42.3 map to [4, 3] instead of [4, 2]
> 2017-05-31 12:55:04,207 42.7 map to [8, 0] instead of [8, 1]
> 2017-05-31 12:55:04,207 42.1 map to [4, 9] instead of [4, 8]
> 
> I wonder if that's a sign that the changes to the crushmap following your change to straw2 are still going on. Would you mind sending me the output of ceph report (please run it again after receiving this mail) ?
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
       [not found] <CANNfkubYdJqJDRV8kuNDBy368OxOyXkXKU3wr9ULFeafPnBoHg@mail.gmail.com>
@ 2017-05-31 11:34 ` Loic Dachary
  2017-05-31 12:40   ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-05-31 11:34 UTC (permalink / raw)
  To: han vincent; +Cc: ceph-devel

Hi,

On 05/31/2017 12:32 PM, han vincent wrote:
> hello, loic:
>      
> I had a cluster build with hammer 0.94.10, then I used the following commands to change the algorithm from "straw" to "straw2".
> 1. ceph osd crush tunables hammer
> 2. ceph osd getcrushmap -o /tmp/cmap
> 3. crushtool -d /tmp/cmap -o /tmp/cmap.txt
> 4. vim /tmp/cmap.txt and change the algorithm of each bucket from "straw" to "straw2"
> 5. crushtool -c /tmp/cmap.txt -o /tmp/cmap
> 6. ceph osd setcrushmap -i /tmp/cmap
> 7. ceph osd crush reweight-all
> after that, I used "python crush" to optimize the cluster, the version of "python crush" is 1.0.32
> 
> 1. ceph report > report.json
> 2. crush optimize --crushmap report.json --out-path optimized.crush
> Unfortunately, there was an error in the output:
> 
> 2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]
> 2017-05-30 18:48:01,838 49.3af map to [9, 2] instead of [9, 3]
> 2017-05-30 18:48:01,838 49.e3 map to [6, 4] instead of [6, 5]
> 2017-05-30 18:48:01,838 49.e1 map to [7, 2] instead of [7, 3]
> 2017-05-30 18:48:01,838 49.e0 map to [5, 1] instead of [5, 0]
> 2017-05-30 18:48:01,838 49.20d map to [3, 1] instead of [3, 0]
> 2017-05-30 18:48:01,838 49.20c map to [2, 9] instead of [2, 8]
> 2017-05-30 18:48:01,838 49.36e map to [6, 1] instead of [6, 0]
> ......
> 
> Traceback (most recent call last):
>  File "/usr/bin/crush", line 25, in <module>
> sys.exit(Ceph().main(sys.argv[1:]))
>  File "/usr/lib64/python2.7/site-packages/crush/main.py", line 136, in main
> return self.constructor(argv).run()
>  File "/usr/lib64/python2.7/site-packages/crush/optimize.py", line 373, in run
> crushmap = self.main.convert_to_crushmap(self.args.crushmap)
>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 690, in convert_to_crushmap
> c.parse(crushmap)
>  File "/usr/lib64/python2.7/site-packages/crush/__init__.py", line 138, in parse
> return self.parse_crushmap(self._convert_to_crushmap(something))
>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 416, in _convert_to_crushmap
> crushmap = CephReport().parse_report(something)
>  File "/usr/lib64/python2.7/site-packages/crush/ceph/__init__.py", line 137, in parse_report
> raise MappingError("some mapping failed, please file a bug at "
> crush.ceph.MappingError: some mapping failed, please file a bug at http://libcrush.org/main/python-crush/issues/new
> Do you know what the problem is? can you help me? I would be very grateful to you.    

This is a safeguard to make sure python-crush maps exactly as expected. I'm not sure yet why there is a difference but I'll work on that, using the crush implementation found in hammer 0.94.10. For your information, the full output of:

$ crush analyze --crushmap /tmp/han-vincent-report.json

is at https://paste2.org/PyeHe2dC What I find strange is that your output regarding pool 42 is different than mine. You have:


2017-05-30 18:48:01,803 42.1 map to [4, 9] instead of [4, 8]

and I have

2017-05-31 12:55:04,207 42.3 map to [4, 3] instead of [4, 2]
2017-05-31 12:55:04,207 42.7 map to [8, 0] instead of [8, 1]
2017-05-31 12:55:04,207 42.1 map to [4, 9] instead of [4, 8]

I wonder if that's a sign that the changes to the crushmap following your change to straw2 are still going on. Would you mind sending me the output of ceph report (please run it again after receiving this mail) ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-05-24 14:01   ` Loic Dachary
@ 2017-05-31  7:01     ` Loic Dachary
  0 siblings, 0 replies; 24+ messages in thread
From: Loic Dachary @ 2017-05-31  7:01 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Ceph Development

(as a conclusion to this thread)

Thanks for testing (and your patience while I was fixing a few bugs). 

I'm glad your cluster is now almost even (+/- 1.5% over/under filled for the OSDs and 0.5% for the hosts). It is better than it was before (+/- 25% over/under filled for the OSDs and 6% for the hosts).

Worst case scenario if a host fails was (before optimization):

        ~over filled %~
~type~                 
device            30.15
host              10.53

After optimization it is down to:

        ~over filled %~
~type~                 
device             7.94
host               4.55

Since you have a full SSD cluster you chose to optimize all at once. It means the incremental approach (--step) was not used.

Cheers

On 05/24/2017 05:01 PM, Loic Dachary wrote:
> 
> 
> On 05/24/2017 04:50 PM, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> great! What means pool 3? Is it just the pool nr from the poll dump / ls
>> command?
> 
> Yes. In the report you sent me, this is the number of the only pool in the cluster.
> 
>>
>> Stefan
>>
>> Am 24.05.2017 um 15:48 schrieb Loic Dachary:
>>> Hi Stefan,
>>>
>>> Thanks for volunteering to beta test the crush optimization on a live cluster :-)
>>>
>>> The "crush optimize" command was published today[1] and you should be able to improve your cluster distribution with the following:
>>>
>>> ceph report > report.json
>>> crush optimize --no-forecast --step 64 --crushmap report.json --pool 3 --out-path optimized.crush
>>> ceph osd setcrushmap -i optimized.crush
>>>
>>> Note that it will only perform a first optimization step (moving around 64 PGs). You will need to repeat this command a dozen time to fully optimize the cluster. I assume that's what you will want to control the workload. If you want a minimal change at each step, you can try --step 1 but it will require more than a hundred steps.
>>>
>>> If you're not worried about the load of the cluster, you can optimize it in one go with:
>>>
>>> ceph report > report.json
>>> crush optimize --crushmap report.json --pool 3 --out-path optimized.crush
>>> ceph osd setcrushmap -i optimized.crush
>>>
>>> Cheers
>>>
>>> [1] http://crush.readthedocs.io/en/latest/ceph/optimize.html
>>>
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-05-24 13:50 ` Stefan Priebe - Profihost AG
@ 2017-05-24 14:01   ` Loic Dachary
  2017-05-31  7:01     ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-05-24 14:01 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Ceph Development



On 05/24/2017 04:50 PM, Stefan Priebe - Profihost AG wrote:
> Hello,
> 
> great! What means pool 3? Is it just the pool nr from the poll dump / ls
> command?

Yes. In the report you sent me, this is the number of the only pool in the cluster.

> 
> Stefan
> 
> Am 24.05.2017 um 15:48 schrieb Loic Dachary:
>> Hi Stefan,
>>
>> Thanks for volunteering to beta test the crush optimization on a live cluster :-)
>>
>> The "crush optimize" command was published today[1] and you should be able to improve your cluster distribution with the following:
>>
>> ceph report > report.json
>> crush optimize --no-forecast --step 64 --crushmap report.json --pool 3 --out-path optimized.crush
>> ceph osd setcrushmap -i optimized.crush
>>
>> Note that it will only perform a first optimization step (moving around 64 PGs). You will need to repeat this command a dozen time to fully optimize the cluster. I assume that's what you will want to control the workload. If you want a minimal change at each step, you can try --step 1 but it will require more than a hundred steps.
>>
>> If you're not worried about the load of the cluster, you can optimize it in one go with:
>>
>> ceph report > report.json
>> crush optimize --crushmap report.json --pool 3 --out-path optimized.crush
>> ceph osd setcrushmap -i optimized.crush
>>
>> Cheers
>>
>> [1] http://crush.readthedocs.io/en/latest/ceph/optimize.html
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Beta testing crush optimization
  2017-05-24 13:48 Loic Dachary
@ 2017-05-24 13:50 ` Stefan Priebe - Profihost AG
  2017-05-24 14:01   ` Loic Dachary
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Priebe - Profihost AG @ 2017-05-24 13:50 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

Hello,

great! What means pool 3? Is it just the pool nr from the poll dump / ls
command?

Stefan

Am 24.05.2017 um 15:48 schrieb Loic Dachary:
> Hi Stefan,
> 
> Thanks for volunteering to beta test the crush optimization on a live cluster :-)
> 
> The "crush optimize" command was published today[1] and you should be able to improve your cluster distribution with the following:
> 
> ceph report > report.json
> crush optimize --no-forecast --step 64 --crushmap report.json --pool 3 --out-path optimized.crush
> ceph osd setcrushmap -i optimized.crush
> 
> Note that it will only perform a first optimization step (moving around 64 PGs). You will need to repeat this command a dozen time to fully optimize the cluster. I assume that's what you will want to control the workload. If you want a minimal change at each step, you can try --step 1 but it will require more than a hundred steps.
> 
> If you're not worried about the load of the cluster, you can optimize it in one go with:
> 
> ceph report > report.json
> crush optimize --crushmap report.json --pool 3 --out-path optimized.crush
> ceph osd setcrushmap -i optimized.crush
> 
> Cheers
> 
> [1] http://crush.readthedocs.io/en/latest/ceph/optimize.html
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Beta testing crush optimization
@ 2017-05-24 13:48 Loic Dachary
  2017-05-24 13:50 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 24+ messages in thread
From: Loic Dachary @ 2017-05-24 13:48 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Ceph Development

Hi Stefan,

Thanks for volunteering to beta test the crush optimization on a live cluster :-)

The "crush optimize" command was published today[1] and you should be able to improve your cluster distribution with the following:

ceph report > report.json
crush optimize --no-forecast --step 64 --crushmap report.json --pool 3 --out-path optimized.crush
ceph osd setcrushmap -i optimized.crush

Note that it will only perform a first optimization step (moving around 64 PGs). You will need to repeat this command a dozen time to fully optimize the cluster. I assume that's what you will want to control the workload. If you want a minimal change at each step, you can try --step 1 but it will require more than a hundred steps.

If you're not worried about the load of the cluster, you can optimize it in one go with:

ceph report > report.json
crush optimize --crushmap report.json --pool 3 --out-path optimized.crush
ceph osd setcrushmap -i optimized.crush

Cheers

[1] http://crush.readthedocs.io/en/latest/ceph/optimize.html

-- 
Loïc Dachary, Artisan Logiciel Libre

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-06-06 13:58 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31  9:52 Beta testing crush optimization han vincent
     [not found] <CANNfkubYdJqJDRV8kuNDBy368OxOyXkXKU3wr9ULFeafPnBoHg@mail.gmail.com>
2017-05-31 11:34 ` Loic Dachary
2017-05-31 12:40   ` Loic Dachary
     [not found]     ` <CANNfkubxHfAYn-aLkHCQV_YZoMuVTzug9nRbvaTSE4UyKiQLuw@mail.gmail.com>
2017-06-01  7:17       ` Loic Dachary
2017-06-01 10:07         ` han vincent
     [not found]         ` <CANNfkuZbfNSW5CQHaV0yyF3744FPf-gh0vKBj45bZKrvZ27MhA@mail.gmail.com>
2017-06-01 10:21           ` Loic Dachary
2017-06-01 11:38             ` han vincent
2017-06-01 11:43               ` Loic Dachary
2017-06-01 11:52                 ` han vincent
2017-06-01 12:08                   ` Loic Dachary
2017-06-01 12:17                     ` han vincent
2017-06-01 12:23                       ` Loic Dachary
2017-06-01 12:32                         ` han vincent
2017-06-01 12:49                           ` Loic Dachary
2017-06-02  3:15                             ` han vincent
2017-06-02  6:20                               ` Loic Dachary
2017-06-02  6:40                                 ` Loic Dachary
2017-06-02  9:28                                   ` han vincent
2017-06-06  9:02                                   ` han vincent
2017-06-06 13:58                                     ` Loic Dachary
  -- strict thread matches above, loose matches on Subject: below --
2017-05-24 13:48 Loic Dachary
2017-05-24 13:50 ` Stefan Priebe - Profihost AG
2017-05-24 14:01   ` Loic Dachary
2017-05-31  7:01     ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.