* Tuning placement group
@ 2012-07-20 16:33 François Charlier
2012-07-20 18:08 ` Florian Haas
2012-07-20 19:43 ` Sage Weil
0 siblings, 2 replies; 4+ messages in thread
From: François Charlier @ 2012-07-20 16:33 UTC (permalink / raw)
To: ceph-devel
Hello,
Reading http://ceph.com/docs/master/ops/manage/grow/placement-groups/
and thinking to build a ceph cluster with potentially 1000 OSDs.
Using the recommandations on the previously cited link, it would require
pg_num being set between 10,000 & 30,000. Okay with that. Let's use the
recommended value of 16,384 ; this is alreay about 160 placement groups
per OSD.
What if, for a start, we choose to reach this number of 1000 OSDs
slowly, starting with 100 OSDs ? It's now 1600 placement groups per OSD.
What if we chose 30,000 (or 32,768) placement groups to keep room for
expansion ?
My question is : How will behave a Ceph pool with 1000, 5000 or even
10000 placement groups per OSD ? Will this impact performance ? How bad
? Can it be worked around ? Is this a problem of RAM size ? CPU usage ?
Any hint about this would be much appreciated.
Thanks !
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Tuning placement group
2012-07-20 16:33 Tuning placement group François Charlier
@ 2012-07-20 18:08 ` Florian Haas
2012-07-20 19:33 ` Yehuda Sadeh
2012-07-20 19:43 ` Sage Weil
1 sibling, 1 reply; 4+ messages in thread
From: Florian Haas @ 2012-07-20 18:08 UTC (permalink / raw)
To: François Charlier; +Cc: ceph-devel
On Fri, Jul 20, 2012 at 9:33 AM, François Charlier
<francois.charlier@enovance.com> wrote:
> Hello,
>
> Reading http://ceph.com/docs/master/ops/manage/grow/placement-groups/
> and thinking to build a ceph cluster with potentially 1000 OSDs.
>
> Using the recommandations on the previously cited link, it would require
> pg_num being set between 10,000 & 30,000. Okay with that. Let's use the
> recommended value of 16,384 ; this is alreay about 160 placement groups
> per OSD.
>
> What if, for a start, we choose to reach this number of 1000 OSDs
> slowly, starting with 100 OSDs ? It's now 1600 placement groups per OSD.
>
> What if we chose 30,000 (or 32,768) placement groups to keep room for
> expansion ?
>
> My question is : How will behave a Ceph pool with 1000, 5000 or even
> 10000 placement groups per OSD ? Will this impact performance ? How bad
> ? Can it be worked around ? Is this a problem of RAM size ? CPU usage ?
>
> Any hint about this would be much appreciated.
If I may, I'd like to add an additional point of consideration,
specifically for radosgw setups:
What's the recommended way to set the number of PGs for the half-dozen
pools that radosgw normally creates on its own (.rgw, .rgw.users,
.rgw.buckets and so on)? I *think* wanting to set a custom number of
PGs would require pre-creating these pools manually, but there may be
a way -- undocumented? -- to instruct radosgw to set a user-configured
number of PGs on pool creation. Insight on that would be much
appreciated.
Cheers,
Florian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Tuning placement group
2012-07-20 18:08 ` Florian Haas
@ 2012-07-20 19:33 ` Yehuda Sadeh
0 siblings, 0 replies; 4+ messages in thread
From: Yehuda Sadeh @ 2012-07-20 19:33 UTC (permalink / raw)
To: Florian Haas; +Cc: François Charlier, ceph-devel
On Fri, Jul 20, 2012 at 11:08 AM, Florian Haas <florian@hastexo.com> wrote:
>
> On Fri, Jul 20, 2012 at 9:33 AM, François Charlier
> <francois.charlier@enovance.com> wrote:
> > Hello,
> >
> > Reading http://ceph.com/docs/master/ops/manage/grow/placement-groups/
> > and thinking to build a ceph cluster with potentially 1000 OSDs.
> >
> > Using the recommandations on the previously cited link, it would require
> > pg_num being set between 10,000 & 30,000. Okay with that. Let's use the
> > recommended value of 16,384 ; this is alreay about 160 placement groups
> > per OSD.
> >
> > What if, for a start, we choose to reach this number of 1000 OSDs
> > slowly, starting with 100 OSDs ? It's now 1600 placement groups per OSD.
> >
> > What if we chose 30,000 (or 32,768) placement groups to keep room for
> > expansion ?
> >
> > My question is : How will behave a Ceph pool with 1000, 5000 or even
> > 10000 placement groups per OSD ? Will this impact performance ? How bad
> > ? Can it be worked around ? Is this a problem of RAM size ? CPU usage ?
> >
> > Any hint about this would be much appreciated.
>
> If I may, I'd like to add an additional point of consideration,
> specifically for radosgw setups:
>
> What's the recommended way to set the number of PGs for the half-dozen
> pools that radosgw normally creates on its own (.rgw, .rgw.users,
> .rgw.buckets and so on)? I *think* wanting to set a custom number of
> PGs would require pre-creating these pools manually, but there may be
> a way -- undocumented? -- to instruct radosgw to set a user-configured
> number of PGs on pool creation. Insight on that would be much
> appreciated.
>
At the moment there's no way to tell radosgw how many pgs should be in
the pools it creates automatically. One way to get around that is to
create these pools before running the radosgw in the first time. For
the data pools, you can modify the set of pools that will be used for
data placement by using the radosgw-admin 'pool add', 'pool rm', and
'pool list' commands. Note that buckets that have already been created
will retain their original pool.
Data in pools that were automatically created can now be copied to a
different pool (rados cppool), and pools can now be renamed (ceph osd
pool rename <oldname> <newname>). So you can create new pool with the
amount of required pgs, copy old data into it, and rename old pool and
new pool. NOTE: this should not be used for the data pool
(.rgw.buckets by default)! This can only be done for the pools that
hold the different indexes and metadata. The bucket index in the data
pool relies on internal pg state, which will be broken if pool moves
around.
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Tuning placement group
2012-07-20 16:33 Tuning placement group François Charlier
2012-07-20 18:08 ` Florian Haas
@ 2012-07-20 19:43 ` Sage Weil
1 sibling, 0 replies; 4+ messages in thread
From: Sage Weil @ 2012-07-20 19:43 UTC (permalink / raw)
To: François Charlier; +Cc: ceph-devel
On Fri, 20 Jul 2012, Fran?ois Charlier wrote:
> Hello,
>
> Reading http://ceph.com/docs/master/ops/manage/grow/placement-groups/
> and thinking to build a ceph cluster with potentially 1000 OSDs.
>
> Using the recommandations on the previously cited link, it would require
> pg_num being set between 10,000 & 30,000. Okay with that. Let's use the
> recommended value of 16,384 ; this is alreay about 160 placement groups
> per OSD.
I think you mean (16384 * 3x) / 1000 osds ~= 50 pgs per osd?
> What if, for a start, we choose to reach this number of 1000 OSDs
> slowly, starting with 100 OSDs ? It's now 1600 placement groups per OSD.
~500
> What if we chose 30,000 (or 32,768) placement groups to keep room for
> expansion ?
~1000
> My question is : How will behave a Ceph pool with 1000, 5000 or even
> 10000 placement groups per OSD ? Will this impact performance ? How bad
> ? Can it be worked around ? Is this a problem of RAM size ? CPU usage ?
>
> Any hint about this would be much appreciated.
It will work, but peering will be slower, and there will be more memory
used.
The other question is when you expect to move beyond 1000 osds. The next
project we'll be doing on the OSD is PG splitting, which will make this
problem adjustable. It won't be backported to argonaut, but it will be in
the next stable release, and will probably appear in our regular
development release in 2-3 months.
sage
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-07-20 19:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-20 16:33 Tuning placement group François Charlier
2012-07-20 18:08 ` Florian Haas
2012-07-20 19:33 ` Yehuda Sadeh
2012-07-20 19:43 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.