explicitly mapping pgs in OSDMap

* explicitly mapping pgs in OSDMap
@ 2017-03-01 19:44 Sage Weil
  2017-03-01 20:49 ` Dan van der Ster
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Sage Weil @ 2017-03-01 19:44 UTC (permalink / raw)
  To: ceph-devel

There's been a longstanding desire to improve the balance of PGs and data 
across OSDs to better utilize storage and balance workload.  We had a few 
ideas about this in a meeting last week and I wrote up a summary/proposal 
here:

	http://pad.ceph.com/p/osdmap-explicit-mapping

The basic idea is to have the ability to explicitly map individual PGs 
to certain OSDs so that we can move PGs from overfull to underfull 
devices.  The idea is that the mon or mgr would do this based on some 
heuristics or policy and should result in a better distribution than teh 
current osd weight adjustments we make now with reweight-by-utilization.

The other key property is that one reason why we need as many PGs as we do 
now is to get a good balance; if we can remap some of them explicitly, we 
can get a better balance with fewer.  In essense, CRUSH gives an 
approximate distribution, and then we correct to make it perfect (or close 
to it).

The main challenge is less about figuring out when/how to remap PGs to 
correct balance, but figuring out when to remove those remappings after 
CRUSH map changes.  Some simple greedy strategies are obvious starting 
points (e.g., to move PGs off OSD X, first adjust or remove existing remap 
entries targetting OSD X before adding new ones), but there are a few 
ways we could structure the remap entries themselves so that they 
more gracefully disappear after a change.

For example, a remap entry might move a PG from OSD A to B if it maps to 
A; if the CRUSH topology changes and the PG no longer maps to A, the entry 
would be removed or ignored.  There are a few ways to do this in the pad; 
I'm sure there are other options.

I put this on the agenda for CDM tonight.  If anyone has any other ideas 
about this we'd love to hear them!

sage

^ permalink raw reply	[flat|nested] 13+ messages in thread