All of lore.kernel.org
 help / color / mirror / Atom feed
* Ceph tier’ing enhancements blue print for jewel
@ 2015-06-10  2:52 Shishir Gowda
  2015-06-10  6:46 ` Gregory Farnum
  2015-06-11  1:42 ` Li Wang
  0 siblings, 2 replies; 5+ messages in thread
From: Shishir Gowda @ 2015-06-10  2:52 UTC (permalink / raw)
  To: ceph-devel

Hi All,

We have uploaded the blueprint for the enhancements we are proposing for ceph tier’ing functionality for Jewel release @

http://tracker.ceph.com/projects/ceph/wiki/Tiering-enhacement

Soliciting comments/feedback for the same.

With regards,
Shishir

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ceph tier’ing enhancements blue print for jewel
  2015-06-10  2:52 Ceph tier’ing enhancements blue print for jewel Shishir Gowda
@ 2015-06-10  6:46 ` Gregory Farnum
  2015-06-11  8:14   ` Shishir Gowda
  2015-06-11  1:42 ` Li Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Gregory Farnum @ 2015-06-10  6:46 UTC (permalink / raw)
  To: Shishir Gowda; +Cc: ceph-devel

On Tue, Jun 9, 2015 at 7:52 PM, Shishir Gowda <Shishir.Gowda@sandisk.com> wrote:
> Hi All,
>
> We have uploaded the blueprint for the enhancements we are proposing for ceph tier’ing functionality for Jewel release @
>
> http://tracker.ceph.com/projects/ceph/wiki/Tiering-enhacement
>
> Soliciting comments/feedback for the same.

By and large this looks pretty sensible to me in a quick read. The
things I noticed:
1) There's a reference to the policy function getting passed in data
about how full the pool is. Note that while we expose stuff to cache
pool users in terms of pools, in the internal implementations the
flushing functions are based on how full the local PG is — that's
because we don't have any up-to-date information about the global pool
(and we really can't). I imagine just substituting PG for pool in your
description should work, but if not that's something to address.

2) Are you sure you want to expose these policies via RGW? That sounds
both excessively complicated (from the UI perspective) and liable to
abuse by users. Plus it seems a little redundant — I could imagine
people wanting the very fastest storage, but then they should just
store those objects in RGW buckets which are stored on pools with
appropriate policies (I forget what this mechanism is called, but
there's some sort of placement thing when creating buckets). Otherwise
the enhancements for things like direct-read-from-EC-shards etc seem
to cover RGW's performance needs pretty well.

3) While a bunch of the docs and possibly some of the code imply that
you have a single cache and a single base tier, I think in general you
can set up tier chains. We want to preserve that, so the $CACHE and
$BASE language used in the tier functions needs to be capable of that.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ceph tier’ing enhancements blue print for jewel
  2015-06-10  2:52 Ceph tier’ing enhancements blue print for jewel Shishir Gowda
  2015-06-10  6:46 ` Gregory Farnum
@ 2015-06-11  1:42 ` Li Wang
  1 sibling, 0 replies; 5+ messages in thread
From: Li Wang @ 2015-06-11  1:42 UTC (permalink / raw)
  To: Shishir Gowda, ceph-devel

Just had a quick look, the idea behind it seems to want to give
a flexible, very fine-grained object-level behavior control,
for example, how long an object will stay in a pool.
however, it is not very convincing that whether it worth the
effort to do this fine-grained control, the benefit may not
offset the introduce algorithm overhead

On 2015/6/10 11:11, Shishir Gowda wrote:
> Hi All,
>
> We have uploaded the blueprint for the enhancements we are proposing for ceph tier’ing functionality for Jewel release @
>
> http://tracker.ceph.com/projects/ceph/wiki/Tiering-enhacement
>
> Soliciting comments/feedback for the same.
>
> With regards,
> Shishir
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Ceph tier’ing enhancements blue print for jewel
  2015-06-10  6:46 ` Gregory Farnum
@ 2015-06-11  8:14   ` Shishir Gowda
  2015-06-17  0:56     ` Allen Samuels
  0 siblings, 1 reply; 5+ messages in thread
From: Shishir Gowda @ 2015-06-11  8:14 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

Hi Greg,

Please find responses inline.

With regards,
Shishir

> -----Original Message-----
> From: Gregory Farnum [mailto:greg@gregs42.com]
> Sent: Wednesday, June 10, 2015 12:17 PM
> To: Shishir Gowda
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Ceph tier’ing enhancements blue print for jewel
>
> On Tue, Jun 9, 2015 at 7:52 PM, Shishir Gowda
> <Shishir.Gowda@sandisk.com> wrote:
> > Hi All,
> >
> > We have uploaded the blueprint for the enhancements we are proposing
> > for ceph tier’ing functionality for Jewel release @
> >
> > http://tracker.ceph.com/projects/ceph/wiki/Tiering-enhacement
> >
> > Soliciting comments/feedback for the same.
>
> By and large this looks pretty sensible to me in a quick read. The things I
> noticed:
> 1) There's a reference to the policy function getting passed in data about
> how full the pool is. Note that while we expose stuff to cache pool users in
> terms of pools, in the internal implementations the flushing functions are
> based on how full the local PG is — that's because we don't have any up-to-
> date information about the global pool (and we really can't). I imagine just
> substituting PG for pool in your description should work, but if not that's
> something to address.
>
Sure, we will keep this in mind.

> 2) Are you sure you want to expose these policies via RGW? That sounds
> both excessively complicated (from the UI perspective) and liable to abuse
> by users. Plus it seems a little redundant — I could imagine people wanting
> the very fastest storage, but then they should just store those objects in
> RGW buckets which are stored on pools with appropriate policies (I forget
> what this mechanism is called, but there's some sort of placement thing
> when creating buckets). Otherwise the enhancements for things like direct-
> read-from-EC-shards etc seem to cover RGW's performance needs pretty
> well.
>

I agree that exposing these polices via RGW is prone to be abused, but it does provide users a fine gained control.
Additionally, what we have in mind is an all flash array as cache tier, so it helps in reducing write amplification.

> 3) While a bunch of the docs and possibly some of the code imply that you
> have a single cache and a single base tier, I think in general you can set up
> tier chains. We want to preserve that, so the $CACHE and $BASE language
> used in the tier functions needs to be capable of that.

Agreed, that is why using pools names are also mentioned, which would allow multiple tiers to exists. If the resolution/lookup of the pool fails, then we would fail the i/o, but in the future be moved to a default pool.

> -Greg

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Ceph tier’ing enhancements blue print for jewel
  2015-06-11  8:14   ` Shishir Gowda
@ 2015-06-17  0:56     ` Allen Samuels
  0 siblings, 0 replies; 5+ messages in thread
From: Allen Samuels @ 2015-06-17  0:56 UTC (permalink / raw)
  To: Shishir Gowda, Gregory Farnum; +Cc: ceph-devel

> -----Original Message-----
> From: Gregory Farnum [mailto:greg@gregs42.com]
> Sent: Wednesday, June 10, 2015 12:17 PM
> To: Shishir Gowda
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Ceph tier’ing enhancements blue print for jewel
>
> On Tue, Jun 9, 2015 at 7:52 PM, Shishir Gowda 
> <Shishir.Gowda@sandisk.com> wrote:
> > Hi All,
> >
> > We have uploaded the blueprint for the enhancements we are proposing 
> > for ceph tier’ing functionality for Jewel release @
> >
> > http://tracker.ceph.com/projects/ceph/wiki/Tiering-enhacement
> >
> > Soliciting comments/feedback for the same.
>
> By and large this looks pretty sensible to me in a quick read. The 
> things I
> noticed:
> 1) There's a reference to the policy function getting passed in data 
> about how full the pool is. Note that while we expose stuff to cache 
> pool users in terms of pools, in the internal implementations the 
> flushing functions are based on how full the local PG is — that's 
> because we don't have any up-to- date information about the global 
> pool (and we really can't). I imagine just substituting PG for pool in 
> your description should work, but if not that's something to address.
>
Sure, we will keep this in mind.

[Allen] PG level occupancy should be a good enough stand-in for pool level. I don't see this as a problem at all.

> 2) Are you sure you want to expose these policies via RGW? That sounds 
> both excessively complicated (from the UI perspective) and liable to 
> abuse by users. Plus it seems a little redundant — I could imagine 
> people wanting the very fastest storage, but then they should just 
> store those objects in RGW buckets which are stored on pools with 
> appropriate policies (I forget what this mechanism is called, but 
> there's some sort of placement thing when creating buckets). Otherwise 
> the enhancements for things like direct- read-from-EC-shards etc seem 
> to cover RGW's performance needs pretty well.
>

I agree that exposing these polices via RGW is prone to be abused, but it does provide users a fine gained control.
Additionally, what we have in mind is an all flash array as cache tier, so it helps in reducing write amplification.

[Allen] No reason not to have a toggle of some kind to suppress in-band alteration of the policies. While it's not discussed, it's assumed that out-of-band mechanisms (cli based) are also available.

> 3) While a bunch of the docs and possibly some of the code imply that 
> you have a single cache and a single base tier, I think in general you 
> can set up tier chains. We want to preserve that, so the $CACHE and 
> $BASE language used in the tier functions needs to be capable of that.

Agreed, that is why using pools names are also mentioned, which would allow multiple tiers to exists. If the resolution/lookup of the pool fails, then we would fail the i/o, but in the future be moved to a default pool.

[Allen] the "$" naming stuff was intended to allow some level of indirection, the table of $ names should be variable (per-pool). Essentially this is the same as shell substitution. No reason that the list of names is limited to 2.

> -Greg

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

N     r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay \x1dʇڙ ,j   f   h   z \x1e w       j:+v   w j m         zZ+     ݢj"  ! i

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-06-17  2:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-10  2:52 Ceph tier’ing enhancements blue print for jewel Shishir Gowda
2015-06-10  6:46 ` Gregory Farnum
2015-06-11  8:14   ` Shishir Gowda
2015-06-17  0:56     ` Allen Samuels
2015-06-11  1:42 ` Li Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.