All of lore.kernel.org
 help / color / mirror / Atom feed
* which osds get used for ec pool reads?
@ 2015-05-14 16:27 Deneau, Tom
  2015-05-14 16:52 ` Samuel Just
  0 siblings, 1 reply; 8+ messages in thread
From: Deneau, Tom @ 2015-05-14 16:27 UTC (permalink / raw)
  To: ceph-devel

I am looking at disk activity on reads from an erasure coded pool (k=2, m=1).
I have a contrived setup where I am reading a bunch of names that are
all in the same PG.  I see disk activity only from the 2 K osds, not the M osd.

As I understand http://ceph.com/docs/master/architecture/,  in this situation
all 3 osds would be read and the first two to return would be used but that is not what I see.

In this particular contrived setup, all of the OSDs are on a single node,
would that be causing the 2-OSD read behavior that I am seeing?

-- Tom Deneau



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: which osds get used for ec pool reads?
  2015-05-14 16:27 which osds get used for ec pool reads? Deneau, Tom
@ 2015-05-14 16:52 ` Samuel Just
  2015-05-14 18:23   ` Somnath Roy
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Just @ 2015-05-14 16:52 UTC (permalink / raw)
  To: Tom Deneau; +Cc: ceph-devel

There is a branch which may merge soonish which will optionally read from all shards and use the first N.  It's not merged yet.  If the pgs are healthy, the current behavior is to read from the data shards (since you don't need to perform a decode in that case).
-Sam

----- Original Message -----
From: "Tom Deneau" <tom.deneau@amd.com>
To: "ceph-devel" <ceph-devel@vger.kernel.org>
Sent: Thursday, May 14, 2015 9:27:14 AM
Subject: which osds get used for ec pool reads?

I am looking at disk activity on reads from an erasure coded pool (k=2, m=1).
I have a contrived setup where I am reading a bunch of names that are
all in the same PG.  I see disk activity only from the 2 K osds, not the M osd.

As I understand http://ceph.com/docs/master/architecture/,  in this situation
all 3 osds would be read and the first two to return would be used but that is not what I see.

In this particular contrived setup, all of the OSDs are on a single node,
would that be causing the 2-OSD read behavior that I am seeing?

-- Tom Deneau


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: which osds get used for ec pool reads?
  2015-05-14 16:52 ` Samuel Just
@ 2015-05-14 18:23   ` Somnath Roy
  2015-05-14 18:33     ` Gregory Farnum
  0 siblings, 1 reply; 8+ messages in thread
From: Somnath Roy @ 2015-05-14 18:23 UTC (permalink / raw)
  To: Samuel Just, Tom Deneau; +Cc: ceph-devel

Sam,
It seems the current code is optimized for performance. So, what's the advantage the new changes are bringing ?
Every time we will be doing decoding then ?

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Samuel Just
Sent: Thursday, May 14, 2015 9:52 AM
To: Tom Deneau
Cc: ceph-devel
Subject: Re: which osds get used for ec pool reads?

There is a branch which may merge soonish which will optionally read from all shards and use the first N.  It's not merged yet.  If the pgs are healthy, the current behavior is to read from the data shards (since you don't need to perform a decode in that case).
-Sam

----- Original Message -----
From: "Tom Deneau" <tom.deneau@amd.com>
To: "ceph-devel" <ceph-devel@vger.kernel.org>
Sent: Thursday, May 14, 2015 9:27:14 AM
Subject: which osds get used for ec pool reads?

I am looking at disk activity on reads from an erasure coded pool (k=2, m=1).
I have a contrived setup where I am reading a bunch of names that are all in the same PG.  I see disk activity only from the 2 K osds, not the M osd.

As I understand http://ceph.com/docs/master/architecture/,  in this situation all 3 osds would be read and the first two to return would be used but that is not what I see.

In this particular contrived setup, all of the OSDs are on a single node, would that be causing the 2-OSD read behavior that I am seeing?

-- Tom Deneau


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: which osds get used for ec pool reads?
  2015-05-14 18:23   ` Somnath Roy
@ 2015-05-14 18:33     ` Gregory Farnum
  2015-05-14 19:07       ` Nathan Cutler
  2015-05-14 19:54       ` Somnath Roy
  0 siblings, 2 replies; 8+ messages in thread
From: Gregory Farnum @ 2015-05-14 18:33 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Samuel Just, Tom Deneau, ceph-devel

On Thu, May 14, 2015 at 11:23 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Sam,
> It seems the current code is optimized for performance. So, what's the advantage the new changes are bringing ?
> Every time we will be doing decoding then ?

If you read every block, you increase the amount of data accessed but
can avoid the long tail latencies. There are a bunch of research
papers about these tradeoffs and the surprising latency improvements
you can get on the aggregate read, and Yahoo! talked about this in
their blog post on their use of Ceph. :)
-Greg

>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Samuel Just
> Sent: Thursday, May 14, 2015 9:52 AM
> To: Tom Deneau
> Cc: ceph-devel
> Subject: Re: which osds get used for ec pool reads?
>
> There is a branch which may merge soonish which will optionally read from all shards and use the first N.  It's not merged yet.  If the pgs are healthy, the current behavior is to read from the data shards (since you don't need to perform a decode in that case).
> -Sam
>
> ----- Original Message -----
> From: "Tom Deneau" <tom.deneau@amd.com>
> To: "ceph-devel" <ceph-devel@vger.kernel.org>
> Sent: Thursday, May 14, 2015 9:27:14 AM
> Subject: which osds get used for ec pool reads?
>
> I am looking at disk activity on reads from an erasure coded pool (k=2, m=1).
> I have a contrived setup where I am reading a bunch of names that are all in the same PG.  I see disk activity only from the 2 K osds, not the M osd.
>
> As I understand http://ceph.com/docs/master/architecture/,  in this situation all 3 osds would be read and the first two to return would be used but that is not what I see.
>
> In this particular contrived setup, all of the OSDs are on a single node, would that be causing the 2-OSD read behavior that I am seeing?
>
> -- Tom Deneau
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: which osds get used for ec pool reads?
  2015-05-14 18:33     ` Gregory Farnum
@ 2015-05-14 19:07       ` Nathan Cutler
  2015-05-14 19:54       ` Somnath Roy
  1 sibling, 0 replies; 8+ messages in thread
From: Nathan Cutler @ 2015-05-14 19:07 UTC (permalink / raw)
  To: ceph-devel

On 05/14/2015 08:33 PM, Gregory Farnum wrote:
> If you read every block, you increase the amount of data accessed but
> can avoid the long tail latencies. There are a bunch of research
> papers about these tradeoffs and the surprising latency improvements
> you can get on the aggregate read, and Yahoo! talked about this in
> their blog post on their use of Ceph. :)

Here's the link, might save some googling:

http://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: which osds get used for ec pool reads?
  2015-05-14 18:33     ` Gregory Farnum
  2015-05-14 19:07       ` Nathan Cutler
@ 2015-05-14 19:54       ` Somnath Roy
  2015-05-14 19:56         ` Sage Weil
  1 sibling, 1 reply; 8+ messages in thread
From: Somnath Roy @ 2015-05-14 19:54 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Samuel Just, Tom Deneau, ceph-devel

Greg,
I think the Yahoo data missing an important factor what is the cpu overhead of doing that since we have to decode everytime..In flash environment cpu is an important factor as we can run of that easily..Yes, it is reducing tail latencies but probably not every application is latency sensitive..
It will be good if we can evaluate every pros/cons of this approach..IMO there should be a config option of selecting one over another..In that case, we can easily evaluate the benefits.

Thanks & Regards
Somnath

-----Original Message-----
From: Gregory Farnum [mailto:greg@gregs42.com] 
Sent: Thursday, May 14, 2015 11:34 AM
To: Somnath Roy
Cc: Samuel Just; Tom Deneau; ceph-devel
Subject: Re: which osds get used for ec pool reads?

On Thu, May 14, 2015 at 11:23 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Sam,
> It seems the current code is optimized for performance. So, what's the advantage the new changes are bringing ?
> Every time we will be doing decoding then ?

If you read every block, you increase the amount of data accessed but can avoid the long tail latencies. There are a bunch of research papers about these tradeoffs and the surprising latency improvements you can get on the aggregate read, and Yahoo! talked about this in their blog post on their use of Ceph. :) -Greg

>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Samuel Just
> Sent: Thursday, May 14, 2015 9:52 AM
> To: Tom Deneau
> Cc: ceph-devel
> Subject: Re: which osds get used for ec pool reads?
>
> There is a branch which may merge soonish which will optionally read from all shards and use the first N.  It's not merged yet.  If the pgs are healthy, the current behavior is to read from the data shards (since you don't need to perform a decode in that case).
> -Sam
>
> ----- Original Message -----
> From: "Tom Deneau" <tom.deneau@amd.com>
> To: "ceph-devel" <ceph-devel@vger.kernel.org>
> Sent: Thursday, May 14, 2015 9:27:14 AM
> Subject: which osds get used for ec pool reads?
>
> I am looking at disk activity on reads from an erasure coded pool (k=2, m=1).
> I have a contrived setup where I am reading a bunch of names that are all in the same PG.  I see disk activity only from the 2 K osds, not the M osd.
>
> As I understand http://ceph.com/docs/master/architecture/,  in this situation all 3 osds would be read and the first two to return would be used but that is not what I see.
>
> In this particular contrived setup, all of the OSDs are on a single node, would that be causing the 2-OSD read behavior that I am seeing?
>
> -- Tom Deneau
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: which osds get used for ec pool reads?
  2015-05-14 19:54       ` Somnath Roy
@ 2015-05-14 19:56         ` Sage Weil
  2015-05-14 20:20           ` Samuel Just
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2015-05-14 19:56 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Gregory Farnum, Samuel Just, Tom Deneau, ceph-devel

On Thu, 14 May 2015, Somnath Roy wrote:
> Greg,
> I think the Yahoo data missing an important factor what is the cpu 
> overhead of doing that since we have to decode everytime..In flash 
> environment cpu is an important factor as we can run of that 
> easily..Yes, it is reducing tail latencies but probably not every 
> application is latency sensitive..
> It will be good if we can evaluate every pros/cons of this approach..IMO 
> there should be a config option of selecting one over another..In that 
> case, we can easily evaluate the benefits.

Dont worry, there is a config option need to enable this and it's off by 
default.

sage


> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Gregory Farnum [mailto:greg@gregs42.com] 
> Sent: Thursday, May 14, 2015 11:34 AM
> To: Somnath Roy
> Cc: Samuel Just; Tom Deneau; ceph-devel
> Subject: Re: which osds get used for ec pool reads?
> 
> On Thu, May 14, 2015 at 11:23 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> > Sam,
> > It seems the current code is optimized for performance. So, what's the advantage the new changes are bringing ?
> > Every time we will be doing decoding then ?
> 
> If you read every block, you increase the amount of data accessed but can avoid the long tail latencies. There are a bunch of research papers about these tradeoffs and the surprising latency improvements you can get on the aggregate read, and Yahoo! talked about this in their blog post on their use of Ceph. :) -Greg
> 
> >
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org 
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Samuel Just
> > Sent: Thursday, May 14, 2015 9:52 AM
> > To: Tom Deneau
> > Cc: ceph-devel
> > Subject: Re: which osds get used for ec pool reads?
> >
> > There is a branch which may merge soonish which will optionally read from all shards and use the first N.  It's not merged yet.  If the pgs are healthy, the current behavior is to read from the data shards (since you don't need to perform a decode in that case).
> > -Sam
> >
> > ----- Original Message -----
> > From: "Tom Deneau" <tom.deneau@amd.com>
> > To: "ceph-devel" <ceph-devel@vger.kernel.org>
> > Sent: Thursday, May 14, 2015 9:27:14 AM
> > Subject: which osds get used for ec pool reads?
> >
> > I am looking at disk activity on reads from an erasure coded pool (k=2, m=1).
> > I have a contrived setup where I am reading a bunch of names that are all in the same PG.  I see disk activity only from the 2 K osds, not the M osd.
> >
> > As I understand http://ceph.com/docs/master/architecture/,  in this situation all 3 osds would be read and the first two to return would be used but that is not what I see.
> >
> > In this particular contrived setup, all of the OSDs are on a single node, would that be causing the 2-OSD read behavior that I am seeing?
> >
> > -- Tom Deneau
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> > ________________________________
> >
> > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> >
> N?????r??y??????X???v???)?{.n?????z?]z????ay?\x1d????j\a??f???h?????\x1e?w???\f???j:+v???w????????\a????zZ+???????j"????i

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: which osds get used for ec pool reads?
  2015-05-14 19:56         ` Sage Weil
@ 2015-05-14 20:20           ` Samuel Just
  0 siblings, 0 replies; 8+ messages in thread
From: Samuel Just @ 2015-05-14 20:20 UTC (permalink / raw)
  To: Sage Weil; +Cc: Somnath Roy, Gregory Farnum, Tom Deneau, ceph-devel

It should really be a pool property.
-Sam

----- Original Message -----
From: "Sage Weil" <sage@newdream.net>
To: "Somnath Roy" <Somnath.Roy@sandisk.com>
Cc: "Gregory Farnum" <greg@gregs42.com>, "Samuel Just" <sjust@redhat.com>, "Tom Deneau" <tom.deneau@amd.com>, "ceph-devel" <ceph-devel@vger.kernel.org>
Sent: Thursday, May 14, 2015 12:56:51 PM
Subject: RE: which osds get used for ec pool reads?

On Thu, 14 May 2015, Somnath Roy wrote:
> Greg,
> I think the Yahoo data missing an important factor what is the cpu 
> overhead of doing that since we have to decode everytime..In flash 
> environment cpu is an important factor as we can run of that 
> easily..Yes, it is reducing tail latencies but probably not every 
> application is latency sensitive..
> It will be good if we can evaluate every pros/cons of this approach..IMO 
> there should be a config option of selecting one over another..In that 
> case, we can easily evaluate the benefits.

Dont worry, there is a config option need to enable this and it's off by 
default.

sage


> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Gregory Farnum [mailto:greg@gregs42.com] 
> Sent: Thursday, May 14, 2015 11:34 AM
> To: Somnath Roy
> Cc: Samuel Just; Tom Deneau; ceph-devel
> Subject: Re: which osds get used for ec pool reads?
> 
> On Thu, May 14, 2015 at 11:23 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> > Sam,
> > It seems the current code is optimized for performance. So, what's the advantage the new changes are bringing ?
> > Every time we will be doing decoding then ?
> 
> If you read every block, you increase the amount of data accessed but can avoid the long tail latencies. There are a bunch of research papers about these tradeoffs and the surprising latency improvements you can get on the aggregate read, and Yahoo! talked about this in their blog post on their use of Ceph. :) -Greg
> 
> >
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org 
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Samuel Just
> > Sent: Thursday, May 14, 2015 9:52 AM
> > To: Tom Deneau
> > Cc: ceph-devel
> > Subject: Re: which osds get used for ec pool reads?
> >
> > There is a branch which may merge soonish which will optionally read from all shards and use the first N.  It's not merged yet.  If the pgs are healthy, the current behavior is to read from the data shards (since you don't need to perform a decode in that case).
> > -Sam
> >
> > ----- Original Message -----
> > From: "Tom Deneau" <tom.deneau@amd.com>
> > To: "ceph-devel" <ceph-devel@vger.kernel.org>
> > Sent: Thursday, May 14, 2015 9:27:14 AM
> > Subject: which osds get used for ec pool reads?
> >
> > I am looking at disk activity on reads from an erasure coded pool (k=2, m=1).
> > I have a contrived setup where I am reading a bunch of names that are all in the same PG.  I see disk activity only from the 2 K osds, not the M osd.
> >
> > As I understand http://ceph.com/docs/master/architecture/,  in this situation all 3 osds would be read and the first two to return would be used but that is not what I see.
> >
> > In this particular contrived setup, all of the OSDs are on a single node, would that be causing the 2-OSD read behavior that I am seeing?
> >
> > -- Tom Deneau
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> > ________________________________
> >
> > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> >
> N?????r??y??????X???v???)?{.n?????z?]z????ay?????j??f???h??????w??????j:+v???w????????????zZ+???????j"????i
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-14 20:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-14 16:27 which osds get used for ec pool reads? Deneau, Tom
2015-05-14 16:52 ` Samuel Just
2015-05-14 18:23   ` Somnath Roy
2015-05-14 18:33     ` Gregory Farnum
2015-05-14 19:07       ` Nathan Cutler
2015-05-14 19:54       ` Somnath Roy
2015-05-14 19:56         ` Sage Weil
2015-05-14 20:20           ` Samuel Just

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.