All of lore.kernel.org
 help / color / mirror / Atom feed
* Bluestore-Performance regression in the latest master
@ 2016-10-06 21:04 Somnath Roy
  2016-10-06 21:16 ` Sage Weil
  0 siblings, 1 reply; 8+ messages in thread
From: Somnath Roy @ 2016-10-06 21:04 UTC (permalink / raw)
  To: Sage Weil (sweil@redhat.com); +Cc: Ceph Development

Sage,
I am seeing the performance with the latest master is not stable compare to 3 days old master. The peak performance is higher but it is jumping up and down. Analyzing further I suspect recent cache changes has an impact. While the performance is high , I am seeing lower disk reads (probably because of cache hits) , but the low point is much lower when it is missing the cache probably. You were saying on the standup about adding buffers in the cache in the write path , is that merged into the master ?

The following pull request for cache that got merged seems reasonable.

https://github.com/ceph/ceph/pull/11295

Any hunch what is causing this degradation recently ? Otherwise, I need to dig down to find that out :-(

See the graph in the following link for the performance difference.

https://docs.google.com/spreadsheets/d/1JO3FaBbfT5jEdyYGB8EhHnAl5bAHe06ojLE85mPzsK4/edit?usp=sharing

The left side graph is with old master and right one is with latest master..Old one I ran for 10 hour and latest master I ran it for 2 hours.
The new one has *4X more* latency as well..

Thanks & Regards
Somnath



PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bluestore-Performance regression in the latest master
  2016-10-06 21:04 Bluestore-Performance regression in the latest master Somnath Roy
@ 2016-10-06 21:16 ` Sage Weil
  2016-10-06 21:52   ` Somnath Roy
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2016-10-06 21:16 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Ceph Development

On Thu, 6 Oct 2016, Somnath Roy wrote:
> Sage,
> I am seeing the performance with the latest master is not stable compare to 3 days old master. The peak performance is higher but it is jumping up and down. Analyzing further I suspect recent cache changes has an impact. While the performance is high , I am seeing lower disk reads (probably because of cache hits) , but the low point is much lower when it is missing the cache probably. You were saying on the standup about adding buffers in the cache in the write path , is that merged into the master ?

The buffered writes by default is not merged:

	https://github.com/ceph/ceph/pull/11301
 
> The following pull request for cache that got merged seems reasonable.
> 
> https://github.com/ceph/ceph/pull/11295

This cut the cache size down by 4x, plus whatever your sharding factor 
is/was.  Could that be it?

> Any hunch what is causing this degradation recently ? Otherwise, I need to dig down to find that out :-(
> 
> See the graph in the following link for the performance difference.
> 
> https://docs.google.com/spreadsheets/d/1JO3FaBbfT5jEdyYGB8EhHnAl5bAHe06ojLE85mPzsK4/edit?usp=sharing
> 
> The left side graph is with old master and right one is with latest master..Old one I ran for 10 hour and latest master I ran it for 2 hours.
> The new one has *4X more* latency as well..

My guess is the cache size.  I can't think of what else would have 
much of an effect on latency...

s

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Bluestore-Performance regression in the latest master
  2016-10-06 21:16 ` Sage Weil
@ 2016-10-06 21:52   ` Somnath Roy
  2016-10-06 22:12     ` Mark Nelson
  0 siblings, 1 reply; 8+ messages in thread
From: Somnath Roy @ 2016-10-06 21:52 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development

I have increased the cache size at the point where it is started swapping after 30 min run or so , but, still I am seeing the huge latest difference.
Will try to find out what went wrong recently..

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com]
Sent: Thursday, October 06, 2016 2:17 PM
To: Somnath Roy
Cc: Ceph Development
Subject: Re: Bluestore-Performance regression in the latest master

On Thu, 6 Oct 2016, Somnath Roy wrote:
> Sage,
> I am seeing the performance with the latest master is not stable compare to 3 days old master. The peak performance is higher but it is jumping up and down. Analyzing further I suspect recent cache changes has an impact. While the performance is high , I am seeing lower disk reads (probably because of cache hits) , but the low point is much lower when it is missing the cache probably. You were saying on the standup about adding buffers in the cache in the write path , is that merged into the master ?

The buffered writes by default is not merged:

https://github.com/ceph/ceph/pull/11301

> The following pull request for cache that got merged seems reasonable.
>
> https://github.com/ceph/ceph/pull/11295

This cut the cache size down by 4x, plus whatever your sharding factor is/was.  Could that be it?

> Any hunch what is causing this degradation recently ? Otherwise, I
> need to dig down to find that out :-(
>
> See the graph in the following link for the performance difference.
>
> https://docs.google.com/spreadsheets/d/1JO3FaBbfT5jEdyYGB8EhHnAl5bAHe0
> 6ojLE85mPzsK4/edit?usp=sharing
>
> The left side graph is with old master and right one is with latest master..Old one I ran for 10 hour and latest master I ran it for 2 hours.
> The new one has *4X more* latency as well..

My guess is the cache size.  I can't think of what else would have much of an effect on latency...

s
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bluestore-Performance regression in the latest master
  2016-10-06 21:52   ` Somnath Roy
@ 2016-10-06 22:12     ` Mark Nelson
  2016-10-06 23:04       ` Somnath Roy
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Nelson @ 2016-10-06 22:12 UTC (permalink / raw)
  To: Somnath Roy, Sage Weil; +Cc: Ceph Development

This is bluestore specific?  My guess is the cache sizes as well. 
Bluestore on master is actually significantly faster for me in all of 
the tests I've thrown at it vs earlier this week, but my onode hit rate 
is still "decent".

I still see higher performance by increasing the min alloc size to 16k 
and then bumping up the onode cache size a bit with the memory savings.

Mark

On 10/06/2016 04:52 PM, Somnath Roy wrote:
> I have increased the cache size at the point where it is started swapping after 30 min run or so , but, still I am seeing the huge latest difference.
> Will try to find out what went wrong recently..
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Thursday, October 06, 2016 2:17 PM
> To: Somnath Roy
> Cc: Ceph Development
> Subject: Re: Bluestore-Performance regression in the latest master
>
> On Thu, 6 Oct 2016, Somnath Roy wrote:
>> Sage,
>> I am seeing the performance with the latest master is not stable compare to 3 days old master. The peak performance is higher but it is jumping up and down. Analyzing further I suspect recent cache changes has an impact. While the performance is high , I am seeing lower disk reads (probably because of cache hits) , but the low point is much lower when it is missing the cache probably. You were saying on the standup about adding buffers in the cache in the write path , is that merged into the master ?
>
> The buffered writes by default is not merged:
>
> https://github.com/ceph/ceph/pull/11301
>
>> The following pull request for cache that got merged seems reasonable.
>>
>> https://github.com/ceph/ceph/pull/11295
>
> This cut the cache size down by 4x, plus whatever your sharding factor is/was.  Could that be it?
>
>> Any hunch what is causing this degradation recently ? Otherwise, I
>> need to dig down to find that out :-(
>>
>> See the graph in the following link for the performance difference.
>>
>> https://docs.google.com/spreadsheets/d/1JO3FaBbfT5jEdyYGB8EhHnAl5bAHe0
>> 6ojLE85mPzsK4/edit?usp=sharing
>>
>> The left side graph is with old master and right one is with latest master..Old one I ran for 10 hour and latest master I ran it for 2 hours.
>> The new one has *4X more* latency as well..
>
> My guess is the cache size.  I can't think of what else would have much of an effect on latency...
>
> s
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Bluestore-Performance regression in the latest master
  2016-10-06 22:12     ` Mark Nelson
@ 2016-10-06 23:04       ` Somnath Roy
  2016-10-07 13:17         ` Sage Weil
  0 siblings, 1 reply; 8+ messages in thread
From: Somnath Roy @ 2016-10-06 23:04 UTC (permalink / raw)
  To: Mark Nelson, Sage Weil; +Cc: Ceph Development

Mark,
If you see the graph in the below link , you will see it is choppy and faster till 1 hour , beyond that it is degrading fast..
If you see the left side graph that is holding the constant performance throughout for 10 hour (old master)
Have you checked the latency with the shorter run with the latest master as well ? Even with the shorter run it is 4X more latency (even though aggregated throughput is bit more) for me..
Anyways, I am trying to reproduce this in another setup just to eliminate HW issues in case..

Thanks & Regards
Somnath

-----Original Message-----
From: Mark Nelson [mailto:mnelson@redhat.com] 
Sent: Thursday, October 06, 2016 3:13 PM
To: Somnath Roy; Sage Weil
Cc: Ceph Development
Subject: Re: Bluestore-Performance regression in the latest master

This is bluestore specific?  My guess is the cache sizes as well. 
Bluestore on master is actually significantly faster for me in all of the tests I've thrown at it vs earlier this week, but my onode hit rate is still "decent".

I still see higher performance by increasing the min alloc size to 16k and then bumping up the onode cache size a bit with the memory savings.

Mark

On 10/06/2016 04:52 PM, Somnath Roy wrote:
> I have increased the cache size at the point where it is started swapping after 30 min run or so , but, still I am seeing the huge latest difference.
> Will try to find out what went wrong recently..
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Thursday, October 06, 2016 2:17 PM
> To: Somnath Roy
> Cc: Ceph Development
> Subject: Re: Bluestore-Performance regression in the latest master
>
> On Thu, 6 Oct 2016, Somnath Roy wrote:
>> Sage,
>> I am seeing the performance with the latest master is not stable compare to 3 days old master. The peak performance is higher but it is jumping up and down. Analyzing further I suspect recent cache changes has an impact. While the performance is high , I am seeing lower disk reads (probably because of cache hits) , but the low point is much lower when it is missing the cache probably. You were saying on the standup about adding buffers in the cache in the write path , is that merged into the master ?
>
> The buffered writes by default is not merged:
>
> https://github.com/ceph/ceph/pull/11301
>
>> The following pull request for cache that got merged seems reasonable.
>>
>> https://github.com/ceph/ceph/pull/11295
>
> This cut the cache size down by 4x, plus whatever your sharding factor is/was.  Could that be it?
>
>> Any hunch what is causing this degradation recently ? Otherwise, I 
>> need to dig down to find that out :-(
>>
>> See the graph in the following link for the performance difference.
>>
>> https://docs.google.com/spreadsheets/d/1JO3FaBbfT5jEdyYGB8EhHnAl5bAHe
>> 0
>> 6ojLE85mPzsK4/edit?usp=sharing
>>
>> The left side graph is with old master and right one is with latest master..Old one I ran for 10 hour and latest master I ran it for 2 hours.
>> The new one has *4X more* latency as well..
>
> My guess is the cache size.  I can't think of what else would have much of an effect on latency...
>
> s
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Bluestore-Performance regression in the latest master
  2016-10-06 23:04       ` Somnath Roy
@ 2016-10-07 13:17         ` Sage Weil
  2016-10-07 16:20           ` Somnath Roy
  2016-10-07 22:16           ` Somnath Roy
  0 siblings, 2 replies; 8+ messages in thread
From: Sage Weil @ 2016-10-07 13:17 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Mark Nelson, Ceph Development

On Thu, 6 Oct 2016, Somnath Roy wrote:
> Mark,
> If you see the graph in the below link , you will see it is choppy and faster till 1 hour , beyond that it is degrading fast..
> If you see the left side graph that is holding the constant performance throughout for 10 hour (old master)
> Have you checked the latency with the shorter run with the latest master as well ? Even with the shorter run it is 4X more latency (even though aggregated throughput is bit more) for me..
> Anyways, I am trying to reproduce this in another setup just to eliminate HW issues in case..

You mentioned yesterday you had tried switching the rocksdb compaction 
mode.. could that be it?

sage

> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Mark Nelson [mailto:mnelson@redhat.com] 
> Sent: Thursday, October 06, 2016 3:13 PM
> To: Somnath Roy; Sage Weil
> Cc: Ceph Development
> Subject: Re: Bluestore-Performance regression in the latest master
> 
> This is bluestore specific?  My guess is the cache sizes as well. 
> Bluestore on master is actually significantly faster for me in all of the tests I've thrown at it vs earlier this week, but my onode hit rate is still "decent".
> 
> I still see higher performance by increasing the min alloc size to 16k and then bumping up the onode cache size a bit with the memory savings.
> 
> Mark
> 
> On 10/06/2016 04:52 PM, Somnath Roy wrote:
> > I have increased the cache size at the point where it is started swapping after 30 min run or so , but, still I am seeing the huge latest difference.
> > Will try to find out what went wrong recently..
> >
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@redhat.com]
> > Sent: Thursday, October 06, 2016 2:17 PM
> > To: Somnath Roy
> > Cc: Ceph Development
> > Subject: Re: Bluestore-Performance regression in the latest master
> >
> > On Thu, 6 Oct 2016, Somnath Roy wrote:
> >> Sage,
> >> I am seeing the performance with the latest master is not stable compare to 3 days old master. The peak performance is higher but it is jumping up and down. Analyzing further I suspect recent cache changes has an impact. While the performance is high , I am seeing lower disk reads (probably because of cache hits) , but the low point is much lower when it is missing the cache probably. You were saying on the standup about adding buffers in the cache in the write path , is that merged into the master ?
> >
> > The buffered writes by default is not merged:
> >
> > https://github.com/ceph/ceph/pull/11301
> >
> >> The following pull request for cache that got merged seems reasonable.
> >>
> >> https://github.com/ceph/ceph/pull/11295
> >
> > This cut the cache size down by 4x, plus whatever your sharding factor is/was.  Could that be it?
> >
> >> Any hunch what is causing this degradation recently ? Otherwise, I 
> >> need to dig down to find that out :-(
> >>
> >> See the graph in the following link for the performance difference.
> >>
> >> https://docs.google.com/spreadsheets/d/1JO3FaBbfT5jEdyYGB8EhHnAl5bAHe
> >> 0
> >> 6ojLE85mPzsK4/edit?usp=sharing
> >>
> >> The left side graph is with old master and right one is with latest master..Old one I ran for 10 hour and latest master I ran it for 2 hours.
> >> The new one has *4X more* latency as well..
> >
> > My guess is the cache size.  I can't think of what else would have much of an effect on latency...
> >
> > s
> > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Bluestore-Performance regression in the latest master
  2016-10-07 13:17         ` Sage Weil
@ 2016-10-07 16:20           ` Somnath Roy
  2016-10-07 22:16           ` Somnath Roy
  1 sibling, 0 replies; 8+ messages in thread
From: Somnath Roy @ 2016-10-07 16:20 UTC (permalink / raw)
  To: Sage Weil; +Cc: Mark Nelson, Ceph Development

Shouldn't be, as amount of data we are writing per write is same as my last test , so, level compaction shouldn't cause this..I am trying to nail it down , will update you hopefully soon..

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com] 
Sent: Friday, October 07, 2016 6:18 AM
To: Somnath Roy
Cc: Mark Nelson; Ceph Development
Subject: RE: Bluestore-Performance regression in the latest master

On Thu, 6 Oct 2016, Somnath Roy wrote:
> Mark,
> If you see the graph in the below link , you will see it is choppy and faster till 1 hour , beyond that it is degrading fast..
> If you see the left side graph that is holding the constant 
> performance throughout for 10 hour (old master) Have you checked the latency with the shorter run with the latest master as well ? Even with the shorter run it is 4X more latency (even though aggregated throughput is bit more) for me..
> Anyways, I am trying to reproduce this in another setup just to eliminate HW issues in case..

You mentioned yesterday you had tried switching the rocksdb compaction mode.. could that be it?

sage

> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Mark Nelson [mailto:mnelson@redhat.com]
> Sent: Thursday, October 06, 2016 3:13 PM
> To: Somnath Roy; Sage Weil
> Cc: Ceph Development
> Subject: Re: Bluestore-Performance regression in the latest master
> 
> This is bluestore specific?  My guess is the cache sizes as well. 
> Bluestore on master is actually significantly faster for me in all of the tests I've thrown at it vs earlier this week, but my onode hit rate is still "decent".
> 
> I still see higher performance by increasing the min alloc size to 16k and then bumping up the onode cache size a bit with the memory savings.
> 
> Mark
> 
> On 10/06/2016 04:52 PM, Somnath Roy wrote:
> > I have increased the cache size at the point where it is started swapping after 30 min run or so , but, still I am seeing the huge latest difference.
> > Will try to find out what went wrong recently..
> >
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@redhat.com]
> > Sent: Thursday, October 06, 2016 2:17 PM
> > To: Somnath Roy
> > Cc: Ceph Development
> > Subject: Re: Bluestore-Performance regression in the latest master
> >
> > On Thu, 6 Oct 2016, Somnath Roy wrote:
> >> Sage,
> >> I am seeing the performance with the latest master is not stable compare to 3 days old master. The peak performance is higher but it is jumping up and down. Analyzing further I suspect recent cache changes has an impact. While the performance is high , I am seeing lower disk reads (probably because of cache hits) , but the low point is much lower when it is missing the cache probably. You were saying on the standup about adding buffers in the cache in the write path , is that merged into the master ?
> >
> > The buffered writes by default is not merged:
> >
> > https://github.com/ceph/ceph/pull/11301
> >
> >> The following pull request for cache that got merged seems reasonable.
> >>
> >> https://github.com/ceph/ceph/pull/11295
> >
> > This cut the cache size down by 4x, plus whatever your sharding factor is/was.  Could that be it?
> >
> >> Any hunch what is causing this degradation recently ? Otherwise, I 
> >> need to dig down to find that out :-(
> >>
> >> See the graph in the following link for the performance difference.
> >>
> >> https://docs.google.com/spreadsheets/d/1JO3FaBbfT5jEdyYGB8EhHnAl5bA
> >> He
> >> 0
> >> 6ojLE85mPzsK4/edit?usp=sharing
> >>
> >> The left side graph is with old master and right one is with latest master..Old one I ran for 10 hour and latest master I ran it for 2 hours.
> >> The new one has *4X more* latency as well..
> >
> > My guess is the cache size.  I can't think of what else would have much of an effect on latency...
> >
> > s
> > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Bluestore-Performance regression in the latest master
  2016-10-07 13:17         ` Sage Weil
  2016-10-07 16:20           ` Somnath Roy
@ 2016-10-07 22:16           ` Somnath Roy
  1 sibling, 0 replies; 8+ messages in thread
From: Somnath Roy @ 2016-10-07 22:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: Mark Nelson, Ceph Development

My bad, I modified my script to run with high QD for read benchmarking and forgot to change back to lower QD for write test. That's why the latency got 4X.
Now, I am running with lower QD (same I was doing earlier) and performance is similar to before (at least till 3 hour run now..), no regression. Sorry for the noise.
But, why higher QD is making bluestore to run bumpy and the weird pattern I showed earlier that is probably a separate issue which I will take up later..
Ideally, you should be knowing what QD your cluster is capable of handling and beyond that lot of things could go wrong performance wise :-)..

Thanks & Regards
Somnath

-----Original Message-----
From: Somnath Roy 
Sent: Friday, October 07, 2016 9:21 AM
To: 'Sage Weil'
Cc: Mark Nelson; Ceph Development
Subject: RE: Bluestore-Performance regression in the latest master

Shouldn't be, as amount of data we are writing per write is same as my last test , so, level compaction shouldn't cause this..I am trying to nail it down , will update you hopefully soon..

-----Original Message-----
From: Sage Weil [mailto:sweil@redhat.com]
Sent: Friday, October 07, 2016 6:18 AM
To: Somnath Roy
Cc: Mark Nelson; Ceph Development
Subject: RE: Bluestore-Performance regression in the latest master

On Thu, 6 Oct 2016, Somnath Roy wrote:
> Mark,
> If you see the graph in the below link , you will see it is choppy and faster till 1 hour , beyond that it is degrading fast..
> If you see the left side graph that is holding the constant 
> performance throughout for 10 hour (old master) Have you checked the latency with the shorter run with the latest master as well ? Even with the shorter run it is 4X more latency (even though aggregated throughput is bit more) for me..
> Anyways, I am trying to reproduce this in another setup just to eliminate HW issues in case..

You mentioned yesterday you had tried switching the rocksdb compaction mode.. could that be it?

sage

> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Mark Nelson [mailto:mnelson@redhat.com]
> Sent: Thursday, October 06, 2016 3:13 PM
> To: Somnath Roy; Sage Weil
> Cc: Ceph Development
> Subject: Re: Bluestore-Performance regression in the latest master
> 
> This is bluestore specific?  My guess is the cache sizes as well. 
> Bluestore on master is actually significantly faster for me in all of the tests I've thrown at it vs earlier this week, but my onode hit rate is still "decent".
> 
> I still see higher performance by increasing the min alloc size to 16k and then bumping up the onode cache size a bit with the memory savings.
> 
> Mark
> 
> On 10/06/2016 04:52 PM, Somnath Roy wrote:
> > I have increased the cache size at the point where it is started swapping after 30 min run or so , but, still I am seeing the huge latest difference.
> > Will try to find out what went wrong recently..
> >
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@redhat.com]
> > Sent: Thursday, October 06, 2016 2:17 PM
> > To: Somnath Roy
> > Cc: Ceph Development
> > Subject: Re: Bluestore-Performance regression in the latest master
> >
> > On Thu, 6 Oct 2016, Somnath Roy wrote:
> >> Sage,
> >> I am seeing the performance with the latest master is not stable compare to 3 days old master. The peak performance is higher but it is jumping up and down. Analyzing further I suspect recent cache changes has an impact. While the performance is high , I am seeing lower disk reads (probably because of cache hits) , but the low point is much lower when it is missing the cache probably. You were saying on the standup about adding buffers in the cache in the write path , is that merged into the master ?
> >
> > The buffered writes by default is not merged:
> >
> > https://github.com/ceph/ceph/pull/11301
> >
> >> The following pull request for cache that got merged seems reasonable.
> >>
> >> https://github.com/ceph/ceph/pull/11295
> >
> > This cut the cache size down by 4x, plus whatever your sharding factor is/was.  Could that be it?
> >
> >> Any hunch what is causing this degradation recently ? Otherwise, I 
> >> need to dig down to find that out :-(
> >>
> >> See the graph in the following link for the performance difference.
> >>
> >> https://docs.google.com/spreadsheets/d/1JO3FaBbfT5jEdyYGB8EhHnAl5bA
> >> He
> >> 0
> >> 6ojLE85mPzsK4/edit?usp=sharing
> >>
> >> The left side graph is with old master and right one is with latest master..Old one I ran for 10 hour and latest master I ran it for 2 hours.
> >> The new one has *4X more* latency as well..
> >
> > My guess is the cache size.  I can't think of what else would have much of an effect on latency...
> >
> > s
> > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> > in the body of a message to majordomo@vger.kernel.org More majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-10-07 22:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-06 21:04 Bluestore-Performance regression in the latest master Somnath Roy
2016-10-06 21:16 ` Sage Weil
2016-10-06 21:52   ` Somnath Roy
2016-10-06 22:12     ` Mark Nelson
2016-10-06 23:04       ` Somnath Roy
2016-10-07 13:17         ` Sage Weil
2016-10-07 16:20           ` Somnath Roy
2016-10-07 22:16           ` Somnath Roy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.