All of lore.kernel.org
 help / color / mirror / Atom feed
* [Performance] Improvement on DB Performance
@ 2014-05-21 10:15 Haomai Wang
  2014-05-21 10:21 ` Haomai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Haomai Wang @ 2014-05-21 10:15 UTC (permalink / raw)
  To: ceph-devel

Hi all,

I remember there exists discuss about DB(mysql) performance on rbd.
Recently I test mysql-bench with rbd and found awful performance. So I
dive into it and find that main cause is "flush" request from guest.
As we know, applications such as mysql, ceph has own journal for
durable and journal usually send sync&direct io. If fs barrier is on,
each sync io operation make kernel issue "sync"(barrier) request to
block device. Here, qemu will call "rbd_aio_flush" to apply.

Via systemtap, I found a amazing thing:
aio_flush                          sum: 4177085    avg: 24145  count:
173      max: 28172  min: 22747
flush_set                          sum: 4172116    avg: 24116  count:
173      max: 28034  min: 22733
flush                              sum: 3029910    avg: 4      count:
670477   max: 1893   min: 3

This statistic info is gathered in 5s. Most of consuming time is on
"ObjectCacher::flush". What's more, with time increasing, the flush
count will be increasing.

After view source, I find the root cause is "ObjectCacher::flush_set",
it will iterator the "object_set" and look for dirty buffer. And
"object_set"  contains all objects ever opened.  For example:

2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
total: 5919 flushed: 5
2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
total: 5919 flushed: 5
2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
total: 5920 flushed: 5
2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
total: 5920 flushed: 5
2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
total: 5921 flushed: 5
2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
total: 5922 flushed: 5
2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
total: 5923 flushed: 5
2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
total: 5923 flushed: 5
2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
total: 5923 flushed: 5

These logs record the iteration info, the loop will check 5920 objects
but only 5 objects are dirty.

So I think the solution is make "ObjectCacher::flush_set" only
iterator the objects which is dirty.

-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 10:15 [Performance] Improvement on DB Performance Haomai Wang
@ 2014-05-21 10:21 ` Haomai Wang
  2014-05-21 12:00   ` Luke Jing Yuan
  2014-05-21 15:23   ` Sage Weil
  0 siblings, 2 replies; 13+ messages in thread
From: Haomai Wang @ 2014-05-21 10:21 UTC (permalink / raw)
  To: ceph-devel

I pushed the commit to fix this problem(https://github.com/ceph/ceph/pull/1848).

With test program(Each sync request is issued with ten write request),
a significant improvement is noticed.

aio_flush                          sum: 914750     avg: 1239   count:
738      max: 4714   min: 1011
flush_set                          sum: 904200     avg: 1225   count:
738      max: 4698   min: 999
flush                              sum: 641648     avg: 173    count:
3690     max: 1340   min: 128

Compared to last mail, it reduce each aio_flush request to 1239 ns
instead of 24145 ns.

I hope it's the root cause for db on rbd performance.

On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
> Hi all,
>
> I remember there exists discuss about DB(mysql) performance on rbd.
> Recently I test mysql-bench with rbd and found awful performance. So I
> dive into it and find that main cause is "flush" request from guest.
> As we know, applications such as mysql, ceph has own journal for
> durable and journal usually send sync&direct io. If fs barrier is on,
> each sync io operation make kernel issue "sync"(barrier) request to
> block device. Here, qemu will call "rbd_aio_flush" to apply.
>
> Via systemtap, I found a amazing thing:
> aio_flush                          sum: 4177085    avg: 24145  count:
> 173      max: 28172  min: 22747
> flush_set                          sum: 4172116    avg: 24116  count:
> 173      max: 28034  min: 22733
> flush                              sum: 3029910    avg: 4      count:
> 670477   max: 1893   min: 3
>
> This statistic info is gathered in 5s. Most of consuming time is on
> "ObjectCacher::flush". What's more, with time increasing, the flush
> count will be increasing.
>
> After view source, I find the root cause is "ObjectCacher::flush_set",
> it will iterator the "object_set" and look for dirty buffer. And
> "object_set"  contains all objects ever opened.  For example:
>
> 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
> total: 5919 flushed: 5
> 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
> total: 5919 flushed: 5
> 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
> total: 5920 flushed: 5
> 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
> total: 5920 flushed: 5
> 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
> total: 5921 flushed: 5
> 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
> total: 5922 flushed: 5
> 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
> total: 5923 flushed: 5
> 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
> total: 5923 flushed: 5
> 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
> total: 5923 flushed: 5
>
> These logs record the iteration info, the loop will check 5920 objects
> but only 5 objects are dirty.
>
> So I think the solution is make "ObjectCacher::flush_set" only
> iterator the objects which is dirty.
>
> --
> Best Regards,
>
> Wheat



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [Performance] Improvement on DB Performance
  2014-05-21 10:21 ` Haomai Wang
@ 2014-05-21 12:00   ` Luke Jing Yuan
  2014-05-21 12:12     ` Haomai Wang
  2014-05-21 15:23   ` Sage Weil
  1 sibling, 1 reply; 13+ messages in thread
From: Luke Jing Yuan @ 2014-05-21 12:00 UTC (permalink / raw)
  To: Haomai Wang, ceph-devel

Hi,

I am just curious would this issue be also applied to other DB like postgresql? My team is currently looking at this but we  are using a VM and install postgresql 9.3 on an attached device using RBD (via libvirt). We had yet to complete our planned tests but initial observations did indicate possible performance issues though we had yet to go through all possible options within postgresql.

Regards,
Luke

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
Sent: Wednesday, 21 May, 2014 6:22 PM
To: ceph-devel@vger.kernel.org
Subject: Re: [Performance] Improvement on DB Performance

I pushed the commit to fix this problem(https://github.com/ceph/ceph/pull/1848).

With test program(Each sync request is issued with ten write request), a significant improvement is noticed.

aio_flush                          sum: 914750     avg: 1239   count:
738      max: 4714   min: 1011
flush_set                          sum: 904200     avg: 1225   count:
738      max: 4698   min: 999
flush                              sum: 641648     avg: 173    count:
3690     max: 1340   min: 128

Compared to last mail, it reduce each aio_flush request to 1239 ns instead of 24145 ns.

I hope it's the root cause for db on rbd performance.

On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
> Hi all,
>
> I remember there exists discuss about DB(mysql) performance on rbd.
> Recently I test mysql-bench with rbd and found awful performance. So I
> dive into it and find that main cause is "flush" request from guest.
> As we know, applications such as mysql, ceph has own journal for
> durable and journal usually send sync&direct io. If fs barrier is on,
> each sync io operation make kernel issue "sync"(barrier) request to
> block device. Here, qemu will call "rbd_aio_flush" to apply.
>
> Via systemtap, I found a amazing thing:
> aio_flush                          sum: 4177085    avg: 24145  count:
> 173      max: 28172  min: 22747
> flush_set                          sum: 4172116    avg: 24116  count:
> 173      max: 28034  min: 22733
> flush                              sum: 3029910    avg: 4      count:
> 670477   max: 1893   min: 3
>
> This statistic info is gathered in 5s. Most of consuming time is on
> "ObjectCacher::flush". What's more, with time increasing, the flush
> count will be increasing.
>
> After view source, I find the root cause is "ObjectCacher::flush_set",
> it will iterator the "object_set" and look for dirty buffer. And
> "object_set"  contains all objects ever opened.  For example:
>
> 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
> total: 5919 flushed: 5
> 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
> total: 5919 flushed: 5
> 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
> total: 5920 flushed: 5
> 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
> total: 5920 flushed: 5
> 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
> total: 5921 flushed: 5
> 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
> total: 5922 flushed: 5
> 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
> total: 5923 flushed: 5
> 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
> total: 5923 flushed: 5
> 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
> total: 5923 flushed: 5
>
> These logs record the iteration info, the loop will check 5920 objects
> but only 5 objects are dirty.
>
> So I think the solution is make "ObjectCacher::flush_set" only
> iterator the objects which is dirty.
>
> --
> Best Regards,
>
> Wheat



--
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________
DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be confidential, especially as regards personal data. If you are not the intended recipient, please note that any dealing, review, distribution, printing, copying or use of this e-mail is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions and other information in this e-mail that do not relate to the official business of MIMOS Berhad and/or its subsidiaries shall be understood as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts responsibility for the same. All liability arising from or in connection with computer viruses and/or corrupted e-mails is excluded to the fullest extent permitted by law.
------------------------------------------------------------------
-
-
DISCLAIMER: 

This e-mail (including any attachments) is for the addressee(s) 
only and may contain confidential information. If you are not the 
intended recipient, please note that any dealing, review, 
distribution, printing, copying or use of this e-mail is strictly 
prohibited. If you have received this email in error, please notify 
the sender  immediately and delete the original message. 
MIMOS Berhad is a research and development institution under 
the purview of the Malaysian Ministry of Science, Technology and 
Innovation. Opinions, conclusions and other information in this e-
mail that do not relate to the official business of MIMOS Berhad 
and/or its subsidiaries shall be understood as neither given nor 
endorsed by MIMOS Berhad and/or its subsidiaries and neither 
MIMOS Berhad nor its subsidiaries accepts responsibility for the 
same. All liability arising from or in connection with computer 
viruses and/or corrupted e-mails is excluded to the fullest extent 
permitted by law.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 12:00   ` Luke Jing Yuan
@ 2014-05-21 12:12     ` Haomai Wang
  2014-05-21 15:06       ` Mark Nelson
  0 siblings, 1 reply; 13+ messages in thread
From: Haomai Wang @ 2014-05-21 12:12 UTC (permalink / raw)
  To: Luke Jing Yuan; +Cc: ceph-devel

If rbd cache enabled, I think it should be affected.

On Wed, May 21, 2014 at 8:00 PM, Luke Jing Yuan <jyluke@mimos.my> wrote:
> Hi,
>
> I am just curious would this issue be also applied to other DB like postgresql? My team is currently looking at this but we  are using a VM and install postgresql 9.3 on an attached device using RBD (via libvirt). We had yet to complete our planned tests but initial observations did indicate possible performance issues though we had yet to go through all possible options within postgresql.
>
> Regards,
> Luke
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
> Sent: Wednesday, 21 May, 2014 6:22 PM
> To: ceph-devel@vger.kernel.org
> Subject: Re: [Performance] Improvement on DB Performance
>
> I pushed the commit to fix this problem(https://github.com/ceph/ceph/pull/1848).
>
> With test program(Each sync request is issued with ten write request), a significant improvement is noticed.
>
> aio_flush                          sum: 914750     avg: 1239   count:
> 738      max: 4714   min: 1011
> flush_set                          sum: 904200     avg: 1225   count:
> 738      max: 4698   min: 999
> flush                              sum: 641648     avg: 173    count:
> 3690     max: 1340   min: 128
>
> Compared to last mail, it reduce each aio_flush request to 1239 ns instead of 24145 ns.
>
> I hope it's the root cause for db on rbd performance.
>
> On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
>> Hi all,
>>
>> I remember there exists discuss about DB(mysql) performance on rbd.
>> Recently I test mysql-bench with rbd and found awful performance. So I
>> dive into it and find that main cause is "flush" request from guest.
>> As we know, applications such as mysql, ceph has own journal for
>> durable and journal usually send sync&direct io. If fs barrier is on,
>> each sync io operation make kernel issue "sync"(barrier) request to
>> block device. Here, qemu will call "rbd_aio_flush" to apply.
>>
>> Via systemtap, I found a amazing thing:
>> aio_flush                          sum: 4177085    avg: 24145  count:
>> 173      max: 28172  min: 22747
>> flush_set                          sum: 4172116    avg: 24116  count:
>> 173      max: 28034  min: 22733
>> flush                              sum: 3029910    avg: 4      count:
>> 670477   max: 1893   min: 3
>>
>> This statistic info is gathered in 5s. Most of consuming time is on
>> "ObjectCacher::flush". What's more, with time increasing, the flush
>> count will be increasing.
>>
>> After view source, I find the root cause is "ObjectCacher::flush_set",
>> it will iterator the "object_set" and look for dirty buffer. And
>> "object_set"  contains all objects ever opened.  For example:
>>
>> 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
>> total: 5919 flushed: 5
>> 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
>> total: 5919 flushed: 5
>> 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
>> total: 5920 flushed: 5
>> 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
>> total: 5920 flushed: 5
>> 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
>> total: 5921 flushed: 5
>> 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
>> total: 5922 flushed: 5
>> 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
>> total: 5923 flushed: 5
>> 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
>> total: 5923 flushed: 5
>> 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
>> total: 5923 flushed: 5
>>
>> These logs record the iteration info, the loop will check 5920 objects
>> but only 5 objects are dirty.
>>
>> So I think the solution is make "ObjectCacher::flush_set" only
>> iterator the objects which is dirty.
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> Best Regards,
>
> Wheat
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
> DISCLAIMER:
>
>
> This e-mail (including any attachments) is for the addressee(s) only and may be confidential, especially as regards personal data. If you are not the intended recipient, please note that any dealing, review, distribution, printing, copying or use of this e-mail is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original message (including any attachments).
>
>
> MIMOS Berhad is a research and development institution under the purview of the Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions and other information in this e-mail that do not relate to the official business of MIMOS Berhad and/or its subsidiaries shall be understood as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts responsibility for the same. All liability arising from or in connection with computer viruses and/or corrupted e-mails is excluded to the fullest extent permitted by law.
> ------------------------------------------------------------------
> -
> -
> DISCLAIMER:
>
> This e-mail (including any attachments) is for the addressee(s)
> only and may contain confidential information. If you are not the
> intended recipient, please note that any dealing, review,
> distribution, printing, copying or use of this e-mail is strictly
> prohibited. If you have received this email in error, please notify
> the sender  immediately and delete the original message.
> MIMOS Berhad is a research and development institution under
> the purview of the Malaysian Ministry of Science, Technology and
> Innovation. Opinions, conclusions and other information in this e-
> mail that do not relate to the official business of MIMOS Berhad
> and/or its subsidiaries shall be understood as neither given nor
> endorsed by MIMOS Berhad and/or its subsidiaries and neither
> MIMOS Berhad nor its subsidiaries accepts responsibility for the
> same. All liability arising from or in connection with computer
> viruses and/or corrupted e-mails is excluded to the fullest extent
> permitted by law.
>
>



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 12:12     ` Haomai Wang
@ 2014-05-21 15:06       ` Mark Nelson
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Nelson @ 2014-05-21 15:06 UTC (permalink / raw)
  To: Haomai Wang, Luke Jing Yuan; +Cc: ceph-devel

Hi Guys,

FWIW, the test suite I ran through was DBT3 on mariadb using a KVM 
virtual machine, rbd cache, and Ceph dumpling.  What I saw was that in 
some tests performance was reasonably good given the replication level, 
but in other cases it was slow (at least relative to a locally attached 
disk).  I haven't really dug into the queries enough in DBT3 to tell 
which ones were doing exactly what.  I believe there has been some work 
on the objectcacher in the last couple of releases, so it's possible 
that you may be seeing effects that I didn't see as well.

Mark

On 05/21/2014 07:12 AM, Haomai Wang wrote:
> If rbd cache enabled, I think it should be affected.
>
> On Wed, May 21, 2014 at 8:00 PM, Luke Jing Yuan <jyluke@mimos.my> wrote:
>> Hi,
>>
>> I am just curious would this issue be also applied to other DB like postgresql? My team is currently looking at this but we  are using a VM and install postgresql 9.3 on an attached device using RBD (via libvirt). We had yet to complete our planned tests but initial observations did indicate possible performance issues though we had yet to go through all possible options within postgresql.
>>
>> Regards,
>> Luke
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang
>> Sent: Wednesday, 21 May, 2014 6:22 PM
>> To: ceph-devel@vger.kernel.org
>> Subject: Re: [Performance] Improvement on DB Performance
>>
>> I pushed the commit to fix this problem(https://github.com/ceph/ceph/pull/1848).
>>
>> With test program(Each sync request is issued with ten write request), a significant improvement is noticed.
>>
>> aio_flush                          sum: 914750     avg: 1239   count:
>> 738      max: 4714   min: 1011
>> flush_set                          sum: 904200     avg: 1225   count:
>> 738      max: 4698   min: 999
>> flush                              sum: 641648     avg: 173    count:
>> 3690     max: 1340   min: 128
>>
>> Compared to last mail, it reduce each aio_flush request to 1239 ns instead of 24145 ns.
>>
>> I hope it's the root cause for db on rbd performance.
>>
>> On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
>>> Hi all,
>>>
>>> I remember there exists discuss about DB(mysql) performance on rbd.
>>> Recently I test mysql-bench with rbd and found awful performance. So I
>>> dive into it and find that main cause is "flush" request from guest.
>>> As we know, applications such as mysql, ceph has own journal for
>>> durable and journal usually send sync&direct io. If fs barrier is on,
>>> each sync io operation make kernel issue "sync"(barrier) request to
>>> block device. Here, qemu will call "rbd_aio_flush" to apply.
>>>
>>> Via systemtap, I found a amazing thing:
>>> aio_flush                          sum: 4177085    avg: 24145  count:
>>> 173      max: 28172  min: 22747
>>> flush_set                          sum: 4172116    avg: 24116  count:
>>> 173      max: 28034  min: 22733
>>> flush                              sum: 3029910    avg: 4      count:
>>> 670477   max: 1893   min: 3
>>>
>>> This statistic info is gathered in 5s. Most of consuming time is on
>>> "ObjectCacher::flush". What's more, with time increasing, the flush
>>> count will be increasing.
>>>
>>> After view source, I find the root cause is "ObjectCacher::flush_set",
>>> it will iterator the "object_set" and look for dirty buffer. And
>>> "object_set"  contains all objects ever opened.  For example:
>>>
>>> 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
>>> total: 5919 flushed: 5
>>> 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
>>> total: 5919 flushed: 5
>>> 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
>>> total: 5920 flushed: 5
>>> 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
>>> total: 5920 flushed: 5
>>> 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
>>> total: 5921 flushed: 5
>>> 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
>>> total: 5922 flushed: 5
>>> 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
>>> total: 5923 flushed: 5
>>> 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
>>> total: 5923 flushed: 5
>>> 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
>>> total: 5923 flushed: 5
>>>
>>> These logs record the iteration info, the loop will check 5920 objects
>>> but only 5 objects are dirty.
>>>
>>> So I think the solution is make "ObjectCacher::flush_set" only
>>> iterator the objects which is dirty.
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> ________________________________
>> DISCLAIMER:
>>
>>
>> This e-mail (including any attachments) is for the addressee(s) only and may be confidential, especially as regards personal data. If you are not the intended recipient, please note that any dealing, review, distribution, printing, copying or use of this e-mail is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original message (including any attachments).
>>
>>
>> MIMOS Berhad is a research and development institution under the purview of the Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions and other information in this e-mail that do not relate to the official business of MIMOS Berhad and/or its subsidiaries shall be understood as neither given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS Berhad nor its subsidiaries accepts responsibility for the same. All liability arising from or in connection with computer viruses and/or corrupted e-mails is excluded to the fullest extent permitted by law.
>> ------------------------------------------------------------------
>> -
>> -
>> DISCLAIMER:
>>
>> This e-mail (including any attachments) is for the addressee(s)
>> only and may contain confidential information. If you are not the
>> intended recipient, please note that any dealing, review,
>> distribution, printing, copying or use of this e-mail is strictly
>> prohibited. If you have received this email in error, please notify
>> the sender  immediately and delete the original message.
>> MIMOS Berhad is a research and development institution under
>> the purview of the Malaysian Ministry of Science, Technology and
>> Innovation. Opinions, conclusions and other information in this e-
>> mail that do not relate to the official business of MIMOS Berhad
>> and/or its subsidiaries shall be understood as neither given nor
>> endorsed by MIMOS Berhad and/or its subsidiaries and neither
>> MIMOS Berhad nor its subsidiaries accepts responsibility for the
>> same. All liability arising from or in connection with computer
>> viruses and/or corrupted e-mails is excluded to the fullest extent
>> permitted by law.
>>
>>
>
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 10:21 ` Haomai Wang
  2014-05-21 12:00   ` Luke Jing Yuan
@ 2014-05-21 15:23   ` Sage Weil
  2014-05-21 15:50     ` Mike Dawson
  2014-05-26 13:57     ` Haomai Wang
  1 sibling, 2 replies; 13+ messages in thread
From: Sage Weil @ 2014-05-21 15:23 UTC (permalink / raw)
  To: Haomai Wang; +Cc: ceph-devel

On Wed, 21 May 2014, Haomai Wang wrote:
> I pushed the commit to fix this problem(https://github.com/ceph/ceph/pull/1848).
> 
> With test program(Each sync request is issued with ten write request),
> a significant improvement is noticed.
> 
> aio_flush                          sum: 914750     avg: 1239   count:
> 738      max: 4714   min: 1011
> flush_set                          sum: 904200     avg: 1225   count:
> 738      max: 4698   min: 999
> flush                              sum: 641648     avg: 173    count:
> 3690     max: 1340   min: 128
> 
> Compared to last mail, it reduce each aio_flush request to 1239 ns
> instead of 24145 ns.

Good catch!  That's a great improvement.

The patch looks clearly correct.  We can probably do even better by 
putting the Objects on a list when they get the first dirty buffer so that 
we only cycle through the dirty ones.  Or, have a global list of dirty 
buffers (instead of dirty objects -> dirty buffers).

sage

> 
> I hope it's the root cause for db on rbd performance.
> 
> On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
> > Hi all,
> >
> > I remember there exists discuss about DB(mysql) performance on rbd.
> > Recently I test mysql-bench with rbd and found awful performance. So I
> > dive into it and find that main cause is "flush" request from guest.
> > As we know, applications such as mysql, ceph has own journal for
> > durable and journal usually send sync&direct io. If fs barrier is on,
> > each sync io operation make kernel issue "sync"(barrier) request to
> > block device. Here, qemu will call "rbd_aio_flush" to apply.
> >
> > Via systemtap, I found a amazing thing:
> > aio_flush                          sum: 4177085    avg: 24145  count:
> > 173      max: 28172  min: 22747
> > flush_set                          sum: 4172116    avg: 24116  count:
> > 173      max: 28034  min: 22733
> > flush                              sum: 3029910    avg: 4      count:
> > 670477   max: 1893   min: 3
> >
> > This statistic info is gathered in 5s. Most of consuming time is on
> > "ObjectCacher::flush". What's more, with time increasing, the flush
> > count will be increasing.
> >
> > After view source, I find the root cause is "ObjectCacher::flush_set",
> > it will iterator the "object_set" and look for dirty buffer. And
> > "object_set"  contains all objects ever opened.  For example:
> >
> > 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
> > total: 5919 flushed: 5
> > 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
> > total: 5919 flushed: 5
> > 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
> > total: 5920 flushed: 5
> > 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
> > total: 5920 flushed: 5
> > 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
> > total: 5921 flushed: 5
> > 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
> > total: 5922 flushed: 5
> > 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
> > total: 5923 flushed: 5
> > 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
> > total: 5923 flushed: 5
> > 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
> > total: 5923 flushed: 5
> >
> > These logs record the iteration info, the loop will check 5920 objects
> > but only 5 objects are dirty.
> >
> > So I think the solution is make "ObjectCacher::flush_set" only
> > iterator the objects which is dirty.
> >
> > --
> > Best Regards,
> >
> > Wheat
> 
> 
> 
> -- 
> Best Regards,
> 
> Wheat
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 15:23   ` Sage Weil
@ 2014-05-21 15:50     ` Mike Dawson
  2014-05-21 15:53       ` Mark Nelson
  2014-05-21 16:15       ` Sage Weil
  2014-05-26 13:57     ` Haomai Wang
  1 sibling, 2 replies; 13+ messages in thread
From: Mike Dawson @ 2014-05-21 15:50 UTC (permalink / raw)
  To: Sage Weil, Haomai Wang; +Cc: ceph-devel

Haomai,

Thanks for finding this!


Sage,

We have a client that runs an io intensive, closed-source software 
package that seems to issue overzealous flushes which may benefit from 
this patch (or the other methods you mention). If you were to spin a wip 
build based on Dumpling, I'll be a willing tester.

Thanks,
Mike Dawson

On 5/21/2014 11:23 AM, Sage Weil wrote:
> On Wed, 21 May 2014, Haomai Wang wrote:
>> I pushed the commit to fix this problem(https://github.com/ceph/ceph/pull/1848).
>>
>> With test program(Each sync request is issued with ten write request),
>> a significant improvement is noticed.
>>
>> aio_flush                          sum: 914750     avg: 1239   count:
>> 738      max: 4714   min: 1011
>> flush_set                          sum: 904200     avg: 1225   count:
>> 738      max: 4698   min: 999
>> flush                              sum: 641648     avg: 173    count:
>> 3690     max: 1340   min: 128
>>
>> Compared to last mail, it reduce each aio_flush request to 1239 ns
>> instead of 24145 ns.
>
> Good catch!  That's a great improvement.
>
> The patch looks clearly correct.  We can probably do even better by
> putting the Objects on a list when they get the first dirty buffer so that
> we only cycle through the dirty ones.  Or, have a global list of dirty
> buffers (instead of dirty objects -> dirty buffers).
>
> sage
>
>>
>> I hope it's the root cause for db on rbd performance.
>>
>> On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
>>> Hi all,
>>>
>>> I remember there exists discuss about DB(mysql) performance on rbd.
>>> Recently I test mysql-bench with rbd and found awful performance. So I
>>> dive into it and find that main cause is "flush" request from guest.
>>> As we know, applications such as mysql, ceph has own journal for
>>> durable and journal usually send sync&direct io. If fs barrier is on,
>>> each sync io operation make kernel issue "sync"(barrier) request to
>>> block device. Here, qemu will call "rbd_aio_flush" to apply.
>>>
>>> Via systemtap, I found a amazing thing:
>>> aio_flush                          sum: 4177085    avg: 24145  count:
>>> 173      max: 28172  min: 22747
>>> flush_set                          sum: 4172116    avg: 24116  count:
>>> 173      max: 28034  min: 22733
>>> flush                              sum: 3029910    avg: 4      count:
>>> 670477   max: 1893   min: 3
>>>
>>> This statistic info is gathered in 5s. Most of consuming time is on
>>> "ObjectCacher::flush". What's more, with time increasing, the flush
>>> count will be increasing.
>>>
>>> After view source, I find the root cause is "ObjectCacher::flush_set",
>>> it will iterator the "object_set" and look for dirty buffer. And
>>> "object_set"  contains all objects ever opened.  For example:
>>>
>>> 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
>>> total: 5919 flushed: 5
>>> 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
>>> total: 5919 flushed: 5
>>> 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
>>> total: 5920 flushed: 5
>>> 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
>>> total: 5920 flushed: 5
>>> 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
>>> total: 5921 flushed: 5
>>> 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
>>> total: 5922 flushed: 5
>>> 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
>>> total: 5923 flushed: 5
>>> 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
>>> total: 5923 flushed: 5
>>> 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
>>> total: 5923 flushed: 5
>>>
>>> These logs record the iteration info, the loop will check 5920 objects
>>> but only 5 objects are dirty.
>>>
>>> So I think the solution is make "ObjectCacher::flush_set" only
>>> iterator the objects which is dirty.
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 15:50     ` Mike Dawson
@ 2014-05-21 15:53       ` Mark Nelson
  2014-05-21 16:15       ` Sage Weil
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Nelson @ 2014-05-21 15:53 UTC (permalink / raw)
  To: Mike Dawson, Sage Weil, Haomai Wang; +Cc: ceph-devel

On 05/21/2014 10:50 AM, Mike Dawson wrote:
> Haomai,
>
> Thanks for finding this!

Yes agreed, this looks very exciting. :D

>
>
> Sage,
>
> We have a client that runs an io intensive, closed-source software
> package that seems to issue overzealous flushes which may benefit from
> this patch (or the other methods you mention). If you were to spin a wip
> build based on Dumpling, I'll be a willing tester.

I'd be happy to jump on the bandwagon too.  I'm in the middle of RBD 
testing using fio with the librbd engine.

>
> Thanks,
> Mike Dawson
>
> On 5/21/2014 11:23 AM, Sage Weil wrote:
>> On Wed, 21 May 2014, Haomai Wang wrote:
>>> I pushed the commit to fix this
>>> problem(https://github.com/ceph/ceph/pull/1848).
>>>
>>> With test program(Each sync request is issued with ten write request),
>>> a significant improvement is noticed.
>>>
>>> aio_flush                          sum: 914750     avg: 1239   count:
>>> 738      max: 4714   min: 1011
>>> flush_set                          sum: 904200     avg: 1225   count:
>>> 738      max: 4698   min: 999
>>> flush                              sum: 641648     avg: 173    count:
>>> 3690     max: 1340   min: 128
>>>
>>> Compared to last mail, it reduce each aio_flush request to 1239 ns
>>> instead of 24145 ns.
>>
>> Good catch!  That's a great improvement.
>>
>> The patch looks clearly correct.  We can probably do even better by
>> putting the Objects on a list when they get the first dirty buffer so
>> that
>> we only cycle through the dirty ones.  Or, have a global list of dirty
>> buffers (instead of dirty objects -> dirty buffers).
>>
>> sage
>>
>>>
>>> I hope it's the root cause for db on rbd performance.
>>>
>>> On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com>
>>> wrote:
>>>> Hi all,
>>>>
>>>> I remember there exists discuss about DB(mysql) performance on rbd.
>>>> Recently I test mysql-bench with rbd and found awful performance. So I
>>>> dive into it and find that main cause is "flush" request from guest.
>>>> As we know, applications such as mysql, ceph has own journal for
>>>> durable and journal usually send sync&direct io. If fs barrier is on,
>>>> each sync io operation make kernel issue "sync"(barrier) request to
>>>> block device. Here, qemu will call "rbd_aio_flush" to apply.
>>>>
>>>> Via systemtap, I found a amazing thing:
>>>> aio_flush                          sum: 4177085    avg: 24145  count:
>>>> 173      max: 28172  min: 22747
>>>> flush_set                          sum: 4172116    avg: 24116  count:
>>>> 173      max: 28034  min: 22733
>>>> flush                              sum: 3029910    avg: 4      count:
>>>> 670477   max: 1893   min: 3
>>>>
>>>> This statistic info is gathered in 5s. Most of consuming time is on
>>>> "ObjectCacher::flush". What's more, with time increasing, the flush
>>>> count will be increasing.
>>>>
>>>> After view source, I find the root cause is "ObjectCacher::flush_set",
>>>> it will iterator the "object_set" and look for dirty buffer. And
>>>> "object_set"  contains all objects ever opened.  For example:
>>>>
>>>> 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5919 flushed: 5
>>>> 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5919 flushed: 5
>>>> 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5920 flushed: 5
>>>> 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5920 flushed: 5
>>>> 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5921 flushed: 5
>>>> 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5922 flushed: 5
>>>> 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5923 flushed: 5
>>>> 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5923 flushed: 5
>>>> 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
>>>> total: 5923 flushed: 5
>>>>
>>>> These logs record the iteration info, the loop will check 5920 objects
>>>> but only 5 objects are dirty.
>>>>
>>>> So I think the solution is make "ObjectCacher::flush_set" only
>>>> iterator the objects which is dirty.
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Wheat
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 15:50     ` Mike Dawson
  2014-05-21 15:53       ` Mark Nelson
@ 2014-05-21 16:15       ` Sage Weil
       [not found]         ` <77004F70-7FE7-4EBE-A34D-46A8DC290936@profihost.ag>
  1 sibling, 1 reply; 13+ messages in thread
From: Sage Weil @ 2014-05-21 16:15 UTC (permalink / raw)
  To: Mike Dawson; +Cc: Haomai Wang, ceph-devel

On Wed, 21 May 2014, Mike Dawson wrote:
> Haomai,
> 
> Thanks for finding this!
> 
> 
> Sage,
> 
> We have a client that runs an io intensive, closed-source software package
> that seems to issue overzealous flushes which may benefit from this patch (or
> the other methods you mention). If you were to spin a wip build based on
> Dumpling, I'll be a willing tester.

Pushed wip-librbd-flush-dumpling, should be built shortly.

sage

> 
> Thanks,
> Mike Dawson
> 
> On 5/21/2014 11:23 AM, Sage Weil wrote:
> > On Wed, 21 May 2014, Haomai Wang wrote:
> > > I pushed the commit to fix this
> > > problem(https://github.com/ceph/ceph/pull/1848).
> > > 
> > > With test program(Each sync request is issued with ten write request),
> > > a significant improvement is noticed.
> > > 
> > > aio_flush                          sum: 914750     avg: 1239   count:
> > > 738      max: 4714   min: 1011
> > > flush_set                          sum: 904200     avg: 1225   count:
> > > 738      max: 4698   min: 999
> > > flush                              sum: 641648     avg: 173    count:
> > > 3690     max: 1340   min: 128
> > > 
> > > Compared to last mail, it reduce each aio_flush request to 1239 ns
> > > instead of 24145 ns.
> > 
> > Good catch!  That's a great improvement.
> > 
> > The patch looks clearly correct.  We can probably do even better by
> > putting the Objects on a list when they get the first dirty buffer so that
> > we only cycle through the dirty ones.  Or, have a global list of dirty
> > buffers (instead of dirty objects -> dirty buffers).
> > 
> > sage
> > 
> > > 
> > > I hope it's the root cause for db on rbd performance.
> > > 
> > > On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
> > > > Hi all,
> > > > 
> > > > I remember there exists discuss about DB(mysql) performance on rbd.
> > > > Recently I test mysql-bench with rbd and found awful performance. So I
> > > > dive into it and find that main cause is "flush" request from guest.
> > > > As we know, applications such as mysql, ceph has own journal for
> > > > durable and journal usually send sync&direct io. If fs barrier is on,
> > > > each sync io operation make kernel issue "sync"(barrier) request to
> > > > block device. Here, qemu will call "rbd_aio_flush" to apply.
> > > > 
> > > > Via systemtap, I found a amazing thing:
> > > > aio_flush                          sum: 4177085    avg: 24145  count:
> > > > 173      max: 28172  min: 22747
> > > > flush_set                          sum: 4172116    avg: 24116  count:
> > > > 173      max: 28034  min: 22733
> > > > flush                              sum: 3029910    avg: 4      count:
> > > > 670477   max: 1893   min: 3
> > > > 
> > > > This statistic info is gathered in 5s. Most of consuming time is on
> > > > "ObjectCacher::flush". What's more, with time increasing, the flush
> > > > count will be increasing.
> > > > 
> > > > After view source, I find the root cause is "ObjectCacher::flush_set",
> > > > it will iterator the "object_set" and look for dirty buffer. And
> > > > "object_set"  contains all objects ever opened.  For example:
> > > > 
> > > > 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5919 flushed: 5
> > > > 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5919 flushed: 5
> > > > 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5920 flushed: 5
> > > > 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5920 flushed: 5
> > > > 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5921 flushed: 5
> > > > 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5922 flushed: 5
> > > > 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5923 flushed: 5
> > > > 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5923 flushed: 5
> > > > 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
> > > > total: 5923 flushed: 5
> > > > 
> > > > These logs record the iteration info, the loop will check 5920 objects
> > > > but only 5 objects are dirty.
> > > > 
> > > > So I think the solution is make "ObjectCacher::flush_set" only
> > > > iterator the objects which is dirty.
> > > > 
> > > > --
> > > > Best Regards,
> > > > 
> > > > Wheat
> > > 
> > > 
> > > 
> > > --
> > > Best Regards,
> > > 
> > > Wheat
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
       [not found]         ` <77004F70-7FE7-4EBE-A34D-46A8DC290936@profihost.ag>
@ 2014-05-21 18:41           ` Sage Weil
  2014-05-21 18:51             ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2014-05-21 18:41 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Mike Dawson, Haomai Wang, ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 10192 bytes --]

On Wed, 21 May 2014, Stefan Priebe - Profihost AG wrote:
> Hi sage,
> 
> what about cuttlefish customers?

We stopped backporting fixes to cuttlefish a while ago.  Please upgrade to 
dumpling!

That said, this patch should apply cleanly to cuttlefish.

sage


> 
> Greets,
> Stefan
> Excuse my typo sent from my mobile phone.
> 
> Am 21.05.2014 um 18:15 schrieb Sage Weil <sage@inktank.com>:
> 
>       On Wed, 21 May 2014, Mike Dawson wrote:
>             Haomai,
> 
> 
>             Thanks for finding this!
> 
> 
> 
>             Sage,
> 
> 
>             We have a client that runs an io intensive, closed-source software
>             package
> 
>             that seems to issue overzealous flushes which may benefit from this
>             patch (or
> 
>             the other methods you mention). If you were to spin a wip build based
>             on
> 
>             Dumpling, I'll be a willing tester.
> 
> 
>       Pushed wip-librbd-flush-dumpling, should be built shortly.
> 
>       sage
> 
> 
>             Thanks,
> 
>             Mike Dawson
> 
> 
>             On 5/21/2014 11:23 AM, Sage Weil wrote:
> 
>                   On Wed, 21 May 2014, Haomai Wang wrote:
> 
>                         I pushed the commit to fix this
> 
>                         problem(https://github.com/ceph/ceph/pull/1848).
> 
> 
>                         With test program(Each sync request is issued
>                         with ten write request),
> 
>                         a significant improvement is noticed.
> 
> 
>                         aio_flush                          sum: 914750
>                             avg: 1239   count:
> 
>                         738      max: 4714   min: 1011
> 
>                         flush_set                          sum: 904200
>                             avg: 1225   count:
> 
>                         738      max: 4698   min: 999
> 
>                         flush                              sum: 641648
>                             avg: 173    count:
> 
>                         3690     max: 1340   min: 128
> 
> 
>                         Compared to last mail, it reduce each aio_flush
>                         request to 1239 ns
> 
>                         instead of 24145 ns.
> 
> 
>                   Good catch!  That's a great improvement.
> 
> 
>                   The patch looks clearly correct.  We can probably do even
>                   better by
> 
>                   putting the Objects on a list when they get the first dirty
>                   buffer so that
> 
>                   we only cycle through the dirty ones.  Or, have a global
>                   list of dirty
> 
>                   buffers (instead of dirty objects -> dirty buffers).
> 
> 
>                   sage
> 
> 
> 
>                         I hope it's the root cause for db on rbd
>                         performance.
> 
> 
>                         On Wed, May 21, 2014 at 6:15 PM, Haomai Wang
>                         <haomaiwang@gmail.com> wrote:
> 
>                               Hi all,
> 
> 
>                               I remember there exists discuss
>                               about DB(mysql) performance on rbd.
> 
>                               Recently I test mysql-bench with
>                               rbd and found awful performance. So
>                               I
> 
>                               dive into it and find that main
>                               cause is "flush" request from
>                               guest.
> 
>                               As we know, applications such as
>                               mysql, ceph has own journal for
> 
>                               durable and journal usually send
>                               sync&direct io. If fs barrier is
>                               on,
> 
>                               each sync io operation make kernel
>                               issue "sync"(barrier) request to
> 
>                               block device. Here, qemu will call
>                               "rbd_aio_flush" to apply.
> 
> 
>                               Via systemtap, I found a amazing
>                               thing:
> 
>                               aio_flush
>                                                        sum:
>                               4177085    avg: 24145  count:
> 
>                               173      max: 28172  min: 22747
> 
>                               flush_set
>                                                        sum:
>                               4172116    avg: 24116  count:
> 
>                               173      max: 28034  min: 22733
> 
>                               flush
>                                                            sum:
>                               3029910    avg: 4      count:
> 
>                               670477   max: 1893   min: 3
> 
> 
>                               This statistic info is gathered in
>                               5s. Most of consuming time is on
> 
>                               "ObjectCacher::flush". What's more,
>                               with time increasing, the flush
> 
>                               count will be increasing.
> 
> 
>                               After view source, I find the root
>                               cause is "ObjectCacher::flush_set",
> 
>                               it will iterator the "object_set"
>                               and look for dirty buffer. And
> 
>                               "object_set"  contains all objects
>                               ever opened.  For example:
> 
> 
>                               2014-05-21 18:01:37.959013
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5919 flushed: 5
> 
>                               2014-05-21 18:01:37.999698
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5919 flushed: 5
> 
>                               2014-05-21 18:01:38.038405
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5920 flushed: 5
> 
>                               2014-05-21 18:01:38.080118
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5920 flushed: 5
> 
>                               2014-05-21 18:01:38.119792
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5921 flushed: 5
> 
>                               2014-05-21 18:01:38.162004
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5922 flushed: 5
> 
>                               2014-05-21 18:01:38.202755
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5923 flushed: 5
> 
>                               2014-05-21 18:01:38.243880
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5923 flushed: 5
> 
>                               2014-05-21 18:01:38.284399
>                               7f785c7c6700  0 objectcacher
>                               flush_set
> 
>                               total: 5923 flushed: 5
> 
> 
>                               These logs record the iteration
>                               info, the loop will check 5920
>                               objects
> 
>                               but only 5 objects are dirty.
> 
> 
>                               So I think the solution is make
>                               "ObjectCacher::flush_set" only
> 
>                               iterator the objects which is
>                               dirty.
> 
> 
>                               --
> 
>                               Best Regards,
> 
> 
>                               Wheat
> 
> 
> 
> 
>                         --
> 
>                         Best Regards,
> 
> 
>                         Wheat
> 
>                         --
> 
>                         To unsubscribe from this list: send the line
>                         "unsubscribe ceph-devel" in
> 
>                         the body of a message to
>                         majordomo@vger.kernel.org
> 
>                         More majordomo info at
>                          http://vger.kernel.org/majordomo-info.html
> 
> 
> 
>                   --
> 
>                   To unsubscribe from this list: send the line "unsubscribe
>                   ceph-devel" in
> 
>                   the body of a message to majordomo@vger.kernel.org
> 
>                   More majordomo info at
>                    http://vger.kernel.org/majordomo-info.html
> 
> 
>             --
> 
>             To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>             in
> 
>             the body of a message to majordomo@vger.kernel.org
> 
>             More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
>       --
>       To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>       the body of a message to majordomo@vger.kernel.org
>       More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 18:41           ` Sage Weil
@ 2014-05-21 18:51             ` Stefan Priebe - Profihost AG
  2014-05-21 20:05               ` Stefan Priebe
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2014-05-21 18:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: Mike Dawson, Haomai Wang, ceph-devel


> Am 21.05.2014 um 20:41 schrieb Sage Weil <sage@inktank.com>:
> 
>> On Wed, 21 May 2014, Stefan Priebe - Profihost AG wrote:
>> Hi sage,
>> 
>> what about cuttlefish customers?
> 
> We stopped backporting fixes to cuttlefish a while ago.  Please upgrade to 
> dumpling!

Did I miss an information from inktank to update to dumpling? I thought we should stay at cuttlefish and then upgrade to firefly.

> 
> That said, this patch should apply cleanly to cuttlefish.
> 
> sage
> 
> 
>> 
>> Greets,
>> Stefan
>> Excuse my typo sent from my mobile phone.
>> 
>> Am 21.05.2014 um 18:15 schrieb Sage Weil <sage@inktank.com>:
>> 
>>      On Wed, 21 May 2014, Mike Dawson wrote:
>>            Haomai,
>> 
>> 
>>            Thanks for finding this!
>> 
>> 
>> 
>>            Sage,
>> 
>> 
>>            We have a client that runs an io intensive, closed-source software
>>            package
>> 
>>            that seems to issue overzealous flushes which may benefit from this
>>            patch (or
>> 
>>            the other methods you mention). If you were to spin a wip build based
>>            on
>> 
>>            Dumpling, I'll be a willing tester.
>> 
>> 
>>      Pushed wip-librbd-flush-dumpling, should be built shortly.
>> 
>>      sage
>> 
>> 
>>            Thanks,
>> 
>>            Mike Dawson
>> 
>> 
>>            On 5/21/2014 11:23 AM, Sage Weil wrote:
>> 
>>                  On Wed, 21 May 2014, Haomai Wang wrote:
>> 
>>                        I pushed the commit to fix this
>> 
>>                        problem(https://github.com/ceph/ceph/pull/1848).
>> 
>> 
>>                        With test program(Each sync request is issued
>>                        with ten write request),
>> 
>>                        a significant improvement is noticed.
>> 
>> 
>>                        aio_flush                          sum: 914750
>>                            avg: 1239   count:
>> 
>>                        738      max: 4714   min: 1011
>> 
>>                        flush_set                          sum: 904200
>>                            avg: 1225   count:
>> 
>>                        738      max: 4698   min: 999
>> 
>>                        flush                              sum: 641648
>>                            avg: 173    count:
>> 
>>                        3690     max: 1340   min: 128
>> 
>> 
>>                        Compared to last mail, it reduce each aio_flush
>>                        request to 1239 ns
>> 
>>                        instead of 24145 ns.
>> 
>> 
>>                  Good catch!  That's a great improvement.
>> 
>> 
>>                  The patch looks clearly correct.  We can probably do even
>>                  better by
>> 
>>                  putting the Objects on a list when they get the first dirty
>>                  buffer so that
>> 
>>                  we only cycle through the dirty ones.  Or, have a global
>>                  list of dirty
>> 
>>                  buffers (instead of dirty objects -> dirty buffers).
>> 
>> 
>>                  sage
>> 
>> 
>> 
>>                        I hope it's the root cause for db on rbd
>>                        performance.
>> 
>> 
>>                        On Wed, May 21, 2014 at 6:15 PM, Haomai Wang
>>                        <haomaiwang@gmail.com> wrote:
>> 
>>                              Hi all,
>> 
>> 
>>                              I remember there exists discuss
>>                              about DB(mysql) performance on rbd.
>> 
>>                              Recently I test mysql-bench with
>>                              rbd and found awful performance. So
>>                              I
>> 
>>                              dive into it and find that main
>>                              cause is "flush" request from
>>                              guest.
>> 
>>                              As we know, applications such as
>>                              mysql, ceph has own journal for
>> 
>>                              durable and journal usually send
>>                              sync&direct io. If fs barrier is
>>                              on,
>> 
>>                              each sync io operation make kernel
>>                              issue "sync"(barrier) request to
>> 
>>                              block device. Here, qemu will call
>>                              "rbd_aio_flush" to apply.
>> 
>> 
>>                              Via systemtap, I found a amazing
>>                              thing:
>> 
>>                              aio_flush
>>                                                       sum:
>>                              4177085    avg: 24145  count:
>> 
>>                              173      max: 28172  min: 22747
>> 
>>                              flush_set
>>                                                       sum:
>>                              4172116    avg: 24116  count:
>> 
>>                              173      max: 28034  min: 22733
>> 
>>                              flush
>>                                                           sum:
>>                              3029910    avg: 4      count:
>> 
>>                              670477   max: 1893   min: 3
>> 
>> 
>>                              This statistic info is gathered in
>>                              5s. Most of consuming time is on
>> 
>>                              "ObjectCacher::flush". What's more,
>>                              with time increasing, the flush
>> 
>>                              count will be increasing.
>> 
>> 
>>                              After view source, I find the root
>>                              cause is "ObjectCacher::flush_set",
>> 
>>                              it will iterator the "object_set"
>>                              and look for dirty buffer. And
>> 
>>                              "object_set"  contains all objects
>>                              ever opened.  For example:
>> 
>> 
>>                              2014-05-21 18:01:37.959013
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5919 flushed: 5
>> 
>>                              2014-05-21 18:01:37.999698
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5919 flushed: 5
>> 
>>                              2014-05-21 18:01:38.038405
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5920 flushed: 5
>> 
>>                              2014-05-21 18:01:38.080118
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5920 flushed: 5
>> 
>>                              2014-05-21 18:01:38.119792
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5921 flushed: 5
>> 
>>                              2014-05-21 18:01:38.162004
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5922 flushed: 5
>> 
>>                              2014-05-21 18:01:38.202755
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5923 flushed: 5
>> 
>>                              2014-05-21 18:01:38.243880
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5923 flushed: 5
>> 
>>                              2014-05-21 18:01:38.284399
>>                              7f785c7c6700  0 objectcacher
>>                              flush_set
>> 
>>                              total: 5923 flushed: 5
>> 
>> 
>>                              These logs record the iteration
>>                              info, the loop will check 5920
>>                              objects
>> 
>>                              but only 5 objects are dirty.
>> 
>> 
>>                              So I think the solution is make
>>                              "ObjectCacher::flush_set" only
>> 
>>                              iterator the objects which is
>>                              dirty.
>> 
>> 
>>                              --
>> 
>>                              Best Regards,
>> 
>> 
>>                              Wheat
>> 
>> 
>> 
>> 
>>                        --
>> 
>>                        Best Regards,
>> 
>> 
>>                        Wheat
>> 
>>                        --
>> 
>>                        To unsubscribe from this list: send the line
>>                        "unsubscribe ceph-devel" in
>> 
>>                        the body of a message to
>>                        majordomo@vger.kernel.org
>> 
>>                        More majordomo info at
>>                         http://vger.kernel.org/majordomo-info.html
>> 
>> 
>> 
>>                  --
>> 
>>                  To unsubscribe from this list: send the line "unsubscribe
>>                  ceph-devel" in
>> 
>>                  the body of a message to majordomo@vger.kernel.org
>> 
>>                  More majordomo info at
>>                   http://vger.kernel.org/majordomo-info.html
>> 
>> 
>>            --
>> 
>>            To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>            in
>> 
>>            the body of a message to majordomo@vger.kernel.org
>> 
>>            More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
>> 
>>      --
>>      To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>      the body of a message to majordomo@vger.kernel.org
>>      More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 18:51             ` Stefan Priebe - Profihost AG
@ 2014-05-21 20:05               ` Stefan Priebe
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Priebe @ 2014-05-21 20:05 UTC (permalink / raw)
  To: Sage Weil; +Cc: Mike Dawson, Haomai Wang, ceph-devel

*arg* sorry missed emperor with dumpling.. sorry.

Stefan

Am 21.05.2014 20:51, schrieb Stefan Priebe - Profihost AG:
>
>> Am 21.05.2014 um 20:41 schrieb Sage Weil <sage@inktank.com>:
>>
>>> On Wed, 21 May 2014, Stefan Priebe - Profihost AG wrote:
>>> Hi sage,
>>>
>>> what about cuttlefish customers?
>>
>> We stopped backporting fixes to cuttlefish a while ago.  Please upgrade to
>> dumpling!
>
> Did I miss an information from inktank to update to dumpling? I thought we should stay at cuttlefish and then upgrade to firefly.
>
>>
>> That said, this patch should apply cleanly to cuttlefish.
>>
>> sage
>>
>>
>>>
>>> Greets,
>>> Stefan
>>> Excuse my typo sent from my mobile phone.
>>>
>>> Am 21.05.2014 um 18:15 schrieb Sage Weil <sage@inktank.com>:
>>>
>>>       On Wed, 21 May 2014, Mike Dawson wrote:
>>>             Haomai,
>>>
>>>
>>>             Thanks for finding this!
>>>
>>>
>>>
>>>             Sage,
>>>
>>>
>>>             We have a client that runs an io intensive, closed-source software
>>>             package
>>>
>>>             that seems to issue overzealous flushes which may benefit from this
>>>             patch (or
>>>
>>>             the other methods you mention). If you were to spin a wip build based
>>>             on
>>>
>>>             Dumpling, I'll be a willing tester.
>>>
>>>
>>>       Pushed wip-librbd-flush-dumpling, should be built shortly.
>>>
>>>       sage
>>>
>>>
>>>             Thanks,
>>>
>>>             Mike Dawson
>>>
>>>
>>>             On 5/21/2014 11:23 AM, Sage Weil wrote:
>>>
>>>                   On Wed, 21 May 2014, Haomai Wang wrote:
>>>
>>>                         I pushed the commit to fix this
>>>
>>>                         problem(https://github.com/ceph/ceph/pull/1848).
>>>
>>>
>>>                         With test program(Each sync request is issued
>>>                         with ten write request),
>>>
>>>                         a significant improvement is noticed.
>>>
>>>
>>>                         aio_flush                          sum: 914750
>>>                             avg: 1239   count:
>>>
>>>                         738      max: 4714   min: 1011
>>>
>>>                         flush_set                          sum: 904200
>>>                             avg: 1225   count:
>>>
>>>                         738      max: 4698   min: 999
>>>
>>>                         flush                              sum: 641648
>>>                             avg: 173    count:
>>>
>>>                         3690     max: 1340   min: 128
>>>
>>>
>>>                         Compared to last mail, it reduce each aio_flush
>>>                         request to 1239 ns
>>>
>>>                         instead of 24145 ns.
>>>
>>>
>>>                   Good catch!  That's a great improvement.
>>>
>>>
>>>                   The patch looks clearly correct.  We can probably do even
>>>                   better by
>>>
>>>                   putting the Objects on a list when they get the first dirty
>>>                   buffer so that
>>>
>>>                   we only cycle through the dirty ones.  Or, have a global
>>>                   list of dirty
>>>
>>>                   buffers (instead of dirty objects -> dirty buffers).
>>>
>>>
>>>                   sage
>>>
>>>
>>>
>>>                         I hope it's the root cause for db on rbd
>>>                         performance.
>>>
>>>
>>>                         On Wed, May 21, 2014 at 6:15 PM, Haomai Wang
>>>                         <haomaiwang@gmail.com> wrote:
>>>
>>>                               Hi all,
>>>
>>>
>>>                               I remember there exists discuss
>>>                               about DB(mysql) performance on rbd.
>>>
>>>                               Recently I test mysql-bench with
>>>                               rbd and found awful performance. So
>>>                               I
>>>
>>>                               dive into it and find that main
>>>                               cause is "flush" request from
>>>                               guest.
>>>
>>>                               As we know, applications such as
>>>                               mysql, ceph has own journal for
>>>
>>>                               durable and journal usually send
>>>                               sync&direct io. If fs barrier is
>>>                               on,
>>>
>>>                               each sync io operation make kernel
>>>                               issue "sync"(barrier) request to
>>>
>>>                               block device. Here, qemu will call
>>>                               "rbd_aio_flush" to apply.
>>>
>>>
>>>                               Via systemtap, I found a amazing
>>>                               thing:
>>>
>>>                               aio_flush
>>>                                                        sum:
>>>                               4177085    avg: 24145  count:
>>>
>>>                               173      max: 28172  min: 22747
>>>
>>>                               flush_set
>>>                                                        sum:
>>>                               4172116    avg: 24116  count:
>>>
>>>                               173      max: 28034  min: 22733
>>>
>>>                               flush
>>>                                                            sum:
>>>                               3029910    avg: 4      count:
>>>
>>>                               670477   max: 1893   min: 3
>>>
>>>
>>>                               This statistic info is gathered in
>>>                               5s. Most of consuming time is on
>>>
>>>                               "ObjectCacher::flush". What's more,
>>>                               with time increasing, the flush
>>>
>>>                               count will be increasing.
>>>
>>>
>>>                               After view source, I find the root
>>>                               cause is "ObjectCacher::flush_set",
>>>
>>>                               it will iterator the "object_set"
>>>                               and look for dirty buffer. And
>>>
>>>                               "object_set"  contains all objects
>>>                               ever opened.  For example:
>>>
>>>
>>>                               2014-05-21 18:01:37.959013
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5919 flushed: 5
>>>
>>>                               2014-05-21 18:01:37.999698
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5919 flushed: 5
>>>
>>>                               2014-05-21 18:01:38.038405
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5920 flushed: 5
>>>
>>>                               2014-05-21 18:01:38.080118
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5920 flushed: 5
>>>
>>>                               2014-05-21 18:01:38.119792
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5921 flushed: 5
>>>
>>>                               2014-05-21 18:01:38.162004
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5922 flushed: 5
>>>
>>>                               2014-05-21 18:01:38.202755
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5923 flushed: 5
>>>
>>>                               2014-05-21 18:01:38.243880
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5923 flushed: 5
>>>
>>>                               2014-05-21 18:01:38.284399
>>>                               7f785c7c6700  0 objectcacher
>>>                               flush_set
>>>
>>>                               total: 5923 flushed: 5
>>>
>>>
>>>                               These logs record the iteration
>>>                               info, the loop will check 5920
>>>                               objects
>>>
>>>                               but only 5 objects are dirty.
>>>
>>>
>>>                               So I think the solution is make
>>>                               "ObjectCacher::flush_set" only
>>>
>>>                               iterator the objects which is
>>>                               dirty.
>>>
>>>
>>>                               --
>>>
>>>                               Best Regards,
>>>
>>>
>>>                               Wheat
>>>
>>>
>>>
>>>
>>>                         --
>>>
>>>                         Best Regards,
>>>
>>>
>>>                         Wheat
>>>
>>>                         --
>>>
>>>                         To unsubscribe from this list: send the line
>>>                         "unsubscribe ceph-devel" in
>>>
>>>                         the body of a message to
>>>                         majordomo@vger.kernel.org
>>>
>>>                         More majordomo info at
>>>                          http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>                   --
>>>
>>>                   To unsubscribe from this list: send the line "unsubscribe
>>>                   ceph-devel" in
>>>
>>>                   the body of a message to majordomo@vger.kernel.org
>>>
>>>                   More majordomo info at
>>>                    http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>             --
>>>
>>>             To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>             in
>>>
>>>             the body of a message to majordomo@vger.kernel.org
>>>
>>>             More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>       --
>>>       To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>       the body of a message to majordomo@vger.kernel.org
>>>       More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Performance] Improvement on DB Performance
  2014-05-21 15:23   ` Sage Weil
  2014-05-21 15:50     ` Mike Dawson
@ 2014-05-26 13:57     ` Haomai Wang
  1 sibling, 0 replies; 13+ messages in thread
From: Haomai Wang @ 2014-05-26 13:57 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On Wed, May 21, 2014 at 11:23 PM, Sage Weil <sage@inktank.com> wrote:
> On Wed, 21 May 2014, Haomai Wang wrote:
>> I pushed the commit to fix this problem(https://github.com/ceph/ceph/pull/1848).
>>
>> With test program(Each sync request is issued with ten write request),
>> a significant improvement is noticed.
>>
>> aio_flush                          sum: 914750     avg: 1239   count:
>> 738      max: 4714   min: 1011
>> flush_set                          sum: 904200     avg: 1225   count:
>> 738      max: 4698   min: 999
>> flush                              sum: 641648     avg: 173    count:
>> 3690     max: 1340   min: 128
>>
>> Compared to last mail, it reduce each aio_flush request to 1239 ns
>> instead of 24145 ns.
>
> Good catch!  That's a great improvement.
>
> The patch looks clearly correct.  We can probably do even better by
> putting the Objects on a list when they get the first dirty buffer so that
> we only cycle through the dirty ones.  Or, have a global list of dirty
> buffers (instead of dirty objects -> dirty buffers).

Yes, I think the dirty objects list is worth to maintain.

According to my test: create a 1TB volume and has 500GB DB data, I run
the test program covered 100GB hot data. So in the test, we have
nearly 3000 objects. I calculate the time flush_set consumed and
compared it to the total time which get from "ceph perf
dump"(aio_flush_latency). The flush_set loop usually consumes 1-4ms,
and the total flush request consumes 3-8ms. The proportion is still
high.

Sample logs:
2014-05-26 22:11:16.392538 7fa6c6e41700  0 objectcacher flush_set
total: 28952 flushed: 4
2014-05-26 22:11:16.398726 7fa6c6e41700  0 objectcacher flush_set: time 0.004436
2014-05-26 22:11:16.398745 7fa6c6e41700  0 objectcacher flush_set
total: 28955 flushed: 15
2014-05-26 22:11:16.400992 7fa6c6e41700  0 objectcacher flush_set: time 0.002149
2014-05-26 22:11:17.039226 7fa6c6e41700  0 objectcacher flush_set: time 0.001304
2014-05-26 22:11:17.039274 7fa6c6e41700  0 objectcacher flush_set
total: 28950 flushed: 4
2014-05-26 22:11:17.045580 7fa6c6e41700  0 objectcacher flush_set: time 0.004523
2014-05-26 22:11:17.045636 7fa6c6e41700  0 objectcacher flush_set
total: 28959 flushed: 15

I think a request consuming 4ms in client side is still unaffordable.
And if I create a 2 or 3TB volume, the time will be more larger.

>
> sage
>
>>
>> I hope it's the root cause for db on rbd performance.
>>
>> On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
>> > Hi all,
>> >
>> > I remember there exists discuss about DB(mysql) performance on rbd.
>> > Recently I test mysql-bench with rbd and found awful performance. So I
>> > dive into it and find that main cause is "flush" request from guest.
>> > As we know, applications such as mysql, ceph has own journal for
>> > durable and journal usually send sync&direct io. If fs barrier is on,
>> > each sync io operation make kernel issue "sync"(barrier) request to
>> > block device. Here, qemu will call "rbd_aio_flush" to apply.
>> >
>> > Via systemtap, I found a amazing thing:
>> > aio_flush                          sum: 4177085    avg: 24145  count:
>> > 173      max: 28172  min: 22747
>> > flush_set                          sum: 4172116    avg: 24116  count:
>> > 173      max: 28034  min: 22733
>> > flush                              sum: 3029910    avg: 4      count:
>> > 670477   max: 1893   min: 3
>> >
>> > This statistic info is gathered in 5s. Most of consuming time is on
>> > "ObjectCacher::flush". What's more, with time increasing, the flush
>> > count will be increasing.
>> >
>> > After view source, I find the root cause is "ObjectCacher::flush_set",
>> > it will iterator the "object_set" and look for dirty buffer. And
>> > "object_set"  contains all objects ever opened.  For example:
>> >
>> > 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
>> > total: 5919 flushed: 5
>> > 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
>> > total: 5919 flushed: 5
>> > 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
>> > total: 5920 flushed: 5
>> > 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
>> > total: 5920 flushed: 5
>> > 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
>> > total: 5921 flushed: 5
>> > 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
>> > total: 5922 flushed: 5
>> > 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
>> > total: 5923 flushed: 5
>> > 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
>> > total: 5923 flushed: 5
>> > 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
>> > total: 5923 flushed: 5
>> >
>> > These logs record the iteration info, the loop will check 5920 objects
>> > but only 5 objects are dirty.
>> >
>> > So I think the solution is make "ObjectCacher::flush_set" only
>> > iterator the objects which is dirty.
>> >
>> > --
>> > Best Regards,
>> >
>> > Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-05-26 13:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-21 10:15 [Performance] Improvement on DB Performance Haomai Wang
2014-05-21 10:21 ` Haomai Wang
2014-05-21 12:00   ` Luke Jing Yuan
2014-05-21 12:12     ` Haomai Wang
2014-05-21 15:06       ` Mark Nelson
2014-05-21 15:23   ` Sage Weil
2014-05-21 15:50     ` Mike Dawson
2014-05-21 15:53       ` Mark Nelson
2014-05-21 16:15       ` Sage Weil
     [not found]         ` <77004F70-7FE7-4EBE-A34D-46A8DC290936@profihost.ag>
2014-05-21 18:41           ` Sage Weil
2014-05-21 18:51             ` Stefan Priebe - Profihost AG
2014-05-21 20:05               ` Stefan Priebe
2014-05-26 13:57     ` Haomai Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.