More ondisk_finisher thread?

All of lore.kernel.org
 help / color / mirror / Atom feed

* More ondisk_finisher thread?
@ 2015-08-04  9:59 Ding Dinghua
  2015-08-04 16:13 ` Somnath Roy
  0 siblings, 1 reply; 6+ messages in thread
From: Ding Dinghua @ 2015-08-04  9:59 UTC (permalink / raw)
  To: ceph-devel

Hi:
   Now we are doing some ceph performance tuning work, our setup has
ten ceph nodes, and SSD as journal, HDD for filestore, and ceph
version is 0.80.9.
   We run fio in virtual maching with random 4KB write workload, we
find that It took about 1ms in average for ondisk_finisher, while
journal write only took 0.4ms, so I think it's unreasonable.
    Since ondisk callback will be called with pg lock held, If pg lock
has been grabbed by another thread(for example, osd->op_wq), all
ondisk callback will be delayed, then all write op will be delayed.
     I found that op_commit must be called with pg lock, so what about
increase the ondisk_finisher thread number, so ondisk callback can be
less likely to be delayed.

-- 
Ding Dinghua

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: More ondisk_finisher thread?
  2015-08-04  9:59 More ondisk_finisher thread? Ding Dinghua
@ 2015-08-04 16:13 ` Somnath Roy
  2015-08-05  2:52   ` Ding Dinghua
  2015-08-05  5:23   ` Haomai Wang
  0 siblings, 2 replies; 6+ messages in thread
From: Somnath Roy @ 2015-08-04 16:13 UTC (permalink / raw)
  To: Ding Dinghua, ceph-devel

Yes, it has to re-acquire pg_lock today..
But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay...
Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock..
But, increasing finisher threads for the other context switches I was talking about (see queue_completion_thru) may help...

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Ding Dinghua
Sent: Tuesday, August 04, 2015 3:00 AM
To: ceph-devel@vger.kernel.org
Subject: More ondisk_finisher thread?

Hi:
   Now we are doing some ceph performance tuning work, our setup has ten ceph nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9.
   We run fio in virtual maching with random 4KB write workload, we find that It took about 1ms in average for ondisk_finisher, while journal write only took 0.4ms, so I think it's unreasonable.
    Since ondisk callback will be called with pg lock held, If pg lock has been grabbed by another thread(for example, osd->op_wq), all ondisk callback will be delayed, then all write op will be delayed.
     I found that op_commit must be called with pg lock, so what about increase the ondisk_finisher thread number, so ondisk callback can be less likely to be delayed.

--
Ding Dinghua
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: More ondisk_finisher thread?
  2015-08-04 16:13 ` Somnath Roy
@ 2015-08-05  2:52   ` Ding Dinghua
  2015-08-05 12:29     ` Sage Weil
  2015-08-05  5:23   ` Haomai Wang
  1 sibling, 1 reply; 6+ messages in thread
From: Ding Dinghua @ 2015-08-05  2:52 UTC (permalink / raw)
  To: Somnath Roy; +Cc: ceph-devel

Please see the comment below:

2015-08-05 0:13 GMT+08:00 Somnath Roy <Somnath.Roy@sandisk.com>:
> Yes, it has to re-acquire pg_lock today..
> But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay...
> Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock..
My concern is, if pg lock of pg A has been grabbed, not only ondisk
callback of pg A is delayed, since ondisk_finisher has only one
thread,  ondisk callback of other pgs will be delayed too.
> But, increasing finisher threads for the other context switches I was talking about (see queue_completion_thru) may help...
We also count that latency, and It doesn't took much time in our case.

2015-08-05 0:13 GMT+08:00 Somnath Roy <Somnath.Roy@sandisk.com>:
> Yes, it has to re-acquire pg_lock today..
> But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay...
> Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock..
> But, increasing finisher threads for the other context switches I was talking about (see queue_completion_thru) may help...
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Ding Dinghua
> Sent: Tuesday, August 04, 2015 3:00 AM
> To: ceph-devel@vger.kernel.org
> Subject: More ondisk_finisher thread?
>
> Hi:
>    Now we are doing some ceph performance tuning work, our setup has ten ceph nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9.
>    We run fio in virtual maching with random 4KB write workload, we find that It took about 1ms in average for ondisk_finisher, while journal write only took 0.4ms, so I think it's unreasonable.
>     Since ondisk callback will be called with pg lock held, If pg lock has been grabbed by another thread(for example, osd->op_wq), all ondisk callback will be delayed, then all write op will be delayed.
>      I found that op_commit must be called with pg lock, so what about increase the ondisk_finisher thread number, so ondisk callback can be less likely to be delayed.
>
> --
> Ding Dinghua
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>



-- 
Ding Dinghua

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: More ondisk_finisher thread?
  2015-08-04 16:13 ` Somnath Roy
  2015-08-05  2:52   ` Ding Dinghua
@ 2015-08-05  5:23   ` Haomai Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Haomai Wang @ 2015-08-05  5:23 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Ding Dinghua, ceph-devel

It's interesting to see ondisk_finisher will occur 1ms, could you
replay this workload and see whether exists read io from iostat. I
guess it may help to see the cause.

On Wed, Aug 5, 2015 at 12:13 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> Yes, it has to re-acquire pg_lock today..
> But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay...
> Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock..
> But, increasing finisher threads for the other context switches I was talking about (see queue_completion_thru) may help...
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Ding Dinghua
> Sent: Tuesday, August 04, 2015 3:00 AM
> To: ceph-devel@vger.kernel.org
> Subject: More ondisk_finisher thread?
>
> Hi:
>    Now we are doing some ceph performance tuning work, our setup has ten ceph nodes, and SSD as journal, HDD for filestore, and ceph version is 0.80.9.
>    We run fio in virtual maching with random 4KB write workload, we find that It took about 1ms in average for ondisk_finisher, while journal write only took 0.4ms, so I think it's unreasonable.
>     Since ondisk callback will be called with pg lock held, If pg lock has been grabbed by another thread(for example, osd->op_wq), all ondisk callback will be delayed, then all write op will be delayed.
>      I found that op_commit must be called with pg lock, so what about increase the ondisk_finisher thread number, so ondisk callback can be less likely to be delayed.
>
> --
> Ding Dinghua
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: More ondisk_finisher thread?
  2015-08-05  2:52   ` Ding Dinghua
@ 2015-08-05 12:29     ` Sage Weil
  2015-08-06 10:47       ` Ding Dinghua
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2015-08-05 12:29 UTC (permalink / raw)
  To: Ding Dinghua; +Cc: Somnath Roy, ceph-devel

On Wed, 5 Aug 2015, Ding Dinghua wrote:
> 2015-08-05 0:13 GMT+08:00 Somnath Roy <Somnath.Roy@sandisk.com>:
> > Yes, it has to re-acquire pg_lock today..
> > But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay...
> > Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock..
> My concern is, if pg lock of pg A has been grabbed, not only ondisk
> callback of pg A is delayed, since ondisk_finisher has only one
> thread,  ondisk callback of other pgs will be delayed too.

I wonder if an optimistic approach might help here by making the 
completion synchronous and doing something like

   if (pg->lock.TryLock()) {
      pg->_finish_thing(completion->op);
      delete completion;
   } else {
      finisher.queue(completion);
   }

or whatever.  We'd need to ensure that we aren't holding any lock or 
throttle budget that the pg could deadlock against.

sage

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: More ondisk_finisher thread?
  2015-08-05 12:29     ` Sage Weil
@ 2015-08-06 10:47       ` Ding Dinghua
  0 siblings, 0 replies; 6+ messages in thread
From: Ding Dinghua @ 2015-08-06 10:47 UTC (permalink / raw)
  To: Sage Weil; +Cc: Somnath Roy, ceph-devel

Sorry for the noise.
I have find out the cause in our setup and case: We gathered too many
logs in our RADOS IO path, and the latency seems to be
reasonable(about 0.026 ms) if we don't gather that many logs...

2015-08-05 20:29 GMT+08:00 Sage Weil <sage@newdream.net>:
> On Wed, 5 Aug 2015, Ding Dinghua wrote:
>> 2015-08-05 0:13 GMT+08:00 Somnath Roy <Somnath.Roy@sandisk.com>:
>> > Yes, it has to re-acquire pg_lock today..
>> > But, between journal write and initiating the ondisk ack, there is one context switche in the code path. So, I guess the pg_lock is not the only one that is causing this 1 ms delay...
>> > Not sure increasing the finisher threads will help in the pg_lock case as it will be more or less serialized by this pg_lock..
>> My concern is, if pg lock of pg A has been grabbed, not only ondisk
>> callback of pg A is delayed, since ondisk_finisher has only one
>> thread,  ondisk callback of other pgs will be delayed too.
>
> I wonder if an optimistic approach might help here by making the
> completion synchronous and doing something like
>
>    if (pg->lock.TryLock()) {
>       pg->_finish_thing(completion->op);
>       delete completion;
>    } else {
>       finisher.queue(completion);
>    }
>
> or whatever.  We'd need to ensure that we aren't holding any lock or
> throttle budget that the pg could deadlock against.
>
> sage



-- 
Ding Dinghua

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-06 10:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-04  9:59 More ondisk_finisher thread? Ding Dinghua
2015-08-04 16:13 ` Somnath Roy
2015-08-05  2:52   ` Ding Dinghua
2015-08-05 12:29     ` Sage Weil
2015-08-06 10:47       ` Ding Dinghua
2015-08-05  5:23   ` Haomai Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.