* Bucket index op - lock contention hang op threads @ 2015-02-05 9:36 GuangYang 2015-02-05 19:52 ` Samuel Just 0 siblings, 1 reply; 4+ messages in thread From: GuangYang @ 2015-02-05 9:36 UTC (permalink / raw) To: ceph-devel, Weil Sage Hi ceph-devel, In our ceph cluster (with rgw), we came across a problem that all rgw process are stuck (all worker threads wait for the response from OSD, and start giving 500 to clients). objecter_requests dump showed the slow in flight requests were caused by one OSD, which has 2 PGs doing backfilling and it has 2 bucket index objects. At OSD side, we configure 8 threads, it turned out when this problem occurred, several op threads took seconds (even tens of seconds) handling bucket index op, with most of time waiting for the ondisk_read_lock. As a result, the throughput of the op threads drop (qlen increasing). I am wondering what options we can pursue to improve the situation, some general ideas on my mind: 1> Similar to OpContext::rwstate, instead of make the op thread stuck, put this op to a waiting list and notify upon lock available. I am not sure if this worth it or break anything. 2> Differentiate the service class at filestore level for such OP - somebody is waiting for its release of the lock. Does this break any assumption at filestore layer? As we are using EC (8+3), the fan out is more than replication pool, such kind of slow from one OSD could be cascading to more OSDs easier. BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739 Look forward to your suggestions. Thanks, Guang -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bucket index op - lock contention hang op threads 2015-02-05 9:36 Bucket index op - lock contention hang op threads GuangYang @ 2015-02-05 19:52 ` Samuel Just [not found] ` <BLU175-W1776AE5CA99C76AAE06468DF3B0@phx.gbl> 0 siblings, 1 reply; 4+ messages in thread From: Samuel Just @ 2015-02-05 19:52 UTC (permalink / raw) To: GuangYang; +Cc: ceph-devel, Weil Sage Recent changes already merged for hammer should prevent blocking the thread on the ondisk_read_lock by expanding the ObjectContext::rwstate lists mostly as you suggested. -Sam On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@outlook.com> wrote: > Hi ceph-devel, > In our ceph cluster (with rgw), we came across a problem that all rgw process are stuck (all worker threads wait for the response from OSD, and start giving 500 to clients). objecter_requests dump showed the slow in flight requests were caused by one OSD, which has 2 PGs doing backfilling and it has 2 bucket index objects. > > At OSD side, we configure 8 threads, it turned out when this problem occurred, several op threads took seconds (even tens of seconds) handling bucket index op, with most of time waiting for the ondisk_read_lock. As a result, the throughput of the op threads drop (qlen increasing). > > I am wondering what options we can pursue to improve the situation, some general ideas on my mind: > 1> Similar to OpContext::rwstate, instead of make the op thread stuck, put this op to a waiting list and notify upon lock available. I am not sure if this worth it or break anything. > 2> Differentiate the service class at filestore level for such OP - somebody is waiting for its release of the lock. Does this break any assumption at filestore layer? > > As we are using EC (8+3), the fan out is more than replication pool, such kind of slow from one OSD could be cascading to more OSDs easier. > > BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739 > > Look forward to your suggestions. > > Thanks, > Guang -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <BLU175-W1776AE5CA99C76AAE06468DF3B0@phx.gbl>]
* Re: Bucket index op - lock contention hang op threads [not found] ` <BLU175-W1776AE5CA99C76AAE06468DF3B0@phx.gbl> @ 2015-02-05 21:42 ` Samuel Just 2015-02-06 0:53 ` GuangYang 0 siblings, 1 reply; 4+ messages in thread From: Samuel Just @ 2015-02-05 21:42 UTC (permalink / raw) To: GuangYang; +Cc: ceph-devel, Sage Weil Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449. The intention was actually to allow writes on degraded objects for replicated pools (to avoid a 4k rbd write blocking on a 4mb recovery), but I think it solves this issue as well. -Sam On Thu, Feb 5, 2015 at 1:39 PM, GuangYang <yguang11@outlook.com> wrote: > Thanks Sam! Do you mind sharing the pull request / commit Id of the change? > >> Date: Thu, 5 Feb 2015 11:52:04 -0800 >> Subject: Re: Bucket index op - lock contention hang op threads >> From: sam.just@inktank.com >> To: yguang11@outlook.com >> CC: ceph-devel@vger.kernel.org; sweil@redhat.com > >> >> Recent changes already merged for hammer should prevent blocking the >> thread on the ondisk_read_lock by expanding the ObjectContext::rwstate >> lists mostly as you suggested. >> -Sam >> >> On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@outlook.com> wrote: >> > Hi ceph-devel, >> > In our ceph cluster (with rgw), we came across a problem that all rgw >> > process are stuck (all worker threads wait for the response from OSD, and >> > start giving 500 to clients). objecter_requests dump showed the slow in >> > flight requests were caused by one OSD, which has 2 PGs doing backfilling >> > and it has 2 bucket index objects. >> > >> > At OSD side, we configure 8 threads, it turned out when this problem >> > occurred, several op threads took seconds (even tens of seconds) handling >> > bucket index op, with most of time waiting for the ondisk_read_lock. As a >> > result, the throughput of the op threads drop (qlen increasing). >> > >> > I am wondering what options we can pursue to improve the situation, some >> > general ideas on my mind: >> > 1> Similar to OpContext::rwstate, instead of make the op thread stuck, >> > put this op to a waiting list and notify upon lock available. I am not sure >> > if this worth it or break anything. >> > 2> Differentiate the service class at filestore level for such OP - >> > somebody is waiting for its release of the lock. Does this break any >> > assumption at filestore layer? >> > >> > As we are using EC (8+3), the fan out is more than replication pool, >> > such kind of slow from one OSD could be cascading to more OSDs easier. >> > >> > BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739 >> > >> > Look forward to your suggestions. >> > >> > Thanks, >> > Guang -- >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Bucket index op - lock contention hang op threads 2015-02-05 21:42 ` Samuel Just @ 2015-02-06 0:53 ` GuangYang 0 siblings, 0 replies; 4+ messages in thread From: GuangYang @ 2015-02-06 0:53 UTC (permalink / raw) To: sjust; +Cc: ceph-devel, Weil Sage Thanks Sam! Just took a look at the patch, it should be very much helpful for our use case. Thanks, Guang ---------------------------------------- > Date: Thu, 5 Feb 2015 13:42:13 -0800 > Subject: Re: Bucket index op - lock contention hang op threads > From: sam.just@inktank.com > To: yguang11@outlook.com > CC: ceph-devel@vger.kernel.org; sweil@redhat.com > > Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449. The intention was > actually to allow writes on degraded objects for replicated pools (to > avoid a 4k rbd write blocking on a 4mb recovery), but I think it > solves this issue as well. > -Sam > > On Thu, Feb 5, 2015 at 1:39 PM, GuangYang <yguang11@outlook.com> wrote: >> Thanks Sam! Do you mind sharing the pull request / commit Id of the change? >> >>> Date: Thu, 5 Feb 2015 11:52:04 -0800 >>> Subject: Re: Bucket index op - lock contention hang op threads >>> From: sam.just@inktank.com >>> To: yguang11@outlook.com >>> CC: ceph-devel@vger.kernel.org; sweil@redhat.com >> >>> >>> Recent changes already merged for hammer should prevent blocking the >>> thread on the ondisk_read_lock by expanding the ObjectContext::rwstate >>> lists mostly as you suggested. >>> -Sam >>> >>> On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@outlook.com> wrote: >>>> Hi ceph-devel, >>>> In our ceph cluster (with rgw), we came across a problem that all rgw >>>> process are stuck (all worker threads wait for the response from OSD, and >>>> start giving 500 to clients). objecter_requests dump showed the slow in >>>> flight requests were caused by one OSD, which has 2 PGs doing backfilling >>>> and it has 2 bucket index objects. >>>> >>>> At OSD side, we configure 8 threads, it turned out when this problem >>>> occurred, several op threads took seconds (even tens of seconds) handling >>>> bucket index op, with most of time waiting for the ondisk_read_lock. As a >>>> result, the throughput of the op threads drop (qlen increasing). >>>> >>>> I am wondering what options we can pursue to improve the situation, some >>>> general ideas on my mind: >>>> 1> Similar to OpContext::rwstate, instead of make the op thread stuck, >>>> put this op to a waiting list and notify upon lock available. I am not sure >>>> if this worth it or break anything. >>>> 2> Differentiate the service class at filestore level for such OP - >>>> somebody is waiting for its release of the lock. Does this break any >>>> assumption at filestore layer? >>>> >>>> As we are using EC (8+3), the fan out is more than replication pool, >>>> such kind of slow from one OSD could be cascading to more OSDs easier. >>>> >>>> BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739 >>>> >>>> Look forward to your suggestions. >>>> >>>> Thanks, >>>> Guang -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-02-06 0:53 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-02-05 9:36 Bucket index op - lock contention hang op threads GuangYang 2015-02-05 19:52 ` Samuel Just [not found] ` <BLU175-W1776AE5CA99C76AAE06468DF3B0@phx.gbl> 2015-02-05 21:42 ` Samuel Just 2015-02-06 0:53 ` GuangYang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.