All of lore.kernel.org
 help / color / mirror / Atom feed
* Bucket index op - lock contention hang op threads
@ 2015-02-05  9:36 GuangYang
  2015-02-05 19:52 ` Samuel Just
  0 siblings, 1 reply; 4+ messages in thread
From: GuangYang @ 2015-02-05  9:36 UTC (permalink / raw)
  To: ceph-devel, Weil Sage

Hi ceph-devel,
In our ceph cluster (with rgw), we came across a problem that all rgw process are stuck (all worker threads wait for the response from OSD, and start giving 500 to clients). objecter_requests dump showed the slow in flight requests were caused by one OSD, which has 2 PGs doing backfilling and it has 2 bucket index objects.

At OSD side, we configure 8 threads, it turned out when this problem occurred, several op threads took seconds (even tens of seconds) handling bucket index op, with most of time waiting for the ondisk_read_lock. As a result, the throughput of the op threads drop (qlen increasing).

I am wondering what options we can pursue to improve the situation, some general ideas on my mind:
 1> Similar to OpContext::rwstate, instead of make the op thread stuck, put this op to a waiting list and notify upon lock available. I am not sure if this worth it or break anything.
 2> Differentiate the service class at filestore level for such OP - somebody is waiting for its release of the lock. Does this break any assumption at filestore layer?

As we are using EC (8+3), the fan out is more than replication pool, such kind of slow from one OSD could be cascading to more OSDs easier.

BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739

Look forward to your suggestions.

Thanks,
Guang 		 	   		  --
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bucket index op - lock contention hang op threads
  2015-02-05  9:36 Bucket index op - lock contention hang op threads GuangYang
@ 2015-02-05 19:52 ` Samuel Just
       [not found]   ` <BLU175-W1776AE5CA99C76AAE06468DF3B0@phx.gbl>
  0 siblings, 1 reply; 4+ messages in thread
From: Samuel Just @ 2015-02-05 19:52 UTC (permalink / raw)
  To: GuangYang; +Cc: ceph-devel, Weil Sage

Recent changes already merged for hammer should prevent blocking the
thread on the ondisk_read_lock by expanding the ObjectContext::rwstate
lists mostly as you suggested.
-Sam

On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@outlook.com> wrote:
> Hi ceph-devel,
> In our ceph cluster (with rgw), we came across a problem that all rgw process are stuck (all worker threads wait for the response from OSD, and start giving 500 to clients). objecter_requests dump showed the slow in flight requests were caused by one OSD, which has 2 PGs doing backfilling and it has 2 bucket index objects.
>
> At OSD side, we configure 8 threads, it turned out when this problem occurred, several op threads took seconds (even tens of seconds) handling bucket index op, with most of time waiting for the ondisk_read_lock. As a result, the throughput of the op threads drop (qlen increasing).
>
> I am wondering what options we can pursue to improve the situation, some general ideas on my mind:
>  1> Similar to OpContext::rwstate, instead of make the op thread stuck, put this op to a waiting list and notify upon lock available. I am not sure if this worth it or break anything.
>  2> Differentiate the service class at filestore level for such OP - somebody is waiting for its release of the lock. Does this break any assumption at filestore layer?
>
> As we are using EC (8+3), the fan out is more than replication pool, such kind of slow from one OSD could be cascading to more OSDs easier.
>
> BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739
>
> Look forward to your suggestions.
>
> Thanks,
> Guang                                     --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bucket index op - lock contention hang op threads
       [not found]   ` <BLU175-W1776AE5CA99C76AAE06468DF3B0@phx.gbl>
@ 2015-02-05 21:42     ` Samuel Just
  2015-02-06  0:53       ` GuangYang
  0 siblings, 1 reply; 4+ messages in thread
From: Samuel Just @ 2015-02-05 21:42 UTC (permalink / raw)
  To: GuangYang; +Cc: ceph-devel, Sage Weil

Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449.  The intention was
actually to allow writes on degraded objects for replicated pools (to
avoid a 4k rbd write blocking on a 4mb recovery), but I think it
solves this issue as well.
-Sam

On Thu, Feb 5, 2015 at 1:39 PM, GuangYang <yguang11@outlook.com> wrote:
> Thanks Sam! Do you mind sharing the pull request / commit Id of the change?
>
>> Date: Thu, 5 Feb 2015 11:52:04 -0800
>> Subject: Re: Bucket index op - lock contention hang op threads
>> From: sam.just@inktank.com
>> To: yguang11@outlook.com
>> CC: ceph-devel@vger.kernel.org; sweil@redhat.com
>
>>
>> Recent changes already merged for hammer should prevent blocking the
>> thread on the ondisk_read_lock by expanding the ObjectContext::rwstate
>> lists mostly as you suggested.
>> -Sam
>>
>> On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@outlook.com> wrote:
>> > Hi ceph-devel,
>> > In our ceph cluster (with rgw), we came across a problem that all rgw
>> > process are stuck (all worker threads wait for the response from OSD, and
>> > start giving 500 to clients). objecter_requests dump showed the slow in
>> > flight requests were caused by one OSD, which has 2 PGs doing backfilling
>> > and it has 2 bucket index objects.
>> >
>> > At OSD side, we configure 8 threads, it turned out when this problem
>> > occurred, several op threads took seconds (even tens of seconds) handling
>> > bucket index op, with most of time waiting for the ondisk_read_lock. As a
>> > result, the throughput of the op threads drop (qlen increasing).
>> >
>> > I am wondering what options we can pursue to improve the situation, some
>> > general ideas on my mind:
>> > 1> Similar to OpContext::rwstate, instead of make the op thread stuck,
>> > put this op to a waiting list and notify upon lock available. I am not sure
>> > if this worth it or break anything.
>> > 2> Differentiate the service class at filestore level for such OP -
>> > somebody is waiting for its release of the lock. Does this break any
>> > assumption at filestore layer?
>> >
>> > As we are using EC (8+3), the fan out is more than replication pool,
>> > such kind of slow from one OSD could be cascading to more OSDs easier.
>> >
>> > BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739
>> >
>> > Look forward to your suggestions.
>> >
>> > Thanks,
>> > Guang --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Bucket index op - lock contention hang op threads
  2015-02-05 21:42     ` Samuel Just
@ 2015-02-06  0:53       ` GuangYang
  0 siblings, 0 replies; 4+ messages in thread
From: GuangYang @ 2015-02-06  0:53 UTC (permalink / raw)
  To: sjust; +Cc: ceph-devel, Weil Sage

Thanks Sam! Just took a look at the patch, it should be very much helpful for our use case.

Thanks,
Guang


----------------------------------------
> Date: Thu, 5 Feb 2015 13:42:13 -0800
> Subject: Re: Bucket index op - lock contention hang op threads
> From: sam.just@inktank.com
> To: yguang11@outlook.com
> CC: ceph-devel@vger.kernel.org; sweil@redhat.com
>
> Sure, a81f3e6e61abfc7eca7743a83bf4af810705b449. The intention was
> actually to allow writes on degraded objects for replicated pools (to
> avoid a 4k rbd write blocking on a 4mb recovery), but I think it
> solves this issue as well.
> -Sam
>
> On Thu, Feb 5, 2015 at 1:39 PM, GuangYang <yguang11@outlook.com> wrote:
>> Thanks Sam! Do you mind sharing the pull request / commit Id of the change?
>>
>>> Date: Thu, 5 Feb 2015 11:52:04 -0800
>>> Subject: Re: Bucket index op - lock contention hang op threads
>>> From: sam.just@inktank.com
>>> To: yguang11@outlook.com
>>> CC: ceph-devel@vger.kernel.org; sweil@redhat.com
>>
>>>
>>> Recent changes already merged for hammer should prevent blocking the
>>> thread on the ondisk_read_lock by expanding the ObjectContext::rwstate
>>> lists mostly as you suggested.
>>> -Sam
>>>
>>> On Thu, Feb 5, 2015 at 1:36 AM, GuangYang <yguang11@outlook.com> wrote:
>>>> Hi ceph-devel,
>>>> In our ceph cluster (with rgw), we came across a problem that all rgw
>>>> process are stuck (all worker threads wait for the response from OSD, and
>>>> start giving 500 to clients). objecter_requests dump showed the slow in
>>>> flight requests were caused by one OSD, which has 2 PGs doing backfilling
>>>> and it has 2 bucket index objects.
>>>>
>>>> At OSD side, we configure 8 threads, it turned out when this problem
>>>> occurred, several op threads took seconds (even tens of seconds) handling
>>>> bucket index op, with most of time waiting for the ondisk_read_lock. As a
>>>> result, the throughput of the op threads drop (qlen increasing).
>>>>
>>>> I am wondering what options we can pursue to improve the situation, some
>>>> general ideas on my mind:
>>>> 1> Similar to OpContext::rwstate, instead of make the op thread stuck,
>>>> put this op to a waiting list and notify upon lock available. I am not sure
>>>> if this worth it or break anything.
>>>> 2> Differentiate the service class at filestore level for such OP -
>>>> somebody is waiting for its release of the lock. Does this break any
>>>> assumption at filestore layer?
>>>>
>>>> As we are using EC (8+3), the fan out is more than replication pool,
>>>> such kind of slow from one OSD could be cascading to more OSDs easier.
>>>>
>>>> BTW, I created a tracker for this - http://tracker.ceph.com/issues/10739
>>>>
>>>> Look forward to your suggestions.
>>>>
>>>> Thanks,
>>>> Guang --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
 		 	   		  

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-06  0:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-05  9:36 Bucket index op - lock contention hang op threads GuangYang
2015-02-05 19:52 ` Samuel Just
     [not found]   ` <BLU175-W1776AE5CA99C76AAE06468DF3B0@phx.gbl>
2015-02-05 21:42     ` Samuel Just
2015-02-06  0:53       ` GuangYang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.