proposal : read/write threads and handles separation in civetweb and rgw

All of lore.kernel.org
 help / color / mirror / Atom feed

* proposal : read/write threads and handles separation in civetweb and rgw
@ 2017-07-10 11:47 Abhishek Varshney
       [not found] ` <CAKOnarkBg16Cxxa2eTFbMT8K_BMqPbsnoJhjDyfzWxen-boYow@mail.gmail.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Abhishek Varshney @ 2017-07-10 11:47 UTC (permalink / raw)
  To: Ceph Development

TL;DR
---------
The proposal is to separate out read and write threads/handles in
civetweb/rgw to reduce the blast radius in case of an outage caused
due to one type of op (GET or PUT) being blocked or latent. Proposal
PR : https://github.com/ceph/civetweb/pull/21

Problem Statment
------------------------
Our production clusters, primarily running object gateway workloads on
hammer, have quite a few times seen one type of op (GET or PUT) being
blocked or latent due to different reasons. This have resulted in a
complete outage with rgw becoming totally un-responsive and unable to
accept connections. After root causing the issue, it is found that
there is no separation of resources, threads and handles at civetweb
and rgw layers, which causes a complete blackout.

Scenarios
--------------
Some scenarios which are known to block one kind of op (GET or PUT).

* PUTs are blocked when pool with bucket index is degraded. We have
large omap objects, recovery/rebalancing of which is known to block
PUT ops for longer duration of times ( ~ couple of hours). We are
working to address this issue separately also.

* GETs are blocked when rgw data pool (which is front-ended by a
writeback cache tier on a different crush root) is degraded.

There could be other such scenarios too.

Proposed Approach
---------------------------
The proposal here is to separate read and write resources in terms of
threads in civetweb and rados handles in rgw which would help to limit
the blast radius and reduce the impact of any outage that may happen.

* civetweb : currently in civetweb, there is a common pool of worker
threads which consume sockets from a queue to process. In case of
blocked requests in ceph, the queue becomes full and civetweb master
thread is stuck in a loop waiting for the queue to become empty [1]
and is unable to process any more requests.

The proposal is to introduce 2 additional queues, a read connection
queue and a write connection queue along with a dispatcher thread
which picks sockets from the socket queue and puts them to one of
these queues based on the type of the op. In case, a queue is full,
the dispatcher thread would return a 503 instead of waiting for that
queue to be empty again.

This is supposed to limit failures and thus improve the availability
of the clusters.

The ideas described above are presented in the form of a PR here :
https://github.com/ceph/civetweb/pull/21

* rgw : while the proposed changes in civetweb should give major
returns, next level of optimisations can be done in rgw, where the
rados handles can be separated again based on the type of op, so that
civetweb worker threads dont end up contending on rados handles.

Would love to hear suggestions, opinions and feedback from the community.

PS : Due to lack of a proper branch which keeps track of latest branch
of civetweb and as per the suggestions received from the irc channel,
the PR is raised against wip-listen4 branch of civetweb.

1. https://github.com/ceph/civetweb/blob/wip-listen4/src/civetweb.c#L12558

Thanks
Abhishek Varshney

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: proposal : read/write threads and handles separation in civetweb and rgw
       [not found] ` <CAKOnarkBg16Cxxa2eTFbMT8K_BMqPbsnoJhjDyfzWxen-boYow@mail.gmail.com>
@ 2017-07-11  4:09   ` Matt Benjamin
  2017-07-11  5:33     ` Abhishek Varshney
  2017-07-11 13:22     ` Sage Weil
  0 siblings, 2 replies; 4+ messages in thread
From: Matt Benjamin @ 2017-07-11  4:09 UTC (permalink / raw)
  Cc: Ceph Development

Hi Abhishek,

There are plans in place to provide for enhanced scheduling and
fairness intrinsically, somewhat in tandem with new front-end
(boost::asio/beast) and librados interfacing work by Adam.  I'm not
clear whether this proposal advances that goal, or not.  It seems like
it adds complexity that we won't want to retain for the long term, but
maybe it's helpful in ways I don't understand yet.

It seems like it would definitely make sense to have a focused
discussion in one of our standups of the broader issues, approaches
being taken, and so on.

regards,

Matt

On Mon, Jul 10, 2017 at 8:01 AM, Matt Benjamin <mbenjami@redhat.com> wrote:
> Hi Abhishek,
>
> There are plans in place to provide for enhanced scheduling and fairness
> intrinsically, somewhat in tandem with new front-end (boost::asio/beast) and
> librados interfacing work by Adam.  I'm not clear whether this proposal
> advances that goal, or not.  It seems like it adds complexity that we won't
> want to retain for the long term, but maybe it's helpful in ways I don't
> understand yet.
>
> It seems like it would definitely make sense to have a focused discussion in
> one of our standups of the broader issues, approaches being taken, and so
> on.
>
> regards,
>
> Matt
>
>
> On Mon, Jul 10, 2017 at 7:47 AM, Abhishek Varshney
> <abhishek.varshney@flipkart.com> wrote:
>>
>> TL;DR
>> ---------
>> The proposal is to separate out read and write threads/handles in
>> civetweb/rgw to reduce the blast radius in case of an outage caused
>> due to one type of op (GET or PUT) being blocked or latent. Proposal
>> PR : https://github.com/ceph/civetweb/pull/21
>>
>> Problem Statment
>> ------------------------
>> Our production clusters, primarily running object gateway workloads on
>> hammer, have quite a few times seen one type of op (GET or PUT) being
>> blocked or latent due to different reasons. This have resulted in a
>> complete outage with rgw becoming totally un-responsive and unable to
>> accept connections. After root causing the issue, it is found that
>> there is no separation of resources, threads and handles at civetweb
>> and rgw layers, which causes a complete blackout.
>>
>> Scenarios
>> --------------
>> Some scenarios which are known to block one kind of op (GET or PUT).
>>
>> * PUTs are blocked when pool with bucket index is degraded. We have
>> large omap objects, recovery/rebalancing of which is known to block
>> PUT ops for longer duration of times ( ~ couple of hours). We are
>> working to address this issue separately also.
>>
>> * GETs are blocked when rgw data pool (which is front-ended by a
>> writeback cache tier on a different crush root) is degraded.
>>
>> There could be other such scenarios too.
>>
>> Proposed Approach
>> ---------------------------
>> The proposal here is to separate read and write resources in terms of
>> threads in civetweb and rados handles in rgw which would help to limit
>> the blast radius and reduce the impact of any outage that may happen.
>>
>> * civetweb : currently in civetweb, there is a common pool of worker
>> threads which consume sockets from a queue to process. In case of
>> blocked requests in ceph, the queue becomes full and civetweb master
>> thread is stuck in a loop waiting for the queue to become empty [1]
>> and is unable to process any more requests.
>>
>> The proposal is to introduce 2 additional queues, a read connection
>> queue and a write connection queue along with a dispatcher thread
>> which picks sockets from the socket queue and puts them to one of
>> these queues based on the type of the op. In case, a queue is full,
>> the dispatcher thread would return a 503 instead of waiting for that
>> queue to be empty again.
>>
>> This is supposed to limit failures and thus improve the availability
>> of the clusters.
>>
>> The ideas described above are presented in the form of a PR here :
>> https://github.com/ceph/civetweb/pull/21
>>
>> * rgw : while the proposed changes in civetweb should give major
>> returns, next level of optimisations can be done in rgw, where the
>> rados handles can be separated again based on the type of op, so that
>> civetweb worker threads dont end up contending on rados handles.
>>
>> Would love to hear suggestions, opinions and feedback from the community.
>>
>> PS : Due to lack of a proper branch which keeps track of latest branch
>> of civetweb and as per the suggestions received from the irc channel,
>> the PR is raised against wip-listen4 branch of civetweb.
>>
>> 1. https://github.com/ceph/civetweb/blob/wip-listen4/src/civetweb.c#L12558
>>
>> Thanks
>> Abhishek Varshney
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: proposal : read/write threads and handles separation in civetweb and rgw
  2017-07-11  4:09   ` Matt Benjamin
@ 2017-07-11  5:33     ` Abhishek Varshney
  2017-07-11 13:22     ` Sage Weil
  1 sibling, 0 replies; 4+ messages in thread
From: Abhishek Varshney @ 2017-07-11  5:33 UTC (permalink / raw)
  To: Matt Benjamin; +Cc: Ceph Development

Hi Matt,

On Tue, Jul 11, 2017 at 9:39 AM, Matt Benjamin <mbenjami@redhat.com> wrote:
> Hi Abhishek,
>
> There are plans in place to provide for enhanced scheduling and
> fairness intrinsically, somewhat in tandem with new front-end
> (boost::asio/beast) and librados interfacing work by Adam.  I'm not

Where can I get more details on this work?

> clear whether this proposal advances that goal, or not.  It seems like
> it adds complexity that we won't want to retain for the long term, but
> maybe it's helpful in ways I don't understand yet.

Right. The proposed approach may not be the best way to solve for
fairness and QoS end-to-end. Looking forward to the things already in
roadmap as you mentioned.

>
> It seems like it would definitely make sense to have a focused
> discussion in one of our standups of the broader issues, approaches
> being taken, and so on.
>
> regards,
>
> Matt
>
> On Mon, Jul 10, 2017 at 8:01 AM, Matt Benjamin <mbenjami@redhat.com> wrote:
>> Hi Abhishek,
>>
>> There are plans in place to provide for enhanced scheduling and fairness
>> intrinsically, somewhat in tandem with new front-end (boost::asio/beast) and
>> librados interfacing work by Adam.  I'm not clear whether this proposal
>> advances that goal, or not.  It seems like it adds complexity that we won't
>> want to retain for the long term, but maybe it's helpful in ways I don't
>> understand yet.
>>
>> It seems like it would definitely make sense to have a focused discussion in
>> one of our standups of the broader issues, approaches being taken, and so
>> on.
>>
>> regards,
>>
>> Matt
>>
>>
>> On Mon, Jul 10, 2017 at 7:47 AM, Abhishek Varshney
>> <abhishek.varshney@flipkart.com> wrote:
>>>
>>> TL;DR
>>> ---------
>>> The proposal is to separate out read and write threads/handles in
>>> civetweb/rgw to reduce the blast radius in case of an outage caused
>>> due to one type of op (GET or PUT) being blocked or latent. Proposal
>>> PR : https://github.com/ceph/civetweb/pull/21
>>>
>>> Problem Statment
>>> ------------------------
>>> Our production clusters, primarily running object gateway workloads on
>>> hammer, have quite a few times seen one type of op (GET or PUT) being
>>> blocked or latent due to different reasons. This have resulted in a
>>> complete outage with rgw becoming totally un-responsive and unable to
>>> accept connections. After root causing the issue, it is found that
>>> there is no separation of resources, threads and handles at civetweb
>>> and rgw layers, which causes a complete blackout.
>>>
>>> Scenarios
>>> --------------
>>> Some scenarios which are known to block one kind of op (GET or PUT).
>>>
>>> * PUTs are blocked when pool with bucket index is degraded. We have
>>> large omap objects, recovery/rebalancing of which is known to block
>>> PUT ops for longer duration of times ( ~ couple of hours). We are
>>> working to address this issue separately also.
>>>
>>> * GETs are blocked when rgw data pool (which is front-ended by a
>>> writeback cache tier on a different crush root) is degraded.
>>>
>>> There could be other such scenarios too.
>>>
>>> Proposed Approach
>>> ---------------------------
>>> The proposal here is to separate read and write resources in terms of
>>> threads in civetweb and rados handles in rgw which would help to limit
>>> the blast radius and reduce the impact of any outage that may happen.
>>>
>>> * civetweb : currently in civetweb, there is a common pool of worker
>>> threads which consume sockets from a queue to process. In case of
>>> blocked requests in ceph, the queue becomes full and civetweb master
>>> thread is stuck in a loop waiting for the queue to become empty [1]
>>> and is unable to process any more requests.
>>>
>>> The proposal is to introduce 2 additional queues, a read connection
>>> queue and a write connection queue along with a dispatcher thread
>>> which picks sockets from the socket queue and puts them to one of
>>> these queues based on the type of the op. In case, a queue is full,
>>> the dispatcher thread would return a 503 instead of waiting for that
>>> queue to be empty again.
>>>
>>> This is supposed to limit failures and thus improve the availability
>>> of the clusters.
>>>
>>> The ideas described above are presented in the form of a PR here :
>>> https://github.com/ceph/civetweb/pull/21
>>>
>>> * rgw : while the proposed changes in civetweb should give major
>>> returns, next level of optimisations can be done in rgw, where the
>>> rados handles can be separated again based on the type of op, so that
>>> civetweb worker threads dont end up contending on rados handles.
>>>
>>> Would love to hear suggestions, opinions and feedback from the community.
>>>
>>> PS : Due to lack of a proper branch which keeps track of latest branch
>>> of civetweb and as per the suggestions received from the irc channel,
>>> the PR is raised against wip-listen4 branch of civetweb.
>>>
>>> 1. https://github.com/ceph/civetweb/blob/wip-listen4/src/civetweb.c#L12558
>>>
>>> Thanks
>>> Abhishek Varshney
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: proposal : read/write threads and handles separation in civetweb and rgw
  2017-07-11  4:09   ` Matt Benjamin
  2017-07-11  5:33     ` Abhishek Varshney
@ 2017-07-11 13:22     ` Sage Weil
  1 sibling, 0 replies; 4+ messages in thread
From: Sage Weil @ 2017-07-11 13:22 UTC (permalink / raw)
  To: Matt Benjamin; +Cc: Ceph Development

On Tue, 11 Jul 2017, Matt Benjamin wrote:
> Hi Abhishek,
> 
> There are plans in place to provide for enhanced scheduling and
> fairness intrinsically, somewhat in tandem with new front-end
> (boost::asio/beast) and librados interfacing work by Adam.  I'm not
> clear whether this proposal advances that goal, or not.  It seems like
> it adds complexity that we won't want to retain for the long term, but
> maybe it's helpful in ways I don't understand yet.
> 
> It seems like it would definitely make sense to have a focused
> discussion in one of our standups of the broader issues, approaches
> being taken, and so on.

The (currently empty) agenda for the next CDM (Aug 2) is here:

	http://tracker.ceph.com/projects/ceph/wiki/CDM_02-AUG-2017?parent=Planning

sage

> regards,
> 
> Matt
> 
> On Mon, Jul 10, 2017 at 8:01 AM, Matt Benjamin <mbenjami@redhat.com> wrote:
> > Hi Abhishek,
> >
> > There are plans in place to provide for enhanced scheduling and fairness
> > intrinsically, somewhat in tandem with new front-end (boost::asio/beast) and
> > librados interfacing work by Adam.  I'm not clear whether this proposal
> > advances that goal, or not.  It seems like it adds complexity that we won't
> > want to retain for the long term, but maybe it's helpful in ways I don't
> > understand yet.
> >
> > It seems like it would definitely make sense to have a focused discussion in
> > one of our standups of the broader issues, approaches being taken, and so
> > on.
> >
> > regards,
> >
> > Matt
> >
> >
> > On Mon, Jul 10, 2017 at 7:47 AM, Abhishek Varshney
> > <abhishek.varshney@flipkart.com> wrote:
> >>
> >> TL;DR
> >> ---------
> >> The proposal is to separate out read and write threads/handles in
> >> civetweb/rgw to reduce the blast radius in case of an outage caused
> >> due to one type of op (GET or PUT) being blocked or latent. Proposal
> >> PR : https://github.com/ceph/civetweb/pull/21
> >>
> >> Problem Statment
> >> ------------------------
> >> Our production clusters, primarily running object gateway workloads on
> >> hammer, have quite a few times seen one type of op (GET or PUT) being
> >> blocked or latent due to different reasons. This have resulted in a
> >> complete outage with rgw becoming totally un-responsive and unable to
> >> accept connections. After root causing the issue, it is found that
> >> there is no separation of resources, threads and handles at civetweb
> >> and rgw layers, which causes a complete blackout.
> >>
> >> Scenarios
> >> --------------
> >> Some scenarios which are known to block one kind of op (GET or PUT).
> >>
> >> * PUTs are blocked when pool with bucket index is degraded. We have
> >> large omap objects, recovery/rebalancing of which is known to block
> >> PUT ops for longer duration of times ( ~ couple of hours). We are
> >> working to address this issue separately also.
> >>
> >> * GETs are blocked when rgw data pool (which is front-ended by a
> >> writeback cache tier on a different crush root) is degraded.
> >>
> >> There could be other such scenarios too.
> >>
> >> Proposed Approach
> >> ---------------------------
> >> The proposal here is to separate read and write resources in terms of
> >> threads in civetweb and rados handles in rgw which would help to limit
> >> the blast radius and reduce the impact of any outage that may happen.
> >>
> >> * civetweb : currently in civetweb, there is a common pool of worker
> >> threads which consume sockets from a queue to process. In case of
> >> blocked requests in ceph, the queue becomes full and civetweb master
> >> thread is stuck in a loop waiting for the queue to become empty [1]
> >> and is unable to process any more requests.
> >>
> >> The proposal is to introduce 2 additional queues, a read connection
> >> queue and a write connection queue along with a dispatcher thread
> >> which picks sockets from the socket queue and puts them to one of
> >> these queues based on the type of the op. In case, a queue is full,
> >> the dispatcher thread would return a 503 instead of waiting for that
> >> queue to be empty again.
> >>
> >> This is supposed to limit failures and thus improve the availability
> >> of the clusters.
> >>
> >> The ideas described above are presented in the form of a PR here :
> >> https://github.com/ceph/civetweb/pull/21
> >>
> >> * rgw : while the proposed changes in civetweb should give major
> >> returns, next level of optimisations can be done in rgw, where the
> >> rados handles can be separated again based on the type of op, so that
> >> civetweb worker threads dont end up contending on rados handles.
> >>
> >> Would love to hear suggestions, opinions and feedback from the community.
> >>
> >> PS : Due to lack of a proper branch which keeps track of latest branch
> >> of civetweb and as per the suggestions received from the irc channel,
> >> the PR is raised against wip-listen4 branch of civetweb.
> >>
> >> 1. https://github.com/ceph/civetweb/blob/wip-listen4/src/civetweb.c#L12558
> >>
> >> Thanks
> >> Abhishek Varshney
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-07-11 13:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-10 11:47 proposal : read/write threads and handles separation in civetweb and rgw Abhishek Varshney
     [not found] ` <CAKOnarkBg16Cxxa2eTFbMT8K_BMqPbsnoJhjDyfzWxen-boYow@mail.gmail.com>
2017-07-11  4:09   ` Matt Benjamin
2017-07-11  5:33     ` Abhishek Varshney
2017-07-11 13:22     ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.