All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: chaos monkeys
       [not found] <5074541B.4000504@inktank.com>
@ 2012-10-09 16:46 ` Sage Weil
       [not found] ` <CAPYLRzhqfbWcP0_oXDQmoGSmzPOEGO9DutPSRBS32kzPjuLoZw@mail.gmail.com>
  1 sibling, 0 replies; 4+ messages in thread
From: Sage Weil @ 2012-10-09 16:46 UTC (permalink / raw)
  To: Sam Lang; +Cc: ceph-devel

[moved to ceph-devel]

On Tue, 9 Oct 2012, Sam Lang wrote:
> Could we add some other chaos monkeys to the network/storage infrastructure
> besides ms_inject_socket_failures?  In particular, I would like to add
> ms_inject_delay_msg and ms_inject_reorder_msgs?  I think those could
> potentially help flush out some bugs (such as:
> https://github.com/ceph/ceph/commit/fa66eaa162542ac01752ada91a46051dde060831).
> 
> Right now I'm particularly interested in network delays, but we should
> probably add storage delays as well.
> 
> Maybe something like this already exists and I'm just not finding it?

Injecting message delays is a great idea, and pretty straightforward... 
let's do it!  Maybe a combination of 'ms inject delay max' and 'ms inject 
delay probability' (default 0)?

Reordering messages isn't allowed to happen; if we do see out of order 
delivery, it is a bug in the messenger code, and not something higher 
layeers should need to worry about.

sage

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: chaos monkeys
       [not found]   ` <507457EB.4000802@inktank.com>
@ 2012-10-09 17:16     ` Gregory Farnum
  2012-10-09 18:32       ` Sam Lang
  0 siblings, 1 reply; 4+ messages in thread
From: Gregory Farnum @ 2012-10-09 17:16 UTC (permalink / raw)
  To: Sam Lang; +Cc: ceph-devel

<also moved to ceph-devel>
On Tue, Oct 9, 2012 at 9:59 AM, Sam Lang <sam.lang@inktank.com> wrote:
> On 10/09/2012 11:46 AM, Gregory Farnum wrote:
>>
>> On Tue, Oct 9, 2012 at 9:43 AM, Sam Lang <sam.lang@inktank.com> wrote:
>>>
>>>
>>> Could we add some other chaos monkeys to the network/storage
>>> infrastructure
>>> besides ms_inject_socket_failures?  In particular, I would like to add
>>> ms_inject_delay_msg and ms_inject_reorder_msgs?  I think those could
>>> potentially help flush out some bugs (such as:
>>>
>>> https://github.com/ceph/ceph/commit/fa66eaa162542ac01752ada91a46051dde060831).
>>
>>
>> You're going to have to explain these more — ordered delivery over a
>> connection is one of the guarantees that the messaging layer provides,
>> so that doesn't sound like a configurable we're going to add.
>
>
> That's true, but there's no guarantee that the source will always send them
> in the same order.  The bug I linked above is a good example, the mds was
> sending out two messages, one the open session reply, and another the stale
> session async message.  The bug is only expressed when the stale comes
> before the open session, which is possible in some cases.  The stale
> originates from a timer expiring, and the open session is sent after the
> journal commit, so the timing (and ordering) of those two messages can vary
> based on when the timer thread gets scheduled to execute, how long the
> journal commit takes, etc.
>
> Reordering messages at the destination would act to simulate all the
> asynchronous paths like this that exist in our code.

The sending messenger also maintains ordering invariants. The endpoint
(the MDS) might not dispatch them in the same order all the time, but
that's at a different semantic layer and is not something we can
simulate inside the messenger — it requires semantic knowledge of
which messages are okay to reorder. If we just did random reordering
like you're suggesting, absolutely everything would break.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: chaos monkeys
  2012-10-09 17:16     ` Gregory Farnum
@ 2012-10-09 18:32       ` Sam Lang
  2012-10-09 18:40         ` Gregory Farnum
  0 siblings, 1 reply; 4+ messages in thread
From: Sam Lang @ 2012-10-09 18:32 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

On 10/09/2012 12:16 PM, Gregory Farnum wrote:
> <also moved to ceph-devel>
> On Tue, Oct 9, 2012 at 9:59 AM, Sam Lang <sam.lang@inktank.com> wrote:
>> On 10/09/2012 11:46 AM, Gregory Farnum wrote:
>>>
>>> On Tue, Oct 9, 2012 at 9:43 AM, Sam Lang <sam.lang@inktank.com> wrote:
>>>>
>>>>
>>>> Could we add some other chaos monkeys to the network/storage
>>>> infrastructure
>>>> besides ms_inject_socket_failures?  In particular, I would like to add
>>>> ms_inject_delay_msg and ms_inject_reorder_msgs?  I think those could
>>>> potentially help flush out some bugs (such as:
>>>>
>>>> https://github.com/ceph/ceph/commit/fa66eaa162542ac01752ada91a46051dde060831).
>>>
>>>
>>> You're going to have to explain these more — ordered delivery over a
>>> connection is one of the guarantees that the messaging layer provides,
>>> so that doesn't sound like a configurable we're going to add.
>>
>>
>> That's true, but there's no guarantee that the source will always send them
>> in the same order.  The bug I linked above is a good example, the mds was
>> sending out two messages, one the open session reply, and another the stale
>> session async message.  The bug is only expressed when the stale comes
>> before the open session, which is possible in some cases.  The stale
>> originates from a timer expiring, and the open session is sent after the
>> journal commit, so the timing (and ordering) of those two messages can vary
>> based on when the timer thread gets scheduled to execute, how long the
>> journal commit takes, etc.
>>
>> Reordering messages at the destination would act to simulate all the
>> asynchronous paths like this that exist in our code.
>
> The sending messenger also maintains ordering invariants. The endpoint
> (the MDS) might not dispatch them in the same order all the time, but
> that's at a different semantic layer and is not something we can
> simulate inside the messenger — it requires semantic knowledge of
> which messages are okay to reorder. If we just did random reordering
> like you're suggesting, absolutely everything would break.

Putting a delay on the sender would avoid the reordering of messages 
that have semantic meaning but allow delay-caused reordering to occur 
for those that have no semantic dependency.

You're right that reordering at the receiver won't work, but it would be 
nice to have more concrete examples.  The only example I can come up 
with is the unsafe/safe messages from mds to client.  Even in that case 
it looks like we handle it by throwing away the unsafe message.  What 
other examples exist?  Caps issue/revoke?

-sam


>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: chaos monkeys
  2012-10-09 18:32       ` Sam Lang
@ 2012-10-09 18:40         ` Gregory Farnum
  0 siblings, 0 replies; 4+ messages in thread
From: Gregory Farnum @ 2012-10-09 18:40 UTC (permalink / raw)
  To: Sam Lang; +Cc: ceph-devel

On Tue, Oct 9, 2012 at 11:32 AM, Sam Lang <sam.lang@inktank.com> wrote:
> Putting a delay on the sender would avoid the reordering of messages that
> have semantic meaning but allow delay-caused reordering to occur for those
> that have no semantic dependency.
>
> You're right that reordering at the receiver won't work, but it would be
> nice to have more concrete examples.  The only example I can come up with is
> the unsafe/safe messages from mds to client.  Even in that case it looks
> like we handle it by throwing away the unsafe message.  What other examples
> exist?  Caps issue/revoke?

in the OSDs then requests to the same object are all strictly ordered,
and responses need to be ordered in the same way — everybody through
the whole chain asserts out if that's not the case.

I haven't thought it through on the MDS, but yeah, caps messages on
the same inode (or inodes in a hierarchical relationship) all need to
be ordered. Most of the rest of its messages I can come up with aren't
going to have semantic meanings without the client already waiting, so
it's got fewer problems than the OSDs do.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-10-09 18:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5074541B.4000504@inktank.com>
2012-10-09 16:46 ` chaos monkeys Sage Weil
     [not found] ` <CAPYLRzhqfbWcP0_oXDQmoGSmzPOEGO9DutPSRBS32kzPjuLoZw@mail.gmail.com>
     [not found]   ` <507457EB.4000802@inktank.com>
2012-10-09 17:16     ` Gregory Farnum
2012-10-09 18:32       ` Sam Lang
2012-10-09 18:40         ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.