All of lore.kernel.org
 help / color / mirror / Atom feed
* MonClient hunt interval
@ 2016-01-25 13:14 Ilya Dryomov
  2016-01-25 14:45 ` Gregory Farnum
  0 siblings, 1 reply; 5+ messages in thread
From: Ilya Dryomov @ 2016-01-25 13:14 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ceph Development

Hi Greg,

With 794c86fd289b ("monc: backoff the timeout period when
reconnecting") you made it so that the backoff is applied to the hunt
interval.  When the session is established, the multiplier is reduced
by 50% and that's it - I don't see any per-tick reduction or anything
like that.

If a client had some bad luck and couldn't establish the session for
a while (so that the multiplier went all the way up to 10), its initial
timeout upon the next connection break is going to be 15 seconds no
matter how much time has passed in the interim.  Was that your intent?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MonClient hunt interval
  2016-01-25 13:14 MonClient hunt interval Ilya Dryomov
@ 2016-01-25 14:45 ` Gregory Farnum
  2016-01-25 15:03   ` Ilya Dryomov
  0 siblings, 1 reply; 5+ messages in thread
From: Gregory Farnum @ 2016-01-25 14:45 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

On Mon, Jan 25, 2016 at 5:14 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> Hi Greg,
>
> With 794c86fd289b ("monc: backoff the timeout period when
> reconnecting") you made it so that the backoff is applied to the hunt
> interval.  When the session is established, the multiplier is reduced
> by 50% and that's it - I don't see any per-tick reduction or anything
> like that.
>
> If a client had some bad luck and couldn't establish the session for
> a while (so that the multiplier went all the way up to 10), its initial
> timeout upon the next connection break is going to be 15 seconds no
> matter how much time has passed in the interim.  Was that your intent?

I don't remember this, but looking at the sha I logged that behavior
in the commit message, so I'd have to say "yes". As it says, we're
trying to respond to monitor load; if they're doing so badly that we
had to increase our timeout when re-establishing a session, there's
every chance it will continue to be slow. If we reset the timeout back
to default, we'd have to go through a lot more monitor-punishing
timeout rounds on the next failure than just cutting it in half would
take.
-Greg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MonClient hunt interval
  2016-01-25 14:45 ` Gregory Farnum
@ 2016-01-25 15:03   ` Ilya Dryomov
  2016-01-25 15:20     ` Gregory Farnum
  0 siblings, 1 reply; 5+ messages in thread
From: Ilya Dryomov @ 2016-01-25 15:03 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ceph Development

On Mon, Jan 25, 2016 at 3:45 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
> On Mon, Jan 25, 2016 at 5:14 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>> Hi Greg,
>>
>> With 794c86fd289b ("monc: backoff the timeout period when
>> reconnecting") you made it so that the backoff is applied to the hunt
>> interval.  When the session is established, the multiplier is reduced
>> by 50% and that's it - I don't see any per-tick reduction or anything
>> like that.
>>
>> If a client had some bad luck and couldn't establish the session for
>> a while (so that the multiplier went all the way up to 10), its initial
>> timeout upon the next connection break is going to be 15 seconds no
>> matter how much time has passed in the interim.  Was that your intent?
>
> I don't remember this, but looking at the sha I logged that behavior
> in the commit message, so I'd have to say "yes". As it says, we're
> trying to respond to monitor load; if they're doing so badly that we
> had to increase our timeout when re-establishing a session, there's
> every chance it will continue to be slow. If we reset the timeout back
> to default, we'd have to go through a lot more monitor-punishing
> timeout rounds on the next failure than just cutting it in half would
> take.

The timeout could have been increased due to intermittent networking
issues between the client and the monitor cluster.  The problem I see
here is that once it's increased to 30s, it's effectively never
decreased - since it's cut in half only once, that MonClient instance
is stuck with 15s as its initial timeout forever.

I'm not advocating resetting it back to default right away, it's just
I expected to see some kind of slow backoff back to default.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MonClient hunt interval
  2016-01-25 15:03   ` Ilya Dryomov
@ 2016-01-25 15:20     ` Gregory Farnum
  2016-01-25 15:29       ` Ilya Dryomov
  0 siblings, 1 reply; 5+ messages in thread
From: Gregory Farnum @ 2016-01-25 15:20 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

On Mon, Jan 25, 2016 at 7:03 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> On Mon, Jan 25, 2016 at 3:45 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
>> On Mon, Jan 25, 2016 at 5:14 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>> Hi Greg,
>>>
>>> With 794c86fd289b ("monc: backoff the timeout period when
>>> reconnecting") you made it so that the backoff is applied to the hunt
>>> interval.  When the session is established, the multiplier is reduced
>>> by 50% and that's it - I don't see any per-tick reduction or anything
>>> like that.
>>>
>>> If a client had some bad luck and couldn't establish the session for
>>> a while (so that the multiplier went all the way up to 10), its initial
>>> timeout upon the next connection break is going to be 15 seconds no
>>> matter how much time has passed in the interim.  Was that your intent?
>>
>> I don't remember this, but looking at the sha I logged that behavior
>> in the commit message, so I'd have to say "yes". As it says, we're
>> trying to respond to monitor load; if they're doing so badly that we
>> had to increase our timeout when re-establishing a session, there's
>> every chance it will continue to be slow. If we reset the timeout back
>> to default, we'd have to go through a lot more monitor-punishing
>> timeout rounds on the next failure than just cutting it in half would
>> take.
>
> The timeout could have been increased due to intermittent networking
> issues between the client and the monitor cluster.  The problem I see
> here is that once it's increased to 30s, it's effectively never
> decreased - since it's cut in half only once, that MonClient instance
> is stuck with 15s as its initial timeout forever.
>
> I'm not advocating resetting it back to default right away, it's just
> I expected to see some kind of slow backoff back to default.

Mmm, that might make sense. There's just also a limit to how much this
is worth worrying about — longer timeouts are bad only in the presence
of actually-dead monitors, and only when your connection to one of the
monitors dies. Any sort of gradual decay here would require more
complicated state and some mechanism for determining the monitors have
gotten happy now. Maybe you could feed it in based on response times
of other requests...
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MonClient hunt interval
  2016-01-25 15:20     ` Gregory Farnum
@ 2016-01-25 15:29       ` Ilya Dryomov
  0 siblings, 0 replies; 5+ messages in thread
From: Ilya Dryomov @ 2016-01-25 15:29 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Ceph Development

On Mon, Jan 25, 2016 at 4:20 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
> On Mon, Jan 25, 2016 at 7:03 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>> On Mon, Jan 25, 2016 at 3:45 PM, Gregory Farnum <gfarnum@redhat.com> wrote:
>>> On Mon, Jan 25, 2016 at 5:14 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>>>> Hi Greg,
>>>>
>>>> With 794c86fd289b ("monc: backoff the timeout period when
>>>> reconnecting") you made it so that the backoff is applied to the hunt
>>>> interval.  When the session is established, the multiplier is reduced
>>>> by 50% and that's it - I don't see any per-tick reduction or anything
>>>> like that.
>>>>
>>>> If a client had some bad luck and couldn't establish the session for
>>>> a while (so that the multiplier went all the way up to 10), its initial
>>>> timeout upon the next connection break is going to be 15 seconds no
>>>> matter how much time has passed in the interim.  Was that your intent?
>>>
>>> I don't remember this, but looking at the sha I logged that behavior
>>> in the commit message, so I'd have to say "yes". As it says, we're
>>> trying to respond to monitor load; if they're doing so badly that we
>>> had to increase our timeout when re-establishing a session, there's
>>> every chance it will continue to be slow. If we reset the timeout back
>>> to default, we'd have to go through a lot more monitor-punishing
>>> timeout rounds on the next failure than just cutting it in half would
>>> take.
>>
>> The timeout could have been increased due to intermittent networking
>> issues between the client and the monitor cluster.  The problem I see
>> here is that once it's increased to 30s, it's effectively never
>> decreased - since it's cut in half only once, that MonClient instance
>> is stuck with 15s as its initial timeout forever.
>>
>> I'm not advocating resetting it back to default right away, it's just
>> I expected to see some kind of slow backoff back to default.
>
> Mmm, that might make sense. There's just also a limit to how much this
> is worth worrying about — longer timeouts are bad only in the presence
> of actually-dead monitors, and only when your connection to one of the
> monitors dies. Any sort of gradual decay here would require more
> complicated state and some mechanism for determining the monitors have
> gotten happy now. Maybe you could feed it in based on response times
> of other requests...

Well, a *really* slow decay might not need to check for whether the
monitors are happy or not and so won't require any additional state.
Anyway, I'm not super worried about this either - I'm bringing it into
the kernel client and just wanted to make sure it behaves as intended
before I merge it in.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-25 15:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-25 13:14 MonClient hunt interval Ilya Dryomov
2016-01-25 14:45 ` Gregory Farnum
2016-01-25 15:03   ` Ilya Dryomov
2016-01-25 15:20     ` Gregory Farnum
2016-01-25 15:29       ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.