[Lustre-devel] Imperative Recovery - forcing failover server stop blocking

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
@ 2009-06-18 23:10 Chris Horn
  2009-06-19 21:18 ` Johann Lombardi
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Horn @ 2009-06-18 23:10 UTC (permalink / raw)
  To: lustre-devel

Hello lustre-devel,

Some thoughts/questions on one aspect of Imperative Recovery.

Via Eric Barton:

"Actually imperative recovery is just explicit notification of clients
to reconnect and explicit notification of the failover server not to
block for any more clients to reconnect."

Since backup servers immediately begin replay once all clients have
reconnected we only care about the case where we have dead/dying
clients, or maybe when clients are "too slow".  In these cases we are
seeking the ability to short circuit the recovery window, however this
is equivalent to simply having a short(er) recovery window in the first
place. 

It seems as though an ability to short circuit is only going to be
useful if we can distinguish between the case where we only need a short
recovery window vs. the case where we need that extra time.  My question
is, what are the use cases where this applies?

My intuition is the following:
Case 1:  x/y clients which are dead, (y-x)/y clients connected to the
backup server (all clients that can connect have done so).  We want to
go ahead and short circuit. 

Case 2:  x/y clients which are dead, (y-x-z)/y clients connected to the
backup server (z slow clients).  We want more time for the z slow
clients to connect.

Am I missing a use case?  If not then my next question is, do we want to
distinguish between these two cases?  If we do want to distinguish
between these two cases, then imperative recovery needs a mechanism to
distinguish between them in addition to the explicit notification of
clients and explicit notification of the failover server. 

If we want to treat these cases the same then imperative recovery
reduces to allowing the recovery window timeout to be tunable (if it
isn't already), and the explicit notification of clients to reconnect
(which still nets a huge improvement over the current implementation). 
I can imagine having an ability to end a server's recovery window early
might be useful to system admins in some circumstances, but I don't see
its utility in an automated failover solution.

Chris Horn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-18 23:10 [Lustre-devel] Imperative Recovery - forcing failover server stop blocking Chris Horn
@ 2009-06-19 21:18 ` Johann Lombardi
  2009-06-19 22:10   ` Chris Horn
  0 siblings, 1 reply; 10+ messages in thread
From: Johann Lombardi @ 2009-06-19 21:18 UTC (permalink / raw)
  To: lustre-devel

On Jun 19, 2009, at 1:10 AM, Chris Horn wrote:
> Since backup servers immediately begin replay once all clients have
> reconnected we only care about the case where we have dead/dying
> clients, or maybe when clients are "too slow".

Actually, the problem is that all the clients can be considered as "too
slow". Before reconnecting to the failover partner, a client will
first wait for a request to timeout, then it will retry to connect to  
the
same server (for flappy network) and only after those 2 timeouts,
the client will attempt to connect to the the backup server.
This means that the server has to extend the recovery window
accordingly to make sure that all clients can join recovery.
The situation is even worst if you do failover with N servers or
if each target can be reached via several nids on the same host.

The idea of imperative recovery is to tell the clients to reconnect
immediately to the failover partner. This should reduce the overall
recovery time.

> In these cases we are seeking the ability to short circuit the  
> recovery window,
> however this is equivalent to simply having a short(er) recovery  
> window in the first
> place.

Well, if we just reduce the recovery window, some clients will not
join recovery which cannot fully completed (= everything is replayed).

> It seems as though an ability to short circuit is only going to be
> useful if we can distinguish between the case where we only need a  
> short
> recovery window vs. the case where we need that extra time.  My  
> question
> is, what are the use cases where this applies?
>
> My intuition is the following:
> Case 1:  x/y clients which are dead, (y-x)/y clients connected to the
> backup server (all clients that can connect have done so).  We want to
> go ahead and short circuit.

That's the 2nd aspect of imperative recovery. We want to notify the
server when all clients that were supposed to reconnect should
have done so already. Basically, the idea is to tell the server that
no new clients will reconnect now and that it is not needed to wait
any longer for new clients to join (the x clients).

> Case 2:  x/y clients which are dead, (y-x-z)/y clients connected to  
> the
> backup server (z slow clients).  We want more time for the z slow
> clients to connect.

In fact, with imperative recovery, we should no longer have slow
clients since a client no longer needs to detect the server failure
by itself, but instead will be told explicitly to reconnect to the
failover partner w/o any delay.

HTH

Cheers,
Johann

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-19 21:18 ` Johann Lombardi
@ 2009-06-19 22:10   ` Chris Horn
  2009-06-22 17:53     ` Eric Barton
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Horn @ 2009-06-19 22:10 UTC (permalink / raw)
  To: lustre-devel

Oops, I forgot to cc lustre-devel.

Johann Lombardi wrote:

> > On Jun 19, 2009, at 1:10 AM, Chris Horn wrote:
>   
>> >> It seems as though an ability to short circuit is only going to be
>> >> useful if we can distinguish between the case where we only need a short
>> >> recovery window vs. the case where we need that extra time.  My question
>> >> is, what are the use cases where this applies?
>> >>
>> >> My intuition is the following:
>> >> Case 1:  x/y clients which are dead, (y-x)/y clients connected to the
>> >> backup server (all clients that can connect have done so).  We want to
>> >> go ahead and short circuit.
>>     
> >
> > That's the 2nd aspect of imperative recovery. We want to notify the
> > server when all clients that were supposed to reconnect should
> > have done so already. Basically, the idea is to tell the server that
> > no new clients will reconnect now and that it is not needed to wait
> > any longer for new clients to join (the x clients).
>   
I just want to verify that in order to use this 2nd aspect of imperative
recovery we need some method of determining client health, yes?


Chris Horn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-19 22:10   ` Chris Horn
@ 2009-06-22 17:53     ` Eric Barton
  2009-06-22 18:21       ` Chris Horn
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Barton @ 2009-06-22 17:53 UTC (permalink / raw)
  To: lustre-devel

Chris,

Comment inline...

> -----Original Message-----
> From: lustre-devel-bounces at lists.lustre.org [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of Chris Horn
> Sent: 19 June 2009 11:11 PM
> To: lustre-devel at lists.lustre.org
> Subject: Re: [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
> 
> Oops, I forgot to cc lustre-devel.
> 
> Johann Lombardi wrote:
> 
> > > On Jun 19, 2009, at 1:10 AM, Chris Horn wrote:
> >
> >> >> It seems as though an ability to short circuit is only going to be
> >> >> useful if we can distinguish between the case where we only need a short
> >> >> recovery window vs. the case where we need that extra time.  My question
> >> >> is, what are the use cases where this applies?
> >> >>
> >> >> My intuition is the following:
> >> >> Case 1:  x/y clients which are dead, (y-x)/y clients connected to the
> >> >> backup server (all clients that can connect have done so).  We want to
> >> >> go ahead and short circuit.
> >>
> > >
> > > That's the 2nd aspect of imperative recovery. We want to notify the
> > > server when all clients that were supposed to reconnect should
> > > have done so already. Basically, the idea is to tell the server that
> > > no new clients will reconnect now and that it is not needed to wait
> > > any longer for new clients to join (the x clients).
> >
> I just want to verify that in order to use this 2nd aspect of imperative
> recovery we need some method of determining client health, yes?

Yes.  

Consider a utility that runs on a client to notify it to reconnect to a
failover server, and which completes with a success status only when the
client has reconnected successfully.

If you run this utility on all clients after starting a failover server,
you can notify the server to close the recovery window once all instances have
completed since that tells you that all clients are healthy and ready to
participate in recovery.

Of course, you can decide to stop waiting and proceed with the server
notification at any time you like.  You can base this decision on a timeout,
knowing how many clients have reconnected successfully, or any other criterion
you chose - i.e. you are now the effective arbiter of client health.

    Cheers,
              Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-22 17:53     ` Eric Barton
@ 2009-06-22 18:21       ` Chris Horn
  2009-06-22 19:27         ` Brian Behlendorf
  2009-06-23 12:49         ` Eric Barton
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Horn @ 2009-06-22 18:21 UTC (permalink / raw)
  To: lustre-devel

Eric Barton wrote:
> Consider a utility that runs on a client to notify it to reconnect to a
> failover server, and which completes with a success status only when the
> client has reconnected successfully.
>   
Would this be equivalent to monitoring the "completed_clients" field of
the recovery_status proc file?
> If you run this utility on all clients after starting a failover server,
> you can notify the server to close the recovery window once all instances have
> completed since that tells you that all clients are healthy and ready to
> participate in recovery.
>   
Won't the server already begin replay by this time, since it has
received connections from all clients?  Thus rendering our notification
to the server (to close the recovery window) redundant?
> Of course, you can decide to stop waiting and proceed with the server
> notification at any time you like.  You can base this decision on a timeout,
> knowing how many clients have reconnected successfully, or any other criterion
> you chose - i.e. you are now the effective arbiter of client health.
>   
Our initial plan was to do just this.  We would have a proxy running on
the bootnode to aggregate client responses.  It would wait some
configurable timeout period, say clnt_timeout, and if it received a # of
responses equal to obd->obd_max_recoverable_clients, it would go ahead
and notify the server to stop waiting for responses immediately (though
this is the situation described in the last comment).  If the timeout
expired it would notify the server to stop waiting.  However, it
occurred to me that we would get the same behavior by simply tuning the
server's recovery window down to whatever value we were going to assign
clnt_timeout.  It seemed we were going through an awful lot of trouble
to gain a tunable recovery_window.  I'm not sure if this is a result of
our choosing poor criterion upon which to notify the server to stop
waiting, or if there is something else (a use case perhaps) that I'm
missing.
>     Cheers,
>               Eric
>
>
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-22 18:21       ` Chris Horn
@ 2009-06-22 19:27         ` Brian Behlendorf
  2009-06-23 12:49         ` Eric Barton
  1 sibling, 0 replies; 10+ messages in thread
From: Brian Behlendorf @ 2009-06-22 19:27 UTC (permalink / raw)
  To: lustre-devel

> However, it occurred to me that we would get the same behavior by simply
> tuning the server's recovery window down to whatever value we were going
> to assign  clnt_timeout.

Chris, for similiar reasons I put together a patch to do exactly this with a 
lustre server mount option.  There is a 1.6.x version in bugziilla 18948, 
attachment 23447, and a version pending inclusion in lustre 1.8.2.  It adds 
the following two options with the idea being they can be set to whatever is 
reasonable for your system.

recovery_time_soft= timeout
Allow 'timeout' seconds for clients to reconnect for recovery after a server
crash.  This timeout will be incrementally extended if it is about to expire
and the server is still handling new connections from recoverable clients.
The default soft recovery timeout is set to 300 seconds (5 minutes).

recovery_time_hard= timeout
The server will be allowed to incrementally extend its timeout up to a hard
maximum of 'timeout' seconds.  The default hard recovery timeout is set to
900 seconds (15 minutes).

-- 
Thanks,
Brian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20090622/a0be32dc/attachment.pgp>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-22 18:21       ` Chris Horn
  2009-06-22 19:27         ` Brian Behlendorf
@ 2009-06-23 12:49         ` Eric Barton
  2009-06-23 14:53           ` Andreas Dilger
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Barton @ 2009-06-23 12:49 UTC (permalink / raw)
  To: lustre-devel


Chris,

> Eric Barton wrote:
> > Consider a utility that runs on a client to notify it to reconnect
> > to a failover server, and which completes with a success status
> > only when the client has reconnected successfully.
>
> Would this be equivalent to monitoring the "completed_clients" field
> of the recovery_status proc file?

No, this is for accounting clients that have actually completed
recovery, not clients which have reconnected and are therefore ready
to participate in recovery - you'd want 'connected_clients' for that.

But actually, counting reconnected clients is only half the story.
Currently clients don't even start to participate in recovery until
they detect an error communicating with the failed server - i.e. after
a timeout _and_ a failed reconnection attempt.  This utility
eliminates this latency by notifying the client explicitly to
reconnect NOW.

> > If you run this utility on all clients after starting a failover
> > server, you can notify the server to close the recovery window
> > once all instances have completed since that tells you that all
> > clients are healthy and ready to participate in recovery.
>
> Won't the server already begin replay by this time, since it has
> received connections from all clients?  Thus rendering our
> notification to the server (to close the recovery window) redundant?

Yes, in the optimistic event that all clients reconnected.  

> > Of course, you can decide to stop waiting and proceed with the
> > server notification at any time you like.  You can base this
> > decision on a timeout, knowing how many clients have reconnected
> > successfully, or any other criterion you chose - i.e. you are now
> > the effective arbiter of client health.
>
> Our initial plan was to do just this.  We would have a proxy running
> on the bootnode to aggregate client responses.  It would wait some
> configurable timeout period, say clnt_timeout, and if it received a
> # of responses equal to obd->obd_max_recoverable_clients, it would
> go ahead and notify the server to stop waiting for responses
> immediately (though this is the situation described in the last
> comment).  If the timeout expired it would notify the server to stop
> waiting.  However, it occurred to me that we would get the same
> behavior by simply tuning the server's recovery window down to
> whatever value we were going to assign clnt_timeout.  It seemed we
> were going through an awful lot of trouble to gain a tunable
> recovery_window.  I'm not sure if this is a result of our choosing
> poor criterion upon which to notify the server to stop waiting, or
> if there is something else (a use case perhaps) that I'm missing.

Yes, of course, you can just tune down the recovery window in the
knowledge that explicit notification has speeded the whole process of
client reconnection.  However if you have better knowledge about
client health than Lustre can have - e.g. hardware-specific health
monitoring, or just using the success/failure of the explicit
notification method itself - then why not use it to control exactly
when to stop waiting for dead clients?

-- 

        Cheers,
                   Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-23 12:49         ` Eric Barton
@ 2009-06-23 14:53           ` Andreas Dilger
  2009-06-23 14:59             ` Chris Horn
  2009-06-23 17:20             ` Robert Read
  0 siblings, 2 replies; 10+ messages in thread
From: Andreas Dilger @ 2009-06-23 14:53 UTC (permalink / raw)
  To: lustre-devel

On Jun 23, 2009  13:49 +0100, Eric Barton wrote:
> Yes, of course, you can just tune down the recovery window in the
> knowledge that explicit notification has speeded the whole process of
> client reconnection.  However if you have better knowledge about
> client health than Lustre can have - e.g. hardware-specific health
> monitoring, or just using the success/failure of the explicit
> notification method itself - then why not use it to control exactly
> when to stop waiting for dead clients?

Yes, to restate this in a different way - the only way that Lustre itself
knows that some client will NOT be participating is after the timeout has
expired.  If there is some external mechanism that can inform Lustre that
one or more clients are dead and will not be participating in recovery
then the recovery does not need to wait for the timeout.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-23 14:53           ` Andreas Dilger
@ 2009-06-23 14:59             ` Chris Horn
  2009-06-23 17:20             ` Robert Read
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Horn @ 2009-06-23 14:59 UTC (permalink / raw)
  To: lustre-devel

Okay, thanks everyone for your input.  Utilizing an external source of
client health information is not something that we had incorporated into
our initial design, so when I saw the need for this information I wanted
to ensure that it was in fact necessary.

@Brian, thanks for doing the work on tunable recovery and pointing me to
that patch.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Lustre-devel] Imperative Recovery - forcing failover server stop blocking
  2009-06-23 14:53           ` Andreas Dilger
  2009-06-23 14:59             ` Chris Horn
@ 2009-06-23 17:20             ` Robert Read
  1 sibling, 0 replies; 10+ messages in thread
From: Robert Read @ 2009-06-23 17:20 UTC (permalink / raw)
  To: lustre-devel


On Jun 23, 2009, at 07:53 , Andreas Dilger wrote:

> On Jun 23, 2009  13:49 +0100, Eric Barton wrote:
>> Yes, of course, you can just tune down the recovery window in the
>> knowledge that explicit notification has speeded the whole process of
>> client reconnection.  However if you have better knowledge about
>> client health than Lustre can have - e.g. hardware-specific health
>> monitoring, or just using the success/failure of the explicit
>> notification method itself - then why not use it to control exactly
>> when to stop waiting for dead clients?
>
> Yes, to restate this in a different way - the only way that Lustre  
> itself
> knows that some client will NOT be participating is after the  
> timeout has
> expired.  If there is some external mechanism that can inform Lustre  
> that
> one or more clients are dead and will not be participating in recovery
> then the recovery does not need to wait for the timeout.

The external mechanism should just evict the known dead clients from  
the server as soon as it discovers them so the server can begin  
recovery as soon as the live clients connect. Then we don't need to  
worry about the timeout.

robert

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-06-23 17:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-18 23:10 [Lustre-devel] Imperative Recovery - forcing failover server stop blocking Chris Horn
2009-06-19 21:18 ` Johann Lombardi
2009-06-19 22:10   ` Chris Horn
2009-06-22 17:53     ` Eric Barton
2009-06-22 18:21       ` Chris Horn
2009-06-22 19:27         ` Brian Behlendorf
2009-06-23 12:49         ` Eric Barton
2009-06-23 14:53           ` Andreas Dilger
2009-06-23 14:59             ` Chris Horn
2009-06-23 17:20             ` Robert Read

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.