All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] server-side resending & bulk transfer
       [not found] <20100205163524.GW236@granier.hd.free.fr>
@ 2010-02-05 17:12 ` Eric Barton
  2010-02-05 20:20   ` Nicolas Williams
  2010-02-09 19:21   ` Nathan Rutman
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Barton @ 2010-02-05 17:12 UTC (permalink / raw)
  To: lustre-devel

Johann,

cc-ing lustre-devel.

Yes, the server could retry the bulk if it times out and this
will be safe for the client since its bulk buffer is auto-unlinked,
so only 1 bulk PUT/GET can match it.  But if the problem happens
on the way back to the server rather than the way out to the client,
you're hosed since the bulk has completed from the client's POV.

This should be an exceptional circumstance - i.e. a router has
actually failed - so I think it's better just to stick with the
client retrying from scratch rather than tying down a server thread
until it has decided whether there was a router failure or the
client really crashed.

Roll on the health network! :)

    Cheers,
              Eric

> -----Original Message-----
> From: Johann Lombardi [mailto:johann at sun.com]
> Sent: 05 February 2010 4:35 PM
> To: lustre-tech-leads at sun.com
> Subject: server-side resending & bulk transfer
> 
> Hi,
> 
> As you know, the most important part of server-side resending is to resend
> lock callbacks since a lost of such a message ends up with a client eviction
> (except for glimpses which are resent indefinitely causing other problems).
> 
> That being said, another aspect is losing a message during bulk transfer, and
> more particularly the start bulk signal issued by LNET.
> Unlike lock callback rpcs, losing the start bulk signal is not fatal since
> the bulk transfer will timeout on the server side, the request be dropped
> and the client will resend after reconnection. This is indeed harmless,
> but still causes slowdown which could be avoided according to LLNL if we
> try to resend the start bulk signal (bug 21714). Brian Behlendorf's
> proposal is to resend the start bulk signal after the first l_wait_event()
> timeout in ost_brw_write(). However, we don't know if this is safe to do,
> e.g. how does the client react if it receives duplicated start bulk signals?
> 
> Johann

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Lustre-devel] server-side resending & bulk transfer
  2010-02-05 17:12 ` [Lustre-devel] server-side resending & bulk transfer Eric Barton
@ 2010-02-05 20:20   ` Nicolas Williams
  2010-02-06 12:28     ` Eric Barton
  2010-02-09 19:21   ` Nathan Rutman
  1 sibling, 1 reply; 4+ messages in thread
From: Nicolas Williams @ 2010-02-05 20:20 UTC (permalink / raw)
  To: lustre-devel

On Fri, Feb 05, 2010 at 05:12:51PM +0000, Eric Barton wrote:
> On Feb 5, 2010, at 8:35 AM, Johann Lombardi wrote:
> > Unlike lock callback rpcs, losing the start bulk signal is not fatal since
> > the bulk transfer will timeout on the server side, the request be dropped
> > and the client will resend after reconnection. This is indeed harmless,
> > but still causes slowdown which could be avoided according to LLNL if we
> > try to resend the start bulk signal (bug 21714). Brian Behlendorf's
> > proposal is to resend the start bulk signal after the first l_wait_event()
> > timeout in ost_brw_write(). However, we don't know if this is safe to do,
> > e.g. how does the client react if it receives duplicated start bulk signals?
> 
> Yes, the server could retry the bulk if it times out and this
> will be safe for the client since its bulk buffer is auto-unlinked,
> so only 1 bulk PUT/GET can match it.  But if the problem happens
> on the way back to the server rather than the way out to the client,
> you're hosed since the bulk has completed from the client's POV.
> 
> This should be an exceptional circumstance - i.e. a router has
> actually failed - so I think it's better just to stick with the
> client retrying from scratch rather than tying down a server thread
> until it has decided whether there was a router failure or the
> client really crashed.

I agree that tying down a server thread on a long block is not a good
thing.  If the LLNL proposal (resend the start bulk signal) is on the
money, then the thing to do would be to create a queue and separate
service thread(s) to handle such resends.

> Roll on the health network! :)

Well, if the deadline here is on the order of 1s or thereabouts then the
health network isn't likely to help much because we're not going to get
sub-second dead node detection.  (Well, if we jack up the ping rate and
reduce the time-to-declare-death low enough, and make sure that HN
threads and messaging are suitably prioritized, then we might be able to
get sub-second dead node detection, but my gut feeling is that any
heuristic approach should wait for longer than 1s.)

Nico
-- 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Lustre-devel] server-side resending & bulk transfer
  2010-02-05 20:20   ` Nicolas Williams
@ 2010-02-06 12:28     ` Eric Barton
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Barton @ 2010-02-06 12:28 UTC (permalink / raw)
  To: lustre-devel


Nico,

> -----Original Message-----
> From: Nicolas Williams [mailto:Nicolas.Williams at Sun.COM]
> Sent: 05 February 2010 8:20 PM
> To: Eric Barton
> Cc: 'Johann Lombardi'; lustre-devel at lists.lustre.org
> Subject: Re: [Lustre-devel] server-side resending & bulk transfer

<snip>
 
> I agree that tying down a server thread on a long block is not a good
> thing.  If the LLNL proposal (resend the start bulk signal) is on the
> money, then the thing to do would be to create a queue and separate
> service thread(s) to handle such resends.

That's a dreadful layering violation - how LNET implements
GET and PUT is down to each LND in each network traversed.  The only
think you can do at the Lustre level is retry the GET or PUT on the
assumption that router failure caused the timeout, not the client's
death.
 
> > Roll on the health network! :)
> 
> Well, if the deadline here is on the order of 1s or thereabouts then the
> health network isn't likely to help much because we're not going to get
> sub-second dead node detection.  (Well, if we jack up the ping rate and
> reduce the time-to-declare-death low enough, and make sure that HN
> threads and messaging are suitably prioritized, then we might be able to
> get sub-second dead node detection, but my gut feeling is that any
> heuristic approach should wait for longer than 1s.)

The point is that the server can legitimately dedicate a thread retrying
communications with the client until it discovers the client is dead.  
Currently the bulk timeout is the sole, yet unreliable indication of
this.  A health network that provided reliable notification within 10s
of seconds would be a considerable improvement.

    Cheers,
              Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Lustre-devel] server-side resending & bulk transfer
  2010-02-05 17:12 ` [Lustre-devel] server-side resending & bulk transfer Eric Barton
  2010-02-05 20:20   ` Nicolas Williams
@ 2010-02-09 19:21   ` Nathan Rutman
  1 sibling, 0 replies; 4+ messages in thread
From: Nathan Rutman @ 2010-02-09 19:21 UTC (permalink / raw)
  To: lustre-devel


On Feb 5, 2010, at 9:12 AM, Eric Barton wrote:

> Johann,
> 
> cc-ing lustre-devel.
> 
> Yes, the server could retry the bulk if it times out and this
> will be safe for the client since its bulk buffer is auto-unlinked,
> so only 1 bulk PUT/GET can match it.  But if the problem happens
> on the way back to the server rather than the way out to the client,
> you're hosed since the bulk has completed from the client's POV.
> 
> This should be an exceptional circumstance - i.e. a router has
> actually failed - so I think it's better just to stick with the
> client retrying from scratch rather than tying down a server thread
> until it has decided whether there was a router failure or the
> client really crashed.
> 
> Roll on the health network! :)
> 


Eric - this sounds like we can retry the LNetGet/Put whenever we want with
no ill effects (even if from client's point of view it has completed bulk, it will just
ignore a signal with unmatched matchbits, right?)  So it's "free" for us to try that
every time we e.g. send an early reply?  
For any LND LNetGet/Put does somehow indicate across the wire that the server
is ready for the bulk, so I'm making the bold assumption that re-calling that will
re-indicate server readiness (and in particular in the case where that original signal
got lost).  


>    Cheers,
>              Eric
> 
>> -----Original Message-----
>> From: Johann Lombardi [mailto:johann at sun.com]
>> Sent: 05 February 2010 4:35 PM
>> To: lustre-tech-leads at sun.com
>> Subject: server-side resending & bulk transfer
>> 
>> Hi,
>> 
>> As you know, the most important part of server-side resending is to resend
>> lock callbacks since a lost of such a message ends up with a client eviction
>> (except for glimpses which are resent indefinitely causing other problems).
>> 
>> That being said, another aspect is losing a message during bulk transfer, and
>> more particularly the start bulk signal issued by LNET.
>> Unlike lock callback rpcs, losing the start bulk signal is not fatal since
>> the bulk transfer will timeout on the server side, the request be dropped
>> and the client will resend after reconnection. This is indeed harmless,
>> but still causes slowdown which could be avoided according to LLNL if we
>> try to resend the start bulk signal (bug 21714). Brian Behlendorf's
>> proposal is to resend the start bulk signal after the first l_wait_event()
>> timeout in ost_brw_write(). However, we don't know if this is safe to do,
>> e.g. how does the client react if it receives duplicated start bulk signals?
>> 
>> Johann
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-02-09 19:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20100205163524.GW236@granier.hd.free.fr>
2010-02-05 17:12 ` [Lustre-devel] server-side resending & bulk transfer Eric Barton
2010-02-05 20:20   ` Nicolas Williams
2010-02-06 12:28     ` Eric Barton
2010-02-09 19:21   ` Nathan Rutman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.