All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] SCTP versus OpenAIS/corosync time-outs
@ 2009-10-31  0:20 Lars Marowsky-Bree
  2009-11-02  8:41 ` Christine Caulfield
  0 siblings, 1 reply; 5+ messages in thread
From: Lars Marowsky-Bree @ 2009-10-31  0:20 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi all, David,

I'm contemplating SCTP versus OpenAIS/corosync. Is dlm_controld(.pcmk)
pro-actively informed if a single ring/link goes down, as to trigger
faster SCTP recovery - or is it left for SCTP to time out on its own and
proceed?

If the latter - is there a way to auto-tune the SCTP time-outs to make
sure the DLM doesn't stall longer than that? I'm wondering whether
there's any chance for higher-level time-outs, ie a monitor operation on
a filesystem-using service.

RFC 5061 seems to support dynamic reconfiguration in such a fashion. If
I'm reading http://tools.ietf.org/html/rfc4960#page-87 correctly, SCTP
multi-homing is "active/passive", so there's some latency on the
fail-over at least. If several links go down at once, SCTP might try
them in sequence and pick the one surviving link last, incurring a large
latency.

No concurrently active transmission ("rrp_mode active") - I wonder if it
is possible to put SCTP into such an mode, or, vice-versa, if this means
the DLM might be better off directly opening several TCP connections on
its own (and using them all at once, simply discarding duplicate
messages)?


I'm not sure what kind of problems exist, if any, but this may be a
worth-while thing to consider or at least contemplate. I welcome
feedback ;-)


Regards,
    Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Cluster-devel] SCTP versus OpenAIS/corosync time-outs
  2009-10-31  0:20 [Cluster-devel] SCTP versus OpenAIS/corosync time-outs Lars Marowsky-Bree
@ 2009-11-02  8:41 ` Christine Caulfield
  2009-11-02 16:37   ` David Teigland
  2009-11-04 21:29   ` Lars Marowsky-Bree
  0 siblings, 2 replies; 5+ messages in thread
From: Christine Caulfield @ 2009-11-02  8:41 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 31/10/09 00:20, Lars Marowsky-Bree wrote:
> Hi all, David,
>
> I'm contemplating SCTP versus OpenAIS/corosync. Is dlm_controld(.pcmk)
> pro-actively informed if a single ring/link goes down, as to trigger
> faster SCTP recovery - or is it left for SCTP to time out on its own and
> proceed?

Corosync tells no-one, apart from syslog, if a link goes down. I imagine 
its possible for the CFG subsystem to inform applications of link state 
changes but it doesn't currently do so.


> If the latter - is there a way to auto-tune the SCTP time-outs to make
> sure the DLM doesn't stall longer than that? I'm wondering whether
> there's any chance for higher-level time-outs, ie a monitor operation on
> a filesystem-using service.

I imagine it's possible to tell SCTP the cman values for timeouts. It 
doesn't happen at the moment but perhaps it should. There is a lot of 
score for more auto-configuration of things in clustering I think.


> RFC 5061 seems to support dynamic reconfiguration in such a fashion. If
> I'm reading http://tools.ietf.org/html/rfc4960#page-87 correctly, SCTP
> multi-homing is "active/passive", so there's some latency on the
> fail-over at least. If several links go down at once, SCTP might try
> them in sequence and pick the one surviving link last, incurring a large
> latency.
>
> No concurrently active transmission ("rrp_mode active") - I wonder if it
> is possible to put SCTP into such an mode, or, vice-versa, if this means
> the DLM might be better off directly opening several TCP connections on
> its own (and using them all at once, simply discarding duplicate
> messages)?

If you want to add TCP multi-homing code to the DLM, feel free. But 
it'll be complicated and messy I promise. And it seems pointless to 
reimplement all the sort of failover code that's already in SCTP for free.

> I'm not sure what kind of problems exist, if any, but this may be a
> worth-while thing to consider or at least contemplate. I welcome
> feedback ;-)

To be honest, RRP & DLM/SCTP is not well tested or used. There are 
probably lots of things that could be done to improve it. In particular 
the failover aspect of it (the most important part of course) has 
probably not been tried under any sort of serious load ... though i 
could be wrong.


Chrissie



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Cluster-devel] SCTP versus OpenAIS/corosync time-outs
  2009-11-02  8:41 ` Christine Caulfield
@ 2009-11-02 16:37   ` David Teigland
  2009-11-04 21:29   ` Lars Marowsky-Bree
  1 sibling, 0 replies; 5+ messages in thread
From: David Teigland @ 2009-11-02 16:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, Nov 02, 2009 at 08:41:43AM +0000, Christine Caulfield wrote:
> To be honest, RRP & DLM/SCTP is not well tested or used. There are 
> probably lots of things that could be done to improve it. In particular 
> the failover aspect of it (the most important part of course) has 
> probably not been tried under any sort of serious load ... though i 
> could be wrong.

Yep, I dusted off dlm/sctp a few weeks ago to see what kind of condition
it was in.  Fixed a couple bugs, and then found that it's noticably slower
than tcp (using single sctp connections.)  I spent a bit of timing trying
to understand why but didn't.  Someone more familiar with those networking
layers could probably diagnose things much faster... not sure when I'll
get back to looking at it again.

Dave



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Cluster-devel] SCTP versus OpenAIS/corosync time-outs
  2009-11-02  8:41 ` Christine Caulfield
  2009-11-02 16:37   ` David Teigland
@ 2009-11-04 21:29   ` Lars Marowsky-Bree
  2009-11-05  7:23     ` Fabio M. Di Nitto
  1 sibling, 1 reply; 5+ messages in thread
From: Lars Marowsky-Bree @ 2009-11-04 21:29 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 2009-11-02T08:41:43, Christine Caulfield <ccaulfie@redhat.com> wrote:

>> No concurrently active transmission ("rrp_mode active") - I wonder if it
>> is possible to put SCTP into such an mode, or, vice-versa, if this means
>> the DLM might be better off directly opening several TCP connections on
>> its own (and using them all at once, simply discarding duplicate
>> messages)?
> If you want to add TCP multi-homing code to the DLM, feel free. But it'll 
> be complicated and messy I promise. And it seems pointless to reimplement 
> all the sort of failover code that's already in SCTP for free.

Well, the thing is that active/active doesn't seem to be something SCTP
actually _can_ do, while quite obviously being the thing we'd want. I'd
love to be proven wrong about SCTP, of course, that'd make things
easier.

>> I'm not sure what kind of problems exist, if any, but this may be a
>> worth-while thing to consider or at least contemplate. I welcome
>> feedback ;-)
> To be honest, RRP & DLM/SCTP is not well tested or used. There are probably 
> lots of things that could be done to improve it. In particular the failover 
> aspect of it (the most important part of course) has probably not been 
> tried under any sort of serious load ... though i could be wrong.

Yeah, I'm trying to scope what needs to be tested and improved.

(Having OpenAIS+DLM run over bonding is mostly fine, but bonding is
sometimes the problem, and doesn't support all topologies; hence the
need to explore SCTP, or if SCTP can't do that, some alternative.)

A quite different trick for redundant networking would be to assign
static addresses to lo:X and run OSPF over all links, and having DLM
connect to the static IPs. That's quite trivial to setup, give us
"resilient" TCP (w/o needing to mess with SCTP, bonding, or anything).

Comments?


Regards,
    Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Cluster-devel] SCTP versus OpenAIS/corosync time-outs
  2009-11-04 21:29   ` Lars Marowsky-Bree
@ 2009-11-05  7:23     ` Fabio M. Di Nitto
  0 siblings, 0 replies; 5+ messages in thread
From: Fabio M. Di Nitto @ 2009-11-05  7:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Lars Marowsky-Bree wrote:
> On 2009-11-02T08:41:43, Christine Caulfield <ccaulfie@redhat.com> wrote:
> 
> A quite different trick for redundant networking would be to assign
> static addresses to lo:X and run OSPF over all links, and having DLM
> connect to the static IPs. That's quite trivial to setup, give us
> "resilient" TCP (w/o needing to mess with SCTP, bonding, or anything).
> 
> Comments?
> 

OSPF timings to converge networks can be very long as it involves link
UP/DOWN events, flapping protection and so on and so forth... IMHO,
while the idea is valid, it introduces a new whole set of timeouts to
take into account.

IMHO it?s a lot simpler to do something like this:

monitor the links status, if UP assign a static route with different
metric so that one link is always preferred over another. On link DOWN
event, remove the static route, flush the route cache (speed up kernel
look ups into the new route with higher or lower metric), and traffic
will flow very quickly again on the new link.

I use a very similar setup using a patched version of vtun (we clearly
don?t want or need that), and the response time is in the order of a
couple of seconds (it could be a lot lower with proper trimming to the
setup).

Fabio



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-11-05  7:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-31  0:20 [Cluster-devel] SCTP versus OpenAIS/corosync time-outs Lars Marowsky-Bree
2009-11-02  8:41 ` Christine Caulfield
2009-11-02 16:37   ` David Teigland
2009-11-04 21:29   ` Lars Marowsky-Bree
2009-11-05  7:23     ` Fabio M. Di Nitto

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.