All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlad Yasevich <vyasevich@gmail.com>
To: David Laight <David.Laight@ACULAB.COM>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: SCTP seems to lose its socket state.
Date: Wed, 28 May 2014 16:18:54 -0400	[thread overview]
Message-ID: <538644AE.90807@gmail.com> (raw)
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1724E53D@AcuExch.aculab.com>

On 05/27/2014 11:10 AM, David Laight wrote:
> I've been looking at an ethernet trace from one of our customers.
> They seem to have got an SCTP socket into a rather confused state.
> 
> There seem to be a significant number of transmit ethernet frames
> that don't read the far end.
> This shouldn't cause a real problem, but we end up with the following:
> This trace was taken on the linux system:
> 
> 39964   0.304473        ->      SCTP    INIT
> 39965   0.292669        <-      SCTP    INIT  (I think this has an invalid checksum)
> 39968   0.467935        <-      SCTP    INIT
> 39969   0.000093        ->      SCTP    INIT_ACK
> 39970   0.003947        <-      SCTP    COOKIE_ECHO
> 39971   0.000072        ->      SCTP    COOKIE_ACK
> 39972   0.000337        ->      M3UA    ASPUP
> 39979   0.809659        <-      SCTP    COOKIE_ECHO

cookie_ack was dropped for some reason?

> 39980   0.000058        ->      SCTP    COOKIE_ACK
> shutdown() called here - seems to be ignored
> 39983   0.949471        <-      SCTP    COOKIE_ECHO

Cookie timer fired and resent the cookie_echo.

> 39984   0.000053        ->      SCTP    COOKIE_ACK
> 39986   0.730072        ->      M3UA    ASPUP           Same TSN as above
> 40002   0.270589        ->      M3UA    ASPUP           Same TSN as above

Hmm.. look like more retransmissions.

> 40008   3.689088        <-      SCTP    HEARTBEAT

This probably means that cookie_ack was finally accepted and
we are not heart-beating...

output of 'cat /proc/net/sctp/assocs' might help.  If the local
is running a recent enough kernel, then turning on dynamic debug
in sctp will also help.

> 40009   0.000027        ->      SCTP    HEARTBEAT_ACK
> 40014   0.261152        <-      SCTP    HEARTBEAT
> 40015   0.000033        ->      SCTP    HEARTBEAT_ACK
> 40026   0.123048        <-      SCTP    HEARTBEAT
> 40027   0.000030        ->      SCTP    HEARTBEAT_ACK
> 40036   1.615048        ->      M3UA    ASPUP           Same TSN as above
> 
> There are no signs of any SACKs for the ASPUP, I think they have the
> correct TSN (the same value as in the INIT_ACK).

Make sure that verification tags match what was negotiated in
init/init_ack, and the SSN starts at 0.


> No signs of any shutdowns or aborts from either system.
> 

What's strange is that some frames are simply not accepted.
Are the nics by any chance ixgbe that has checksum offload and
the checksums are corrupt for some reason?

-vlad

> As seems to be typical for M3UA the source and destination ports are
> the same. No additional IP addresses appear in the INIT (etc) messages.
> 
> Some 80 seconds after the start of the above the remote sends us another INIT.
> This is responded to (with new verification tags from both ends), but only
> SCTP heartbeats get sent/received (both ways).
> 
> The remote sends a few heartbeats with the old verification tag they are
> ignored.
> 
> The application is repeatedly trying to connect() - but the requests fail
> immediately (errno unknown).
> I think the system is RHEL 6.4, kernel: 2.6.32-358.el6.x86_64.
> 
> Does this 'ring any bells' ?
> I think I've asked a similar question before - and 2.6.32 was thought
> to be a late enough kernel.
> It is, of course, possible they are running RHEL 5 on this system.
> 
> I can't think of an easy way to repeat the above sequence to verify
> on a much more recent kernel.
> 
> 	David
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2014-05-28 20:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-27 15:10 SCTP seems to lose its socket state David Laight
2014-05-28 20:18 ` Vlad Yasevich [this message]
2014-05-29  9:03   ` David Laight
2014-05-29  9:12     ` Daniel Borkmann
2014-06-06 15:14 ` David Laight
2014-06-06 16:24   ` David Laight
2014-06-06 16:50   ` Vlad Yasevich
2014-06-09 12:49   ` David Laight
2014-06-09 18:37     ` Vlad Yasevich
2014-06-10  8:29       ` David Laight
2014-06-09 22:44     ` Vlad Yasevich
2014-06-13 10:53       ` David Laight
2014-06-13 18:48         ` Vlad Yasevich
2014-06-16  8:40           ` David Laight
2014-06-16 13:47             ` Vlad Yasevich
2014-06-16 14:46               ` David Laight
2014-06-17 11:28             ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538644AE.90807@gmail.com \
    --to=vyasevich@gmail.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.