linux-sctp.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* EAGAIN
@ 2020-06-07 12:18 Andreas Fink
  2020-06-07 12:47 ` EAGAIN Michael Tuexen
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Andreas Fink @ 2020-06-07 12:18 UTC (permalink / raw)
  To: linux-sctp

Hello folks,

I run into a strange issue with SCTP under Linux and I'm not sure whats the right approach to fix this.

I have a listener thread which listens on a port for multiple inbound connections
I have a sender thread which sends packets to peers by using the same socket and doing a sctp_sendv call.
Sockets are always in non blocking mode.

When the remote side gets stopped (process killed),  the sctp_sendv starts returning 0 and errno is set to EAGAIN and we constantly retry.
We somehow didn't catch that this socket was down and end up in a busy loop (which we now fail after a certain amount of retries).

My question to you is whats the correct way to deal with EAGAIN here? Why are we not catching the assoc being down?
We have a sender thread and a receiver thread. The receiver thread is listening to events and should know.
Furthermore if the assoc is down the sctp_sendv could simply return something like socket not connected error instead of EAGAIN.

Or what am I missing here?


Andreas

PS: above was under Linux Debian Kernel 5.4.0-0.bpo.4.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EAGAIN
  2020-06-07 12:18 EAGAIN Andreas Fink
@ 2020-06-07 12:47 ` Michael Tuexen
  2020-06-07 12:59 ` EAGAIN Andreas Fink
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Michael Tuexen @ 2020-06-07 12:47 UTC (permalink / raw)
  To: linux-sctp

> On 7. Jun 2020, at 14:18, Andreas Fink <afink@list.fink.org> wrote:
> 
> Hello folks,
> 
> I run into a strange issue with SCTP under Linux and I'm not sure whats the right approach to fix this.
> 
> I have a listener thread which listens on a port for multiple inbound connections
> I have a sender thread which sends packets to peers by using the same socket and doing a sctp_sendv call.
> Sockets are always in non blocking mode.
So a single SOCK_SEQPACKET socket for sending and receiving, right?
> 
> When the remote side gets stopped (process killed),  the sctp_sendv starts returning 0 and errno is set to EAGAIN and we constantly retry.
When it returns 0, you can't look at errno. errno is only set to a correct value, if -1 is returned.

If you killed the peer, I would assume that there is an SCTP message containing an
ABORT chunk in the wire. Is that true? If that is true, you could subscribe to
SCTP_ASSOC_CHANGE notification, which should tell you.

Best regards
Michael
> We somehow didn't catch that this socket was down and end up in a busy loop (which we now fail after a certain amount of retries).
> 
> My question to you is whats the correct way to deal with EAGAIN here? Why are we not catching the assoc being down?
> We have a sender thread and a receiver thread. The receiver thread is listening to events and should know.
> Furthermore if the assoc is down the sctp_sendv could simply return something like socket not connected error instead of EAGAIN.
> 
> Or what am I missing here?
> 
> 
> Andreas
> 
> PS: above was under Linux Debian Kernel 5.4.0-0.bpo.4.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EAGAIN
  2020-06-07 12:18 EAGAIN Andreas Fink
  2020-06-07 12:47 ` EAGAIN Michael Tuexen
@ 2020-06-07 12:59 ` Andreas Fink
  2020-06-07 13:46 ` EAGAIN Michael Tuexen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Andreas Fink @ 2020-06-07 12:59 UTC (permalink / raw)
  To: linux-sctp



> On 7 Jun 2020, at 14:47, Michael Tuexen <Michael.Tuexen@lurchi.franken.de> wrote:
> 
>> On 7. Jun 2020, at 14:18, Andreas Fink <afink@list.fink.org> wrote:
>> 
>> Hello folks,
>> 
>> I run into a strange issue with SCTP under Linux and I'm not sure whats the right approach to fix this.
>> 
>> I have a listener thread which listens on a port for multiple inbound connections
>> I have a sender thread which sends packets to peers by using the same socket and doing a sctp_sendv call.
>> Sockets are always in non blocking mode.
> So a single SOCK_SEQPACKET socket for sending and receiving, right?

correct

>> 
>> When the remote side gets stopped (process killed),  the sctp_sendv starts returning 0 and errno is set to EAGAIN and we constantly retry.
> When it returns 0, you can't look at errno. errno is only set to a correct value, if -1 is returned.


I actually check if return value is > 0. So probably -1 applies here. Returning 0 doesnt make any sense anyway.

> 
> If you killed the peer, I would assume that there is an SCTP message containing an
> ABORT chunk in the wire. Is that true?

I can not currently verify that. But we have seen this happening when the remote application (which uses the same mechanism) got killed or has crashed.
So the operating system's sctp driver should have sent ABORT I believe. We noticed that when the remote application restarts, it can not reestablish the connection somehow, probably because the main application is still busy looping sending old data in the queue.


> If that is true, you could subscribe to
> SCTP_ASSOC_CHANGE notification, which should tell you.


I am subscribed to SCTP_ASSOC_CHANGE but I didnt catch anything there.
(or I catched it in the receiver thread and the sender thread is not checking the new status in its tight sending loop)

My question is, what is the exact meaning of EAGAIN here? Does it mean that the send buffer is full?
Why am I not getting a simple error because the specified assoc is down?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EAGAIN
  2020-06-07 12:18 EAGAIN Andreas Fink
  2020-06-07 12:47 ` EAGAIN Michael Tuexen
  2020-06-07 12:59 ` EAGAIN Andreas Fink
@ 2020-06-07 13:46 ` Michael Tuexen
  2020-06-07 14:04 ` EAGAIN David Laight
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Michael Tuexen @ 2020-06-07 13:46 UTC (permalink / raw)
  To: linux-sctp

> On 7. Jun 2020, at 14:59, Andreas Fink <afink@list.fink.org> wrote:
> 
> 
> 
>> On 7 Jun 2020, at 14:47, Michael Tuexen <Michael.Tuexen@lurchi.franken.de> wrote:
>> 
>>> On 7. Jun 2020, at 14:18, Andreas Fink <afink@list.fink.org> wrote:
>>> 
>>> Hello folks,
>>> 
>>> I run into a strange issue with SCTP under Linux and I'm not sure whats the right approach to fix this.
>>> 
>>> I have a listener thread which listens on a port for multiple inbound connections
>>> I have a sender thread which sends packets to peers by using the same socket and doing a sctp_sendv call.
>>> Sockets are always in non blocking mode.
>> So a single SOCK_SEQPACKET socket for sending and receiving, right?
> 
> correct
> 
>>> 
>>> When the remote side gets stopped (process killed),  the sctp_sendv starts returning 0 and errno is set to EAGAIN and we constantly retry.
>> When it returns 0, you can't look at errno. errno is only set to a correct value, if -1 is returned.
> 
> 
> I actually check if return value is > 0. So probably -1 applies here. Returning 0 doesnt make any sense anyway.
> 
>> 
>> If you killed the peer, I would assume that there is an SCTP message containing an
>> ABORT chunk in the wire. Is that true?
> 
> I can not currently verify that. But we have seen this happening when the remote application (which uses the same mechanism) got killed or has crashed.
> So the operating system's sctp driver should have sent ABORT I believe. We noticed that when the remote application restarts, it can not reestablish the connection somehow, probably because the main application is still busy looping sending old data in the queue.
> 
> 
>> If that is true, you could subscribe to
>> SCTP_ASSOC_CHANGE notification, which should tell you.
> 
> 
> I am subscribed to SCTP_ASSOC_CHANGE but I didnt catch anything there.
> (or I catched it in the receiver thread and the sender thread is not checking the new status in its tight sending loop)
OK.
> 
> My question is, what is the exact meaning of EAGAIN here? Does it mean that the send buffer is full?
My answer is not specific to the Linux implementation, since I don't know it. But EAGAIN is signalled,
if a request can't be fulfilled right now, but might work at some later time. Just hammering
on it in a busy loop might not be the best idea.
If you would use SOCK_STREAM socket (1-to-1), I would suggest to use select/poll to check
for writability.

So I'm wondering if the following actually works, maybe you can test it:
1. Let an association be up. Use a one-to-many style socket.
2. Call continuously sctp_sendv().
3. Kill the peer and restart it.
4. Does the association gets killed?
5. Does a new association gets established triggered by the sctp_sendv() calls?

In addition: What happens if the association times out instead of being killed by an ABORT?

Best regards
Michael
> Why am I not getting a simple error because the specified assoc is down?
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: EAGAIN
  2020-06-07 12:18 EAGAIN Andreas Fink
                   ` (2 preceding siblings ...)
  2020-06-07 13:46 ` EAGAIN Michael Tuexen
@ 2020-06-07 14:04 ` David Laight
  2020-06-07 15:17 ` EAGAIN Michael Tuexen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: David Laight @ 2020-06-07 14:04 UTC (permalink / raw)
  To: linux-sctp

From: Michael Tuexen
> Sent: 07 June 2020 13:48
...
> If you killed the peer, I would assume that there is an SCTP message containing an
> ABORT chunk in the wire. Is that true? If that is true, you could subscribe to
> SCTP_ASSOC_CHANGE notification, which should tell you.

Actually for TCP-style 1-1 connections you must subscribe to
SCTP_ASSOC_CHANGE.

If you don't and the far end is killed (eg kernel crash) and
restarts with the same port number pair then you get no
indication that everything has disconnected and reconnected.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EAGAIN
  2020-06-07 12:18 EAGAIN Andreas Fink
                   ` (3 preceding siblings ...)
  2020-06-07 14:04 ` EAGAIN David Laight
@ 2020-06-07 15:17 ` Michael Tuexen
  2020-06-07 17:05 ` EAGAIN David Laight
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Michael Tuexen @ 2020-06-07 15:17 UTC (permalink / raw)
  To: linux-sctp

> On 7. Jun 2020, at 16:04, David Laight <David.Laight@ACULAB.COM> wrote:
> 
> From: Michael Tuexen
>> Sent: 07 June 2020 13:48
> ...
>> If you killed the peer, I would assume that there is an SCTP message containing an
>> ABORT chunk in the wire. Is that true? If that is true, you could subscribe to
>> SCTP_ASSOC_CHANGE notification, which should tell you.
> 
> Actually for TCP-style 1-1 connections you must subscribe to
> SCTP_ASSOC_CHANGE.
I guess you are referring to UDP (1-to-many) style sockets.
For 1-to-1 style sockets, the normal error handling should
work, like it does for TCP (returning -1 in a system call
and errno being ETIMEDOUT or ECONNRESET). At least this is
the way intended by the specification and I think Linux
does it that way.

Best regards
Michael
> 
> If you don't and the far end is killed (eg kernel crash) and
> restarts with the same port number pair then you get no
> indication that everything has disconnected and reconnected.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: EAGAIN
  2020-06-07 12:18 EAGAIN Andreas Fink
                   ` (4 preceding siblings ...)
  2020-06-07 15:17 ` EAGAIN Michael Tuexen
@ 2020-06-07 17:05 ` David Laight
  2020-06-07 17:31 ` EAGAIN Michael Tuexen
  2020-06-08  6:13 ` EAGAIN Andreas Fink
  7 siblings, 0 replies; 9+ messages in thread
From: David Laight @ 2020-06-07 17:05 UTC (permalink / raw)
  To: linux-sctp

From: Michael Tuexen
> Sent: 07 June 2020 16:18
> > On 7. Jun 2020, at 16:04, David Laight <David.Laight@ACULAB.COM> wrote:
> >
> > From: Michael Tuexen
> >> Sent: 07 June 2020 13:48
> > ...
> >> If you killed the peer, I would assume that there is an SCTP message containing an
> >> ABORT chunk in the wire. Is that true? If that is true, you could subscribe to
> >> SCTP_ASSOC_CHANGE notification, which should tell you.
> >
> > Actually for TCP-style 1-1 connections you must subscribe to
> > SCTP_ASSOC_CHANGE.
>
> I guess you are referring to UDP (1-to-many) style sockets.
> For 1-to-1 style sockets, the normal error handling should
> work, like it does for TCP (returning -1 in a system call
> and errno being ETIMEDOUT or ECONNRESET). At least this is
> the way intended by the specification and I think Linux
> does it that way.

Nope, if you take a program that will run over TCP or SCTP
then receipt of an INIT chunk (with matching ports etc)
goes through the connection handshake sequence and the
application isn't given any indication.

You might expect the incoming INIT to cause a disconnect
indication on the old socket and a new 'listen' event.
But that isn't what happens.

We have to enable the ASSPC_CHANGE indication - and disconnect
in response to it!

No idea what the standard implies should happen.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EAGAIN
  2020-06-07 12:18 EAGAIN Andreas Fink
                   ` (5 preceding siblings ...)
  2020-06-07 17:05 ` EAGAIN David Laight
@ 2020-06-07 17:31 ` Michael Tuexen
  2020-06-08  6:13 ` EAGAIN Andreas Fink
  7 siblings, 0 replies; 9+ messages in thread
From: Michael Tuexen @ 2020-06-07 17:31 UTC (permalink / raw)
  To: linux-sctp



> On 7. Jun 2020, at 19:05, David Laight <David.Laight@ACULAB.COM> wrote:
> 
> From: Michael Tuexen
>> Sent: 07 June 2020 16:18
>>> On 7. Jun 2020, at 16:04, David Laight <David.Laight@ACULAB.COM> wrote:
>>> 
>>> From: Michael Tuexen
>>>> Sent: 07 June 2020 13:48
>>> ...
>>>> If you killed the peer, I would assume that there is an SCTP message containing an
>>>> ABORT chunk in the wire. Is that true? If that is true, you could subscribe to
>>>> SCTP_ASSOC_CHANGE notification, which should tell you.
>>> 
>>> Actually for TCP-style 1-1 connections you must subscribe to
>>> SCTP_ASSOC_CHANGE.
>> 
>> I guess you are referring to UDP (1-to-many) style sockets.
>> For 1-to-1 style sockets, the normal error handling should
>> work, like it does for TCP (returning -1 in a system call
>> and errno being ETIMEDOUT or ECONNRESET). At least this is
>> the way intended by the specification and I think Linux
>> does it that way.
> 
> Nope, if you take a program that will run over TCP or SCTP
> then receipt of an INIT chunk (with matching ports etc)
> goes through the connection handshake sequence and the
> application isn't given any indication.
Right. But once the association is established and you
subscribed the SCTP_ASSOC_CHANGE on the listener, the listener
should become readable, then you call accept() and the accepted
socket should become readable, because you can read a SCTP_ASSOC_CHANGE.
> 
> You might expect the incoming INIT to cause a disconnect
> indication on the old socket and a new 'listen' event. 
No, I don't expect that.
> But that isn't what happens.
I'm not sure if you are talking about a restart event. That would
actually be given (after the handshake) in an SCTP_ASSOC_CHANGE event.

But Andreas is using a 1-to-many style socket. Assume he is constantly sending
to a peer. I would assume that an sctp_sendv() call triggers the sending
of an INIT, an ABORT comes back, you clear all buffered data for that
association and the next sctp_sendv() would trigger the sending of the next INIT.
So he should observe a lot of sctp_sendv() failing, but some of them
should succeed. Andreas are you seeing such a pattern? How does it look on the wire?

Best regards
Michael 
> 
> We have to enable the ASSPC_CHANGE indication - and disconnect
> in response to it!
> 
> No idea what the standard implies should happen.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EAGAIN
  2020-06-07 12:18 EAGAIN Andreas Fink
                   ` (6 preceding siblings ...)
  2020-06-07 17:31 ` EAGAIN Michael Tuexen
@ 2020-06-08  6:13 ` Andreas Fink
  7 siblings, 0 replies; 9+ messages in thread
From: Andreas Fink @ 2020-06-08  6:13 UTC (permalink / raw)
  To: linux-sctp



> On 7 Jun 2020, at 19:31, Michael Tuexen <michael.tuexen@lurchi.franken.de> wrote:
> 
> 
> 
>> On 7. Jun 2020, at 19:05, David Laight <David.Laight@ACULAB.COM> wrote:
>> 
>> From: Michael Tuexen
>>> Sent: 07 June 2020 16:18
>>>> On 7. Jun 2020, at 16:04, David Laight <David.Laight@ACULAB.COM> wrote:
>>>> 
>>>> From: Michael Tuexen
>>>>> Sent: 07 June 2020 13:48
>>>> ...
>>>>> If you killed the peer, I would assume that there is an SCTP message containing an
>>>>> ABORT chunk in the wire. Is that true? If that is true, you could subscribe to
>>>>> SCTP_ASSOC_CHANGE notification, which should tell you.
>>>> 
>>>> Actually for TCP-style 1-1 connections you must subscribe to
>>>> SCTP_ASSOC_CHANGE.
>>> 
>>> I guess you are referring to UDP (1-to-many) style sockets.
>>> For 1-to-1 style sockets, the normal error handling should
>>> work, like it does for TCP (returning -1 in a system call
>>> and errno being ETIMEDOUT or ECONNRESET). At least this is
>>> the way intended by the specification and I think Linux
>>> does it that way.
>> 
>> Nope, if you take a program that will run over TCP or SCTP
>> then receipt of an INIT chunk (with matching ports etc)
>> goes through the connection handshake sequence and the
>> application isn't given any indication.
> Right. But once the association is established and you
> subscribed the SCTP_ASSOC_CHANGE on the listener, the listener
> should become readable, then you call accept() and the accepted
> socket should become readable, because you can read a SCTP_ASSOC_CHANGE.
>> 
>> You might expect the incoming INIT to cause a disconnect
>> indication on the old socket and a new 'listen' event. 
> No, I don't expect that.
>> But that isn't what happens.
> I'm not sure if you are talking about a restart event. That would
> actually be given (after the handshake) in an SCTP_ASSOC_CHANGE event.
> 
> But Andreas is using a 1-to-many style socket. Assume he is constantly sending
> to a peer. I would assume that an sctp_sendv() call triggers the sending
> of an INIT, an ABORT comes back, you clear all buffered data for that
> association and the next sctp_sendv() would trigger the sending of the next INIT.
> So he should observe a lot of sctp_sendv() failing, but some of them
> should succeed. Andreas are you seeing such a pattern? How does it look on the wire?

I have to reproduce this in my lab to get a trace.

As far as I remember we ended up having the server side trying to send old data while the client side tries to establish a new connection and gets association up while the old side has no trace of this new connection. I definitively catch all SCTP_ASSOC_CHANGE events but I did not process it until after the send loop sending one single packet.
The connections are nailed down ones. Meaning the sender uses the same source port to connect. I guess this might be important to know.

I currently worked around the issue by breaking my sendloop if we get assoc change (which is read in a different thread) or when EAGAIN is received more than 100 times in a row.
I will try to simulate this with a small test programm to see how it looks on the wire.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-08  6:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-07 12:18 EAGAIN Andreas Fink
2020-06-07 12:47 ` EAGAIN Michael Tuexen
2020-06-07 12:59 ` EAGAIN Andreas Fink
2020-06-07 13:46 ` EAGAIN Michael Tuexen
2020-06-07 14:04 ` EAGAIN David Laight
2020-06-07 15:17 ` EAGAIN Michael Tuexen
2020-06-07 17:05 ` EAGAIN David Laight
2020-06-07 17:31 ` EAGAIN Michael Tuexen
2020-06-08  6:13 ` EAGAIN Andreas Fink

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).