linux-sctp.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 200ms delays with SCTP streaming data
@ 2020-07-13 21:59 Corey Minyard
  2020-07-13 22:11 ` Marcelo Leitner
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Corey Minyard @ 2020-07-13 21:59 UTC (permalink / raw)
  To: linux-sctp

Hi, it's me again with another strange issue.  In case you didn't figure
it out before, I'm working on a library that supports all different
types of stream I/O, and SCTP is one supported building block.  I
noticed when I stacked a multiplexer layer on top of SCTP I started
getting timeouts occasionally.  It took a bit, but I finally realized
that I was getting 200ms delays occasionally between sending a packet
and receiving a packet.  I verified this with a trace right at the
sctp_send() and sctp_recvmsg() calls.  It doesn't seem to be regular
in any way I can see, but it happens often enough to cause issues.

If I replace the SCTP block with a TCP block, it works fine, and pretty
much all the code is the same except where it does the read and write
calls (including the epoll() usage, and I have also switched to select()
and it has the same issue).  The write calls don't seem to be the issue,
I see two back-to-back writes a few microseconds apart and see a 200ms
delay between the messages on the receive side.

The test in question sets up two connections and does a big simultaneous
bidirectional transfer.  The test app has 4 threads waiting on epoll()
handling data and writing data.

And the delay is always ~200ms.  Which sounds suspicious.

It's not using sctp_sendv() at the moment, as the systems I'm running on
don't have that yet.  But the library does have support if it sees it is
available.

So I don't think it's my library; I've stared at it a bunch (and found a
few other bugs) but I can't reconcile this one.  There are no timers
that would cause this in the code in question.  Just basically an
epoll() call waiting on data and receive processing that is comparing
data, along with write processing that is sending the same data.

Anyway, I haven't tried to create a small reproducer; I thought I would
report it first and see if anything rang a bell.  I tried this on a
recent kernel and got the same issue.

The library is at https://github.com/cminyard/gensio.  I'd need to
provide a patch for the tracing.

-corey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 200ms delays with SCTP streaming data
  2020-07-13 21:59 200ms delays with SCTP streaming data Corey Minyard
@ 2020-07-13 22:11 ` Marcelo Leitner
  2020-07-14 12:12 ` Corey Minyard
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Marcelo Leitner @ 2020-07-13 22:11 UTC (permalink / raw)
  To: linux-sctp

On Mon, Jul 13, 2020 at 04:59:07PM -0500, Corey Minyard wrote:
> Hi, it's me again with another strange issue.  In case you didn't figure
> it out before, I'm working on a library that supports all different
> types of stream I/O, and SCTP is one supported building block.  I
> noticed when I stacked a multiplexer layer on top of SCTP I started
> getting timeouts occasionally.  It took a bit, but I finally realized
> that I was getting 200ms delays occasionally between sending a packet
> and receiving a packet.  I verified this with a trace right at the
> sctp_send() and sctp_recvmsg() calls.  It doesn't seem to be regular
> in any way I can see, but it happens often enough to cause issues.
> 
> If I replace the SCTP block with a TCP block, it works fine, and pretty
> much all the code is the same except where it does the read and write
> calls (including the epoll() usage, and I have also switched to select()
> and it has the same issue).  The write calls don't seem to be the issue,
> I see two back-to-back writes a few microseconds apart and see a 200ms
> delay between the messages on the receive side.
> 
> The test in question sets up two connections and does a big simultaneous
> bidirectional transfer.  The test app has 4 threads waiting on epoll()
> handling data and writing data.
> 
> And the delay is always ~200ms.  Which sounds suspicious.

That can be the delayed sack timer, in kernel.
/* Delayed sack timer - 200ms */
#define SCTP_DEFAULT_TIMEOUT_SACK       (200)

You may tweak the sysctl net.sctp.sack_timeout and see if changes
accordingly, or via SCTP_PEER_ADDR_PARAMS or even enable immediate ack
(by setting SPP_SACKDELAY_DISABLE)

> 
> It's not using sctp_sendv() at the moment, as the systems I'm running on
> don't have that yet.  But the library does have support if it sees it is
> available.
> 
> So I don't think it's my library; I've stared at it a bunch (and found a
> few other bugs) but I can't reconcile this one.  There are no timers

Nice.

> that would cause this in the code in question.  Just basically an
> epoll() call waiting on data and receive processing that is comparing
> data, along with write processing that is sending the same data.
> 
> Anyway, I haven't tried to create a small reproducer; I thought I would
> report it first and see if anything rang a bell.  I tried this on a
> recent kernel and got the same issue.

I guess a combination of xmit rate, msg and buffer sizes and packet
drops can lead to this delay. I've seen it happening, but didn't have
the time to track it down back then.

That said, remember that Linux SCTP doesn't support buffer
auto-tuning. So considering you're running a stress test, you probably
want to adjust them accordingly manually to avoid packet drops.

  Marcelo

> 
> The library is at https://github.com/cminyard/gensio.  I'd need to
> provide a patch for the tracing.
> 
> -corey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 200ms delays with SCTP streaming data
  2020-07-13 21:59 200ms delays with SCTP streaming data Corey Minyard
  2020-07-13 22:11 ` Marcelo Leitner
@ 2020-07-14 12:12 ` Corey Minyard
  2020-07-14 13:10 ` Corey Minyard
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Corey Minyard @ 2020-07-14 12:12 UTC (permalink / raw)
  To: linux-sctp

On Mon, Jul 13, 2020 at 07:11:04PM -0300, Marcelo Leitner wrote:
> On Mon, Jul 13, 2020 at 04:59:07PM -0500, Corey Minyard wrote:
> > Hi, it's me again with another strange issue.  In case you didn't figure
> > it out before, I'm working on a library that supports all different
> > types of stream I/O, and SCTP is one supported building block.  I
> > noticed when I stacked a multiplexer layer on top of SCTP I started
> > getting timeouts occasionally.  It took a bit, but I finally realized
> > that I was getting 200ms delays occasionally between sending a packet
> > and receiving a packet.  I verified this with a trace right at the
> > sctp_send() and sctp_recvmsg() calls.  It doesn't seem to be regular
> > in any way I can see, but it happens often enough to cause issues.
> > 
> > If I replace the SCTP block with a TCP block, it works fine, and pretty
> > much all the code is the same except where it does the read and write
> > calls (including the epoll() usage, and I have also switched to select()
> > and it has the same issue).  The write calls don't seem to be the issue,
> > I see two back-to-back writes a few microseconds apart and see a 200ms
> > delay between the messages on the receive side.
> > 
> > The test in question sets up two connections and does a big simultaneous
> > bidirectional transfer.  The test app has 4 threads waiting on epoll()
> > handling data and writing data.
> > 
> > And the delay is always ~200ms.  Which sounds suspicious.
> 
> That can be the delayed sack timer, in kernel.
> /* Delayed sack timer - 200ms */
> #define SCTP_DEFAULT_TIMEOUT_SACK       (200)
> 
> You may tweak the sysctl net.sctp.sack_timeout and see if changes
> accordingly, or via SCTP_PEER_ADDR_PARAMS or even enable immediate ack
> (by setting SPP_SACKDELAY_DISABLE)

Ok, setting SPP_SACKDELAY_DISABLE does make the problem go away.

> 
> > 
> > It's not using sctp_sendv() at the moment, as the systems I'm running on
> > don't have that yet.  But the library does have support if it sees it is
> > available.
> > 
> > So I don't think it's my library; I've stared at it a bunch (and found a
> > few other bugs) but I can't reconcile this one.  There are no timers
> 
> Nice.
> 
> > that would cause this in the code in question.  Just basically an
> > epoll() call waiting on data and receive processing that is comparing
> > data, along with write processing that is sending the same data.
> > 
> > Anyway, I haven't tried to create a small reproducer; I thought I would
> > report it first and see if anything rang a bell.  I tried this on a
> > recent kernel and got the same issue.
> 
> I guess a combination of xmit rate, msg and buffer sizes and packet
> drops can lead to this delay. I've seen it happening, but didn't have
> the time to track it down back then.

There should be no packet drops.  It's running across localhost, and
flow control in the multiplex layer as it's set up for the tests limits
the maximum outstanding data to 1024 bytes.  With overhead and flow
control messages it's maybe 1050 bytes of data that would ever be
unacked.  (It's not really testing throughput, it's testing the inner
workings of the multiplexing protocol.)

If I understand this correctly per the RFCs, by default if there are two
messages outstanding, it will send an sack immediately.  Otherwise it
waits 200ms.  I'm guessing what is happening is that that SCTP sends a
sack and then receives one more message and the upper layer stops
because of flow control.  Then the sack comes out in 200ms and things
continue.

So I think I can figure out how to make this work smoothly.  I assume
this doesn't happen in TCP because all packets carry an ack, and there
is data flowing both ways all the time.

Thanks,

-corey

> 
> That said, remember that Linux SCTP doesn't support buffer
> auto-tuning. So considering you're running a stress test, you probably
> want to adjust them accordingly manually to avoid packet drops.
> 
>   Marcelo
> 
> > 
> > The library is at https://github.com/cminyard/gensio.  I'd need to
> > provide a patch for the tracing.
> > 
> > -corey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 200ms delays with SCTP streaming data
  2020-07-13 21:59 200ms delays with SCTP streaming data Corey Minyard
  2020-07-13 22:11 ` Marcelo Leitner
  2020-07-14 12:12 ` Corey Minyard
@ 2020-07-14 13:10 ` Corey Minyard
  2020-07-14 13:51 ` Michael Tuexen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Corey Minyard @ 2020-07-14 13:10 UTC (permalink / raw)
  To: linux-sctp

On Tue, Jul 14, 2020 at 07:12:58AM -0500, Corey Minyard wrote:
> On Mon, Jul 13, 2020 at 07:11:04PM -0300, Marcelo Leitner wrote:
> > On Mon, Jul 13, 2020 at 04:59:07PM -0500, Corey Minyard wrote:
> > > Hi, it's me again with another strange issue.  In case you didn't figure
> > > it out before, I'm working on a library that supports all different
> > > types of stream I/O, and SCTP is one supported building block.  I
> > > noticed when I stacked a multiplexer layer on top of SCTP I started
> > > getting timeouts occasionally.  It took a bit, but I finally realized
> > > that I was getting 200ms delays occasionally between sending a packet
> > > and receiving a packet.  I verified this with a trace right at the
> > > sctp_send() and sctp_recvmsg() calls.  It doesn't seem to be regular
> > > in any way I can see, but it happens often enough to cause issues.
> > > 
> > > If I replace the SCTP block with a TCP block, it works fine, and pretty
> > > much all the code is the same except where it does the read and write
> > > calls (including the epoll() usage, and I have also switched to select()
> > > and it has the same issue).  The write calls don't seem to be the issue,
> > > I see two back-to-back writes a few microseconds apart and see a 200ms
> > > delay between the messages on the receive side.
> > > 
> > > The test in question sets up two connections and does a big simultaneous
> > > bidirectional transfer.  The test app has 4 threads waiting on epoll()
> > > handling data and writing data.
> > > 
> > > And the delay is always ~200ms.  Which sounds suspicious.
> > 
> > That can be the delayed sack timer, in kernel.
> > /* Delayed sack timer - 200ms */
> > #define SCTP_DEFAULT_TIMEOUT_SACK       (200)
> > 
> > You may tweak the sysctl net.sctp.sack_timeout and see if changes
> > accordingly, or via SCTP_PEER_ADDR_PARAMS or even enable immediate ack
> > (by setting SPP_SACKDELAY_DISABLE)
> 
> Ok, setting SPP_SACKDELAY_DISABLE does make the problem go away.
> 
> > 
> > > 
> > > It's not using sctp_sendv() at the moment, as the systems I'm running on
> > > don't have that yet.  But the library does have support if it sees it is
> > > available.
> > > 
> > > So I don't think it's my library; I've stared at it a bunch (and found a
> > > few other bugs) but I can't reconcile this one.  There are no timers
> > 
> > Nice.
> > 
> > > that would cause this in the code in question.  Just basically an
> > > epoll() call waiting on data and receive processing that is comparing
> > > data, along with write processing that is sending the same data.
> > > 
> > > Anyway, I haven't tried to create a small reproducer; I thought I would
> > > report it first and see if anything rang a bell.  I tried this on a
> > > recent kernel and got the same issue.
> > 
> > I guess a combination of xmit rate, msg and buffer sizes and packet
> > drops can lead to this delay. I've seen it happening, but didn't have
> > the time to track it down back then.
> 
> There should be no packet drops.  It's running across localhost, and
> flow control in the multiplex layer as it's set up for the tests limits
> the maximum outstanding data to 1024 bytes.  With overhead and flow
> control messages it's maybe 1050 bytes of data that would ever be
> unacked.  (It's not really testing throughput, it's testing the inner
> workings of the multiplexing protocol.)
> 
> If I understand this correctly per the RFCs, by default if there are two
> messages outstanding, it will send an sack immediately.  Otherwise it
> waits 200ms.  I'm guessing what is happening is that that SCTP sends a
> sack and then receives one more message and the upper layer stops
> because of flow control.  Then the sack comes out in 200ms and things
> continue.

Actually, that still doesn't make sense.  The lack of a sack shouldn't
keep anything from sending unless the congestion window is closed, which
shouldn't happen in this case.  Am I correct?

-corey

> 
> So I think I can figure out how to make this work smoothly.  I assume
> this doesn't happen in TCP because all packets carry an ack, and there
> is data flowing both ways all the time.
> 
> Thanks,
> 
> -corey
> 
> > 
> > That said, remember that Linux SCTP doesn't support buffer
> > auto-tuning. So considering you're running a stress test, you probably
> > want to adjust them accordingly manually to avoid packet drops.
> > 
> >   Marcelo
> > 
> > > 
> > > The library is at https://github.com/cminyard/gensio.  I'd need to
> > > provide a patch for the tracing.
> > > 
> > > -corey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 200ms delays with SCTP streaming data
  2020-07-13 21:59 200ms delays with SCTP streaming data Corey Minyard
                   ` (2 preceding siblings ...)
  2020-07-14 13:10 ` Corey Minyard
@ 2020-07-14 13:51 ` Michael Tuexen
  2020-07-14 16:23 ` Corey Minyard
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Tuexen @ 2020-07-14 13:51 UTC (permalink / raw)
  To: linux-sctp

> On 14. Jul 2020, at 15:10, Corey Minyard <minyard@acm.org> wrote:
> 
> On Tue, Jul 14, 2020 at 07:12:58AM -0500, Corey Minyard wrote:
>> On Mon, Jul 13, 2020 at 07:11:04PM -0300, Marcelo Leitner wrote:
>>> On Mon, Jul 13, 2020 at 04:59:07PM -0500, Corey Minyard wrote:
>>>> Hi, it's me again with another strange issue.  In case you didn't figure
>>>> it out before, I'm working on a library that supports all different
>>>> types of stream I/O, and SCTP is one supported building block.  I
>>>> noticed when I stacked a multiplexer layer on top of SCTP I started
>>>> getting timeouts occasionally.  It took a bit, but I finally realized
>>>> that I was getting 200ms delays occasionally between sending a packet
>>>> and receiving a packet.  I verified this with a trace right at the
>>>> sctp_send() and sctp_recvmsg() calls.  It doesn't seem to be regular
>>>> in any way I can see, but it happens often enough to cause issues.
>>>> 
>>>> If I replace the SCTP block with a TCP block, it works fine, and pretty
>>>> much all the code is the same except where it does the read and write
>>>> calls (including the epoll() usage, and I have also switched to select()
>>>> and it has the same issue).  The write calls don't seem to be the issue,
>>>> I see two back-to-back writes a few microseconds apart and see a 200ms
>>>> delay between the messages on the receive side.
>>>> 
>>>> The test in question sets up two connections and does a big simultaneous
>>>> bidirectional transfer.  The test app has 4 threads waiting on epoll()
>>>> handling data and writing data.
>>>> 
>>>> And the delay is always ~200ms.  Which sounds suspicious.
>>> 
>>> That can be the delayed sack timer, in kernel.
>>> /* Delayed sack timer - 200ms */
>>> #define SCTP_DEFAULT_TIMEOUT_SACK       (200)
>>> 
>>> You may tweak the sysctl net.sctp.sack_timeout and see if changes
>>> accordingly, or via SCTP_PEER_ADDR_PARAMS or even enable immediate ack
>>> (by setting SPP_SACKDELAY_DISABLE)
>> 
>> Ok, setting SPP_SACKDELAY_DISABLE does make the problem go away.
>> 
>>> 
>>>> 
>>>> It's not using sctp_sendv() at the moment, as the systems I'm running on
>>>> don't have that yet.  But the library does have support if it sees it is
>>>> available.
>>>> 
>>>> So I don't think it's my library; I've stared at it a bunch (and found a
>>>> few other bugs) but I can't reconcile this one.  There are no timers
>>> 
>>> Nice.
>>> 
>>>> that would cause this in the code in question.  Just basically an
>>>> epoll() call waiting on data and receive processing that is comparing
>>>> data, along with write processing that is sending the same data.
>>>> 
>>>> Anyway, I haven't tried to create a small reproducer; I thought I would
>>>> report it first and see if anything rang a bell.  I tried this on a
>>>> recent kernel and got the same issue.
>>> 
>>> I guess a combination of xmit rate, msg and buffer sizes and packet
>>> drops can lead to this delay. I've seen it happening, but didn't have
>>> the time to track it down back then.
>> 
>> There should be no packet drops.  It's running across localhost, and
>> flow control in the multiplex layer as it's set up for the tests limits
>> the maximum outstanding data to 1024 bytes.  With overhead and flow
>> control messages it's maybe 1050 bytes of data that would ever be
>> unacked.  (It's not really testing throughput, it's testing the inner
>> workings of the multiplexing protocol.)
>> 
>> If I understand this correctly per the RFCs, by default if there are two
>> messages outstanding, it will send an sack immediately.  Otherwise it
>> waits 200ms.  I'm guessing what is happening is that that SCTP sends a
>> sack and then receives one more message and the upper layer stops
>> because of flow control.  Then the sack comes out in 200ms and things
>> continue.
> 
> Actually, that still doesn't make sense.  The lack of a sack shouldn't
> keep anything from sending unless the congestion window is closed, which
> shouldn't happen in this case.  Am I correct?
I guess you still have the Nagle algorithm enabled. Try enabled the SCTP_NODELAY
socket option: https://tools.ietf.org/html/rfc6458#section-8.1.5 at the sender side.

It is enabled by default and will delay the sending of packet if they are
not large enough (an implementation decision) and there is outstanding data.

Best regards
Michael
> 
> -corey
> 
>> 
>> So I think I can figure out how to make this work smoothly.  I assume
>> this doesn't happen in TCP because all packets carry an ack, and there
>> is data flowing both ways all the time.
>> 
>> Thanks,
>> 
>> -corey
>> 
>>> 
>>> That said, remember that Linux SCTP doesn't support buffer
>>> auto-tuning. So considering you're running a stress test, you probably
>>> want to adjust them accordingly manually to avoid packet drops.
>>> 
>>>  Marcelo
>>> 
>>>> 
>>>> The library is at https://github.com/cminyard/gensio.  I'd need to
>>>> provide a patch for the tracing.
>>>> 
>>>> -corey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 200ms delays with SCTP streaming data
  2020-07-13 21:59 200ms delays with SCTP streaming data Corey Minyard
                   ` (3 preceding siblings ...)
  2020-07-14 13:51 ` Michael Tuexen
@ 2020-07-14 16:23 ` Corey Minyard
  2020-07-14 17:30 ` Michael Tuexen
  2020-07-15 13:08 ` David Laight
  6 siblings, 0 replies; 8+ messages in thread
From: Corey Minyard @ 2020-07-14 16:23 UTC (permalink / raw)
  To: linux-sctp

On Tue, Jul 14, 2020 at 03:51:49PM +0200, Michael Tuexen wrote:
> > On 14. Jul 2020, at 15:10, Corey Minyard <minyard@acm.org> wrote:
> > 
> > On Tue, Jul 14, 2020 at 07:12:58AM -0500, Corey Minyard wrote:
> >> On Mon, Jul 13, 2020 at 07:11:04PM -0300, Marcelo Leitner wrote:
> >>> On Mon, Jul 13, 2020 at 04:59:07PM -0500, Corey Minyard wrote:
> >>>> Hi, it's me again with another strange issue.  In case you didn't figure
> >>>> it out before, I'm working on a library that supports all different
> >>>> types of stream I/O, and SCTP is one supported building block.  I
> >>>> noticed when I stacked a multiplexer layer on top of SCTP I started
> >>>> getting timeouts occasionally.  It took a bit, but I finally realized
> >>>> that I was getting 200ms delays occasionally between sending a packet
> >>>> and receiving a packet.  I verified this with a trace right at the
> >>>> sctp_send() and sctp_recvmsg() calls.  It doesn't seem to be regular
> >>>> in any way I can see, but it happens often enough to cause issues.
> >>>> 
> >>>> If I replace the SCTP block with a TCP block, it works fine, and pretty
> >>>> much all the code is the same except where it does the read and write
> >>>> calls (including the epoll() usage, and I have also switched to select()
> >>>> and it has the same issue).  The write calls don't seem to be the issue,
> >>>> I see two back-to-back writes a few microseconds apart and see a 200ms
> >>>> delay between the messages on the receive side.
> >>>> 
> >>>> The test in question sets up two connections and does a big simultaneous
> >>>> bidirectional transfer.  The test app has 4 threads waiting on epoll()
> >>>> handling data and writing data.
> >>>> 
> >>>> And the delay is always ~200ms.  Which sounds suspicious.
> >>> 
> >>> That can be the delayed sack timer, in kernel.
> >>> /* Delayed sack timer - 200ms */
> >>> #define SCTP_DEFAULT_TIMEOUT_SACK       (200)
> >>> 
> >>> You may tweak the sysctl net.sctp.sack_timeout and see if changes
> >>> accordingly, or via SCTP_PEER_ADDR_PARAMS or even enable immediate ack
> >>> (by setting SPP_SACKDELAY_DISABLE)
> >> 
> >> Ok, setting SPP_SACKDELAY_DISABLE does make the problem go away.
> >> 
> >>> 
> >>>> 
> >>>> It's not using sctp_sendv() at the moment, as the systems I'm running on
> >>>> don't have that yet.  But the library does have support if it sees it is
> >>>> available.
> >>>> 
> >>>> So I don't think it's my library; I've stared at it a bunch (and found a
> >>>> few other bugs) but I can't reconcile this one.  There are no timers
> >>> 
> >>> Nice.
> >>> 
> >>>> that would cause this in the code in question.  Just basically an
> >>>> epoll() call waiting on data and receive processing that is comparing
> >>>> data, along with write processing that is sending the same data.
> >>>> 
> >>>> Anyway, I haven't tried to create a small reproducer; I thought I would
> >>>> report it first and see if anything rang a bell.  I tried this on a
> >>>> recent kernel and got the same issue.
> >>> 
> >>> I guess a combination of xmit rate, msg and buffer sizes and packet
> >>> drops can lead to this delay. I've seen it happening, but didn't have
> >>> the time to track it down back then.
> >> 
> >> There should be no packet drops.  It's running across localhost, and
> >> flow control in the multiplex layer as it's set up for the tests limits
> >> the maximum outstanding data to 1024 bytes.  With overhead and flow
> >> control messages it's maybe 1050 bytes of data that would ever be
> >> unacked.  (It's not really testing throughput, it's testing the inner
> >> workings of the multiplexing protocol.)
> >> 
> >> If I understand this correctly per the RFCs, by default if there are two
> >> messages outstanding, it will send an sack immediately.  Otherwise it
> >> waits 200ms.  I'm guessing what is happening is that that SCTP sends a
> >> sack and then receives one more message and the upper layer stops
> >> because of flow control.  Then the sack comes out in 200ms and things
> >> continue.
> > 
> > Actually, that still doesn't make sense.  The lack of a sack shouldn't
> > keep anything from sending unless the congestion window is closed, which
> > shouldn't happen in this case.  Am I correct?
> I guess you still have the Nagle algorithm enabled. Try enabled the SCTP_NODELAY
> socket option: https://tools.ietf.org/html/rfc6458#section-8.1.5 at the sender side.
> 
> It is enabled by default and will delay the sending of packet if they are
> not large enough (an implementation decision) and there is outstanding data.

Well, that was a surprise, disabling Nagle caused the problem to go
away.  Nagle generally doesn't make a difference when transferring lots
of data.

I guess this is a bad interaction between Nagle and the SCTP
sack algorithms.  With TCP in my test, data is flowing both ways so data
is always being acked, and Nagle is never significantly involved.

That's happening with SCTP, too, but in some situations a sack could be
sent, you get one more packet sent, and that packet won't be acked until
another packet is sent.  So you have unacked data, and Nagle will hold
any new data until it receives an ack for the outstanding packet.  So you
get stuck until the sack delay elapses.  Bah.

This is sort of like the interaction between Nagle and TCP delayed ack.
Which is sort of a bug, I guess, but well known.  I have a number of
ways to work around this issue, and I can document it so users can know.

Thanks for your help.

-corey

> 
> Best regards
> Michael
> > 
> > -corey
> > 
> >> 
> >> So I think I can figure out how to make this work smoothly.  I assume
> >> this doesn't happen in TCP because all packets carry an ack, and there
> >> is data flowing both ways all the time.
> >> 
> >> Thanks,
> >> 
> >> -corey
> >> 
> >>> 
> >>> That said, remember that Linux SCTP doesn't support buffer
> >>> auto-tuning. So considering you're running a stress test, you probably
> >>> want to adjust them accordingly manually to avoid packet drops.
> >>> 
> >>>  Marcelo
> >>> 
> >>>> 
> >>>> The library is at https://github.com/cminyard/gensio.  I'd need to
> >>>> provide a patch for the tracing.
> >>>> 
> >>>> -corey
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 200ms delays with SCTP streaming data
  2020-07-13 21:59 200ms delays with SCTP streaming data Corey Minyard
                   ` (4 preceding siblings ...)
  2020-07-14 16:23 ` Corey Minyard
@ 2020-07-14 17:30 ` Michael Tuexen
  2020-07-15 13:08 ` David Laight
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Tuexen @ 2020-07-14 17:30 UTC (permalink / raw)
  To: linux-sctp

> On 14. Jul 2020, at 18:23, Corey Minyard <minyard@acm.org> wrote:
> 
> On Tue, Jul 14, 2020 at 03:51:49PM +0200, Michael Tuexen wrote:
>>> On 14. Jul 2020, at 15:10, Corey Minyard <minyard@acm.org> wrote:
>>> 
>>> On Tue, Jul 14, 2020 at 07:12:58AM -0500, Corey Minyard wrote:
>>>> On Mon, Jul 13, 2020 at 07:11:04PM -0300, Marcelo Leitner wrote:
>>>>> On Mon, Jul 13, 2020 at 04:59:07PM -0500, Corey Minyard wrote:
>>>>>> Hi, it's me again with another strange issue.  In case you didn't figure
>>>>>> it out before, I'm working on a library that supports all different
>>>>>> types of stream I/O, and SCTP is one supported building block.  I
>>>>>> noticed when I stacked a multiplexer layer on top of SCTP I started
>>>>>> getting timeouts occasionally.  It took a bit, but I finally realized
>>>>>> that I was getting 200ms delays occasionally between sending a packet
>>>>>> and receiving a packet.  I verified this with a trace right at the
>>>>>> sctp_send() and sctp_recvmsg() calls.  It doesn't seem to be regular
>>>>>> in any way I can see, but it happens often enough to cause issues.
>>>>>> 
>>>>>> If I replace the SCTP block with a TCP block, it works fine, and pretty
>>>>>> much all the code is the same except where it does the read and write
>>>>>> calls (including the epoll() usage, and I have also switched to select()
>>>>>> and it has the same issue).  The write calls don't seem to be the issue,
>>>>>> I see two back-to-back writes a few microseconds apart and see a 200ms
>>>>>> delay between the messages on the receive side.
>>>>>> 
>>>>>> The test in question sets up two connections and does a big simultaneous
>>>>>> bidirectional transfer.  The test app has 4 threads waiting on epoll()
>>>>>> handling data and writing data.
>>>>>> 
>>>>>> And the delay is always ~200ms.  Which sounds suspicious.
>>>>> 
>>>>> That can be the delayed sack timer, in kernel.
>>>>> /* Delayed sack timer - 200ms */
>>>>> #define SCTP_DEFAULT_TIMEOUT_SACK       (200)
>>>>> 
>>>>> You may tweak the sysctl net.sctp.sack_timeout and see if changes
>>>>> accordingly, or via SCTP_PEER_ADDR_PARAMS or even enable immediate ack
>>>>> (by setting SPP_SACKDELAY_DISABLE)
>>>> 
>>>> Ok, setting SPP_SACKDELAY_DISABLE does make the problem go away.
>>>> 
>>>>> 
>>>>>> 
>>>>>> It's not using sctp_sendv() at the moment, as the systems I'm running on
>>>>>> don't have that yet.  But the library does have support if it sees it is
>>>>>> available.
>>>>>> 
>>>>>> So I don't think it's my library; I've stared at it a bunch (and found a
>>>>>> few other bugs) but I can't reconcile this one.  There are no timers
>>>>> 
>>>>> Nice.
>>>>> 
>>>>>> that would cause this in the code in question.  Just basically an
>>>>>> epoll() call waiting on data and receive processing that is comparing
>>>>>> data, along with write processing that is sending the same data.
>>>>>> 
>>>>>> Anyway, I haven't tried to create a small reproducer; I thought I would
>>>>>> report it first and see if anything rang a bell.  I tried this on a
>>>>>> recent kernel and got the same issue.
>>>>> 
>>>>> I guess a combination of xmit rate, msg and buffer sizes and packet
>>>>> drops can lead to this delay. I've seen it happening, but didn't have
>>>>> the time to track it down back then.
>>>> 
>>>> There should be no packet drops.  It's running across localhost, and
>>>> flow control in the multiplex layer as it's set up for the tests limits
>>>> the maximum outstanding data to 1024 bytes.  With overhead and flow
>>>> control messages it's maybe 1050 bytes of data that would ever be
>>>> unacked.  (It's not really testing throughput, it's testing the inner
>>>> workings of the multiplexing protocol.)
>>>> 
>>>> If I understand this correctly per the RFCs, by default if there are two
>>>> messages outstanding, it will send an sack immediately.  Otherwise it
>>>> waits 200ms.  I'm guessing what is happening is that that SCTP sends a
>>>> sack and then receives one more message and the upper layer stops
>>>> because of flow control.  Then the sack comes out in 200ms and things
>>>> continue.
>>> 
>>> Actually, that still doesn't make sense.  The lack of a sack shouldn't
>>> keep anything from sending unless the congestion window is closed, which
>>> shouldn't happen in this case.  Am I correct?
>> I guess you still have the Nagle algorithm enabled. Try enabled the SCTP_NODELAY
>> socket option: https://tools.ietf.org/html/rfc6458#section-8.1.5 at the sender side.
>> 
>> It is enabled by default and will delay the sending of packet if they are
>> not large enough (an implementation decision) and there is outstanding data.
> 
> Well, that was a surprise, disabling Nagle caused the problem to go
> away.  Nagle generally doesn't make a difference when transferring lots
> of data.
Correct.
> 
> I guess this is a bad interaction between Nagle and the SCTP
> sack algorithms.  With TCP in my test, data is flowing both ways so data
> is always being acked, and Nagle is never significantly involved.
TCP also uses delayed ACKs. However, bidirectional transfers are different
from unidirectional ones.
> 
> That's happening with SCTP, too, but in some situations a sack could be
> sent, you get one more packet sent, and that packet won't be acked until
> another packet is sent.  So you have unacked data, and Nagle will hold
> any new data until it receives an ack for the outstanding packet.  So you
> get stuck until the sack delay elapses.  Bah.
The same applies to TCP...
> 
> This is sort of like the interaction between Nagle and TCP delayed ack.
> Which is sort of a bug, I guess, but well known.  I have a number of
> ways to work around this issue, and I can document it so users can know.
> 
> Thanks for your help.
You are welcome.

Best regards
Michael
> 
> -corey
> 
>> 
>> Best regards
>> Michael
>>> 
>>> -corey
>>> 
>>>> 
>>>> So I think I can figure out how to make this work smoothly.  I assume
>>>> this doesn't happen in TCP because all packets carry an ack, and there
>>>> is data flowing both ways all the time.
>>>> 
>>>> Thanks,
>>>> 
>>>> -corey
>>>> 
>>>>> 
>>>>> That said, remember that Linux SCTP doesn't support buffer
>>>>> auto-tuning. So considering you're running a stress test, you probably
>>>>> want to adjust them accordingly manually to avoid packet drops.
>>>>> 
>>>>> Marcelo
>>>>> 
>>>>>> 
>>>>>> The library is at https://github.com/cminyard/gensio.  I'd need to
>>>>>> provide a patch for the tracing.
>>>>>> 
>>>>>> -corey
>> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: 200ms delays with SCTP streaming data
  2020-07-13 21:59 200ms delays with SCTP streaming data Corey Minyard
                   ` (5 preceding siblings ...)
  2020-07-14 17:30 ` Michael Tuexen
@ 2020-07-15 13:08 ` David Laight
  6 siblings, 0 replies; 8+ messages in thread
From: David Laight @ 2020-07-15 13:08 UTC (permalink / raw)
  To: linux-sctp

From: Corey Minyard
> Sent: 14 July 2020 17:24
...
> Well, that was a surprise, disabling Nagle caused the problem to go
> away.  Nagle generally doesn't make a difference when transferring lots
> of data.
> 
> I guess this is a bad interaction between Nagle and the SCTP
> sack algorithms.  With TCP in my test, data is flowing both ways so data
> is always being acked, and Nagle is never significantly involved.
> 
> That's happening with SCTP, too, but in some situations a sack could be
> sent, you get one more packet sent, and that packet won't be acked until
> another packet is sent.  So you have unacked data, and Nagle will hold
> any new data until it receives an ack for the outstanding packet.  So you
> get stuck until the sack delay elapses.  Bah.
> 
> This is sort of like the interaction between Nagle and TCP delayed ack.
> Which is sort of a bug, I guess, but well known.  I have a number of
> ways to work around this issue, and I can document it so users can know.

If you have an indefinite amount of data to send then even with Nagle
enabled data should get sent when enough is queued to fill an ethernet
frame.
The only proviso is that there is enough transmit window space.
Doing that in SCTP is much more horrid than TCP.

I have seen issues caused by 0 RTT, slow start and delayed acks.
But that was TCP with Nagle enabled.

If you want a 'full hands on debugging experience' enable ftrace
on 'useful' function entry points (all syscall entry/exit events
is a good start) and get the application to stop it on the 200ms
timeout.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-15 13:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 21:59 200ms delays with SCTP streaming data Corey Minyard
2020-07-13 22:11 ` Marcelo Leitner
2020-07-14 12:12 ` Corey Minyard
2020-07-14 13:10 ` Corey Minyard
2020-07-14 13:51 ` Michael Tuexen
2020-07-14 16:23 ` Corey Minyard
2020-07-14 17:30 ` Michael Tuexen
2020-07-15 13:08 ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).