linux-sctp.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sctp discarding received data chunks
@ 2020-10-08 21:46 David Laight
  2020-10-08 21:46 ` David Laight
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: David Laight @ 2020-10-08 21:46 UTC (permalink / raw)
  To: linux-sctp

One of my local tests (that I run quite often) failed in a 'new and exciting way'.
I sent 510 messages through an M3UA-MTP2-M3UA link and only got ~350 at the far end.
(This is part of my SS7+TCAP regression test - gets run a lot.)
Usually I can find 'lost' messages logged as discarded due to my own flow control.
In this case there is no sign of any error traces.
I expect to have each message traced 6 times (on each send and receive)
but the missing messages are only traced 5 times.

Now /proc/net/sctp/snmp has SctpInDataChunksDiscards set to 163,
this matches the number of messages I'm missing.
Any idea how I can find out why one (or more) the SCTP connections (which are still
connected - unless there is a power cut) has discarded a lot of receive packets?

Each M3UA link is actually 4 SCTP connections (TCP style 1-1).
All are loopback connected to 127.0.0.1 or ::1 (the assocs print is 'interesting').
The local port is 'random' the listening port 2905 or 2906.
I'd expect the data to loadshare evenly between them but I've not checked the actual
distribution.
The packet data contains a sequence number, I'm missing all the x1, x2, x9, xa
and half the x6 and xe packets - so I think at least one of the sctp connections
is just discarding the receive chunks.
(I'll sort out which one tomorrow.)

This is a 5.6.0-rc7 kernel.
I've not seen anything like this before - I've run this same test for
over 10 years, probably going back to at least 2.6.28.

Data chunks will have gone though all the connections when they were
initialised.

Is there anything anywhere that indicates why a data chunk was dropped?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
@ 2020-10-08 21:46 ` David Laight
  2020-10-09  7:24 ` Andreas Fink
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: David Laight @ 2020-10-08 21:46 UTC (permalink / raw)
  To: linux-sctp

One of my local tests (that I run quite often) failed in a 'new and exciting way'.
I sent 510 messages through an M3UA-MTP2-M3UA link and only got ~350 at the far end.
(This is part of my SS7+TCAP regression test - gets run a lot.)
Usually I can find 'lost' messages logged as discarded due to my own flow control.
In this case there is no sign of any error traces.
I expect to have each message traced 6 times (on each send and receive)
but the missing messages are only traced 5 times.

Now /proc/net/sctp/snmp has SctpInDataChunksDiscards set to 163,
this matches the number of messages I'm missing.
Any idea how I can find out why one (or more) the SCTP connections (which are still
connected - unless there is a power cut) has discarded a lot of receive packets?

Each M3UA link is actually 4 SCTP connections (TCP style 1-1).
All are loopback connected to 127.0.0.1 or ::1 (the assocs print is 'interesting').
The local port is 'random' the listening port 2905 or 2906.
I'd expect the data to loadshare evenly between them but I've not checked the actual
distribution.
The packet data contains a sequence number, I'm missing all the x1, x2, x9, xa
and half the x6 and xe packets - so I think at least one of the sctp connections
is just discarding the receive chunks.
(I'll sort out which one tomorrow.)

This is a 5.6.0-rc7 kernel.
I've not seen anything like this before - I've run this same test for
over 10 years, probably going back to at least 2.6.28.

Data chunks will have gone though all the connections when they were
initialised.

Is there anything anywhere that indicates why a data chunk was dropped?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
  2020-10-08 21:46 ` David Laight
@ 2020-10-09  7:24 ` Andreas Fink
  2020-10-09  7:24   ` Andreas Fink
  2020-10-09  7:57 ` David Laight
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Andreas Fink @ 2020-10-09  7:24 UTC (permalink / raw)
  To: linux-sctp

Can you see this issue with the 5.4 kernel too?

I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of links going off after a while so I had to revert back.
5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you

> On 8 Oct 2020, at 23:46, David Laight <David.Laight@ACULAB.COM> wrote:
> 
> One of my local tests (that I run quite often) failed in a 'new and exciting way'.
> I sent 510 messages through an M3UA-MTP2-M3UA link and only got ~350 at the far end.
> (This is part of my SS7+TCAP regression test - gets run a lot.)
> Usually I can find 'lost' messages logged as discarded due to my own flow control.
> In this case there is no sign of any error traces.
> I expect to have each message traced 6 times (on each send and receive)
> but the missing messages are only traced 5 times.
> 
> Now /proc/net/sctp/snmp has SctpInDataChunksDiscards set to 163,
> this matches the number of messages I'm missing.
> Any idea how I can find out why one (or more) the SCTP connections (which are still
> connected - unless there is a power cut) has discarded a lot of receive packets?
> 
> Each M3UA link is actually 4 SCTP connections (TCP style 1-1).
> All are loopback connected to 127.0.0.1 or ::1 (the assocs print is 'interesting').
> The local port is 'random' the listening port 2905 or 2906.
> I'd expect the data to loadshare evenly between them but I've not checked the actual
> distribution.
> The packet data contains a sequence number, I'm missing all the x1, x2, x9, xa
> and half the x6 and xe packets - so I think at least one of the sctp connections
> is just discarding the receive chunks.
> (I'll sort out which one tomorrow.)
> 
> This is a 5.6.0-rc7 kernel.
> I've not seen anything like this before - I've run this same test for
> over 10 years, probably going back to at least 2.6.28.
> 
> Data chunks will have gone though all the connections when they were
> initialised.
> 
> Is there anything anywhere that indicates why a data chunk was dropped?
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sctp discarding received data chunks
  2020-10-09  7:24 ` Andreas Fink
@ 2020-10-09  7:24   ` Andreas Fink
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Fink @ 2020-10-09  7:24 UTC (permalink / raw)
  To: David Laight; +Cc: linux-sctp

Can you see this issue with the 5.4 kernel too?

I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of links going off after a while so I had to revert back.
5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you

> On 8 Oct 2020, at 23:46, David Laight <David.Laight@ACULAB.COM> wrote:
> 
> One of my local tests (that I run quite often) failed in a 'new and exciting way'.
> I sent 510 messages through an M3UA-MTP2-M3UA link and only got ~350 at the far end.
> (This is part of my SS7+TCAP regression test - gets run a lot.)
> Usually I can find 'lost' messages logged as discarded due to my own flow control.
> In this case there is no sign of any error traces.
> I expect to have each message traced 6 times (on each send and receive)
> but the missing messages are only traced 5 times.
> 
> Now /proc/net/sctp/snmp has SctpInDataChunksDiscards set to 163,
> this matches the number of messages I'm missing.
> Any idea how I can find out why one (or more) the SCTP connections (which are still
> connected - unless there is a power cut) has discarded a lot of receive packets?
> 
> Each M3UA link is actually 4 SCTP connections (TCP style 1-1).
> All are loopback connected to 127.0.0.1 or ::1 (the assocs print is 'interesting').
> The local port is 'random' the listening port 2905 or 2906.
> I'd expect the data to loadshare evenly between them but I've not checked the actual
> distribution.
> The packet data contains a sequence number, I'm missing all the x1, x2, x9, xa
> and half the x6 and xe packets - so I think at least one of the sctp connections
> is just discarding the receive chunks.
> (I'll sort out which one tomorrow.)
> 
> This is a 5.6.0-rc7 kernel.
> I've not seen anything like this before - I've run this same test for
> over 10 years, probably going back to at least 2.6.28.
> 
> Data chunks will have gone though all the connections when they were
> initialised.
> 
> Is there anything anywhere that indicates why a data chunk was dropped?
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
  2020-10-08 21:46 ` David Laight
  2020-10-09  7:24 ` Andreas Fink
@ 2020-10-09  7:57 ` David Laight
  2020-10-09  7:57   ` David Laight
  2020-10-09 11:13 ` David Laight
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: David Laight @ 2020-10-09  7:57 UTC (permalink / raw)
  To: linux-sctp

From: Andreas Fink
> Sent: 09 October 2020 08:25
> 
> Can you see this issue with the 5.4 kernel too?

I've never seen anything like it before.
I've probably run the same tests quite a few times on a post 5.4 kernel.
I'll have run them on the same 5.6.0-rc7 (from April 2020) a few times as well.
 
> I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run
> into all sorts of links going off after a while so I had to revert back.
> 5.4 is stable for me. 5.7 is not.
> And I have lots of M2PA and M3UA connections like you

Glad it's just not me...

I've only had it go horribly wrong once.
I've kept in it the failed state to do further tests.
Regressing is hard, my SCCP and MTP3 code is in a big driver
and I'd need to find the correct kernel headers to rebuild it.

If it is still broken now I'll push some single packets through
while running ftrace.

Actual test is TCAP test app connected to a 'local' dual pair of
ss7 protocol stacks (MTP3 has a MTPA-like link to its peer).
The traffic from SCCP is directed to M3UA (not MTP3) into a pair
of M3UA connections (from each half the dual) to a second MTP3
dual, each of which has an MTP2 signalling link to the other pointcode.
The other system is the same.
So I have 4 SS7 protocol stacks on each of 2 pointcodes.
Eight M3UA connections - 16 SCTP sockets.
Normally this would require 8 computers in 4 geographically
separated locations - but they are all piled up on a single system!
Oh - and the tcap app connects by TCP - so that could be on yet
another system!

	David

> 
> > On 8 Oct 2020, at 23:46, David Laight <David.Laight@ACULAB.COM> wrote:
> >
> > One of my local tests (that I run quite often) failed in a 'new and exciting way'.
> > I sent 510 messages through an M3UA-MTP2-M3UA link and only got ~350 at the far end.
> > (This is part of my SS7+TCAP regression test - gets run a lot.)
> > Usually I can find 'lost' messages logged as discarded due to my own flow control.
> > In this case there is no sign of any error traces.
> > I expect to have each message traced 6 times (on each send and receive)
> > but the missing messages are only traced 5 times.
> >
> > Now /proc/net/sctp/snmp has SctpInDataChunksDiscards set to 163,
> > this matches the number of messages I'm missing.
> > Any idea how I can find out why one (or more) the SCTP connections (which are still
> > connected - unless there is a power cut) has discarded a lot of receive packets?
> >
> > Each M3UA link is actually 4 SCTP connections (TCP style 1-1).
> > All are loopback connected to 127.0.0.1 or ::1 (the assocs print is 'interesting').
> > The local port is 'random' the listening port 2905 or 2906.
> > I'd expect the data to loadshare evenly between them but I've not checked the actual
> > distribution.
> > The packet data contains a sequence number, I'm missing all the x1, x2, x9, xa
> > and half the x6 and xe packets - so I think at least one of the sctp connections
> > is just discarding the receive chunks.
> > (I'll sort out which one tomorrow.)
> >
> > This is a 5.6.0-rc7 kernel.
> > I've not seen anything like this before - I've run this same test for
> > over 10 years, probably going back to at least 2.6.28.
> >
> > Data chunks will have gone though all the connections when they were
> > initialised.
> >
> > Is there anything anywhere that indicates why a data chunk was dropped?
> >
> > 	David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> >
> 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-09  7:57 ` David Laight
@ 2020-10-09  7:57   ` David Laight
  0 siblings, 0 replies; 18+ messages in thread
From: David Laight @ 2020-10-09  7:57 UTC (permalink / raw)
  To: 'Andreas Fink'; +Cc: linux-sctp

From: Andreas Fink
> Sent: 09 October 2020 08:25
> 
> Can you see this issue with the 5.4 kernel too?

I've never seen anything like it before.
I've probably run the same tests quite a few times on a post 5.4 kernel.
I'll have run them on the same 5.6.0-rc7 (from April 2020) a few times as well.
 
> I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run
> into all sorts of links going off after a while so I had to revert back.
> 5.4 is stable for me. 5.7 is not.
> And I have lots of M2PA and M3UA connections like you

Glad it's just not me...

I've only had it go horribly wrong once.
I've kept in it the failed state to do further tests.
Regressing is hard, my SCCP and MTP3 code is in a big driver
and I'd need to find the correct kernel headers to rebuild it.

If it is still broken now I'll push some single packets through
while running ftrace.

Actual test is TCAP test app connected to a 'local' dual pair of
ss7 protocol stacks (MTP3 has a MTPA-like link to its peer).
The traffic from SCCP is directed to M3UA (not MTP3) into a pair
of M3UA connections (from each half the dual) to a second MTP3
dual, each of which has an MTP2 signalling link to the other pointcode.
The other system is the same.
So I have 4 SS7 protocol stacks on each of 2 pointcodes.
Eight M3UA connections - 16 SCTP sockets.
Normally this would require 8 computers in 4 geographically
separated locations - but they are all piled up on a single system!
Oh - and the tcap app connects by TCP - so that could be on yet
another system!

	David

> 
> > On 8 Oct 2020, at 23:46, David Laight <David.Laight@ACULAB.COM> wrote:
> >
> > One of my local tests (that I run quite often) failed in a 'new and exciting way'.
> > I sent 510 messages through an M3UA-MTP2-M3UA link and only got ~350 at the far end.
> > (This is part of my SS7+TCAP regression test - gets run a lot.)
> > Usually I can find 'lost' messages logged as discarded due to my own flow control.
> > In this case there is no sign of any error traces.
> > I expect to have each message traced 6 times (on each send and receive)
> > but the missing messages are only traced 5 times.
> >
> > Now /proc/net/sctp/snmp has SctpInDataChunksDiscards set to 163,
> > this matches the number of messages I'm missing.
> > Any idea how I can find out why one (or more) the SCTP connections (which are still
> > connected - unless there is a power cut) has discarded a lot of receive packets?
> >
> > Each M3UA link is actually 4 SCTP connections (TCP style 1-1).
> > All are loopback connected to 127.0.0.1 or ::1 (the assocs print is 'interesting').
> > The local port is 'random' the listening port 2905 or 2906.
> > I'd expect the data to loadshare evenly between them but I've not checked the actual
> > distribution.
> > The packet data contains a sequence number, I'm missing all the x1, x2, x9, xa
> > and half the x6 and xe packets - so I think at least one of the sctp connections
> > is just discarding the receive chunks.
> > (I'll sort out which one tomorrow.)
> >
> > This is a 5.6.0-rc7 kernel.
> > I've not seen anything like this before - I've run this same test for
> > over 10 years, probably going back to at least 2.6.28.
> >
> > Data chunks will have gone though all the connections when they were
> > initialised.
> >
> > Is there anything anywhere that indicates why a data chunk was dropped?
> >
> > 	David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> >
> 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
                   ` (2 preceding siblings ...)
  2020-10-09  7:57 ` David Laight
@ 2020-10-09 11:13 ` David Laight
  2020-10-09 11:13   ` David Laight
  2020-10-09 13:03 ` David Laight
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: David Laight @ 2020-10-09 11:13 UTC (permalink / raw)
  To: linux-sctp

From: Andreas Fink
> Sent: 09 October 2020 08:25
> 
> Can you see this issue with the 5.4 kernel too?
> 
> I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of links
> going off after a while so I had to revert back.
> 5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you

I've just spent hours digging through my traces.
It is only some messages through the connection that get lost!

Now SCTP_MIN_IN_DATA_CHUNK_DISCARDS is only incremented in two
adjacent places in sm_statefuncs.c.

Either for bad TSN (unlikely when everything is using "lo")
and bad STREAM.
I suspect it is the 'bad stream' case.
I've not double-checked but I bet the discarded packets
all have a large stream number.

So it is likely that the addition of 'sctp streams' broke
the negotiation of the maximum stream number, and the reporting
of that value back to the application in getsockopt().

I've probably recently changed my test to request 17 streams
(not 5). Since the default number of streams is 10 that may
be why it worked on this kernel before.

There was a similar bug that got fixed very recently.
Ah yes, I wrote this on 14th August:

    At some point the negotiation of the number of SCTP streams
    seems to have got broken.
    I've definitely tested it in the past (probably 10 years ago!)
    but on a 5.8.0 kernel getsockopt(SCTP_INFO) seems to be
    returning the 'num_ostreams' set by setsockopt(SCTP_INIT)
    rather than the smaller of that value and that configured
    at the other end of the connection.

Although I though that stopped packets being set, not received.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-09 11:13 ` David Laight
@ 2020-10-09 11:13   ` David Laight
  0 siblings, 0 replies; 18+ messages in thread
From: David Laight @ 2020-10-09 11:13 UTC (permalink / raw)
  To: 'Andreas Fink',
	Marcelo Ricardo Leitner, 'Neil Horman',
	'Xin Long'
  Cc: linux-sctp

From: Andreas Fink
> Sent: 09 October 2020 08:25
> 
> Can you see this issue with the 5.4 kernel too?
> 
> I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of links
> going off after a while so I had to revert back.
> 5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you

I've just spent hours digging through my traces.
It is only some messages through the connection that get lost!

Now SCTP_MIN_IN_DATA_CHUNK_DISCARDS is only incremented in two
adjacent places in sm_statefuncs.c.

Either for bad TSN (unlikely when everything is using "lo")
and bad STREAM.
I suspect it is the 'bad stream' case.
I've not double-checked but I bet the discarded packets
all have a large stream number.

So it is likely that the addition of 'sctp streams' broke
the negotiation of the maximum stream number, and the reporting
of that value back to the application in getsockopt().

I've probably recently changed my test to request 17 streams
(not 5). Since the default number of streams is 10 that may
be why it worked on this kernel before.

There was a similar bug that got fixed very recently.
Ah yes, I wrote this on 14th August:

    At some point the negotiation of the number of SCTP streams
    seems to have got broken.
    I've definitely tested it in the past (probably 10 years ago!)
    but on a 5.8.0 kernel getsockopt(SCTP_INFO) seems to be
    returning the 'num_ostreams' set by setsockopt(SCTP_INIT)
    rather than the smaller of that value and that configured
    at the other end of the connection.

Although I though that stopped packets being set, not received.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
                   ` (3 preceding siblings ...)
  2020-10-09 11:13 ` David Laight
@ 2020-10-09 13:03 ` David Laight
  2020-10-09 13:03   ` David Laight
  2020-10-10  2:35 ` Xin Long
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: David Laight @ 2020-10-09 13:03 UTC (permalink / raw)
  To: linux-sctp

From: David Laight
> Sent: 09 October 2020 12:14
> 
> From: Andreas Fink
> > Sent: 09 October 2020 08:25
> >
> > Can you see this issue with the 5.4 kernel too?
> >
> > I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of links
> > going off after a while so I had to revert back.
> > 5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you
> 
> I've just spent hours digging through my traces.
> It is only some messages through the connection that get lost!
> 
> Now SCTP_MIN_IN_DATA_CHUNK_DISCARDS is only incremented in two
> adjacent places in sm_statefuncs.c.
> 
> Either for bad TSN (unlikely when everything is using "lo")
> and bad STREAM.
> I suspect it is the 'bad stream' case.
> I've not double-checked but I bet the discarded packets
> all have a large stream number.
...

If I dump out /proc/net/sctp/assocs and look way over to the right
(on the next monitor but 1) there are two columns INS and OUTS.
I've just realised that these are the number of streams.
Now all my connections are loopback - so I see both sockets for each.
So I'd expect the INS to match the OUTS of the peer.
This isn't true.
When the value should be negotiated down the OUTS value is unchanged.
So the kernel is sending packets with illegal stream numbers.
These are acked and then silently discarded.

I've not check a HEAD kernel.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-09 13:03 ` David Laight
@ 2020-10-09 13:03   ` David Laight
  0 siblings, 0 replies; 18+ messages in thread
From: David Laight @ 2020-10-09 13:03 UTC (permalink / raw)
  To: 'Andreas Fink', 'Marcelo Ricardo Leitner',
	'Neil Horman', 'Xin Long'
  Cc: 'linux-sctp@vger.kernel.org'

From: David Laight
> Sent: 09 October 2020 12:14
> 
> From: Andreas Fink
> > Sent: 09 October 2020 08:25
> >
> > Can you see this issue with the 5.4 kernel too?
> >
> > I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of links
> > going off after a while so I had to revert back.
> > 5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you
> 
> I've just spent hours digging through my traces.
> It is only some messages through the connection that get lost!
> 
> Now SCTP_MIN_IN_DATA_CHUNK_DISCARDS is only incremented in two
> adjacent places in sm_statefuncs.c.
> 
> Either for bad TSN (unlikely when everything is using "lo")
> and bad STREAM.
> I suspect it is the 'bad stream' case.
> I've not double-checked but I bet the discarded packets
> all have a large stream number.
...

If I dump out /proc/net/sctp/assocs and look way over to the right
(on the next monitor but 1) there are two columns INS and OUTS.
I've just realised that these are the number of streams.
Now all my connections are loopback - so I see both sockets for each.
So I'd expect the INS to match the OUTS of the peer.
This isn't true.
When the value should be negotiated down the OUTS value is unchanged.
So the kernel is sending packets with illegal stream numbers.
These are acked and then silently discarded.

I've not check a HEAD kernel.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
                   ` (4 preceding siblings ...)
  2020-10-09 13:03 ` David Laight
@ 2020-10-10  2:35 ` Xin Long
  2020-10-10  2:35   ` Xin Long
  2020-10-10 15:10 ` David Laight
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Xin Long @ 2020-10-10  2:35 UTC (permalink / raw)
  To: linux-sctp

On Fri, Oct 9, 2020 at 9:03 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: David Laight
> > Sent: 09 October 2020 12:14
> >
> > From: Andreas Fink
> > > Sent: 09 October 2020 08:25
> > >
> > > Can you see this issue with the 5.4 kernel too?
> > >
> > > I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of links
> > > going off after a while so I had to revert back.
> > > 5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you
> >
> > I've just spent hours digging through my traces.
> > It is only some messages through the connection that get lost!
> >
> > Now SCTP_MIN_IN_DATA_CHUNK_DISCARDS is only incremented in two
> > adjacent places in sm_statefuncs.c.
> >
> > Either for bad TSN (unlikely when everything is using "lo")
> > and bad STREAM.
> > I suspect it is the 'bad stream' case.
> > I've not double-checked but I bet the discarded packets
> > all have a large stream number.
> ...
>
> If I dump out /proc/net/sctp/assocs and look way over to the right
> (on the next monitor but 1) there are two columns INS and OUTS.
> I've just realised that these are the number of streams.
> Now all my connections are loopback - so I see both sockets for each.
> So I'd expect the INS to match the OUTS of the peer.
> This isn't true.
> When the value should be negotiated down the OUTS value is unchanged.
> So the kernel is sending packets with illegal stream numbers.
> These are acked and then silently discarded.
did it do addstream reconfig or receive any duplicate COOKIE-ECHO in your case?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sctp discarding received data chunks
  2020-10-10  2:35 ` Xin Long
@ 2020-10-10  2:35   ` Xin Long
  0 siblings, 0 replies; 18+ messages in thread
From: Xin Long @ 2020-10-10  2:35 UTC (permalink / raw)
  To: David Laight
  Cc: Andreas Fink, Marcelo Ricardo Leitner, Neil Horman, linux-sctp

On Fri, Oct 9, 2020 at 9:03 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: David Laight
> > Sent: 09 October 2020 12:14
> >
> > From: Andreas Fink
> > > Sent: 09 October 2020 08:25
> > >
> > > Can you see this issue with the 5.4 kernel too?
> > >
> > > I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of links
> > > going off after a while so I had to revert back.
> > > 5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you
> >
> > I've just spent hours digging through my traces.
> > It is only some messages through the connection that get lost!
> >
> > Now SCTP_MIN_IN_DATA_CHUNK_DISCARDS is only incremented in two
> > adjacent places in sm_statefuncs.c.
> >
> > Either for bad TSN (unlikely when everything is using "lo")
> > and bad STREAM.
> > I suspect it is the 'bad stream' case.
> > I've not double-checked but I bet the discarded packets
> > all have a large stream number.
> ...
>
> If I dump out /proc/net/sctp/assocs and look way over to the right
> (on the next monitor but 1) there are two columns INS and OUTS.
> I've just realised that these are the number of streams.
> Now all my connections are loopback - so I see both sockets for each.
> So I'd expect the INS to match the OUTS of the peer.
> This isn't true.
> When the value should be negotiated down the OUTS value is unchanged.
> So the kernel is sending packets with illegal stream numbers.
> These are acked and then silently discarded.
did it do addstream reconfig or receive any duplicate COOKIE-ECHO in your case?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
                   ` (5 preceding siblings ...)
  2020-10-10  2:35 ` Xin Long
@ 2020-10-10 15:10 ` David Laight
  2020-10-10 15:10   ` David Laight
  2020-10-11  8:33 ` Andreas Fink
  2020-10-11 15:28 ` David Laight
  8 siblings, 1 reply; 18+ messages in thread
From: David Laight @ 2020-10-10 15:10 UTC (permalink / raw)
  To: linux-sctp

RnJvbTogWGluIExvbmcNCj4gU2VudDogMTAgT2N0b2JlciAyMDIwIDAzOjM1DQo+IE9uIEZyaSwg
T2N0IDksIDIwMjAgYXQgOTowMyBQTSBEYXZpZCBMYWlnaHQgPERhdmlkLkxhaWdodEBhY3VsYWIu
Y29tPiB3cm90ZToNCj4gPg0KPiA+IEZyb206IERhdmlkIExhaWdodA0KPiA+ID4gU2VudDogMDkg
T2N0b2JlciAyMDIwIDEyOjE0DQo+ID4gPg0KPiA+ID4gRnJvbTogQW5kcmVhcyBGaW5rDQo+ID4g
PiA+IFNlbnQ6IDA5IE9jdG9iZXIgMjAyMCAwODoyNQ0KPiA+ID4gPg0KPiA+ID4gPiBDYW4geW91
IHNlZSB0aGlzIGlzc3VlIHdpdGggdGhlIDUuNCBrZXJuZWwgdG9vPw0KPiA+ID4gPg0KPiA+ID4g
PiBJIGRpZCB5ZXN0ZXJkYXkgc29tZSB0ZXN0aW5nIGJ5IHVwZ3JhZGluZyBrZXJuZWwgZnJvbSA1
LjQgdG8gNS43IGFuZCBJIHJ1biBpbnRvIGFsbCBzb3J0cyBvZg0KPiBsaW5rcw0KPiA+ID4gPiBn
b2luZyBvZmYgYWZ0ZXIgYSB3aGlsZSBzbyBJIGhhZCB0byByZXZlcnQgYmFjay4NCj4gPiA+ID4g
NS40IGlzIHN0YWJsZSBmb3IgbWUuIDUuNyBpcyBub3QuIEFuZCBJIGhhdmUgbG90cyBvZiBNMlBB
IGFuZCBNM1VBIGNvbm5lY3Rpb25zIGxpa2UgeW91DQo+ID4gPg0KPiA+ID4gSSd2ZSBqdXN0IHNw
ZW50IGhvdXJzIGRpZ2dpbmcgdGhyb3VnaCBteSB0cmFjZXMuDQo+ID4gPiBJdCBpcyBvbmx5IHNv
bWUgbWVzc2FnZXMgdGhyb3VnaCB0aGUgY29ubmVjdGlvbiB0aGF0IGdldCBsb3N0IQ0KPiA+ID4N
Cj4gPiA+IE5vdyBTQ1RQX01JTl9JTl9EQVRBX0NIVU5LX0RJU0NBUkRTIGlzIG9ubHkgaW5jcmVt
ZW50ZWQgaW4gdHdvDQo+ID4gPiBhZGphY2VudCBwbGFjZXMgaW4gc21fc3RhdGVmdW5jcy5jLg0K
PiA+ID4NCj4gPiA+IEVpdGhlciBmb3IgYmFkIFRTTiAodW5saWtlbHkgd2hlbiBldmVyeXRoaW5n
IGlzIHVzaW5nICJsbyIpDQo+ID4gPiBhbmQgYmFkIFNUUkVBTS4NCj4gPiA+IEkgc3VzcGVjdCBp
dCBpcyB0aGUgJ2JhZCBzdHJlYW0nIGNhc2UuDQo+ID4gPiBJJ3ZlIG5vdCBkb3VibGUtY2hlY2tl
ZCBidXQgSSBiZXQgdGhlIGRpc2NhcmRlZCBwYWNrZXRzDQo+ID4gPiBhbGwgaGF2ZSBhIGxhcmdl
IHN0cmVhbSBudW1iZXIuDQo+ID4gLi4uDQo+ID4NCj4gPiBJZiBJIGR1bXAgb3V0IC9wcm9jL25l
dC9zY3RwL2Fzc29jcyBhbmQgbG9vayB3YXkgb3ZlciB0byB0aGUgcmlnaHQNCj4gPiAob24gdGhl
IG5leHQgbW9uaXRvciBidXQgMSkgdGhlcmUgYXJlIHR3byBjb2x1bW5zIElOUyBhbmQgT1VUUy4N
Cj4gPiBJJ3ZlIGp1c3QgcmVhbGlzZWQgdGhhdCB0aGVzZSBhcmUgdGhlIG51bWJlciBvZiBzdHJl
YW1zLg0KPiA+IE5vdyBhbGwgbXkgY29ubmVjdGlvbnMgYXJlIGxvb3BiYWNrIC0gc28gSSBzZWUg
Ym90aCBzb2NrZXRzIGZvciBlYWNoLg0KPiA+IFNvIEknZCBleHBlY3QgdGhlIElOUyB0byBtYXRj
aCB0aGUgT1VUUyBvZiB0aGUgcGVlci4NCj4gPiBUaGlzIGlzbid0IHRydWUuDQo+ID4gV2hlbiB0
aGUgdmFsdWUgc2hvdWxkIGJlIG5lZ290aWF0ZWQgZG93biB0aGUgT1VUUyB2YWx1ZSBpcyB1bmNo
YW5nZWQuDQo+ID4gU28gdGhlIGtlcm5lbCBpcyBzZW5kaW5nIHBhY2tldHMgd2l0aCBpbGxlZ2Fs
IHN0cmVhbSBudW1iZXJzLg0KPiA+IFRoZXNlIGFyZSBhY2tlZCBhbmQgdGhlbiBzaWxlbnRseSBk
aXNjYXJkZWQuDQoNCj4gZGlkIGl0IGRvIGFkZHN0cmVhbSByZWNvbmZpZyBvciByZWNlaXZlIGFu
eSBkdXBsaWNhdGUgQ09PS0lFLUVDSE8gaW4geW91ciBjYXNlPw0KDQpFeHRyZW1lbHkgdW5saWtl
bHkuDQoNCkxvb2tpbmcgYXQgdGhlIGxhdGVzdCB2ZXJzaW9uIG9mIG15IGRyaXZlciBjb2RlDQoo
d2hpY2ggSSB3YXNuJ3QgdXNpbmcpIEkgd3JvdGUgdGhlIGZvbGxvd2luZzoNCg0KICogU2luY2Ug
dGhlIGNvZGUgdGhhdCBuZWdvdGlhdGVzIHRoZSBudW1iZXIgb2Ygc3RyZWFtcyBnb3QgYnJva2Vu
DQogKiBpbiB2ZXJzaW9uIDUuMSB3ZSBuZWVkIHRvIGV4dHJhY3QgdGhlIGNvcnJlY3QgdmFsdWUg
ZnJvbSB0aGUNCiAqIGludGVybmFsIHN0cnVjdHVyZXMgdG8gYXZvaWQgU0NUUCBzZW5kaW5nIG1l
c3NhZ2VzIHRoZSByZW1vdGUNCiAqIHN5c3RlbSB3aWxsIGRpc2NhcmQuDQoNCiAgICAvKiBzdHJl
YW0ub3V0Y250IGlzIHRoZSB2YWx1ZSB3ZSBzaG91bGQgYmUgdXNpbmcuDQogICAgICogQnV0IGtl
cm5lbHMgNS4xIHRvIDUuOCBmYWlsIHRvIHJlZHVjZSBpdCBiYXNlZCBvbiB0aGUgbnVtYmVyDQog
ICAgICogcmVjZWl2ZWQgZnJvbSB0aGUgcmVtb3RlIHN5c3RlbS4NCiAgICAgKiBTbyBib3VuZCBo
ZXJlIHNvIHRoYXQgdHJhbnNtaXR0ZWQgbWVzc2FnZXMgZG9uJ3QgZ2V0IGRpc2NhcmRlZC4gKi8N
CiAgICBvdXRjbnQgPSBhc29jLT5zdHJlYW0ub3V0Y250Ow0KICAgIG51bV9vc3RyZWFtcyA9IGFz
b2MtPmMuc2luaXRfbnVtX29zdHJlYW1zOw0KDQpJIHRoaW5rIHRoZXJlIHdhcyBhIHBhdGNoIGRv
bmUgZm9yIDUuOS4NCkl0IG5lZWRzIGJhY2stcG9ydGluZy4NCg0KQWx0aG91Z2ggQW5kcmVhcyBz
YWlkIDUuNCB3b3JrZWQgZm9yIGhpbS4NClNvIG1heWJlIGhlIGhhcyBhIGRpZmZlcmVudCBwcm9i
bGVtLg0KDQoJRGF2aWQNCg0KLQ0KUmVnaXN0ZXJlZCBBZGRyZXNzIExha2VzaWRlLCBCcmFtbGV5
IFJvYWQsIE1vdW50IEZhcm0sIE1pbHRvbiBLZXluZXMsIE1LMSAxUFQsIFVLDQpSZWdpc3RyYXRp
b24gTm86IDEzOTczODYgKFdhbGVzKQ0K

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-10 15:10 ` David Laight
@ 2020-10-10 15:10   ` David Laight
  0 siblings, 0 replies; 18+ messages in thread
From: David Laight @ 2020-10-10 15:10 UTC (permalink / raw)
  To: 'Xin Long'
  Cc: Andreas Fink, Marcelo Ricardo Leitner, Neil Horman, linux-sctp

From: Xin Long
> Sent: 10 October 2020 03:35
> On Fri, Oct 9, 2020 at 9:03 PM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: David Laight
> > > Sent: 09 October 2020 12:14
> > >
> > > From: Andreas Fink
> > > > Sent: 09 October 2020 08:25
> > > >
> > > > Can you see this issue with the 5.4 kernel too?
> > > >
> > > > I did yesterday some testing by upgrading kernel from 5.4 to 5.7 and I run into all sorts of
> links
> > > > going off after a while so I had to revert back.
> > > > 5.4 is stable for me. 5.7 is not. And I have lots of M2PA and M3UA connections like you
> > >
> > > I've just spent hours digging through my traces.
> > > It is only some messages through the connection that get lost!
> > >
> > > Now SCTP_MIN_IN_DATA_CHUNK_DISCARDS is only incremented in two
> > > adjacent places in sm_statefuncs.c.
> > >
> > > Either for bad TSN (unlikely when everything is using "lo")
> > > and bad STREAM.
> > > I suspect it is the 'bad stream' case.
> > > I've not double-checked but I bet the discarded packets
> > > all have a large stream number.
> > ...
> >
> > If I dump out /proc/net/sctp/assocs and look way over to the right
> > (on the next monitor but 1) there are two columns INS and OUTS.
> > I've just realised that these are the number of streams.
> > Now all my connections are loopback - so I see both sockets for each.
> > So I'd expect the INS to match the OUTS of the peer.
> > This isn't true.
> > When the value should be negotiated down the OUTS value is unchanged.
> > So the kernel is sending packets with illegal stream numbers.
> > These are acked and then silently discarded.

> did it do addstream reconfig or receive any duplicate COOKIE-ECHO in your case?

Extremely unlikely.

Looking at the latest version of my driver code
(which I wasn't using) I wrote the following:

 * Since the code that negotiates the number of streams got broken
 * in version 5.1 we need to extract the correct value from the
 * internal structures to avoid SCTP sending messages the remote
 * system will discard.

    /* stream.outcnt is the value we should be using.
     * But kernels 5.1 to 5.8 fail to reduce it based on the number
     * received from the remote system.
     * So bound here so that transmitted messages don't get discarded. */
    outcnt = asoc->stream.outcnt;
    num_ostreams = asoc->c.sinit_num_ostreams;

I think there was a patch done for 5.9.
It needs back-porting.

Although Andreas said 5.4 worked for him.
So maybe he has a different problem.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
                   ` (6 preceding siblings ...)
  2020-10-10 15:10 ` David Laight
@ 2020-10-11  8:33 ` Andreas Fink
  2020-10-11  8:33   ` Andreas Fink
  2020-10-11 15:28 ` David Laight
  8 siblings, 1 reply; 18+ messages in thread
From: Andreas Fink @ 2020-10-11  8:33 UTC (permalink / raw)
  To: linux-sctp


> 
> Extremely unlikely.
> 
> Looking at the latest version of my driver code
> (which I wasn't using) I wrote the following:
> 
> * Since the code that negotiates the number of streams got broken
> * in version 5.1 we need to extract the correct value from the
> * internal structures to avoid SCTP sending messages the remote
> * system will discard.
> 
>    /* stream.outcnt is the value we should be using.
>     * But kernels 5.1 to 5.8 fail to reduce it based on the number
>     * received from the remote system.
>     * So bound here so that transmitted messages don't get discarded. */
>    outcnt = asoc->stream.outcnt;
>    num_ostreams = asoc->c.sinit_num_ostreams;
> 
> I think there was a patch done for 5.9.
> It needs back-porting.
> 
> Although Andreas said 5.4 worked for him.
> So maybe he has a different problem.
> 

In my application code, I never use anything else than stream 0 and 1
So I see some other issue in kernel 5.8 which makes it go haywire.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: sctp discarding received data chunks
  2020-10-11  8:33 ` Andreas Fink
@ 2020-10-11  8:33   ` Andreas Fink
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Fink @ 2020-10-11  8:33 UTC (permalink / raw)
  To: David Laight; +Cc: Xin Long, Marcelo Ricardo Leitner, Neil Horman, linux-sctp


> 
> Extremely unlikely.
> 
> Looking at the latest version of my driver code
> (which I wasn't using) I wrote the following:
> 
> * Since the code that negotiates the number of streams got broken
> * in version 5.1 we need to extract the correct value from the
> * internal structures to avoid SCTP sending messages the remote
> * system will discard.
> 
>    /* stream.outcnt is the value we should be using.
>     * But kernels 5.1 to 5.8 fail to reduce it based on the number
>     * received from the remote system.
>     * So bound here so that transmitted messages don't get discarded. */
>    outcnt = asoc->stream.outcnt;
>    num_ostreams = asoc->c.sinit_num_ostreams;
> 
> I think there was a patch done for 5.9.
> It needs back-porting.
> 
> Although Andreas said 5.4 worked for him.
> So maybe he has a different problem.
> 

In my application code, I never use anything else than stream 0 and 1
So I see some other issue in kernel 5.8 which makes it go haywire.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-08 21:46 sctp discarding received data chunks David Laight
                   ` (7 preceding siblings ...)
  2020-10-11  8:33 ` Andreas Fink
@ 2020-10-11 15:28 ` David Laight
  2020-10-11 15:28   ` David Laight
  8 siblings, 1 reply; 18+ messages in thread
From: David Laight @ 2020-10-11 15:28 UTC (permalink / raw)
  To: linux-sctp

Li4uDQo+ID4gPiBJZiBJIGR1bXAgb3V0IC9wcm9jL25ldC9zY3RwL2Fzc29jcyBhbmQgbG9vayB3
YXkgb3ZlciB0byB0aGUgcmlnaHQNCj4gPiA+IChvbiB0aGUgbmV4dCBtb25pdG9yIGJ1dCAxKSB0
aGVyZSBhcmUgdHdvIGNvbHVtbnMgSU5TIGFuZCBPVVRTLg0KPiA+ID4gSSd2ZSBqdXN0IHJlYWxp
c2VkIHRoYXQgdGhlc2UgYXJlIHRoZSBudW1iZXIgb2Ygc3RyZWFtcy4NCj4gPiA+IE5vdyBhbGwg
bXkgY29ubmVjdGlvbnMgYXJlIGxvb3BiYWNrIC0gc28gSSBzZWUgYm90aCBzb2NrZXRzIGZvciBl
YWNoLg0KPiA+ID4gU28gSSdkIGV4cGVjdCB0aGUgSU5TIHRvIG1hdGNoIHRoZSBPVVRTIG9mIHRo
ZSBwZWVyLg0KPiA+ID4gVGhpcyBpc24ndCB0cnVlLg0KPiA+ID4gV2hlbiB0aGUgdmFsdWUgc2hv
dWxkIGJlIG5lZ290aWF0ZWQgZG93biB0aGUgT1VUUyB2YWx1ZSBpcyB1bmNoYW5nZWQuDQo+ID4g
PiBTbyB0aGUga2VybmVsIGlzIHNlbmRpbmcgcGFja2V0cyB3aXRoIGlsbGVnYWwgc3RyZWFtIG51
bWJlcnMuDQo+ID4gPiBUaGVzZSBhcmUgYWNrZWQgYW5kIHRoZW4gc2lsZW50bHkgZGlzY2FyZGVk
Lg0KPiANCj4gPiBkaWQgaXQgZG8gYWRkc3RyZWFtIHJlY29uZmlnIG9yIHJlY2VpdmUgYW55IGR1
cGxpY2F0ZSBDT09LSUUtRUNITyBpbiB5b3VyIGNhc2U/DQo+IA0KPiBFeHRyZW1lbHkgdW5saWtl
bHkuDQo+IA0KPiBMb29raW5nIGF0IHRoZSBsYXRlc3QgdmVyc2lvbiBvZiBteSBkcml2ZXIgY29k
ZQ0KPiAod2hpY2ggSSB3YXNuJ3QgdXNpbmcpIEkgd3JvdGUgdGhlIGZvbGxvd2luZzoNCj4gDQo+
ICAqIFNpbmNlIHRoZSBjb2RlIHRoYXQgbmVnb3RpYXRlcyB0aGUgbnVtYmVyIG9mIHN0cmVhbXMg
Z290IGJyb2tlbg0KPiAgKiBpbiB2ZXJzaW9uIDUuMSB3ZSBuZWVkIHRvIGV4dHJhY3QgdGhlIGNv
cnJlY3QgdmFsdWUgZnJvbSB0aGUNCj4gICogaW50ZXJuYWwgc3RydWN0dXJlcyB0byBhdm9pZCBT
Q1RQIHNlbmRpbmcgbWVzc2FnZXMgdGhlIHJlbW90ZQ0KPiAgKiBzeXN0ZW0gd2lsbCBkaXNjYXJk
Lg0KPiANCj4gICAgIC8qIHN0cmVhbS5vdXRjbnQgaXMgdGhlIHZhbHVlIHdlIHNob3VsZCBiZSB1
c2luZy4NCj4gICAgICAqIEJ1dCBrZXJuZWxzIDUuMSB0byA1LjggZmFpbCB0byByZWR1Y2UgaXQg
YmFzZWQgb24gdGhlIG51bWJlcg0KPiAgICAgICogcmVjZWl2ZWQgZnJvbSB0aGUgcmVtb3RlIHN5
c3RlbS4NCj4gICAgICAqIFNvIGJvdW5kIGhlcmUgc28gdGhhdCB0cmFuc21pdHRlZCBtZXNzYWdl
cyBkb24ndCBnZXQgZGlzY2FyZGVkLiAqLw0KPiAgICAgb3V0Y250ID0gYXNvYy0+c3RyZWFtLm91
dGNudDsNCj4gICAgIG51bV9vc3RyZWFtcyA9IGFzb2MtPmMuc2luaXRfbnVtX29zdHJlYW1zOw0K
PiANCj4gSSB0aGluayB0aGVyZSB3YXMgYSBwYXRjaCBkb25lIGZvciA1LjkuDQo+IEl0IG5lZWRz
IGJhY2stcG9ydGluZy4NCg0KWWVzLCBJIHdyb3RlIHRoZSBwYXRjaC4NCkFwcGxpZWQgdG8gbmV0
LW5leHQuDQoNCmh0dHBzOi8vZ2l0Lmtlcm5lbC5vcmcvcHViL3NjbS9saW51eC9rZXJuZWwvZ2l0
L25leHQvbGludXgtbmV4dC5naXQvY29tbWl0L25ldC9zY3RwL3N0cmVhbS5jP2lkPWFiOTIxZjNj
ZGJlYzAxYzY4NzA1YTdhZGU4YmVjNjI4ZDU0MWZjMmINCg0KCURhdmlkDQoNCi0NClJlZ2lzdGVy
ZWQgQWRkcmVzcyBMYWtlc2lkZSwgQnJhbWxleSBSb2FkLCBNb3VudCBGYXJtLCBNaWx0b24gS2V5
bmVzLCBNSzEgMVBULCBVSw0KUmVnaXN0cmF0aW9uIE5vOiAxMzk3Mzg2IChXYWxlcykNCg=

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: sctp discarding received data chunks
  2020-10-11 15:28 ` David Laight
@ 2020-10-11 15:28   ` David Laight
  0 siblings, 0 replies; 18+ messages in thread
From: David Laight @ 2020-10-11 15:28 UTC (permalink / raw)
  To: David Laight, 'Xin Long'
  Cc: Andreas Fink, Marcelo Ricardo Leitner, Neil Horman, linux-sctp

...
> > > If I dump out /proc/net/sctp/assocs and look way over to the right
> > > (on the next monitor but 1) there are two columns INS and OUTS.
> > > I've just realised that these are the number of streams.
> > > Now all my connections are loopback - so I see both sockets for each.
> > > So I'd expect the INS to match the OUTS of the peer.
> > > This isn't true.
> > > When the value should be negotiated down the OUTS value is unchanged.
> > > So the kernel is sending packets with illegal stream numbers.
> > > These are acked and then silently discarded.
> 
> > did it do addstream reconfig or receive any duplicate COOKIE-ECHO in your case?
> 
> Extremely unlikely.
> 
> Looking at the latest version of my driver code
> (which I wasn't using) I wrote the following:
> 
>  * Since the code that negotiates the number of streams got broken
>  * in version 5.1 we need to extract the correct value from the
>  * internal structures to avoid SCTP sending messages the remote
>  * system will discard.
> 
>     /* stream.outcnt is the value we should be using.
>      * But kernels 5.1 to 5.8 fail to reduce it based on the number
>      * received from the remote system.
>      * So bound here so that transmitted messages don't get discarded. */
>     outcnt = asoc->stream.outcnt;
>     num_ostreams = asoc->c.sinit_num_ostreams;
> 
> I think there was a patch done for 5.9.
> It needs back-porting.

Yes, I wrote the patch.
Applied to net-next.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/net/sctp/stream.c?id=ab921f3cdbec01c68705a7ade8bec628d541fc2b

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-10-11 15:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-08 21:46 sctp discarding received data chunks David Laight
2020-10-08 21:46 ` David Laight
2020-10-09  7:24 ` Andreas Fink
2020-10-09  7:24   ` Andreas Fink
2020-10-09  7:57 ` David Laight
2020-10-09  7:57   ` David Laight
2020-10-09 11:13 ` David Laight
2020-10-09 11:13   ` David Laight
2020-10-09 13:03 ` David Laight
2020-10-09 13:03   ` David Laight
2020-10-10  2:35 ` Xin Long
2020-10-10  2:35   ` Xin Long
2020-10-10 15:10 ` David Laight
2020-10-10 15:10   ` David Laight
2020-10-11  8:33 ` Andreas Fink
2020-10-11  8:33   ` Andreas Fink
2020-10-11 15:28 ` David Laight
2020-10-11 15:28   ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).