All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel ax25: lost and out-of-order data
@ 2016-02-12 18:12 Thomas Osterried
  2016-02-13 12:41 ` Thomas Osterried
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Osterried @ 2016-02-12 18:12 UTC (permalink / raw)
  To: linux-hams

Hello,

this is a bug report for the kernel ax25 outqueue:

- some packets with different content are transmitted with the same AX.25 sequence-number.
- some data is sent twice; we also observe out-of-order data

I generated a file with testpatterns, which consists of a line number (aligned to 256 bytes) in the form "007811 deadbeefde...\n". The file size is 2MB.

I downloaded the file via an AX.25 connection from db0fhn's linux to db0fhn's linux via db0fhn's xnet ( call -r -s dl9sau bpq1 db0fhn-9 db0fhn ).

I traced the packets with listen -a and stored the trace (for being able to see what happened).

The files differ (/tmp/testfile.raw is the original file,  /var/ax25/testfile.raw is the file downloaded with call):
diff /tmp/testfile.raw /var/ax25/testfile.raw
< 000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
...


# grep 000699 /tmp/testfile.raw /var/ax25/testfile.raw
/tmp/testfile.raw:000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef

=> Line number 000699 was not received.



listen -a shows the packets from linux to the xnet program:

bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256
0000  000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
0000  000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
0000  000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256
0000  000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
..

You see the different content (000698 und 00069) in the SAME AX.25 sequenze-Number (I51)?


This came back from the xnet digi:

bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I46^ pid=F0(Text) len 256
0000  000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I47^ pid=F0(Text) len 256
0000  000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I40^ pid=F0(Text) len 256
0000  000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I41^ pid=F0(Text) len 256
0000  000702 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd

=> Packet 000699 is missing.

xnet did not complain about the packets with the different data and same sequence number (and silently dropped the second packet).


A verification if data corruption has happened (loss of characters):


root@db0fhn:/home/thomas# grep ^0 /var/ax25/testfile.raw  | awk '{print $2}' |sort -u
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
root@db0fhn:/home/thomas# grep ^0 /tmp/testfile.raw  | awk '{print $2}' |sort -u
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
root@db0fhn:/home/thomas# 

=> We do not have shortened lines or wrong characters.




The losses of our user content lines occur blockwise:

Looking for missing "lines" in the download (because it had the same sequence number than its predecessor):

$ for i in $(grep ^0 /tmp/testfile.raw |awk '{print $1}'); do grep -q "^$i " /var/ax25/testfile.raw || echo $i ; done

[When an error occurs, it mostly occurs more times again, and then becomes healthy again. I make it more readable by writing empty lines after a "block"]:

000699
000701
000703
000705

001230
001232
001234
001236
001238
001240
001242

001783
001785

002160
002162
002164
002166
002168
002170
002172

002952
002954
002956
002958
002960
002962
002964

003336
003338
003340
003342
003344
003346

003890
003892
003894
003896

004815
004817
004819
004821
004823
004825
004827

005901
005903
005905
005907
005909
005911

006091
006093
006095
006097
006099
006101
006103


The testfile ends with line number 007812.


One thing is very obvious: after one affected packet, the next one is ok again. Most of the time, this repeats, up to 7 times.
The largest blocks contain 7 lost lines, which means, that out of 14 packets every second packet was sent with the same sequence-number.



We also had out-of-order-receiption. line numbers, where the previous packet had a higher line number than the current:


$ LAST=0; for i in $(grep ^0 /var/ax25/testfile.raw |awk '{print $1}'); do if [ $LAST -gt $i ]; then echo $LAST $i ;fi ; LAST=$i  ; done|less
000891 000862
001789 001787
001799 001787
002179 002173
002971 002965
003909 003897
$ 

In the example of 001789 we look at the trace file of listen -a:

bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256 
0000  001786 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.

^ 001787 is missing
v

bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256 
0000  001788 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52+ pid=F0(Text) len 256 
0000  001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256 
0000  001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.

^ here is 001787, which comes after 1789.
v [be aware of the next packet for line 001790 has the same sequence-number as the previous-previous-packet]

bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256 
0000  001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256 
0000  001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd


Unfortunately, it's not enough. Line 001787 was received 2 times: shown here with context +/- one line:

$ grep -B1 -A1 001787 /var/ax25/testfile.raw 
001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
--
001799 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef

Furthermore, the second occurence is completely out of order ( 001799 , then 001787 a second time, then 001790 ).



===========

Consistency verify of my test environment: axspawn works ok:

axspwan reads from pty of his child (login->shell->bget) and writes the output to the kernel

strace -p 8312 -e write -s 2048 2> /tmp/axspawn-write-trace

Original file:
$ wc /tmp/testfile.raw 
   7813   15626 2000128 /tmp/testfile.raw

$ grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut -d'"' -f2-|awk '{print $1}' |wc
   7813    7813   54691

=> every line of the file testfile.raw was written to the kernel.

Every line is a direct successor of the previous line:
$ p=-1; for i in $(grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut -d'"' -f2-|awk '{print $1}'); do if [ $(echo $p +1 -$i |bc) != 0 ] ; then echo $i $p ;fi; p=$i ; done
$ 

76 times an -1 EAGAIN occured and the line was written again (because the kernel buffers were full) until it succeeded:
$ grep 'write(1, "0' /tmp/axspawn-write-trace |grep  "= -1 EAGAIN ("|wc
     76     760   24472
$ 



The testfile was generated with generate-testfile.c :

#include <stdio.h>

unsigned long datasize = 2*1000*1000L;

char * payload="deadbeef";

int main()
{
  unsigned long num = 0;

  for (num = 0; (num *256) < datasize; num++) {
    int i;
    printf("%6.6ld ", num);
    for (i = 0; i < 31; i++)
      printf(payload);
    printf("\n");
  }
}



vy 73,
	- Thomas  dl9sau

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel ax25: lost and out-of-order data
  2016-02-12 18:12 kernel ax25: lost and out-of-order data Thomas Osterried
@ 2016-02-13 12:41 ` Thomas Osterried
  2016-02-13 22:23   ` David Ranch
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Osterried @ 2016-02-13 12:41 UTC (permalink / raw)
  To: linux-hams

Hello,

1. New findings:
  if the kernel runs on a single CPU machine, everything is ok. I transferred the 2MB testfile three times and it has been always received ok.
  => I suppose it's an SMP issue.  

2. The affected system is debian wheezy with standard kernel 
   Linux db0fhn 3.2.0-4-amd64 #1 SMP Debian 3.2.73-2+deb7u2 x86_64 GNU/Linux

vy 73,
	- Thomas  dl9sau


On Fri, Feb 12, 2016 at 07:12:12PM +0100, Thomas Osterried wrote:
> Hello,
> 
> this is a bug report for the kernel ax25 outqueue:
> 
> - some packets with different content are transmitted with the same AX.25 sequence-number.
> - some data is sent twice; we also observe out-of-order data
> 
> I generated a file with testpatterns, which consists of a line number (aligned to 256 bytes) in the form "007811 deadbeefde...\n". The file size is 2MB.
> 
> I downloaded the file via an AX.25 connection from db0fhn's linux to db0fhn's linux via db0fhn's xnet ( call -r -s dl9sau bpq1 db0fhn-9 db0fhn ).
> 
> I traced the packets with listen -a and stored the trace (for being able to see what happened).
> 
> The files differ (/tmp/testfile.raw is the original file,  /var/ax25/testfile.raw is the file downloaded with call):
> diff /tmp/testfile.raw /var/ax25/testfile.raw
> < 000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
> dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
> fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> ...
> 
> 
> # grep 000699 /tmp/testfile.raw /var/ax25/testfile.raw
> /tmp/testfile.raw:000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
> dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
> fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 
> => Line number 000699 was not received.
> 
> 
> 
> listen -a shows the packets from linux to the xnet program:
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256
> 0000  000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
> 0000  000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
> 0000  000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256
> 0000  000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> ..
> 
> You see the different content (000698 und 00069) in the SAME AX.25 sequenze-Number (I51)?
> 
> 
> This came back from the xnet digi:
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I46^ pid=F0(Text) len 256
> 0000  000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I47^ pid=F0(Text) len 256
> 0000  000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I40^ pid=F0(Text) len 256
> 0000  000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I41^ pid=F0(Text) len 256
> 0000  000702 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 
> => Packet 000699 is missing.
> 
> xnet did not complain about the packets with the different data and same sequence number (and silently dropped the second packet).
> 
> 
> A verification if data corruption has happened (loss of characters):
> 
> 
> root@db0fhn:/home/thomas# grep ^0 /var/ax25/testfile.raw  | awk '{print $2}' |sort -u
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> root@db0fhn:/home/thomas# grep ^0 /tmp/testfile.raw  | awk '{print $2}' |sort -u
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> root@db0fhn:/home/thomas# 
> 
> => We do not have shortened lines or wrong characters.
> 
> 
> 
> 
> The losses of our user content lines occur blockwise:
> 
> Looking for missing "lines" in the download (because it had the same sequence number than its predecessor):
> 
> $ for i in $(grep ^0 /tmp/testfile.raw |awk '{print $1}'); do grep -q "^$i " /var/ax25/testfile.raw || echo $i ; done
> 
> [When an error occurs, it mostly occurs more times again, and then becomes healthy again. I make it more readable by writing empty lines after a "block"]:
> 
> 000699
> 000701
> 000703
> 000705
> 
> 001230
> 001232
> 001234
> 001236
> 001238
> 001240
> 001242
> 
> 001783
> 001785
> 
> 002160
> 002162
> 002164
> 002166
> 002168
> 002170
> 002172
> 
> 002952
> 002954
> 002956
> 002958
> 002960
> 002962
> 002964
> 
> 003336
> 003338
> 003340
> 003342
> 003344
> 003346
> 
> 003890
> 003892
> 003894
> 003896
> 
> 004815
> 004817
> 004819
> 004821
> 004823
> 004825
> 004827
> 
> 005901
> 005903
> 005905
> 005907
> 005909
> 005911
> 
> 006091
> 006093
> 006095
> 006097
> 006099
> 006101
> 006103
> 
> 
> The testfile ends with line number 007812.
> 
> 
> One thing is very obvious: after one affected packet, the next one is ok again. Most of the time, this repeats, up to 7 times.
> The largest blocks contain 7 lost lines, which means, that out of 14 packets every second packet was sent with the same sequence-number.
> 
> 
> 
> We also had out-of-order-receiption. line numbers, where the previous packet had a higher line number than the current:
> 
> 
> $ LAST=0; for i in $(grep ^0 /var/ax25/testfile.raw |awk '{print $1}'); do if [ $LAST -gt $i ]; then echo $LAST $i ;fi ; LAST=$i  ; done|less
> 000891 000862
> 001789 001787
> 001799 001787
> 002179 002173
> 002971 002965
> 003909 003897
> $ 
> 
> In the example of 001789 we look at the trace file of listen -a:
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256 
> 0000  001786 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> 
> ^ 001787 is missing
> v
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256 
> 0000  001788 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52+ pid=F0(Text) len 256 
> 0000  001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256 
> 0000  001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> 
> ^ here is 001787, which comes after 1789.
> v [be aware of the next packet for line 001790 has the same sequence-number as the previous-previous-packet]
> 
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256 
> 0000  001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256 
> 0000  001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040  eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 
> 
> Unfortunately, it's not enough. Line 001787 was received 2 times: shown here with context +/- one line:
> 
> $ grep -B1 -A1 001787 /var/ax25/testfile.raw 
> 001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> --
> 001799 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 
> Furthermore, the second occurence is completely out of order ( 001799 , then 001787 a second time, then 001790 ).
> 
> 
> 
> ===========
> 
> Consistency verify of my test environment: axspawn works ok:
> 
> axspwan reads from pty of his child (login->shell->bget) and writes the output to the kernel
> 
> strace -p 8312 -e write -s 2048 2> /tmp/axspawn-write-trace
> 
> Original file:
> $ wc /tmp/testfile.raw 
>    7813   15626 2000128 /tmp/testfile.raw
> 
> $ grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut -d'"' -f2-|awk '{print $1}' |wc
>    7813    7813   54691
> 
> => every line of the file testfile.raw was written to the kernel.
> 
> Every line is a direct successor of the previous line:
> $ p=-1; for i in $(grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut -d'"' -f2-|awk '{print $1}'); do if [ $(echo $p +1 -$i |bc) != 0 ] ; then echo $i $p ;fi; p=$i ; done
> $ 
> 
> 76 times an -1 EAGAIN occured and the line was written again (because the kernel buffers were full) until it succeeded:
> $ grep 'write(1, "0' /tmp/axspawn-write-trace |grep  "= -1 EAGAIN ("|wc
>      76     760   24472
> $ 
> 
> 
> 
> The testfile was generated with generate-testfile.c :
> 
> #include <stdio.h>
> 
> unsigned long datasize = 2*1000*1000L;
> 
> char * payload="deadbeef";
> 
> int main()
> {
>   unsigned long num = 0;
> 
>   for (num = 0; (num *256) < datasize; num++) {
>     int i;
>     printf("%6.6ld ", num);
>     for (i = 0; i < 31; i++)
>       printf(payload);
>     printf("\n");
>   }
> }
> 
> 
> 
> vy 73,
> 	- Thomas  dl9sau
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hams" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel ax25: lost and out-of-order data
  2016-02-13 12:41 ` Thomas Osterried
@ 2016-02-13 22:23   ` David Ranch
  0 siblings, 0 replies; 3+ messages in thread
From: David Ranch @ 2016-02-13 22:23 UTC (permalink / raw)
  To: linux-hams


Hello Thomas,

> 1. New findings:
>    if the kernel runs on a single CPU machine, everything is ok. I transferred the 2MB testfile three times and it has been always received ok.
>    => I suppose it's an SMP issue.

Wow.. that's a key find!  Nice work!  Now that you narrowed this down to 
a threading issue with a easy way to reproduce it, I hope you'll get 
some traction here from the local kernel experts.  If you don't see much 
movement, I imagine taking this up to the main kernel list would be 
warranted.

--David
KI6ZHD


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-02-13 22:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-12 18:12 kernel ax25: lost and out-of-order data Thomas Osterried
2016-02-13 12:41 ` Thomas Osterried
2016-02-13 22:23   ` David Ranch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.