* kernel ax25: lost and out-of-order data
@ 2016-02-12 18:12 Thomas Osterried
2016-02-13 12:41 ` Thomas Osterried
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Osterried @ 2016-02-12 18:12 UTC (permalink / raw)
To: linux-hams
Hello,
this is a bug report for the kernel ax25 outqueue:
- some packets with different content are transmitted with the same AX.25 sequence-number.
- some data is sent twice; we also observe out-of-order data
I generated a file with testpatterns, which consists of a line number (aligned to 256 bytes) in the form "007811 deadbeefde...\n". The file size is 2MB.
I downloaded the file via an AX.25 connection from db0fhn's linux to db0fhn's linux via db0fhn's xnet ( call -r -s dl9sau bpq1 db0fhn-9 db0fhn ).
I traced the packets with listen -a and stored the trace (for being able to see what happened).
The files differ (/tmp/testfile.raw is the original file, /var/ax25/testfile.raw is the file downloaded with call):
diff /tmp/testfile.raw /var/ax25/testfile.raw
< 000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
...
# grep 000699 /tmp/testfile.raw /var/ax25/testfile.raw
/tmp/testfile.raw:000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
=> Line number 000699 was not received.
listen -a shows the packets from linux to the xnet program:
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256
0000 000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
0000 000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
0000 000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256
0000 000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
..
You see the different content (000698 und 00069) in the SAME AX.25 sequenze-Number (I51)?
This came back from the xnet digi:
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I46^ pid=F0(Text) len 256
0000 000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I47^ pid=F0(Text) len 256
0000 000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I40^ pid=F0(Text) len 256
0000 000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I41^ pid=F0(Text) len 256
0000 000702 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
=> Packet 000699 is missing.
xnet did not complain about the packets with the different data and same sequence number (and silently dropped the second packet).
A verification if data corruption has happened (loss of characters):
root@db0fhn:/home/thomas# grep ^0 /var/ax25/testfile.raw | awk '{print $2}' |sort -u
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
root@db0fhn:/home/thomas# grep ^0 /tmp/testfile.raw | awk '{print $2}' |sort -u
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
root@db0fhn:/home/thomas#
=> We do not have shortened lines or wrong characters.
The losses of our user content lines occur blockwise:
Looking for missing "lines" in the download (because it had the same sequence number than its predecessor):
$ for i in $(grep ^0 /tmp/testfile.raw |awk '{print $1}'); do grep -q "^$i " /var/ax25/testfile.raw || echo $i ; done
[When an error occurs, it mostly occurs more times again, and then becomes healthy again. I make it more readable by writing empty lines after a "block"]:
000699
000701
000703
000705
001230
001232
001234
001236
001238
001240
001242
001783
001785
002160
002162
002164
002166
002168
002170
002172
002952
002954
002956
002958
002960
002962
002964
003336
003338
003340
003342
003344
003346
003890
003892
003894
003896
004815
004817
004819
004821
004823
004825
004827
005901
005903
005905
005907
005909
005911
006091
006093
006095
006097
006099
006101
006103
The testfile ends with line number 007812.
One thing is very obvious: after one affected packet, the next one is ok again. Most of the time, this repeats, up to 7 times.
The largest blocks contain 7 lost lines, which means, that out of 14 packets every second packet was sent with the same sequence-number.
We also had out-of-order-receiption. line numbers, where the previous packet had a higher line number than the current:
$ LAST=0; for i in $(grep ^0 /var/ax25/testfile.raw |awk '{print $1}'); do if [ $LAST -gt $i ]; then echo $LAST $i ;fi ; LAST=$i ; done|less
000891 000862
001789 001787
001799 001787
002179 002173
002971 002965
003909 003897
$
In the example of 001789 we look at the trace file of listen -a:
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256
0000 001786 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
^ 001787 is missing
v
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
0000 001788 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52+ pid=F0(Text) len 256
0000 001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256
0000 001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
^ here is 001787, which comes after 1789.
v [be aware of the next packet for line 001790 has the same sequence-number as the previous-previous-packet]
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256
0000 001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256
0000 001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
Unfortunately, it's not enough. Line 001787 was received 2 times: shown here with context +/- one line:
$ grep -B1 -A1 001787 /var/ax25/testfile.raw
001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
--
001799 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Furthermore, the second occurence is completely out of order ( 001799 , then 001787 a second time, then 001790 ).
===========
Consistency verify of my test environment: axspawn works ok:
axspwan reads from pty of his child (login->shell->bget) and writes the output to the kernel
strace -p 8312 -e write -s 2048 2> /tmp/axspawn-write-trace
Original file:
$ wc /tmp/testfile.raw
7813 15626 2000128 /tmp/testfile.raw
$ grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut -d'"' -f2-|awk '{print $1}' |wc
7813 7813 54691
=> every line of the file testfile.raw was written to the kernel.
Every line is a direct successor of the previous line:
$ p=-1; for i in $(grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut -d'"' -f2-|awk '{print $1}'); do if [ $(echo $p +1 -$i |bc) != 0 ] ; then echo $i $p ;fi; p=$i ; done
$
76 times an -1 EAGAIN occured and the line was written again (because the kernel buffers were full) until it succeeded:
$ grep 'write(1, "0' /tmp/axspawn-write-trace |grep "= -1 EAGAIN ("|wc
76 760 24472
$
The testfile was generated with generate-testfile.c :
#include <stdio.h>
unsigned long datasize = 2*1000*1000L;
char * payload="deadbeef";
int main()
{
unsigned long num = 0;
for (num = 0; (num *256) < datasize; num++) {
int i;
printf("%6.6ld ", num);
for (i = 0; i < 31; i++)
printf(payload);
printf("\n");
}
}
vy 73,
- Thomas dl9sau
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: kernel ax25: lost and out-of-order data
2016-02-12 18:12 kernel ax25: lost and out-of-order data Thomas Osterried
@ 2016-02-13 12:41 ` Thomas Osterried
2016-02-13 22:23 ` David Ranch
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Osterried @ 2016-02-13 12:41 UTC (permalink / raw)
To: linux-hams
Hello,
1. New findings:
if the kernel runs on a single CPU machine, everything is ok. I transferred the 2MB testfile three times and it has been always received ok.
=> I suppose it's an SMP issue.
2. The affected system is debian wheezy with standard kernel
Linux db0fhn 3.2.0-4-amd64 #1 SMP Debian 3.2.73-2+deb7u2 x86_64 GNU/Linux
vy 73,
- Thomas dl9sau
On Fri, Feb 12, 2016 at 07:12:12PM +0100, Thomas Osterried wrote:
> Hello,
>
> this is a bug report for the kernel ax25 outqueue:
>
> - some packets with different content are transmitted with the same AX.25 sequence-number.
> - some data is sent twice; we also observe out-of-order data
>
> I generated a file with testpatterns, which consists of a line number (aligned to 256 bytes) in the form "007811 deadbeefde...\n". The file size is 2MB.
>
> I downloaded the file via an AX.25 connection from db0fhn's linux to db0fhn's linux via db0fhn's xnet ( call -r -s dl9sau bpq1 db0fhn-9 db0fhn ).
>
> I traced the packets with listen -a and stored the trace (for being able to see what happened).
>
> The files differ (/tmp/testfile.raw is the original file, /var/ax25/testfile.raw is the file downloaded with call):
> diff /tmp/testfile.raw /var/ax25/testfile.raw
> < 000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
> dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
> fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> ...
>
>
> # grep 000699 /tmp/testfile.raw /var/ax25/testfile.raw
> /tmp/testfile.raw:000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdea
> dbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
> fdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
>
> => Line number 000699 was not received.
>
>
>
> listen -a shows the packets from linux to the xnet program:
>
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256
> 0000 000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
> 0000 000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
> 0000 000699 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256
> 0000 000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> ..
>
> You see the different content (000698 und 00069) in the SAME AX.25 sequenze-Number (I51)?
>
>
> This came back from the xnet digi:
>
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I46^ pid=F0(Text) len 256
> 0000 000697 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I47^ pid=F0(Text) len 256
> 0000 000698 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I40^ pid=F0(Text) len 256
> 0000 000700 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN* ctl I41^ pid=F0(Text) len 256
> 0000 000702 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
>
> => Packet 000699 is missing.
>
> xnet did not complain about the packets with the different data and same sequence number (and silently dropped the second packet).
>
>
> A verification if data corruption has happened (loss of characters):
>
>
> root@db0fhn:/home/thomas# grep ^0 /var/ax25/testfile.raw | awk '{print $2}' |sort -u
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> root@db0fhn:/home/thomas# grep ^0 /tmp/testfile.raw | awk '{print $2}' |sort -u
> deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> root@db0fhn:/home/thomas#
>
> => We do not have shortened lines or wrong characters.
>
>
>
>
> The losses of our user content lines occur blockwise:
>
> Looking for missing "lines" in the download (because it had the same sequence number than its predecessor):
>
> $ for i in $(grep ^0 /tmp/testfile.raw |awk '{print $1}'); do grep -q "^$i " /var/ax25/testfile.raw || echo $i ; done
>
> [When an error occurs, it mostly occurs more times again, and then becomes healthy again. I make it more readable by writing empty lines after a "block"]:
>
> 000699
> 000701
> 000703
> 000705
>
> 001230
> 001232
> 001234
> 001236
> 001238
> 001240
> 001242
>
> 001783
> 001785
>
> 002160
> 002162
> 002164
> 002166
> 002168
> 002170
> 002172
>
> 002952
> 002954
> 002956
> 002958
> 002960
> 002962
> 002964
>
> 003336
> 003338
> 003340
> 003342
> 003344
> 003346
>
> 003890
> 003892
> 003894
> 003896
>
> 004815
> 004817
> 004819
> 004821
> 004823
> 004825
> 004827
>
> 005901
> 005903
> 005905
> 005907
> 005909
> 005911
>
> 006091
> 006093
> 006095
> 006097
> 006099
> 006101
> 006103
>
>
> The testfile ends with line number 007812.
>
>
> One thing is very obvious: after one affected packet, the next one is ok again. Most of the time, this repeats, up to 7 times.
> The largest blocks contain 7 lost lines, which means, that out of 14 packets every second packet was sent with the same sequence-number.
>
>
>
> We also had out-of-order-receiption. line numbers, where the previous packet had a higher line number than the current:
>
>
> $ LAST=0; for i in $(grep ^0 /var/ax25/testfile.raw |awk '{print $1}'); do if [ $LAST -gt $i ]; then echo $LAST $i ;fi ; LAST=$i ; done|less
> 000891 000862
> 001789 001787
> 001799 001787
> 002179 002173
> 002971 002965
> 003909 003897
> $
>
> In the example of 001789 we look at the trace file of listen -a:
>
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I50^ pid=F0(Text) len 256
> 0000 001786 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
>
> ^ 001787 is missing
> v
>
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I51^ pid=F0(Text) len 256
> 0000 001788 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52+ pid=F0(Text) len 256
> 0000 001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256
> 0000 001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
>
> ^ here is 001787, which comes after 1789.
> v [be aware of the next packet for line 001790 has the same sequence-number as the previous-previous-packet]
>
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I52^ pid=F0(Text) len 256
> 0000 001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0080 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 00C0 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef.
> bpq1: fm DB0FHN-9 to DL9SAU via DB0FHN ctl I53+ pid=F0(Text) len 256
> 0000 001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> 0040 eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
>
>
> Unfortunately, it's not enough. Line 001787 was received 2 times: shown here with context +/- one line:
>
> $ grep -B1 -A1 001787 /var/ax25/testfile.raw
> 001789 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefd
> eadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001791 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> --
> 001799 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001787 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> 001790 deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
>
> Furthermore, the second occurence is completely out of order ( 001799 , then 001787 a second time, then 001790 ).
>
>
>
> ===========
>
> Consistency verify of my test environment: axspawn works ok:
>
> axspwan reads from pty of his child (login->shell->bget) and writes the output to the kernel
>
> strace -p 8312 -e write -s 2048 2> /tmp/axspawn-write-trace
>
> Original file:
> $ wc /tmp/testfile.raw
> 7813 15626 2000128 /tmp/testfile.raw
>
> $ grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut -d'"' -f2-|awk '{print $1}' |wc
> 7813 7813 54691
>
> => every line of the file testfile.raw was written to the kernel.
>
> Every line is a direct successor of the previous line:
> $ p=-1; for i in $(grep 'write(1, "0' /tmp/axspawn-write-trace |grep -v "= -1 EAGAIN (" |cut -d'"' -f2-|awk '{print $1}'); do if [ $(echo $p +1 -$i |bc) != 0 ] ; then echo $i $p ;fi; p=$i ; done
> $
>
> 76 times an -1 EAGAIN occured and the line was written again (because the kernel buffers were full) until it succeeded:
> $ grep 'write(1, "0' /tmp/axspawn-write-trace |grep "= -1 EAGAIN ("|wc
> 76 760 24472
> $
>
>
>
> The testfile was generated with generate-testfile.c :
>
> #include <stdio.h>
>
> unsigned long datasize = 2*1000*1000L;
>
> char * payload="deadbeef";
>
> int main()
> {
> unsigned long num = 0;
>
> for (num = 0; (num *256) < datasize; num++) {
> int i;
> printf("%6.6ld ", num);
> for (i = 0; i < 31; i++)
> printf(payload);
> printf("\n");
> }
> }
>
>
>
> vy 73,
> - Thomas dl9sau
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hams" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: kernel ax25: lost and out-of-order data
2016-02-13 12:41 ` Thomas Osterried
@ 2016-02-13 22:23 ` David Ranch
0 siblings, 0 replies; 3+ messages in thread
From: David Ranch @ 2016-02-13 22:23 UTC (permalink / raw)
To: linux-hams
Hello Thomas,
> 1. New findings:
> if the kernel runs on a single CPU machine, everything is ok. I transferred the 2MB testfile three times and it has been always received ok.
> => I suppose it's an SMP issue.
Wow.. that's a key find! Nice work! Now that you narrowed this down to
a threading issue with a easy way to reproduce it, I hope you'll get
some traction here from the local kernel experts. If you don't see much
movement, I imagine taking this up to the main kernel list would be
warranted.
--David
KI6ZHD
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-02-13 22:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-12 18:12 kernel ax25: lost and out-of-order data Thomas Osterried
2016-02-13 12:41 ` Thomas Osterried
2016-02-13 22:23 ` David Ranch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.