* 2.6.29 & network stack strangeness
@ 2009-06-05 15:15 Matthew Lear
2009-06-05 15:49 ` Finn Thain
0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-05 15:15 UTC (permalink / raw)
To: linux-m68k
Hello all,
I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
and I'm having some throughput problems when running network tests.
The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
a lease from the dhcp server and configures eth0. Network connectivity is ok. I
can ping the target from the host and vice versa.
1/
If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
several mins, the kernel reports 'unexpected interrupt from 24' which is the
vector for a spurious interrupt. This message will repeat randomly (from what I
saw it appeared ~ 20 times when running the ping test above for 40 mins). The
mcf54455 reference manual describes a possible cause for spurious interrupts.
However, this test very rarely reports any packet loss, although the max time to
receive a packet can be very large indeed.
2/
If I reboot, start again and run a ping flood test (ping -f) from host pc ->
target, all icmp requests are acknowledged - for a while. Before the target
begins to fail to respond to the icmp requests, running top shows that the
ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
processing the deferred tasks of processing data fired up to the network stack.
So when the target beings to stop responding to icmp, if I then stop the ping
flood and try to ping the host from the target, there is no reply indicated by
ping. However, if you do this with a packet sniffer running (eg wireshark) you
can see that data is still being transmitted from the target -> host and you can
see the icmp reply, only the reply from the host appears to be received ok by
the fec driver but is processed by the network stack target.
When in this state, a proc entry that I added to the fec driver shows that the
last return value from netif_rx() (called in the fec rx interrupt handling
routine) is 1, indicating that the last packet was dropped by the network stack,
e.g.
~ # cat /proc/driver/fec
total interrupts: 1421619
last interrupt type: 2 [1=tx, 2=rx, 3=mii]
total tx interrupts: 709148
total rx interrupts: 712472
total mii interrupts: 1
last interrupt event: 0x2000000
total eberr interrupts: 0
total hberr interrupts: 0
tx loop current count: 0
tx loop last count: 1
rx loop current count: 0
rx loop last count: 1
rx last cbd ctrl/status: 0x800
rx last cbd len: 346
rx last cbd buff addr: 0x40410000
rx last netif_rx status: 1
Strangely, wireshark still shows data being transmitted from the target
-> host. I can see ARP requests and I can also see DHCP discovery packets being
sent by the target when its DHCP lease expires. This all looks ok, only the
reply from host -> target is never processed by the target as the network stack
is in a state where it is dropping all incoming data provided to it by the driver.
I believe udhcpc utilises the network device directly, ie it does not require an
intermediate network protocol being implemented in the kernel (tcpdump is
similar).
The fec driver still seems to be running ok because I can see the ring buffer
address changing when data is received. Everything seems to be ok apart from the
network stack. Very strange indeed.
Running network throughput tests between host and target with netcat or netperf
only run for a few seconds before activity ceases.
Has anybody experienced anything similar? Why does the network stack appear to
be stuck and constantly dropping packets?
Any feedback appreciated.
Rgds,
-- Matt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness
2009-06-05 15:15 2.6.29 & network stack strangeness Matthew Lear
@ 2009-06-05 15:49 ` Finn Thain
2009-06-05 16:17 ` Matthew Lear
0 siblings, 1 reply; 8+ messages in thread
From: Finn Thain @ 2009-06-05 15:49 UTC (permalink / raw)
To: Matthew Lear; +Cc: linux-m68k
Does the problem manifest only when the DHCP lease expires?
Can you reproduce the problem with a static IP?
Finn
On Fri, 5 Jun 2009, Matthew Lear wrote:
> Hello all,
>
> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
> and I'm having some throughput problems when running network tests.
>
> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
> can ping the target from the host and vice versa.
>
> 1/
> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
> several mins, the kernel reports 'unexpected interrupt from 24' which is the
> vector for a spurious interrupt. This message will repeat randomly (from what I
> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
> mcf54455 reference manual describes a possible cause for spurious interrupts.
> However, this test very rarely reports any packet loss, although the max time to
> receive a packet can be very large indeed.
>
> 2/
> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
> target, all icmp requests are acknowledged - for a while. Before the target
> begins to fail to respond to the icmp requests, running top shows that the
> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
> processing the deferred tasks of processing data fired up to the network stack.
> So when the target beings to stop responding to icmp, if I then stop the ping
> flood and try to ping the host from the target, there is no reply indicated by
> ping. However, if you do this with a packet sniffer running (eg wireshark) you
> can see that data is still being transmitted from the target -> host and you can
> see the icmp reply, only the reply from the host appears to be received ok by
> the fec driver but is processed by the network stack target.
>
> When in this state, a proc entry that I added to the fec driver shows that the
> last return value from netif_rx() (called in the fec rx interrupt handling
> routine) is 1, indicating that the last packet was dropped by the network stack,
> e.g.
>
> ~ # cat /proc/driver/fec
> total interrupts: 1421619
> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
> total tx interrupts: 709148
> total rx interrupts: 712472
> total mii interrupts: 1
> last interrupt event: 0x2000000
> total eberr interrupts: 0
> total hberr interrupts: 0
> tx loop current count: 0
> tx loop last count: 1
> rx loop current count: 0
> rx loop last count: 1
> rx last cbd ctrl/status: 0x800
> rx last cbd len: 346
> rx last cbd buff addr: 0x40410000
> rx last netif_rx status: 1
>
> Strangely, wireshark still shows data being transmitted from the target
> -> host. I can see ARP requests and I can also see DHCP discovery packets being
> sent by the target when its DHCP lease expires. This all looks ok, only the
> reply from host -> target is never processed by the target as the network stack
> is in a state where it is dropping all incoming data provided to it by the driver.
>
> I believe udhcpc utilises the network device directly, ie it does not require an
> intermediate network protocol being implemented in the kernel (tcpdump is
> similar).
>
> The fec driver still seems to be running ok because I can see the ring buffer
> address changing when data is received. Everything seems to be ok apart from the
> network stack. Very strange indeed.
>
> Running network throughput tests between host and target with netcat or netperf
> only run for a few seconds before activity ceases.
>
> Has anybody experienced anything similar? Why does the network stack appear to
> be stuck and constantly dropping packets?
>
> Any feedback appreciated.
>
> Rgds,
> -- Matt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness
2009-06-05 15:49 ` Finn Thain
@ 2009-06-05 16:17 ` Matthew Lear
2009-06-05 16:37 ` Finn Thain
0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-05 16:17 UTC (permalink / raw)
To: Finn Thain; +Cc: linux-m68k
Hi - thanks for your reply.
The problem doesn't manifest only when the DHCP lease expires and I can still
reproduce the problem with a static IP. With or without DHCP makes no difference.
It seems to effect socket comms quite seriously (and quickly). If I run a simple
server program on the host that listens on a socket and writes a response string
to the socket when it receives data, and on the target I run a simple client
program which writes a string to the socket, reads and prints the response sent
the server, I only have to send data from client to server with a delay of 1ms
between transmissions for a few seconds and the client program hangs on calling
read() on the socket fd.
If I run a simple netcat test, eg
on target: nc -l -p 3333 > /dev/null
on host: dd if=/dev/zero | nc <target-ip> 3333
...strangely, once activity on the ethernet link as a result of the netcat test
ceases, running netstat -a on the target hangs for several seconds, eg:
~ # nc -l -p 3333 > /dev/null &
~ # netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:login *:* LISTEN
tcp 0 0 *:shell *:* LISTEN
tcp 0 0 *:sunrpc *:* LISTEN
tcp 0 0 *:finger *:* LISTEN
tcp 0 0 *:auth *:* LISTEN
tcp 0 0 *:ftp *:* LISTEN
tcp 0 0 *:telnet *:* LISTEN
<system hangs for several seconds here>
tcp 0 0 192.168.0.11:3333 gateway0:45645
ESTABLISHED
udp 0 0 *:ntalk *:*
udp 0 0 *:sunrpc *:*
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 4 [ ] DGRAM 111 /dev/log
unix 3 [ ] STREAM CONNECTED 123
unix 3 [ ] STREAM CONNECTED 122
unix 2 [ ] DGRAM 120
unix 2 [ ] DGRAM 114
~ #
I thought this was interesting. Also, after this, I have trouble entering
characters over the serial port / console. It seems like interrupts may having
trouble getting serviced but this may be a side-effect...
If you run the same netstat command with strace, you can see that the delay is
caused by polling the socket following calling send:
...
...
gettimeofday({366, 470000}, NULL) = 0
poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
0x4000) = 43
poll(
<delay is here>
[{fd=4, events=POLLIN}], 1, 5000) = 0
...
...
-- Matt
Finn Thain wrote:
> Does the problem manifest only when the DHCP lease expires?
> Can you reproduce the problem with a static IP?
>
> Finn
>
>
> On Fri, 5 Jun 2009, Matthew Lear wrote:
>
>> Hello all,
>>
>> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
>> and I'm having some throughput problems when running network tests.
>>
>> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
>> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
>> can ping the target from the host and vice versa.
>>
>> 1/
>> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
>> several mins, the kernel reports 'unexpected interrupt from 24' which is the
>> vector for a spurious interrupt. This message will repeat randomly (from what I
>> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
>> mcf54455 reference manual describes a possible cause for spurious interrupts.
>> However, this test very rarely reports any packet loss, although the max time to
>> receive a packet can be very large indeed.
>>
>> 2/
>> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
>> target, all icmp requests are acknowledged - for a while. Before the target
>> begins to fail to respond to the icmp requests, running top shows that the
>> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
>> processing the deferred tasks of processing data fired up to the network stack.
>> So when the target beings to stop responding to icmp, if I then stop the ping
>> flood and try to ping the host from the target, there is no reply indicated by
>> ping. However, if you do this with a packet sniffer running (eg wireshark) you
>> can see that data is still being transmitted from the target -> host and you can
>> see the icmp reply, only the reply from the host appears to be received ok by
>> the fec driver but is processed by the network stack target.
>>
>> When in this state, a proc entry that I added to the fec driver shows that the
>> last return value from netif_rx() (called in the fec rx interrupt handling
>> routine) is 1, indicating that the last packet was dropped by the network stack,
>> e.g.
>>
>> ~ # cat /proc/driver/fec
>> total interrupts: 1421619
>> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
>> total tx interrupts: 709148
>> total rx interrupts: 712472
>> total mii interrupts: 1
>> last interrupt event: 0x2000000
>> total eberr interrupts: 0
>> total hberr interrupts: 0
>> tx loop current count: 0
>> tx loop last count: 1
>> rx loop current count: 0
>> rx loop last count: 1
>> rx last cbd ctrl/status: 0x800
>> rx last cbd len: 346
>> rx last cbd buff addr: 0x40410000
>> rx last netif_rx status: 1
>>
>> Strangely, wireshark still shows data being transmitted from the target
>> -> host. I can see ARP requests and I can also see DHCP discovery packets being
>> sent by the target when its DHCP lease expires. This all looks ok, only the
>> reply from host -> target is never processed by the target as the network stack
>> is in a state where it is dropping all incoming data provided to it by the driver.
>>
>> I believe udhcpc utilises the network device directly, ie it does not require an
>> intermediate network protocol being implemented in the kernel (tcpdump is
>> similar).
>>
>> The fec driver still seems to be running ok because I can see the ring buffer
>> address changing when data is received. Everything seems to be ok apart from the
>> network stack. Very strange indeed.
>>
>> Running network throughput tests between host and target with netcat or netperf
>> only run for a few seconds before activity ceases.
>>
>> Has anybody experienced anything similar? Why does the network stack appear to
>> be stuck and constantly dropping packets?
>>
>> Any feedback appreciated.
>>
>> Rgds,
>> -- Matt
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness
2009-06-05 16:17 ` Matthew Lear
@ 2009-06-05 16:37 ` Finn Thain
2009-06-05 16:44 ` Matthew Lear
0 siblings, 1 reply; 8+ messages in thread
From: Finn Thain @ 2009-06-05 16:37 UTC (permalink / raw)
To: Matthew Lear; +Cc: linux-m68k
My only guess would be that the network stack delayed work queues depend
upon working timer interrupts...
But since I have no knowledge of your hardware, I don't think I'll be a
lot of help with that.
Finn
On Fri, 5 Jun 2009, Matthew Lear wrote:
> Hi - thanks for your reply.
>
> The problem doesn't manifest only when the DHCP lease expires and I can still
> reproduce the problem with a static IP. With or without DHCP makes no difference.
>
> It seems to effect socket comms quite seriously (and quickly). If I run a simple
> server program on the host that listens on a socket and writes a response string
> to the socket when it receives data, and on the target I run a simple client
> program which writes a string to the socket, reads and prints the response sent
> the server, I only have to send data from client to server with a delay of 1ms
> between transmissions for a few seconds and the client program hangs on calling
> read() on the socket fd.
>
> If I run a simple netcat test, eg
>
> on target: nc -l -p 3333 > /dev/null
> on host: dd if=/dev/zero | nc <target-ip> 3333
>
> ...strangely, once activity on the ethernet link as a result of the netcat test
> ceases, running netstat -a on the target hangs for several seconds, eg:
>
>
> ~ # nc -l -p 3333 > /dev/null &
> ~ # netstat -a
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> tcp 0 0 *:login *:* LISTEN
> tcp 0 0 *:shell *:* LISTEN
> tcp 0 0 *:sunrpc *:* LISTEN
> tcp 0 0 *:finger *:* LISTEN
> tcp 0 0 *:auth *:* LISTEN
> tcp 0 0 *:ftp *:* LISTEN
> tcp 0 0 *:telnet *:* LISTEN
>
> <system hangs for several seconds here>
>
> tcp 0 0 192.168.0.11:3333 gateway0:45645
> ESTABLISHED
> udp 0 0 *:ntalk *:*
> udp 0 0 *:sunrpc *:*
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags Type State I-Node Path
> unix 4 [ ] DGRAM 111 /dev/log
> unix 3 [ ] STREAM CONNECTED 123
> unix 3 [ ] STREAM CONNECTED 122
> unix 2 [ ] DGRAM 120
> unix 2 [ ] DGRAM 114
> ~ #
>
> I thought this was interesting. Also, after this, I have trouble entering
> characters over the serial port / console. It seems like interrupts may having
> trouble getting serviced but this may be a side-effect...
>
> If you run the same netstat command with strace, you can see that the delay is
> caused by polling the socket following calling send:
>
> ...
> ...
> gettimeofday({366, 470000}, NULL) = 0
> poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
> send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
> 0x4000) = 43
> poll(
>
>
> <delay is here>
>
>
> [{fd=4, events=POLLIN}], 1, 5000) = 0
> ...
> ...
>
> -- Matt
>
>
> Finn Thain wrote:
> > Does the problem manifest only when the DHCP lease expires?
> > Can you reproduce the problem with a static IP?
> >
> > Finn
> >
> >
> > On Fri, 5 Jun 2009, Matthew Lear wrote:
> >
> >> Hello all,
> >>
> >> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
> >> and I'm having some throughput problems when running network tests.
> >>
> >> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
> >> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
> >> can ping the target from the host and vice versa.
> >>
> >> 1/
> >> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
> >> several mins, the kernel reports 'unexpected interrupt from 24' which is the
> >> vector for a spurious interrupt. This message will repeat randomly (from what I
> >> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
> >> mcf54455 reference manual describes a possible cause for spurious interrupts.
> >> However, this test very rarely reports any packet loss, although the max time to
> >> receive a packet can be very large indeed.
> >>
> >> 2/
> >> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
> >> target, all icmp requests are acknowledged - for a while. Before the target
> >> begins to fail to respond to the icmp requests, running top shows that the
> >> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
> >> processing the deferred tasks of processing data fired up to the network stack.
> >> So when the target beings to stop responding to icmp, if I then stop the ping
> >> flood and try to ping the host from the target, there is no reply indicated by
> >> ping. However, if you do this with a packet sniffer running (eg wireshark) you
> >> can see that data is still being transmitted from the target -> host and you can
> >> see the icmp reply, only the reply from the host appears to be received ok by
> >> the fec driver but is processed by the network stack target.
> >>
> >> When in this state, a proc entry that I added to the fec driver shows that the
> >> last return value from netif_rx() (called in the fec rx interrupt handling
> >> routine) is 1, indicating that the last packet was dropped by the network stack,
> >> e.g.
> >>
> >> ~ # cat /proc/driver/fec
> >> total interrupts: 1421619
> >> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
> >> total tx interrupts: 709148
> >> total rx interrupts: 712472
> >> total mii interrupts: 1
> >> last interrupt event: 0x2000000
> >> total eberr interrupts: 0
> >> total hberr interrupts: 0
> >> tx loop current count: 0
> >> tx loop last count: 1
> >> rx loop current count: 0
> >> rx loop last count: 1
> >> rx last cbd ctrl/status: 0x800
> >> rx last cbd len: 346
> >> rx last cbd buff addr: 0x40410000
> >> rx last netif_rx status: 1
> >>
> >> Strangely, wireshark still shows data being transmitted from the target
> >> -> host. I can see ARP requests and I can also see DHCP discovery packets being
> >> sent by the target when its DHCP lease expires. This all looks ok, only the
> >> reply from host -> target is never processed by the target as the network stack
> >> is in a state where it is dropping all incoming data provided to it by the driver.
> >>
> >> I believe udhcpc utilises the network device directly, ie it does not require an
> >> intermediate network protocol being implemented in the kernel (tcpdump is
> >> similar).
> >>
> >> The fec driver still seems to be running ok because I can see the ring buffer
> >> address changing when data is received. Everything seems to be ok apart from the
> >> network stack. Very strange indeed.
> >>
> >> Running network throughput tests between host and target with netcat or netperf
> >> only run for a few seconds before activity ceases.
> >>
> >> Has anybody experienced anything similar? Why does the network stack appear to
> >> be stuck and constantly dropping packets?
> >>
> >> Any feedback appreciated.
> >>
> >> Rgds,
> >> -- Matt
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness
2009-06-05 16:37 ` Finn Thain
@ 2009-06-05 16:44 ` Matthew Lear
[not found] ` <4A2DC70F.7080401@freescale.com>
0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-05 16:44 UTC (permalink / raw)
To: Finn Thain; +Cc: linux-m68k
Yes. I was suspecting that all may not be well in that area... Current set up is
a 10ms tick with CONFIG_HZ set to 100. Further investigation is required I think.
-- Matt
Finn Thain wrote:
> My only guess would be that the network stack delayed work queues depend
> upon working timer interrupts...
>
> But since I have no knowledge of your hardware, I don't think I'll be a
> lot of help with that.
>
> Finn
>
>
> On Fri, 5 Jun 2009, Matthew Lear wrote:
>
>> Hi - thanks for your reply.
>>
>> The problem doesn't manifest only when the DHCP lease expires and I can still
>> reproduce the problem with a static IP. With or without DHCP makes no difference.
>>
>> It seems to effect socket comms quite seriously (and quickly). If I run a simple
>> server program on the host that listens on a socket and writes a response string
>> to the socket when it receives data, and on the target I run a simple client
>> program which writes a string to the socket, reads and prints the response sent
>> the server, I only have to send data from client to server with a delay of 1ms
>> between transmissions for a few seconds and the client program hangs on calling
>> read() on the socket fd.
>>
>> If I run a simple netcat test, eg
>>
>> on target: nc -l -p 3333 > /dev/null
>> on host: dd if=/dev/zero | nc <target-ip> 3333
>>
>> ...strangely, once activity on the ethernet link as a result of the netcat test
>> ceases, running netstat -a on the target hangs for several seconds, eg:
>>
>>
>> ~ # nc -l -p 3333 > /dev/null &
>> ~ # netstat -a
>> Active Internet connections (servers and established)
>> Proto Recv-Q Send-Q Local Address Foreign Address State
>> tcp 0 0 *:login *:* LISTEN
>> tcp 0 0 *:shell *:* LISTEN
>> tcp 0 0 *:sunrpc *:* LISTEN
>> tcp 0 0 *:finger *:* LISTEN
>> tcp 0 0 *:auth *:* LISTEN
>> tcp 0 0 *:ftp *:* LISTEN
>> tcp 0 0 *:telnet *:* LISTEN
>>
>> <system hangs for several seconds here>
>>
>> tcp 0 0 192.168.0.11:3333 gateway0:45645
>> ESTABLISHED
>> udp 0 0 *:ntalk *:*
>> udp 0 0 *:sunrpc *:*
>> Active UNIX domain sockets (servers and established)
>> Proto RefCnt Flags Type State I-Node Path
>> unix 4 [ ] DGRAM 111 /dev/log
>> unix 3 [ ] STREAM CONNECTED 123
>> unix 3 [ ] STREAM CONNECTED 122
>> unix 2 [ ] DGRAM 120
>> unix 2 [ ] DGRAM 114
>> ~ #
>>
>> I thought this was interesting. Also, after this, I have trouble entering
>> characters over the serial port / console. It seems like interrupts may having
>> trouble getting serviced but this may be a side-effect...
>>
>> If you run the same netstat command with strace, you can see that the delay is
>> caused by polling the socket following calling send:
>>
>> ...
>> ...
>> gettimeofday({366, 470000}, NULL) = 0
>> poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
>> send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
>> 0x4000) = 43
>> poll(
>>
>>
>> <delay is here>
>>
>>
>> [{fd=4, events=POLLIN}], 1, 5000) = 0
>> ...
>> ...
>>
>> -- Matt
>>
>>
>> Finn Thain wrote:
>>> Does the problem manifest only when the DHCP lease expires?
>>> Can you reproduce the problem with a static IP?
>>>
>>> Finn
>>>
>>>
>>> On Fri, 5 Jun 2009, Matthew Lear wrote:
>>>
>>>> Hello all,
>>>>
>>>> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
>>>> and I'm having some throughput problems when running network tests.
>>>>
>>>> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
>>>> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
>>>> can ping the target from the host and vice versa.
>>>>
>>>> 1/
>>>> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
>>>> several mins, the kernel reports 'unexpected interrupt from 24' which is the
>>>> vector for a spurious interrupt. This message will repeat randomly (from what I
>>>> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
>>>> mcf54455 reference manual describes a possible cause for spurious interrupts.
>>>> However, this test very rarely reports any packet loss, although the max time to
>>>> receive a packet can be very large indeed.
>>>>
>>>> 2/
>>>> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
>>>> target, all icmp requests are acknowledged - for a while. Before the target
>>>> begins to fail to respond to the icmp requests, running top shows that the
>>>> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
>>>> processing the deferred tasks of processing data fired up to the network stack.
>>>> So when the target beings to stop responding to icmp, if I then stop the ping
>>>> flood and try to ping the host from the target, there is no reply indicated by
>>>> ping. However, if you do this with a packet sniffer running (eg wireshark) you
>>>> can see that data is still being transmitted from the target -> host and you can
>>>> see the icmp reply, only the reply from the host appears to be received ok by
>>>> the fec driver but is processed by the network stack target.
>>>>
>>>> When in this state, a proc entry that I added to the fec driver shows that the
>>>> last return value from netif_rx() (called in the fec rx interrupt handling
>>>> routine) is 1, indicating that the last packet was dropped by the network stack,
>>>> e.g.
>>>>
>>>> ~ # cat /proc/driver/fec
>>>> total interrupts: 1421619
>>>> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
>>>> total tx interrupts: 709148
>>>> total rx interrupts: 712472
>>>> total mii interrupts: 1
>>>> last interrupt event: 0x2000000
>>>> total eberr interrupts: 0
>>>> total hberr interrupts: 0
>>>> tx loop current count: 0
>>>> tx loop last count: 1
>>>> rx loop current count: 0
>>>> rx loop last count: 1
>>>> rx last cbd ctrl/status: 0x800
>>>> rx last cbd len: 346
>>>> rx last cbd buff addr: 0x40410000
>>>> rx last netif_rx status: 1
>>>>
>>>> Strangely, wireshark still shows data being transmitted from the target
>>>> -> host. I can see ARP requests and I can also see DHCP discovery packets being
>>>> sent by the target when its DHCP lease expires. This all looks ok, only the
>>>> reply from host -> target is never processed by the target as the network stack
>>>> is in a state where it is dropping all incoming data provided to it by the driver.
>>>>
>>>> I believe udhcpc utilises the network device directly, ie it does not require an
>>>> intermediate network protocol being implemented in the kernel (tcpdump is
>>>> similar).
>>>>
>>>> The fec driver still seems to be running ok because I can see the ring buffer
>>>> address changing when data is received. Everything seems to be ok apart from the
>>>> network stack. Very strange indeed.
>>>>
>>>> Running network throughput tests between host and target with netcat or netperf
>>>> only run for a few seconds before activity ceases.
>>>>
>>>> Has anybody experienced anything similar? Why does the network stack appear to
>>>> be stuck and constantly dropping packets?
>>>>
>>>> Any feedback appreciated.
>>>>
>>>> Rgds,
>>>> -- Matt
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness
[not found] ` <4A2DC70F.7080401@freescale.com>
@ 2009-06-09 10:04 ` Matthew Lear
[not found] ` <4A2F1C74.1010800@freescale.com>
0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-09 10:04 UTC (permalink / raw)
To: Lanttor; +Cc: linux-m68k
> Could you run "top" command and print out its information?
> "cat /proc/stat" && "cat /proc/interrupts" information are also
> appreciated.
Sure. Info below.
Before running netcat test:
top - 00:02:37 up 2 min, 1 user, load average: 0.07, 0.07, 0.02
Tasks: 24 total, 1 running, 23 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.7%us, 0.0%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.7%hi, 0.0%si,
0.0%st
Mem: 254912k total, 6680k used, 248232k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 3008k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
359 root 20 0 2856 1160 944 R 1.0 0.5 0:01.49 top
3 root 15 -5 0 0 0 S 0.3 0.0 0:00.11 ksoftirqd/0
1 root 20 0 3280 776 656 S 0.0 0.3 0:01.46 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 events/0
6 root 15 -5 0 0 0 S 0.0 0.0 0:00.06 khelper
56 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kblockd/0
62 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
80 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
81 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
82 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kswapd0
83 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
84 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 nfsiod
215 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 mtdblockd
227 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 hid_compat
230 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 rpciod/0
~ # cat /proc/stat
cpu 209 0 0 16892 0 757 0 0 0
cpu0 209 0 0 16892 0 757 0 0 0
intr 79221 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 61353 0 0 0 0 0 17858 0 0 5 2 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
ctxt 72511
btime 0
processes 360
procs_running 1
procs_blocked 0
~ #
~ # cat /proc/interrupts
M5445X 90: 62109 UART
M5445X 96: 19793 timer
M5445X 99: 6 timer
M5445X 100: 2 fec(TXF)
M5445X 101: 0 fec(TXB)
M5445X 102: 0 fec(TXFIFO)
M5445X 103: 0 fec(TXCR)
M5445X 104: 2 fec(RXF)
M5445X 105: 0 fec(RXB)
M5445X 106: 1 fec(MII)
M5445X 107: 0 fec(LC)
M5445X 108: 0 fec(HBERR)
M5445X 109: 0 fec(GRA)
M5445X 110: 0 fec(EBERR)
M5445X 111: 0 fec(BABT)
M5445X 112: 0 fec(BABR)
~ #
Run netcat test:
target: nc -l -p 3333 > /dev/null &
host: dd if=/dev/zero | nc <target-ip> 3333
Netcat test activity on ethernet ceases after only a few seconds.
Wireshark concurs.
top - 00:05:04 up 5 min, 1 user, load average: 0.00, 0.04, 0.01
Tasks: 25 total, 1 running, 24 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.7%us, 0.0%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.7%hi, 0.0%si,
0.0%st
Mem: 254912k total, 6904k used, 248008k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 3008k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
363 root 20 0 2856 1160 944 R 1.0 0.5 0:00.27 top
362 root 20 0 3280 688 576 S 0.3 0.3 0:00.05 nc
1 root 20 0 3280 776 656 S 0.0 0.3 0:01.46 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 15 -5 0 0 0 S 0.0 0.0 0:00.12 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 events/0
6 root 15 -5 0 0 0 S 0.0 0.0 0:00.06 khelper
56 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kblockd/0
62 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
80 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
81 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
82 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kswapd0
83 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
84 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 nfsiod
215 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 mtdblockd
227 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 hid_compat
~ # cat /proc/stat
cpu 228 0 0 31924 0 778 0 0 0
cpu0 228 0 0 31924 0 778 0 0 0
intr 106918 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 73732 0 0 0 0 0 32930 0 0 10 96 0 0 0 149 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 110292
btime 0
processes 364
procs_running 1
procs_blocked 0
~ #
~ # cat /proc/interrupts
M5445X 90: 74523 UART
M5445X 96: 34522 timer
M5445X 99: 10 timer
M5445X 100: 100 fec(TXF)
M5445X 101: 0 fec(TXB)
M5445X 102: 0 fec(TXFIFO)
M5445X 103: 0 fec(TXCR)
M5445X 104: 155 fec(RXF)
M5445X 105: 0 fec(RXB)
M5445X 106: 1 fec(MII)
M5445X 107: 0 fec(LC)
M5445X 108: 0 fec(HBERR)
M5445X 109: 0 fec(GRA)
M5445X 110: 0 fec(EBERR)
M5445X 111: 0 fec(BABT)
M5445X 112: 0 fec(BABR)
~ #
At this point, my proc entry in the fec driver does not indicate any problem:
~ # cat /proc/driver/fec
total interrupts: 284
last interrupt type: 2 [1=tx, 2=rx, 3=mii]
total tx interrupts: 111
total rx interrupts: 172
total mii interrupts: 1
last interrupt event: 0x2000000
total eberr interrupts: 0
total hberr interrupts: 0
tx loop current count: 0
tx loop last count: 1
rx loop current count: 0
rx loop last count: 1
rx last cbd ctrl/status: 0x800
rx last cbd len: 64
rx last cbd buff addr: 0x40407800
rx last netif_rx status: 0
~ #
When starting the netcat test, activity on the ethernet ceases after only
a few seconds. After stopping the dd/nc on the host, I cannot ping the
host from the target or vice versa and the rs-232 console running on the
target does not respond to every key press (usually every second key
press).
I instrumented some of the raw packet handling in the network stack to
attempt to glean a little more info about ping. Running ping from target
to host shows:
~ # ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
[ 490.489407] ping:367 /mnt/nht/linux-2.6.29/net/core/datagram.c:99
wait_for_packet - interrupted
[ 490.500812] ping:367 /mnt/nht/linux-2.6.29/net/core/datagram.c:110
wait_for_packet - timeo=2147483647, err=-512
[ 490.513637] ping:367 /mnt/nht/linux-2.6.29/net/ipv4/raw.c:668
raw_recvmsg - skb_recv_datagram
[ 490.524790] ping:367 /mnt/nht/linux-2.6.29/net/ipv4/raw.c:701
raw_recvmsg - returning err -512
[ 490.536035] ping:367 /mnt/nht/linux-2.6.29/net/core/sock.c:1869
sock_common_recvmsg - recvmsg returned -512
[ 490.548497] ping:367
/mnt/nht/linux-2.6.29/arch/m68k/coldfire/signal.c:804 ERESTART
Other tests which induce similar problems in the network stack include
using wget to pull a large file onto the target.
I can also force netif_rx() to eventually constantly return 1 (pkt
dropped) by using a flood ping at high frequency (see previous post). This
is a different scenario that netcat/wget but possible related to same
cause.
-- Matt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness
[not found] ` <4A2F1C74.1010800@freescale.com>
@ 2009-06-10 9:36 ` Matthew Lear
[not found] ` <4A2F8318.5080206@freescale.com>
0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-10 9:36 UTC (permalink / raw)
To: Lanttor; +Cc: linux-m68k
Lanttor wrote:
> I met the interrupt issue before on mcf5445x platform, following is my fixed patch, but I am not sure it could help resolve your problem, just have a try :-)
Thanks but this makes no difference. From what I can tell, the patch to entry.S
is essentially benign because ret_from_interrupt() is not invoked anywhere.
Also, the patch to ints.c adds invocations of irq_enter() and irq_exit()
to process_int(). These macros really just force entry/exit to/from hard
interrupt context and increment/decrement a count (used for tracking
nested interrupts). Given that process_int() is called by inthandler() in
entry.S, I think this is always going to be in interrupt context anyway, so I
can't see the advantage of this. Maybe I'm missing something?
>
> For your mcf54455 platform, if you use one ether net port, try to disable second ether net port (FEC2) on the menuconfig to see whether it would affect the net working.
Yes. I've already got the second net port disabled.
I have had some level of success by calling flush_cache_all() immediately after
calling netif_rx() in the rx part of the fec interrupt handler - fec_enet_rx()
in fec.c.
This plus a hack to netif_rx() seems to vastly improve things in that I can wget
large files, run netcat tests although in the cast of netcat there is still a
high number of rx overruns in the fec driver.
Hack:
--- a/net/core/dev.c 2009-05-12 07:02:24.000000000 +0100
+++ b/net/core/dev.c 2009-06-10 10:26:47.000000000 +0100
@@ -1969,16 +1969,18 @@
__skb_queue_tail(&queue->input_pkt_queue, skb);
local_irq_restore(flags);
return NET_RX_SUCCESS;
}
napi_schedule(&queue->backlog);
goto enqueue;
}
+ else
+ napi_schedule(&queue->backlog);
__get_cpu_var(netdev_rx_stat).dropped++;
local_irq_restore(flags);
kfree_skb(skb);
return NET_RX_DROP;
}
This effectively ignores the backlog check and schedules the receive function to
run.
I think all this points to is that there is a timing problem in the system. That
said, I can believe the cache coherency issue as I've read some postings on the
net about similar issues with ethernet tx on ppc...
-- Matt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness
[not found] ` <4A2F8318.5080206@freescale.com>
@ 2009-06-10 10:59 ` Matthew Lear
0 siblings, 0 replies; 8+ messages in thread
From: Matthew Lear @ 2009-06-10 10:59 UTC (permalink / raw)
To: Lanttor; +Cc: linux-m68k
Lanttor wrote:
>> Thanks but this makes no difference. From what I can tell, the patch to entry.S
>> is essentially benign because ret_from_interrupt() is not invoked anywhere.
>
> ret_from_interrupt() is the part of inthandler(), and it's called by inthandler() at end.
>
Indeed it is. My apologies.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-06-10 10:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-05 15:15 2.6.29 & network stack strangeness Matthew Lear
2009-06-05 15:49 ` Finn Thain
2009-06-05 16:17 ` Matthew Lear
2009-06-05 16:37 ` Finn Thain
2009-06-05 16:44 ` Matthew Lear
[not found] ` <4A2DC70F.7080401@freescale.com>
2009-06-09 10:04 ` Matthew Lear
[not found] ` <4A2F1C74.1010800@freescale.com>
2009-06-10 9:36 ` Matthew Lear
[not found] ` <4A2F8318.5080206@freescale.com>
2009-06-10 10:59 ` Matthew Lear
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.