* MPC5200B, many FEC_IEVENT_RFIFO_ERRORs until link down
@ 2010-03-30 10:00 Roman Fietze
2010-03-30 10:46 ` Wolfgang Grandegger
0 siblings, 1 reply; 5+ messages in thread
From: Roman Fietze @ 2010-03-30 10:00 UTC (permalink / raw)
To: linuxppc-dev
Hello,
I think this is a never ending story. This error still happens under
higher load every few seconds, until I get a "PHY: f0003000:00 - Link
is Down", on my box easiliy reproducable after maybe 15 to 30 seconds.
I can recover using "ip link set down/up dev eth0".
I double checked that I'm using the most recent version of this driver
(checked with DENX, benh master/next, using Wolfgang Denk's version of
the 2.6.33), this includes the locking patches from Asier Llano, the
hard setting of mii_speed in the PHY mdio transfer routine of course.
I tried all 8 combinations of PLDIS, BSDIS and SE, with and without
CONFIG_NOT_COHERENT_CACHE.
As some of you probably remember, I'm running this controller under
high load on FEC, ATA and LPC. As soon as "the" load is going above a
certain level I get those FEC RFIFO errors, sometimes ATA errors
(MWDMA2) and sometimes even lost SDMA interrupts using BestComm with
the SCLPC (now switched back to simple PIO). I quite sure almost all
of this is the BestComm's fault.
Did somebody already try the latest NAPI patches, which might give me
a slight chance to have a workaround? Any idea or upcoming patch to
address this problem once more, and if it's just by recovering e.g.
within mpc52xx_fec_mdio_transfer's timeout using some other dirty
workaround?
Roman
=2D-=20
Roman Fietze Telemotive AG B=FCro M=FChlhausen
Breitwiesen 73347 M=FChlhausen
Tel.: +49(0)7335/18493-45 http://www.telemotive.de
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MPC5200B, many FEC_IEVENT_RFIFO_ERRORs until link down
2010-03-30 10:00 MPC5200B, many FEC_IEVENT_RFIFO_ERRORs until link down Roman Fietze
@ 2010-03-30 10:46 ` Wolfgang Grandegger
2010-03-31 10:15 ` Wolfgang Grandegger
0 siblings, 1 reply; 5+ messages in thread
From: Wolfgang Grandegger @ 2010-03-30 10:46 UTC (permalink / raw)
To: Roman Fietze; +Cc: linuxppc-dev
Roman Fietze wrote:
> Hello,
>
> I think this is a never ending story. This error still happens under
> higher load every few seconds, until I get a "PHY: f0003000:00 - Link
> is Down", on my box easiliy reproducable after maybe 15 to 30 seconds.
> I can recover using "ip link set down/up dev eth0".
>
> I double checked that I'm using the most recent version of this driver
> (checked with DENX, benh master/next, using Wolfgang Denk's version of
> the 2.6.33), this includes the locking patches from Asier Llano, the
> hard setting of mii_speed in the PHY mdio transfer routine of course.
> I tried all 8 combinations of PLDIS, BSDIS and SE, with and without
> CONFIG_NOT_COHERENT_CACHE.
>
> As some of you probably remember, I'm running this controller under
> high load on FEC, ATA and LPC. As soon as "the" load is going above a
> certain level I get those FEC RFIFO errors, sometimes ATA errors
> (MWDMA2) and sometimes even lost SDMA interrupts using BestComm with
> the SCLPC (now switched back to simple PIO). I quite sure almost all
> of this is the BestComm's fault.
This problem shows up quickly with NAPI, but I have never observed it
with the current version. The error occurs when the software is not able
to readout the messages in time. Unfortunately, dealing with Bestcomm is
a pain.
> Did somebody already try the latest NAPI patches, which might give me
> a slight chance to have a workaround? Any idea or upcoming patch to
> address this problem once more, and if it's just by recovering e.g.
> within mpc52xx_fec_mdio_transfer's timeout using some other dirty
> workaround?
Yes, I have a NAPI version ready for testing. I will roll it out as RFC
today or tomorrow.
Wolfgang.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MPC5200B, many FEC_IEVENT_RFIFO_ERRORs until link down
2010-03-30 10:46 ` Wolfgang Grandegger
@ 2010-03-31 10:15 ` Wolfgang Grandegger
2010-04-01 10:04 ` Roman Fietze
0 siblings, 1 reply; 5+ messages in thread
From: Wolfgang Grandegger @ 2010-03-31 10:15 UTC (permalink / raw)
To: Roman Fietze; +Cc: linuxppc-dev
Hi Roman,
Wolfgang Grandegger wrote:
> Roman Fietze wrote:
>> Hello,
>>
>> I think this is a never ending story. This error still happens under
>> higher load every few seconds, until I get a "PHY: f0003000:00 - Link
>> is Down", on my box easiliy reproducable after maybe 15 to 30 seconds.
>> I can recover using "ip link set down/up dev eth0".
>>
>> I double checked that I'm using the most recent version of this driver
>> (checked with DENX, benh master/next, using Wolfgang Denk's version of
>> the 2.6.33), this includes the locking patches from Asier Llano, the
>> hard setting of mii_speed in the PHY mdio transfer routine of course.
>> I tried all 8 combinations of PLDIS, BSDIS and SE, with and without
>> CONFIG_NOT_COHERENT_CACHE.
>>
>> As some of you probably remember, I'm running this controller under
>> high load on FEC, ATA and LPC. As soon as "the" load is going above a
>> certain level I get those FEC RFIFO errors, sometimes ATA errors
>> (MWDMA2) and sometimes even lost SDMA interrupts using BestComm with
>> the SCLPC (now switched back to simple PIO). I quite sure almost all
>> of this is the BestComm's fault.
>
> This problem shows up quickly with NAPI, but I have never observed it
> with the current version. The error occurs when the software is not able
> to readout the messages in time. Unfortunately, dealing with Bestcomm is
> a pain.
>
>> Did somebody already try the latest NAPI patches, which might give me
>> a slight chance to have a workaround? Any idea or upcoming patch to
>> address this problem once more, and if it's just by recovering e.g.
>> within mpc52xx_fec_mdio_transfer's timeout using some other dirty
>> workaround?
>
> Yes, I have a NAPI version ready for testing. I will roll it out as RFC
> today or tomorrow.
I just sent out the patch. Would be nice if you, or somebody else, could
do some testing and provide some feedback. FYI, I will be out of office
next week.
Thanks,
Wolfgang.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MPC5200B, many FEC_IEVENT_RFIFO_ERRORs until link down
2010-03-31 10:15 ` Wolfgang Grandegger
@ 2010-04-01 10:04 ` Roman Fietze
2010-04-05 12:21 ` Wolfgang Grandegger
0 siblings, 1 reply; 5+ messages in thread
From: Roman Fietze @ 2010-04-01 10:04 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: linuxppc-dev
Hallo Wolfgang,
On Wednesday 31 March 2010 12:15:47 Wolfgang Grandegger wrote:
> I just sent out the patch.
Thanks a lot.
> Would be nice if you, or somebody else, could do some testing and
> provide some feedback.
I tested the patches with the following setup:
=2D DENX 2.6.33 plus NAPI patch, kernel config with and w/o NAPI enabled
=2D Own Icecube based board using MPC5200B
=2D Two different hard drives (because the Toshiba gave my headaches),
ext3 default settings of mkfs.ext3, MWDMA2
=2D FPGA on LPC receiving high bandwidth MOST150 data in PIO mode (for
the test: generating them internally), small app writing the data to
disk. Why PIO? SCLPC FIFO gave=20
=2D netcat receving data optionally writing the data to HD, sender is a
Gigabit Intel NIC feeded using netcat (and /dev/zero) as well via a
100MBit/s switch
And now the first and preliminary results of the tests (see legend and
description of the results below the table):
NAPI MOST HD load bw rx_irq rfifo
=2D-----+-------+---------------+---------------+-------+---------------+--=
=2D----
nc most
=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=
=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D
on off MK4036GA 93 5.15 32000-35000
- 99 10.5 72000-74000
on MK4036GA 49 46 crash 15000-17500 none seen
on HEJ421010G9AT00 48 47 15000-17500 ~100-500, recovers
=2D-----+-------+---------------+-----------------------+-------+----------=
=2D----+-------
off off MK4036GA 90 5.15 34000-36000
- 99 10.5 76000-77000
on MK4036GA 48 47 crash 17500-19000 ~200, network down
Legend:
=2D------
MOST: PIO mode access to FPGA receiving generated MOST150 data
very high data rates possible
HD: used disk type
load/nc: load netcat, %
load/most: load MOST receiver app, %
load/idle: was always 0%
bw: netcat network band width, MB/s
rx_irq: FEX RX IRQ, rate in Hz
rfifo: RX FIFO errors, time in between in seconds
Results:
=2D-------
Using the MK4036GA HD always crashes IDE after a few seconds. A reboot
does not recover the disk, I always need a power cycle. That's why I
switched to a HEJ421010G9AT00.
NAPI reduces the FEC RX interrupt rate (/proc/interrupts) "somewhat".
Could not detect an increase of the maximum bandwidth, but that's not
the "problem" of NAPI.
NAPI nicely recovers more or less nicely from link down (link down to
up about 1 second), without NAPI I have to do that manually (e.g. ip
set link down/up). That's something I was looking for since the
modular PHY drivers.
Some network applications (e.g. our Car Head Unit GN Protocol Logger)
break up their connection when the link goes down (e.g. due to
internal timeouts? Probably fixable). Ssh and netcat connections stay
up.
Transferred many GiB of data to the MPC w/o any problems except those
recoverable FEC_IEVENT_RFIFO_ERRORs.
This patch really looks good to me.
I will run some additional tests e.g. with mixed RX and TX, different
and varying data rates, etc.
> FYI, I will be out of office next week.
Lucky Guy
Roman
=2D-=20
Roman Fietze Telemotive AG B=FCro M=FChlhausen
Breitwiesen 73347 M=FChlhausen
Tel.: +49(0)7335/18493-45 http://www.telemotive.de
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MPC5200B, many FEC_IEVENT_RFIFO_ERRORs until link down
2010-04-01 10:04 ` Roman Fietze
@ 2010-04-05 12:21 ` Wolfgang Grandegger
0 siblings, 0 replies; 5+ messages in thread
From: Wolfgang Grandegger @ 2010-04-05 12:21 UTC (permalink / raw)
To: Roman Fietze; +Cc: linuxppc-dev
Hallo Roman,
Roman Fietze wrote:
> Hallo Wolfgang,
>
> On Wednesday 31 March 2010 12:15:47 Wolfgang Grandegger wrote:
>
>> I just sent out the patch.
>
> Thanks a lot.
>
>> Would be nice if you, or somebody else, could do some testing and
>> provide some feedback.
>
> I tested the patches with the following setup:
>
> - DENX 2.6.33 plus NAPI patch, kernel config with and w/o NAPI enabled
>
> - Own Icecube based board using MPC5200B
>
> - Two different hard drives (because the Toshiba gave my headaches),
> ext3 default settings of mkfs.ext3, MWDMA2
>
> - FPGA on LPC receiving high bandwidth MOST150 data in PIO mode (for
> the test: generating them internally), small app writing the data to
> disk. Why PIO? SCLPC FIFO gave
>
> - netcat receving data optionally writing the data to HD, sender is a
> Gigabit Intel NIC feeded using netcat (and /dev/zero) as well via a
> 100MBit/s switch
>
>
> And now the first and preliminary results of the tests (see legend and
> description of the results below the table):
>
> NAPI MOST HD load bw rx_irq rfifo
> ------+-------+---------------+---------------+-------+---------------+-------
> nc most
> ======+=======+===============+===============+=======+===============+=======
>
> on off MK4036GA 93 5.15 32000-35000
> - 99 10.5 72000-74000
>
> on MK4036GA 49 46 crash 15000-17500 none seen
> on HEJ421010G9AT00 48 47 15000-17500 ~100-500, recovers
>
> ------+-------+---------------+-----------------------+-------+---------------+-------
>
> off off MK4036GA 90 5.15 34000-36000
> - 99 10.5 76000-77000
>
> on MK4036GA 48 47 crash 17500-19000 ~200, network down
>
> Legend:
> -------
>
> MOST: PIO mode access to FPGA receiving generated MOST150 data
> very high data rates possible
> HD: used disk type
> load/nc: load netcat, %
> load/most: load MOST receiver app, %
> load/idle: was always 0%
> bw: netcat network band width, MB/s
> rx_irq: FEX RX IRQ, rate in Hz
> rfifo: RX FIFO errors, time in between in seconds
>
> Results:
> --------
>
> Using the MK4036GA HD always crashes IDE after a few seconds. A reboot
> does not recover the disk, I always need a power cycle. That's why I
> switched to a HEJ421010G9AT00.
That might be a different issue.
> NAPI reduces the FEC RX interrupt rate (/proc/interrupts) "somewhat".
> Could not detect an increase of the maximum bandwidth, but that's not
> the "problem" of NAPI.
I realized the same behavior.
> NAPI nicely recovers more or less nicely from link down (link down to
> up about 1 second), without NAPI I have to do that manually (e.g. ip
> set link down/up). That's something I was looking for since the
> modular PHY drivers.
Recovering from link down takes a while, unfortunately. I also do not
yet have an explanation why the link goes down, at all. The rather long
recovery time does not harm for my use case of "heavy packet storms" but
is rather annoying a high but not yet critical traffic.
> Some network applications (e.g. our Car Head Unit GN Protocol Logger)
> break up their connection when the link goes down (e.g. due to
> internal timeouts? Probably fixable). Ssh and netcat connections stay
> up.
That's a problem due to the overloaded network.
> Transferred many GiB of data to the MPC w/o any problems except those
> recoverable FEC_IEVENT_RFIFO_ERRORs.
>
> This patch really looks good to me.
Doesn't sound too bad...
> I will run some additional tests e.g. with mixed RX and TX, different
> and varying data rates, etc.
OK.
Wolfgang.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-04-05 12:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-30 10:00 MPC5200B, many FEC_IEVENT_RFIFO_ERRORs until link down Roman Fietze
2010-03-30 10:46 ` Wolfgang Grandegger
2010-03-31 10:15 ` Wolfgang Grandegger
2010-04-01 10:04 ` Roman Fietze
2010-04-05 12:21 ` Wolfgang Grandegger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.