linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TG3 network data corruption regression 2.6.24/2.6.23.4
@ 2008-02-18 22:41 Tony Battersby
  2008-02-19  0:32 ` Michael Chan
  0 siblings, 1 reply; 25+ messages in thread
From: Tony Battersby @ 2008-02-18 22:41 UTC (permalink / raw)
  To: Michael Chan, Herbert Xu, David S. Miller, netdev
  Cc: Greg Kroah-Hartman, linux-kernel

I am experiencing network data corruption with a 3Com 3C996B-T NIC
(Broadcom NetXtreme BCM5701; driver tg3.ko).  I have identified the
following patch as the trigger:

commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Wed Nov 14 15:45:21 2007 -0800

    [TCP]: Fix size calculation in sk_stream_alloc_pskb
   
    We round up the header size in sk_stream_alloc_pskb so that
    TSO packets get zero tail room.  Unfortunately this rounding
    up is not coordinated with the select_size() function used by
    TCP to calculate the second parameter of sk_stream_alloc_pskb.
   
    As a result, we may allocate more than a page of data in the
    non-TSO case when exactly one page is desired.
   
    In fact, rounding up the head room is detrimental in the non-TSO
    case because it makes memory that would otherwise be available to
    the payload head room.  TSO doesn't need this either, all it wants
    is the guarantee that there is no tail room.
   
    So this patch fixes this by adjusting the skb_reserve call so that
    exactly the requested amount (which all callers have calculated in
    a precise way) is made available as tail room.
   
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This patch was included in 2.6.24 and 2.6.23.4 -stable.  I am
experiencing data corruption with kernels 2.6.23.4 - 2.6.23.16, 2.6.24 -
2.6.24.2, and 2.6.25-rc2-git1.  I have verified that reverting the above
patch (by hand) makes the data corruption go away on all affected
kernels (note that in 2.6.25 the function is sk_stream_alloc_skb() in
net/ipv4/tcp.c rather than sk_stream_alloc_pskb() in include/net/sock.h).

(Also note that when testing 2.6.23 - 2.6.23.4, I had to apply the
individual patch "TG3: Fix performance regression on 5705." from 2.6.23.5.)

I do not get data corruption when substituting a SysKonnect 9D21 NIC
(which also uses the tg3.ko driver) or a Intel PRO/1000 82546GB NIC
(which uses the e1000.ko driver).

In addition to the 3Com NIC, my computer has a SCSI HBA with an attached
tape drive.  The network data corruption happens only when reading from
or writing to the tape drive.  I have tried both a LSI MPT Fusion
Ultra320 SCSI HBA (mptspi.ko) and a LSI 53c1010 Ultra160 HBA
(sym53c8xx.ko) with the same results.  The NIC and SCSI HBA are on
separate PCI-X buses and do not share IRQs.  I am using two completely
separate test programs to access the SCSI tape drive and test network
data integrity, so one would expect no interaction between the two tests
other than CPU scheduling and DMA bandwidth.  There is no disk I/O
generated by either test program.

The test program that I am using to debug this problem does the following:

Computer A (kernel 2.6.24.2; 3Com 3C996B-T NIC):
   malloc a 64 KB buf aligned to a 4 KB boundary
   loop {
      fill 64 KB buf with count data pattern
      send(64 KB, MSG_MORE) <--- eventually sends corrupted data
   }
   (SCSI tape drive test program runs separately in the background)

Computer B (kernel 2.6.12):
   malloc a 64 KB buf aligned to a 4 KB boundary
   loop {
      recv(64 KB, MSG_WAITALL)
      verify count data pattern in 64 KB buf
   }

After running for a few seconds, the verify on computer B detects data
corruption in the last 4 bytes of the 64 KB buffer.  The last 48 bytes
of the corrupted 64 KB buffer look like this:

D0 D1 D2 D3 | D4 D5 D6 D7 | D8 D9 DA DB | DC DD DE DF
E0 E1 E2 E3 | E4 E5 E6 E7 | E8 E9 EA EB | EC ED EE EF
F0 F1 F2 F3 | F4 F5 F6 F7 | F8 F9 FA FB | F4 F5 F6 F7

The last 4 bytes should be "FC FD FE FF" but instead are corrupted to
"F4 F5 F6 F7", a sequence which came earlier in the data stream.  The
data corruption always occurs at this same buffer offset and with the
same 4 earlier bytes duplicated.  However, it occurs on a different
iteration of the send()/recv() loop each time the test is run.

When I reverse the test so that Computer A does recv() and Computer B
does send(), the test passes with no data corruption.  Therefore, it
appears that the data corruption happens on send() but not recv().

The motherboard that I am using is a Commell LV-672.  This motherboard
has a PCI-express x16 slot but no PCI-X slots.  To plug in the PCI-X NIC
and SCSI HBA, I am using a SuperMicro CSE-RR2UE-AX riser card which
plugs into the PCI-express slot on the motherboard and provides 3 PCI-X
slots (two slots together on one PCI-X bus and one slot on its own PCI-X
bus).  The data corruption happens with every combination of the 2 cards
in the 3 slots.

I assume that the above patch is just exposing some way in which the tg3
driver or the BCM5701 chip are broken.  For now, I am just reverting the
above patch for kernels that I use until a better solution is
forthcoming.  I expect that this problem will be difficult for other
developers to reproduce, but I can test any patches that anyone wants to
send me.  [ In the meantime, should we revert the patch for 2.6.23.x and
2.6.24.x -stable, or wait for a fix to tg3? ]

I am not sure if it is relevant, but I am also getting the following
messages sometimes during testing:

Clocksource tsc unstable (delta = 64002086 ns)
Time: pit clocksource has been installed.

This seems bogus because the CPU is a Intel Pentium 4 with
HyperThreading, so the tsc should be reliable.

---

network MTU = 1500

lspci
00:00.0 Host bridge: Intel Corporation 82915G/P/GV/GL/PL/910GL Memory
Controller Hub (rev 0e)
00:01.0 PCI bridge: Intel Corporation 82915G/P/GV/GL/PL/910GL PCI
Express Root Port (rev 0e)
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL
Integrated Graphics Controller (rev 0e)
00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #1 (rev 04)
00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #2 (rev 04)
00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #3 (rev 04)
00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #4 (rev 04)
00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB2 EHCI Controller (rev 04)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d4)
00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC
Interface Bridge (rev 04)
00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) IDE Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
SMBus Controller (rev 04)
01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge
A (rev 09)
01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge
B (rev 09)
02:02.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
Fusion-MPT Dual Ultra320 SCSI (rev 08)
03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701
Gigabit Ethernet (rev 15)
04:0d.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61)

cat /proc/interrupts
           CPU0       CPU1      
  0:         89          0   IO-APIC-edge      timer
  1:         78          0   IO-APIC-edge      i8042
  3:         17          0   IO-APIC-edge      serial
  8:          0          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          5          0   IO-APIC-edge      i8042
 14:        465          0   IO-APIC-edge      ide0
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
 17:     149220          0   IO-APIC-fasteoi   eth0
 18:      10007          0   IO-APIC-fasteoi   uhci_hcd:usb4, ioc0
 19:         29          0   IO-APIC-fasteoi   uhci_hcd:usb3
 23:          2          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
NMI:          0          0   Non-maskable interrupts
LOC:       7457      10023   Local timer interrupts
RES:       1962      14316   Rescheduling interrupts
CAL:         40         49   function call interrupts
TLB:         39         76   TLB shootdowns
TRM:          0          0   Thermal event interrupts
SPU:          0          0   Spurious interrupts
ERR:          0
MIS:          0
(eth0 == tg3; ioc0 == LSI SCSI HBA)

ifconfig
eth0      Link encap:Ethernet  HWaddr 00:02:A5:E7:3C:2D 
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:77198 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3488350 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:5403873 (5.1 MiB)  TX bytes:1000276920 (953.9 MiB)
          Interrupt:17

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

information from ethtool

driver: tg3
version: 3.87
firmware-version:
bus-info: 0000:03:01.0

Settings for eth0:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Half 1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Half 1000baseT/Full
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: g
    Wake-on: g
    Current message level: 0x000000ff (255)
    Link detected: yes

Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off

NIC statistics:
     rx_octets: 5403873
     rx_fragments: 0
     rx_ucast_packets: 77197
     rx_mcast_packets: 0
     rx_bcast_packets: 1
     rx_fcs_errors: 0
     rx_align_errors: 0
     rx_xon_pause_rcvd: 0
     rx_xoff_pause_rcvd: 0
     rx_mac_ctrl_rcvd: 0
     rx_xoff_entered: 0
     rx_frame_too_long_errors: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_in_length_errors: 0
     rx_out_length_errors: 0
     rx_64_or_less_octet_packets: 2
     rx_65_to_127_octet_packets: 77196
     rx_128_to_255_octet_packets: 0
     rx_256_to_511_octet_packets: 0
     rx_512_to_1023_octet_packets: 0
     rx_1024_to_1522_octet_packets: 0
     rx_1523_to_2047_octet_packets: 0
     rx_2048_to_4095_octet_packets: 0
     rx_4096_to_8191_octet_packets: 0
     rx_8192_to_9022_octet_packets: 0
     tx_octets: 1000276920
     tx_collisions: 0
     tx_xon_sent: 0
     tx_xoff_sent: 0
     tx_flow_control: 0
     tx_mac_errors: 0
     tx_single_collisions: 0
     tx_mult_collisions: 0
     tx_deferred: 0
     tx_excessive_collisions: 0
     tx_late_collisions: 0
     tx_collide_2times: 0
     tx_collide_3times: 0
     tx_collide_4times: 0
     tx_collide_5times: 0
     tx_collide_6times: 0
     tx_collide_7times: 0
     tx_collide_8times: 0
     tx_collide_9times: 0
     tx_collide_10times: 0
     tx_collide_11times: 0
     tx_collide_12times: 0
     tx_collide_13times: 0
     tx_collide_14times: 0
     tx_collide_15times: 0
     tx_ucast_packets: 3488350
     tx_mcast_packets: 0
     tx_bcast_packets: 0
     tx_carrier_sense_errors: 0
     tx_discards: 0
     tx_errors: 0
     dma_writeq_full: 0
     dma_write_prioq_full: 0
     rxbds_empty: 0
     rx_discards: 0
     rx_errors: 0
     rx_threshold_hit: 11
     dma_readq_full: 2188114
     dma_read_prioq_full: 162588
     tx_comp_queue_full: 0
     ring_set_send_prod_index: 2901128
     ring_status_update: 218885
     nic_irqs: 146494
     nic_avoided_irqs: 72391
     nic_tx_threshold_hit: 103584

Tony Battersby
Cybernetics


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-18 22:41 TG3 network data corruption regression 2.6.24/2.6.23.4 Tony Battersby
@ 2008-02-19  0:32 ` Michael Chan
  2008-02-19  0:35   ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Chan @ 2008-02-19  0:32 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Herbert Xu, David Miller, netdev, Greg Kroah-Hartman, linux-kernel

On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote:
> I am experiencing network data corruption with a 3Com 3C996B-T NIC
> (Broadcom NetXtreme BCM5701; driver tg3.ko).  I have identified the
> following patch as the trigger:

Assuming this problem is unique to the 5701, I'm not sure how it is
exposed by Herbert's patch.  One thing unique on the 5701 is that it
double-copies all RX packets so that the data starts at offset 2, but
that's quite unrelated to the patch below.

> 
> commit fb93134dfc2a6e6fbedc7c270a31da03fce88db9
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Wed Nov 14 15:45:21 2007 -0800
> 
>     [TCP]: Fix size calculation in sk_stream_alloc_pskb
>    
> 

> I do not get data corruption when substituting a SysKonnect 9D21 NIC
> (which also uses the tg3.ko driver)

What Broadcom chip is on the Syskonnect card?




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19  0:32 ` Michael Chan
@ 2008-02-19  0:35   ` David Miller
  2008-02-19  1:04     ` Michael Chan
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2008-02-19  0:35 UTC (permalink / raw)
  To: mchan; +Cc: tonyb, herbert, netdev, gregkh, linux-kernel

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 18 Feb 2008 16:32:00 -0800

> On Mon, 2008-02-18 at 17:41 -0500, Tony Battersby wrote:
> > I am experiencing network data corruption with a 3Com 3C996B-T NIC
> > (Broadcom NetXtreme BCM5701; driver tg3.ko).  I have identified the
> > following patch as the trigger:
> 
> Assuming this problem is unique to the 5701, I'm not sure how it is
> exposed by Herbert's patch.  One thing unique on the 5701 is that it
> double-copies all RX packets so that the data starts at offset 2, but
> that's quite unrelated to the patch below.

One consequence of Herbert's change is that the chip will see a
different datastream.  The initial skb->data linear area will be
smaller, and the transition to the fragmented area of pages will be
quicker.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19  0:35   ` David Miller
@ 2008-02-19  1:04     ` Michael Chan
  2008-02-19 16:16       ` Tony Battersby
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Chan @ 2008-02-19  1:04 UTC (permalink / raw)
  To: David Miller; +Cc: tonyb, herbert, netdev, gregkh, linux-kernel

On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote:

> One consequence of Herbert's change is that the chip will see a
> different datastream.  The initial skb->data linear area will be
> smaller, and the transition to the fragmented area of pages will be
> quicker.
> 

I see.  Perhaps when we get to the end of the data-stream, there is a
tiny frag that the chip cannot handle.  That's the only thing I can
think of.

Please try this patch to see if the problem goes away.  This will
disable SG on 5701 so we always get linear SKBs.

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index db606b6..bb37e76 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
 	} else
 		tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS;
 
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
+		dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG);
+
 	/* flow control autonegotiation is default behavior */
 	tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG;
 	tp->link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX;



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19  1:04     ` Michael Chan
@ 2008-02-19 16:16       ` Tony Battersby
  2008-02-19 19:11         ` Michael Chan
  0 siblings, 1 reply; 25+ messages in thread
From: Tony Battersby @ 2008-02-19 16:16 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, herbert, netdev, gregkh, linux-kernel

Michael Chan wrote:
> On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote:
>
>   
>> One consequence of Herbert's change is that the chip will see a
>> different datastream.  The initial skb->data linear area will be
>> smaller, and the transition to the fragmented area of pages will be
>> quicker.
>>
>>     
>
> I see.  Perhaps when we get to the end of the data-stream, there is a
> tiny frag that the chip cannot handle.  That's the only thing I can
> think of.
>
> Please try this patch to see if the problem goes away.  This will
> disable SG on 5701 so we always get linear SKBs.
>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index db606b6..bb37e76 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
>  	} else
>  		tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS;
>  
> +	if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
> +		dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG);
> +
>  	/* flow control autonegotiation is default behavior */
>  	tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG;
>  	tp->link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX;
>
>
>
>   
This patch does appear to fix the data corruption (tested with
2.6.24.2).  However, it results in performance problems with the iSCSI
application that I am trying to run on this machine.

The test program that I described in the previous message still gets
good performance in both directions.  "iperf -r" gets good performance
in both directions (940 Mbits/s or 117 MB/s).  However, my target-mode
iSCSI application (which obviously generates rx/tx traffic patterns more
complicated than the synthetic tests) gets very poor performance in one
direction but good performance in the other direction.  iSCSI
performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
with light tx, but remains at a decent 115 MB/s when the 3Com NIC is
doing heavy tx with light rx.  When I revert Herbert's patch instead of
applying the patch above, I get 115 MB/s in both cases.  (With a stock
unpatched kernel, the test fails almost immediately because the iSCSI
control PDUs are corrupted, causing the TCP connection to be dropped.)

The SysKonnect NIC that does not exhibit this problem has a chip that
says "BCM5411KQM" "TT0128 P2Q" and "56975E".

Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19 16:16       ` Tony Battersby
@ 2008-02-19 19:11         ` Michael Chan
  2008-02-19 19:26           ` Tony Battersby
  2008-02-19 22:14           ` Tony Battersby
  0 siblings, 2 replies; 25+ messages in thread
From: Michael Chan @ 2008-02-19 19:11 UTC (permalink / raw)
  To: Tony Battersby; +Cc: David Miller, herbert, netdev, gregkh, linux-kernel

On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote:
> iSCSI
> performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
> with light tx,

That's strange.  The patch should only affect TX performance slightly
since we are just turning off SG for TX.  Please take an ethereal trace
to see what's happening and compare with a good trace.

> but remains at a decent 115 MB/s when the 3Com NIC is
> doing heavy tx with light rx.  When I revert Herbert's patch instead
> of
> applying the patch above, I get 115 MB/s in both cases.  (With a stock
> unpatched kernel, the test fails almost immediately because the iSCSI
> control PDUs are corrupted, causing the TCP connection to be dropped.)
> 
> The SysKonnect NIC that does not exhibit this problem has a chip that
> says "BCM5411KQM" "TT0128 P2Q" and "56975E".
> 

I think this is the 5700, but please send me the tg3 output that
identifies the chip and the revision.  Something like this:

eth2: Tigon3 [partno(BCM95705) rev 3003 PHY(5705)] (PCI:66MHz:32-bit) 10/100/1000Base-T Ethernet 00:10:18:04:57:0d
eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1]






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19 19:11         ` Michael Chan
@ 2008-02-19 19:26           ` Tony Battersby
  2008-02-19 22:14           ` Tony Battersby
  1 sibling, 0 replies; 25+ messages in thread
From: Tony Battersby @ 2008-02-19 19:26 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, herbert, netdev, gregkh, linux-kernel

Michael Chan wrote:
>> The SysKonnect NIC that does not exhibit this problem has a chip that
>> says "BCM5411KQM" "TT0128 P2Q" and "56975E".
> I think this is the 5700, but please send me the tg3 output that
> identifies the chip and the revision.  Something like this:
>
> eth2: Tigon3 [partno(BCM95705) rev 3003 PHY(5705)] (PCI:66MHz:32-bit) 10/100/1000Base-T Ethernet 00:10:18:04:57:0d
> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1]
>   
Here is the dmesg output for the SysKonnect NIC:

eth0: Tigon3 [partno(SK-9D21) rev 7104 PHY(5411)] (PCI:66MHz:64-bit)
10/100/1000Base-T Ethernet 00:00:5a:9d:0c:4a
eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0]
eth0: dma_rwctrl[76ff000f] dma_mask[64-bit]

Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19 19:11         ` Michael Chan
  2008-02-19 19:26           ` Tony Battersby
@ 2008-02-19 22:14           ` Tony Battersby
  2008-02-19 23:52             ` Michael Chan
                               ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Tony Battersby @ 2008-02-19 22:14 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, herbert, netdev, gregkh, linux-kernel

Michael Chan wrote:
> On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote:
>   
>> iSCSI
>> performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
>> with light tx,
>>     
>
> That's strange.  The patch should only affect TX performance slightly
> since we are just turning off SG for TX.  Please take an ethereal trace
> to see what's happening and compare with a good trace.
>
>   

Update: when I revert Herbert's patch in addition to applying your
patch, the iSCSI performance goes back up to 115 MB/s again in both
directions.  So it looks like turning off SG for TX didn't itself cause
the performance drop, but rather that the performance drop is just
another manifestation of whatever bug is causing the data corruption.

I do not regularly use wireshark or look at network packet dumps, so I
am not really sure what to look for.  Given the above information, do
you still believe that there is value in examining the packet dump?

Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19 22:14           ` Tony Battersby
@ 2008-02-19 23:52             ` Michael Chan
  2008-02-20 15:01               ` Tony Battersby
  2008-02-20  1:38             ` Matt Carlson
  2008-02-20  3:45             ` Herbert Xu
  2 siblings, 1 reply; 25+ messages in thread
From: Michael Chan @ 2008-02-19 23:52 UTC (permalink / raw)
  To: Tony Battersby; +Cc: David Miller, herbert, netdev, gregkh, linux-kernel

On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote:

> 
> Update: when I revert Herbert's patch in addition to applying your
> patch, the iSCSI performance goes back up to 115 MB/s again in both
> directions.  So it looks like turning off SG for TX didn't itself cause
> the performance drop, but rather that the performance drop is just
> another manifestation of whatever bug is causing the data corruption.
> 
> I do not regularly use wireshark or look at network packet dumps, so I
> am not really sure what to look for.  Given the above information, do
> you still believe that there is value in examining the packet dump?
> 

Can you confirm whether you're getting TCP checksum errors on the other
side that is receiving packets from the 5701?  You can just check
statistics using netstat -s.  I suspect that after we turn off SG,
checksum is no longer offloaded and we are getting lots of TCP checksum
errors instead that are slowing the performance.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19 22:14           ` Tony Battersby
  2008-02-19 23:52             ` Michael Chan
@ 2008-02-20  1:38             ` Matt Carlson
  2008-02-20 16:13               ` Tony Battersby
                                 ` (2 more replies)
  2008-02-20  3:45             ` Herbert Xu
  2 siblings, 3 replies; 25+ messages in thread
From: Matt Carlson @ 2008-02-20  1:38 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Michael Chan, David Miller, herbert, netdev, gregkh, linux-kernel

On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:
> Michael Chan wrote:
> > On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote:
> >   
> >> iSCSI
> >> performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
> >> with light tx,
> >>     
> >
> > That's strange.  The patch should only affect TX performance slightly
> > since we are just turning off SG for TX.  Please take an ethereal trace
> > to see what's happening and compare with a good trace.
> >
> >   
> 
> Update: when I revert Herbert's patch in addition to applying your
> patch, the iSCSI performance goes back up to 115 MB/s again in both
> directions.  So it looks like turning off SG for TX didn't itself cause
> the performance drop, but rather that the performance drop is just
> another manifestation of whatever bug is causing the data corruption.
> 
> I do not regularly use wireshark or look at network packet dumps, so I
> am not really sure what to look for.  Given the above information, do
> you still believe that there is value in examining the packet dump?
> 
> Tony

Hi Tony.  Can you give us the output of :

sudo lspci -vvv -xxxx -s 03:01.0'

(assuming that is still the correct address of the 3Com NIC.)

Also, after some digging, I found that the 5701 can run into trouble if
a 64-bit DMA read terminates early and then completes as a 32-bit transfer.
The problem is reportedly very rare, but the failure mode looks like a
match.  Can you apply the following patch and see if it helps your
performance / corruption problems?


diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index db606b6..7ad08ce 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -11409,6 +11409,8 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
 		tp->tg3_flags |= TG3_FLAG_PCI_HIGH_SPEED;
 	if ((pci_state_reg & PCISTATE_BUS_32BIT) != 0)
 		tp->tg3_flags |= TG3_FLAG_PCI_32BIT;
+	else if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
+		tp->grc_mode |= GRC_MODE_FORCE_PCI32BIT;
 
 	/* Chip-specific fixup from Broadcom driver */
 	if ((tp->pci_chip_rev_id == CHIPREV_ID_5704_A0) &&



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19 22:14           ` Tony Battersby
  2008-02-19 23:52             ` Michael Chan
  2008-02-20  1:38             ` Matt Carlson
@ 2008-02-20  3:45             ` Herbert Xu
  2008-02-20 15:18               ` Tony Battersby
  2 siblings, 1 reply; 25+ messages in thread
From: Herbert Xu @ 2008-02-20  3:45 UTC (permalink / raw)
  To: Tony Battersby; +Cc: Michael Chan, David Miller, netdev, gregkh, linux-kernel

On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:
>
> Update: when I revert Herbert's patch in addition to applying your
> patch, the iSCSI performance goes back up to 115 MB/s again in both
> directions.  So it looks like turning off SG for TX didn't itself cause
> the performance drop, but rather that the performance drop is just
> another manifestation of whatever bug is causing the data corruption.

Interesting.  So the workload that regressed is mostly RX with a
little TX traffic? Can you try to reproduce this with something
like netperf to eliminate other variables?

This is all very puzzling since the patch in question shouldn't
change an RX load at all.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-19 23:52             ` Michael Chan
@ 2008-02-20 15:01               ` Tony Battersby
  0 siblings, 0 replies; 25+ messages in thread
From: Tony Battersby @ 2008-02-20 15:01 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, herbert, netdev, gregkh, linux-kernel

Michael Chan wrote:
> On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote:
>
>   
>> Update: when I revert Herbert's patch in addition to applying your
>> patch, the iSCSI performance goes back up to 115 MB/s again in both
>> directions.  So it looks like turning off SG for TX didn't itself cause
>> the performance drop, but rather that the performance drop is just
>> another manifestation of whatever bug is causing the data corruption.
>>
>> I do not regularly use wireshark or look at network packet dumps, so I
>> am not really sure what to look for.  Given the above information, do
>> you still believe that there is value in examining the packet dump?
>>
>>     
>
> Can you confirm whether you're getting TCP checksum errors on the other
> side that is receiving packets from the 5701?  You can just check
> statistics using netstat -s.  I suspect that after we turn off SG,
> checksum is no longer offloaded and we are getting lots of TCP checksum
> errors instead that are slowing the performance.
>
>
>   
Confirmed.  With a 100 MB read/write test, netstat -s shows 75 bad
segments received, and performance in the one direction is about 5
MB/s.  When I switch to the SysKonnect NIC, netstat -s shows 0 bad
segments received, and performance is 115 MB/s.  So that solves that
mystery - there is still data corruption, but the software-computed TCP
checksum causes the bad packets to be retransmitted rather than being
passed on to the application.

Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-20  3:45             ` Herbert Xu
@ 2008-02-20 15:18               ` Tony Battersby
  2008-04-15  0:12                 ` Matt Carlson
  0 siblings, 1 reply; 25+ messages in thread
From: Tony Battersby @ 2008-02-20 15:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Michael Chan, David Miller, netdev, gregkh, linux-kernel

Herbert Xu wrote:
> On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:
>   
>> Update: when I revert Herbert's patch in addition to applying your
>> patch, the iSCSI performance goes back up to 115 MB/s again in both
>> directions.  So it looks like turning off SG for TX didn't itself cause
>> the performance drop, but rather that the performance drop is just
>> another manifestation of whatever bug is causing the data corruption.
>>     
>
> Interesting.  So the workload that regressed is mostly RX with a
> little TX traffic? Can you try to reproduce this with something
> like netperf to eliminate other variables?
>
> This is all very puzzling since the patch in question shouldn't
> change an RX load at all.
>
> Thanks,
>   
We have established that the slowdown was caused by TCP checksum errors
and retransmits.  I assume that the slowdown in my test was due to the
light TX rather than the heavy RX.  I am no TCP protocol expert, but
perhaps heavy TX (such as iperf) might not be affected as much because
the wire stays busy while waiting for the retransmit, whereas with my
light TX iSCSI load, the wire goes idle while waiting for the retransmit
because the iSCSI state machine is stalled.

Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-20  1:38             ` Matt Carlson
@ 2008-02-20 16:13               ` Tony Battersby
  2008-02-20 21:29               ` Tony Battersby
  2008-02-20 23:04               ` Tony Battersby
  2 siblings, 0 replies; 25+ messages in thread
From: Tony Battersby @ 2008-02-20 16:13 UTC (permalink / raw)
  To: Matt Carlson
  Cc: Michael Chan, David Miller, herbert, netdev, gregkh, linux-kernel

Matt Carlson wrote:
> Hi Tony.  Can you give us the output of :
>
> sudo lspci -vvv -xxxx -s 03:01.0'
>   
03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15)
	Subsystem: Compaq Computer Corporation NC7770 Gigabit Server Adapter (PCI-X, 10/100/1000-T)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64 (16000ns min), Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at df7f0000 (64-bit, non-prefetchable) [size=64K]
	[virtual] Expansion ROM at dfc00000 [disabled] [size=64K]
	Capabilities: [40] PCI-X non-bridge device
		Command: DPERE- ERO- RBC=512 OST=1
		Status: Dev=03:01.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz-
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data <?>
	Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable-
		Address: 063000119b608000  Data: 0423
00: e4 14 45 16 06 00 b0 02 15 00 00 02 10 40 00 00
10: 04 00 7f df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 11 0e 7c 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 40 00
40: 07 48 00 00 09 03 03 00 01 50 02 c0 00 20 00 64
50: 03 58 00 00 08 10 21 08 05 00 86 00 00 80 60 9b
60: 11 00 30 06 23 04 00 00 98 02 05 01 0f 00 db 76
70: 8a 10 00 00 c7 00 00 80 50 00 00 00 00 00 00 00
80: 03 58 00 00 00 00 00 00 34 80 13 04 82 10 00 00
90: 09 06 00 01 00 00 00 00 00 00 00 00 c6 01 00 00
a0: 00 00 00 00 fe 02 00 00 00 00 00 00 af 01 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00



> Also, after some digging, I found that the 5701 can run into trouble if
> a 64-bit DMA read terminates early and then completes as a 32-bit transfer.
> The problem is reportedly very rare, but the failure mode looks like a
> match.  Can you apply the following patch and see if it helps your
> performance / corruption problems?
>
>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index db606b6..7ad08ce 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -11409,6 +11409,8 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
>  		tp->tg3_flags |= TG3_FLAG_PCI_HIGH_SPEED;
>  	if ((pci_state_reg & PCISTATE_BUS_32BIT) != 0)
>  		tp->tg3_flags |= TG3_FLAG_PCI_32BIT;
> +	else if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
> +		tp->grc_mode |= GRC_MODE_FORCE_PCI32BIT;
>  
>  	/* Chip-specific fixup from Broadcom driver */
>  	if ((tp->pci_chip_rev_id == CHIPREV_ID_5704_A0) &&
>
>   
Sorry, this didn't help.  I still get data corruption with hardware
checksumming or poor performance with software checksumming.

Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-20  1:38             ` Matt Carlson
  2008-02-20 16:13               ` Tony Battersby
@ 2008-02-20 21:29               ` Tony Battersby
  2008-02-20 23:04               ` Tony Battersby
  2 siblings, 0 replies; 25+ messages in thread
From: Tony Battersby @ 2008-02-20 21:29 UTC (permalink / raw)
  To: Matt Carlson
  Cc: Michael Chan, David Miller, herbert, netdev, gregkh, linux-kernel

Update:

Herbert's patch alters the arguments to alloc_skb_fclone() and
skb_reserve() from within sk_stream_alloc_pskb().  This changes the
skb_headroom() and skb_tailroom() of the returned skb.  I decided to see
if I could detect the precise point at which data corruption started to
happen.  The result is this table:

(sk_stream_alloc_pskb() called with size == 1448;
sk->sk_prot->max_header == 160)

skb_headroom  skb_tailroom  test result  note
216           1448          fail         [1]
344           1448          fail
340           1452          pass
336           1456          pass
332           1460          pass
328           1464          fail
324           1468          pass
320           1472          pass
316           1476          pass
312           1480          fail
308           1484          pass
304           1488          pass
300           1492          pass
296           1496          fail
292           1500          pass
288           1504          pass
284           1508          pass
280           1512          fail
276           1516          pass
272           1520          pass
268           1524          pass
264           1528          fail
260           1532          pass
256           1536          pass         [2]

Notes:
[1] Kernels 2.6.23.4 - 2.6.23.16 and 2.6.24 - current with Herbert's patch
[2] Kernels 2.6.23.3 and before without Herbert's patch

Note that the first row has skb_headroom + skb_tailroom == 1664; the
remaining rows have skb_headroom + skb_tailroom == 1792.

>From these results, it looks like a data alignment issue.  Herbert's
patch unfortunately just happened to change the alignment in a way that
made it break.

Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-20  1:38             ` Matt Carlson
  2008-02-20 16:13               ` Tony Battersby
  2008-02-20 21:29               ` Tony Battersby
@ 2008-02-20 23:04               ` Tony Battersby
  2008-02-20 23:08                 ` David Miller
  2 siblings, 1 reply; 25+ messages in thread
From: Tony Battersby @ 2008-02-20 23:04 UTC (permalink / raw)
  To: Matt Carlson
  Cc: Michael Chan, David Miller, herbert, netdev, gregkh, linux-kernel

The following patch fixes the problem for me.  Do we want to accept this
patch and call it a day or continue investigating the source of the problem?

Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc.  If everyone
agrees that this is the right solution, I will resubmit with a proper
subject line and description.

Tony

--- linux-2.6.24.2/include/net/sock.h.orig	2008-02-20 17:19:20.000000000 -0500
+++ linux-2.6.24.2/include/net/sock.h	2008-02-20 17:25:55.000000000 -0500
@@ -1236,8 +1236,10 @@ static inline struct sk_buff *sk_stream_
 {
 	struct sk_buff *skb;
 
-	/* The TCP header must be at least 32-bit aligned.  */
-	size = ALIGN(size, 4);
+	/* The TCP header must be at least 32-bit aligned, but some chipsets
+	 * such as Broadcom BCM5701 require at least 16-byte alignment.
+	 */
+	size = ALIGN(size, 16);
 
 	skb = alloc_skb_fclone(size + sk->sk_prot->max_header, gfp);
 	if (skb) {



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-20 23:04               ` Tony Battersby
@ 2008-02-20 23:08                 ` David Miller
  2008-02-20 23:17                   ` Michael Chan
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2008-02-20 23:08 UTC (permalink / raw)
  To: tonyb; +Cc: mcarlson, mchan, herbert, netdev, gregkh, linux-kernel

From: Tony Battersby <tonyb@cybernetics.com>
Date: Wed, 20 Feb 2008 18:04:09 -0500

> The following patch fixes the problem for me.  Do we want to accept this
> patch and call it a day or continue investigating the source of the problem?
> 
> Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc.  If everyone
> agrees that this is the right solution, I will resubmit with a proper
> subject line and description.

A chipset bug, if it even exists, should be worked around in the
driver for that hardware.  We shouldn't make every other piece
of hardware in the world suffer too.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-20 23:08                 ` David Miller
@ 2008-02-20 23:17                   ` Michael Chan
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Chan @ 2008-02-20 23:17 UTC (permalink / raw)
  To: David Miller; +Cc: tonyb, mcarlson, herbert, netdev, gregkh, linux-kernel

On Wed, 2008-02-20 at 15:08 -0800, David Miller wrote:
> From: Tony Battersby <tonyb@cybernetics.com>
> Date: Wed, 20 Feb 2008 18:04:09 -0500
> 
> > The following patch fixes the problem for me.  Do we want to accept this
> > patch and call it a day or continue investigating the source of the problem?
> > 
> > Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc.  If everyone
> > agrees that this is the right solution, I will resubmit with a proper
> > subject line and description.
> 
> A chipset bug, if it even exists, should be worked around in the
> driver for that hardware.  We shouldn't make every other piece
> of hardware in the world suffer too.
> 
> 

Yes, we should workaround this in the TG3 driver once we understand what
the problem is and how to workaround it.  We are still looking through
the errata list to sort this out.  It looks like it is the starting DMA
address of the TX buffer that is causing the problem.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-02-20 15:18               ` Tony Battersby
@ 2008-04-15  0:12                 ` Matt Carlson
  2008-04-15 15:39                   ` Tony Battersby
  0 siblings, 1 reply; 25+ messages in thread
From: Matt Carlson @ 2008-04-15  0:12 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Herbert Xu, Michael Chan, David Miller, netdev, gregkh, linux-kernel

Hi Tony.  Sorry for the radio silence.

Michael and I have discussed this problem a bit.  Another possibility is
that the chip may be having difficulty with non-dword aligned TX buffers.
Since we already know the RX side has the same problem, it isn't so
far-fetched to think that perhaps it can affect the TX side too.  Can
you give the following patch a try and see if the corruption still
happens?


diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 96043c5..810c711 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4135,11 +4135,20 @@ static int tigon3_dma_hwbug_workaround(struct tg3 *tp, struct sk_buff *skb,
 				       u32 last_plus_one, u32 *start,
 				       u32 base_flags, u32 mss)
 {
-	struct sk_buff *new_skb = skb_copy(skb, GFP_ATOMIC);
+	struct sk_buff *new_skb;
 	dma_addr_t new_addr = 0;
 	u32 entry = *start;
 	int i, ret = 0;
 
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5701)
+		new_skb = skb_copy(skb, GFP_ATOMIC);
+	else {
+		int more_headroom = 4 - (skb->mac_header & 3);
+
+		new_skb = skb_copy_expand(skb, skb_headroom(skb) + more_headroom,
+					  skb_tailroom(skb), GFP_ATOMIC);
+	}
+
 	if (!new_skb) {
 		ret = -1;
 	} else {
@@ -4465,6 +4474,10 @@ static int tg3_start_xmit_dma_bug(struct sk_buff *skb, struct net_device *dev)
 	if (tg3_4g_overflow_test(mapping, len))
 		would_hit_hwbug = 1;
 
+	/* Force the 5701 into the double copy path. */
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
+		would_hit_hwbug = 1;
+
 	tg3_set_txd(tp, entry, mapping, len, base_flags,
 		    (skb_shinfo(skb)->nr_frags == 0) | (mss << 1));
 


On Wed, Feb 20, 2008 at 10:18:58AM -0500, Tony Battersby wrote:
> Herbert Xu wrote:
> > On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:
> >   
> >> Update: when I revert Herbert's patch in addition to applying your
> >> patch, the iSCSI performance goes back up to 115 MB/s again in both
> >> directions.  So it looks like turning off SG for TX didn't itself cause
> >> the performance drop, but rather that the performance drop is just
> >> another manifestation of whatever bug is causing the data corruption.
> >>     
> >
> > Interesting.  So the workload that regressed is mostly RX with a
> > little TX traffic? Can you try to reproduce this with something
> > like netperf to eliminate other variables?
> >
> > This is all very puzzling since the patch in question shouldn't
> > change an RX load at all.
> >
> > Thanks,
> >   
> We have established that the slowdown was caused by TCP checksum errors
> and retransmits.  I assume that the slowdown in my test was due to the
> light TX rather than the heavy RX.  I am no TCP protocol expert, but
> perhaps heavy TX (such as iperf) might not be affected as much because
> the wire stays busy while waiting for the retransmit, whereas with my
> light TX iSCSI load, the wire goes idle while waiting for the retransmit
> because the iSCSI state machine is stalled.
> 
> Tony
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-04-15  0:12                 ` Matt Carlson
@ 2008-04-15 15:39                   ` Tony Battersby
  2008-04-16  3:31                     ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Tony Battersby @ 2008-04-15 15:39 UTC (permalink / raw)
  To: Matt Carlson
  Cc: Herbert Xu, Michael Chan, David Miller, netdev, gregkh, linux-kernel

Matt Carlson wrote:
> Hi Tony.  Sorry for the radio silence.
>
> Michael and I have discussed this problem a bit.  Another possibility is
> that the chip may be having difficulty with non-dword aligned TX buffers.
> Since we already know the RX side has the same problem, it isn't so
> far-fetched to think that perhaps it can affect the TX side too.  Can
> you give the following patch a try and see if the corruption still
> happens?
>
>   

Thanks, your patch fixes the problem (tested on 2.6.24.4).  However, I
had to change "(skb->mac_header & 3)" in your patch to "((long)
skb->mac_header & 3)" since mac_header is a pointer rather than an int
on 32-bit systems.

Tested-by: Tony Battersby <tonyb@cybernetics.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-04-15 15:39                   ` Tony Battersby
@ 2008-04-16  3:31                     ` David Miller
  2008-04-16 15:40                       ` Michael Chan
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2008-04-16  3:31 UTC (permalink / raw)
  To: tonyb; +Cc: mcarlson, herbert, mchan, netdev, gregkh, linux-kernel

From: Tony Battersby <tonyb@cybernetics.com>
Date: Tue, 15 Apr 2008 11:39:27 -0400

> Thanks, your patch fixes the problem (tested on 2.6.24.4).  However, I
> had to change "(skb->mac_header & 3)" in your patch to "((long)
> skb->mac_header & 3)" since mac_header is a pointer rather than an int
> on 32-bit systems.
> 
> Tested-by: Tony Battersby <tonyb@cybernetics.com>

Thanks for testing.

Matt, skb->mac_header is either a pointer or an integer offset
depending upon whether we are building 32-bit or 64-bit.

Testing skb->mac_header is therefore wrong, because it's an
offset from a pointer in the 64-bit case and therefore it's
alignment does not indicate correctly the actual final alignment
of skb->head + skb->max_header.

Therefore you should test skb_mac_header(skb) and cast it with
(unsigned long).

Please respin this fix with that correction so I can apply it
and get this bug fixed, thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-04-16  3:31                     ` David Miller
@ 2008-04-16 15:40                       ` Michael Chan
  2008-04-16 20:17                         ` Matt Carlson
  2008-04-18  6:20                         ` David Miller
  0 siblings, 2 replies; 25+ messages in thread
From: Michael Chan @ 2008-04-16 15:40 UTC (permalink / raw)
  To: David Miller, tonyb
  Cc: Matthew Carlson, herbert, netdev, gregkh, linux-kernel

David Miller wrote:

> Matt, skb->mac_header is either a pointer or an integer offset
> depending upon whether we are building 32-bit or 64-bit.
> 
> Testing skb->mac_header is therefore wrong, because it's an
> offset from a pointer in the 64-bit case and therefore it's
> alignment does not indicate correctly the actual final alignment
> of skb->head + skb->max_header.
> 
> Therefore you should test skb_mac_header(skb) and cast it with
> (unsigned long).

Isn't it better to test for skb->data?  That's where we tell
the hardware to start transmitting.

> 
> Please respin this fix with that correction so I can apply it
> and get this bug fixed, thanks!
> 
> 

We think that this problem is unique in Tony's environment because
of the PCIE-to-PCI bridge that he is using.  We therefore want to
test for that bridge and apply the workaround only when it's present.
We've never seen this problem in the last 6 or 7 years during the
lifetime of the 5701.

We'll try to get this done ASAP.

Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-04-16 15:40                       ` Michael Chan
@ 2008-04-16 20:17                         ` Matt Carlson
  2008-04-16 21:00                           ` Tony Battersby
  2008-04-18  6:20                         ` David Miller
  1 sibling, 1 reply; 25+ messages in thread
From: Matt Carlson @ 2008-04-16 20:17 UTC (permalink / raw)
  To: tonyb
  Cc: David Miller, Michael Chan, Matthew Carlson, herbert, netdev,
	gregkh, linux-kernel

On Wed, Apr 16, 2008 at 08:40:25AM -0700, Michael Chan wrote:
> David Miller wrote:
> 
> > Matt, skb->mac_header is either a pointer or an integer offset
> > depending upon whether we are building 32-bit or 64-bit.
> > 
> > Testing skb->mac_header is therefore wrong, because it's an
> > offset from a pointer in the 64-bit case and therefore it's
> > alignment does not indicate correctly the actual final alignment
> > of skb->head + skb->max_header.
> > 
> > Therefore you should test skb_mac_header(skb) and cast it with
> > (unsigned long).
> 
> Isn't it better to test for skb->data?  That's where we tell
> the hardware to start transmitting.
> 
> > 
> > Please respin this fix with that correction so I can apply it
> > and get this bug fixed, thanks!
> > 
> > 
> 
> We think that this problem is unique in Tony's environment because
> of the PCIE-to-PCI bridge that he is using.  We therefore want to
> test for that bridge and apply the workaround only when it's present.
> We've never seen this problem in the last 6 or 7 years during the
> lifetime of the 5701.
> 
> We'll try to get this done ASAP.
> 
> Thanks.

Tony,

Below is a patch that attempts to limit the workaround to the bridge you
have on your system.  Can you test it and verify that the workaround is
still enabled?


diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 96043c5..52a44c6 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4135,11 +4135,21 @@ static int tigon3_dma_hwbug_workaround(struct tg3 *tp, struct sk_buff *skb,
 				       u32 last_plus_one, u32 *start,
 				       u32 base_flags, u32 mss)
 {
-	struct sk_buff *new_skb = skb_copy(skb, GFP_ATOMIC);
+	struct sk_buff *new_skb;
 	dma_addr_t new_addr = 0;
 	u32 entry = *start;
 	int i, ret = 0;
 
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5701)
+		new_skb = skb_copy(skb, GFP_ATOMIC);
+	else {
+		int more_headroom = 4 - ((unsigned long)skb->data & 3);
+
+		new_skb = skb_copy_expand(skb,
+					  skb_headroom(skb) + more_headroom,
+					  skb_tailroom(skb), GFP_ATOMIC);
+	}
+
 	if (!new_skb) {
 		ret = -1;
 	} else {
@@ -4462,7 +4472,9 @@ static int tg3_start_xmit_dma_bug(struct sk_buff *skb, struct net_device *dev)
 
 	would_hit_hwbug = 0;
 
-	if (tg3_4g_overflow_test(mapping, len))
+	if (tp->tg3_flags3 & TG3_FLG3_5701_DMA_BUG)
+		would_hit_hwbug = 1;
+	else if (tg3_4g_overflow_test(mapping, len))
 		would_hit_hwbug = 1;
 
 	tg3_set_txd(tp, entry, mapping, len, base_flags,
@@ -11339,6 +11351,41 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
 		}
 	}
 
+	if ((GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)) {
+		static struct tg3_dev_id {
+			u32	vendor;
+			u32	device;
+		} bridge_chipsets[] = {
+			{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXH_0 },
+			{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXH_1 },
+			{ },
+		};
+		struct tg3_dev_id *pci_id = &bridge_chipsets[0];
+		struct pci_dev *bridge = NULL;
+
+		while (pci_id->vendor != 0 &&
+		       !(tp->tg3_flags3 & TG3_FLG3_5701_DMA_BUG)) {
+			while (1) {
+				bridge = pci_get_device(pci_id->vendor,
+							pci_id->device,
+							bridge);
+				if (!bridge) {
+					pci_id++;
+					break;
+				}
+				if (bridge->subordinate &&
+				    (bridge->subordinate->number <=
+				     tp->pdev->bus->number) &&
+				    (bridge->subordinate->subordinate >=
+				     tp->pdev->bus->number)) {
+					tp->tg3_flags3 |= TG3_FLG3_5701_DMA_BUG;
+					pci_dev_put(bridge);
+					break;
+				}
+			}
+		}
+	}
+
 	/* The EPB bridge inside 5714, 5715, and 5780 cannot support
 	 * DMA addresses > 40-bit. This bridge may have other additional
 	 * 57xx devices behind it in some 4-port NIC designs for example.
diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
index c1075a7..c688c3a 100644
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -2476,6 +2476,7 @@ struct tg3 {
 #define TG3_FLG3_NO_NVRAM_ADDR_TRANS	0x00000001
 #define TG3_FLG3_ENABLE_APE		0x00000002
 #define TG3_FLG3_5761_5784_AX_FIXES	0x00000004
+#define TG3_FLG3_5701_DMA_BUG		0x00000008
 
 	struct timer_list		timer;
 	u16				timer_counter;



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-04-16 20:17                         ` Matt Carlson
@ 2008-04-16 21:00                           ` Tony Battersby
  0 siblings, 0 replies; 25+ messages in thread
From: Tony Battersby @ 2008-04-16 21:00 UTC (permalink / raw)
  To: Matt Carlson
  Cc: David Miller, Michael Chan, herbert, netdev, gregkh, linux-kernel

Matt Carlson wrote:
> On Wed, Apr 16, 2008 at 08:40:25AM -0700, Michael Chan wrote:
>   
>> David Miller wrote:
>>
>>     
>>> Matt, skb->mac_header is either a pointer or an integer offset
>>> depending upon whether we are building 32-bit or 64-bit.
>>>
>>> Testing skb->mac_header is therefore wrong, because it's an
>>> offset from a pointer in the 64-bit case and therefore it's
>>> alignment does not indicate correctly the actual final alignment
>>> of skb->head + skb->max_header.
>>>
>>> Therefore you should test skb_mac_header(skb) and cast it with
>>> (unsigned long).
>>>       
>> Isn't it better to test for skb->data?  That's where we tell
>> the hardware to start transmitting.
>>
>>     
>>> Please respin this fix with that correction so I can apply it
>>> and get this bug fixed, thanks!
>>>
>>>
>>>       
>> We think that this problem is unique in Tony's environment because
>> of the PCIE-to-PCI bridge that he is using.  We therefore want to
>> test for that bridge and apply the workaround only when it's present.
>> We've never seen this problem in the last 6 or 7 years during the
>> lifetime of the 5701.
>>
>> We'll try to get this done ASAP.
>>
>> Thanks.
>>     
>
> Tony,
>
> Below is a patch that attempts to limit the workaround to the bridge you
> have on your system.  Can you test it and verify that the workaround is
> still enabled?
>
>
>   

This new patch also passes the test. Thumbs up!

Tested-by: Tony Battersby <tonyb@cybernetics.com>

Tony


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: TG3 network data corruption regression 2.6.24/2.6.23.4
  2008-04-16 15:40                       ` Michael Chan
  2008-04-16 20:17                         ` Matt Carlson
@ 2008-04-18  6:20                         ` David Miller
  1 sibling, 0 replies; 25+ messages in thread
From: David Miller @ 2008-04-18  6:20 UTC (permalink / raw)
  To: mchan; +Cc: tonyb, mcarlson, herbert, netdev, gregkh, linux-kernel

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 16 Apr 2008 08:40:25 -0700

> Isn't it better to test for skb->data?  That's where we tell
> the hardware to start transmitting.

That's correct.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-04-18  6:21 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-18 22:41 TG3 network data corruption regression 2.6.24/2.6.23.4 Tony Battersby
2008-02-19  0:32 ` Michael Chan
2008-02-19  0:35   ` David Miller
2008-02-19  1:04     ` Michael Chan
2008-02-19 16:16       ` Tony Battersby
2008-02-19 19:11         ` Michael Chan
2008-02-19 19:26           ` Tony Battersby
2008-02-19 22:14           ` Tony Battersby
2008-02-19 23:52             ` Michael Chan
2008-02-20 15:01               ` Tony Battersby
2008-02-20  1:38             ` Matt Carlson
2008-02-20 16:13               ` Tony Battersby
2008-02-20 21:29               ` Tony Battersby
2008-02-20 23:04               ` Tony Battersby
2008-02-20 23:08                 ` David Miller
2008-02-20 23:17                   ` Michael Chan
2008-02-20  3:45             ` Herbert Xu
2008-02-20 15:18               ` Tony Battersby
2008-04-15  0:12                 ` Matt Carlson
2008-04-15 15:39                   ` Tony Battersby
2008-04-16  3:31                     ` David Miller
2008-04-16 15:40                       ` Michael Chan
2008-04-16 20:17                         ` Matt Carlson
2008-04-16 21:00                           ` Tony Battersby
2008-04-18  6:20                         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).