All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] r8169 : why SG / TX checksum are default disabled
@ 2012-07-17 22:39 Eric Dumazet
  2012-07-17 23:40 ` Francois Romieu
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2012-07-17 22:39 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Hayes Wang

Hi Francois

I was wondering why Scatter Gather (NETIF_F_SG) and TX checksum
(NETIF_F_IP_CSUM) were disabled by default on r8169.

Couldnt we enable them by default, maybe for a whitelist of "good"
nics ?

(I found that activating them with ethtool automatically enables GSO,
 and performance with GSO is not good)

Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-17 22:39 [RFC] r8169 : why SG / TX checksum are default disabled Eric Dumazet
@ 2012-07-17 23:40 ` Francois Romieu
  2012-07-18  6:45   ` hayeswang
  2012-07-18  8:55   ` Eric Dumazet
  0 siblings, 2 replies; 20+ messages in thread
From: Francois Romieu @ 2012-07-17 23:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Hayes Wang

Eric Dumazet <eric.dumazet@gmail.com> :
[...]
> I was wondering why Scatter Gather (NETIF_F_SG) and TX checksum
> (NETIF_F_IP_CSUM) were disabled by default on r8169.

It was not unconditionally stable. For a number of reasons:
- incompatibility with jumbo frames.
  rtl_chip_infos now allows to take care of it
- lack of support for chipset dependent Tx descriptor 
  rtl_tx_desc_info handles it. There are yet a few unused ipv6 bits. No idea
  how useable they are.
- some false alarms due to different issues and probably some others
  I could hunt for in the archives.

> Couldnt we enable them by default, maybe for a whitelist of "good"
> nics ?

Sure.

Hayes, should we not add into the kernel driver something similar to
the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 8168 driver ?
There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.

> (I found that activating them with ethtool automatically enables GSO,
>  and performance with GSO is not good)

It's still an improvement though, isn't it ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-17 23:40 ` Francois Romieu
@ 2012-07-18  6:45   ` hayeswang
  2012-07-18 16:23     ` David Miller
  2012-07-18  8:55   ` Eric Dumazet
  1 sibling, 1 reply; 20+ messages in thread
From: hayeswang @ 2012-07-18  6:45 UTC (permalink / raw)
  To: 'Francois Romieu', 'Eric Dumazet'; +Cc: netdev

Francois Romieu [mailto:romieu@fr.zoreil.com] 
[...]

> Hayes, should we not add into the kernel driver something similar to
> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
> 8168 driver ?
> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.

For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
with the length less than 60 bytes. The hardware should pad this kind of packet
to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
60 bytes. However, the hw checksum would be incorrect for the modified packet,
so the software checksum is necessary. That is, for the packet less than 60
bytes, the software has to pad the packet and calculate the checksum, and the hw
checksum has to be disabled.
 
Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-17 23:40 ` Francois Romieu
  2012-07-18  6:45   ` hayeswang
@ 2012-07-18  8:55   ` Eric Dumazet
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2012-07-18  8:55 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Hayes Wang

On Wed, 2012-07-18 at 01:40 +0200, Francois Romieu wrote:

> > (I found that activating them with ethtool automatically enables GSO,
> >  and performance with GSO is not good)
> 
> It's still an improvement though, isn't it ?
> 

On an old AMD machine, I can get line rate with default conf, but using
nearly all cpu cycles.

Following test is only partial, a real one should use forwarding for
example...


# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1000 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 62
tcpi_reordering 3 tcpi_total_retrans 0
Local       Remote      Local  Elapsed Throughput Throughput  Local  Local  Remote Remote Local   Remote  Service  
Send Socket Recv Socket Send   Time               Units       CPU    CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util   Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %      Method %      Method                          
290160      549032      16384  10.00   915.44     10^6bits/s  44.93  S      3.61   S      8.042   7.755   usec/KB  

 Performance counter stats for 'netperf -H eric -C -c -t OMNI':

       5206,301186 task-clock                #    0,520 CPUs utilized          
            16 568 context-switches          #    0,003 M/sec                  
                 2 CPU-migrations            #    0,000 K/sec                  
               366 page-faults               #    0,070 K/sec                  
    12 362 775 266 cycles                    #    2,375 GHz                     [66,99%]
     2 529 275 760 stalled-cycles-frontend   #   20,46% frontend cycles idle    [67,00%]
     6 878 915 080 stalled-cycles-backend    #   55,64% backend  cycles idle    [66,24%]
     5 272 222 150 instructions              #    0,43  insns per cycle        
                                             #    1,30  stalled cycles per insn [66,85%]
       819 922 185 branches                  #  157,487 M/sec                   [66,79%]
        50 135 423 branch-misses             #    6,11% of all branches         [66,15%]

      10,019141027 seconds time elapsed


If I switch to SG+TX (GSO is automatically enabled), bandwidth is lower.

# ethtool -K eth1 tx on sg on
Actual changes:
tx-checksumming: on
	tx-checksum-ipv4: on
scatter-gather: on
	tx-scatter-gather: on
generic-segmentation-offload: on

# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 21 tpci_snd_cwnd 169
tcpi_reordering 3 tcpi_total_retrans 0
Local       Remote      Local  Elapsed Throughput Throughput  Local  Local  Remote Remote Local   Remote  Service  
Send Socket Recv Socket Send   Time               Units       CPU    CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util   Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %      Method %      Method                          
790920      704640      16384  10.01   762.29     10^6bits/s  38.00  S      3.38   S      8.167   8.720   usec/KB  

 Performance counter stats for 'netperf -H eric -C -c -t OMNI':

       4526,838736 task-clock                #    0,452 CPUs utilized          
             2 031 context-switches          #    0,449 K/sec                  
                 3 CPU-migrations            #    0,001 K/sec                  
               366 page-faults               #    0,081 K/sec                  
     4 476 876 825 cycles                    #    0,989 GHz                     [66,41%]
       899 080 378 stalled-cycles-frontend   #   20,08% frontend cycles idle    [66,56%]
     2 430 763 937 stalled-cycles-backend    #   54,30% backend  cycles idle    [66,87%]
     1 685 481 163 instructions              #    0,38  insns per cycle        
                                             #    1,44  stalled cycles per insn [66,93%]
       280 404 977 branches                  #   61,943 M/sec                   [66,73%]
        15 608 497 branch-misses             #    5,57% of all branches         [66,54%]

      10,025486268 seconds time elapsed

Since most frames need between 2 and 3 segments
(one for the ip/tcp headers, and one or two frags for the payload), this
might be a MMIO issue, that Alexander tried to solve recently...

If I only switch to SG+TX its ok

# ethtool -K eth1 gso off

# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1000 tcpi_rttvar 750 tcpi_snd_ssthresh 18 tpci_snd_cwnd 60
tcpi_reordering 3 tcpi_total_retrans 0
Local       Remote      Local  Elapsed Throughput Throughput  Local  Local  Remote Remote Local   Remote  Service  
Send Socket Recv Socket Send   Time               Units       CPU    CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util   Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %      Method %      Method                          
280800      549032      16384  10.00   916.61     10^6bits/s  40.05  S      3.62   S      7.159   7.774   usec/KB  

 Performance counter stats for 'netperf -H eric -C -c -t OMNI':

       4827,259625 task-clock                #    0,482 CPUs utilized          
            17 988 context-switches          #    0,004 M/sec                  
                 3 CPU-migrations            #    0,001 K/sec                  
               366 page-faults               #    0,076 K/sec                  
    11 448 148 411 cycles                    #    2,372 GHz                     [66,57%]
     2 278 563 777 stalled-cycles-frontend   #   19,90% frontend cycles idle    [66,38%]
     6 420 123 655 stalled-cycles-backend    #   56,08% backend  cycles idle    [66,38%]
     4 471 468 064 instructions              #    0,39  insns per cycle        
                                             #    1,44  stalled cycles per insn [67,48%]
       757 302 269 branches                  #  156,880 M/sec                   [67,08%]
        44 320 435 branch-misses             #    5,85% of all branches         [66,16%]

      10,020331031 seconds time elapsed

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-18  6:45   ` hayeswang
@ 2012-07-18 16:23     ` David Miller
  2012-07-18 20:12       ` Francois Romieu
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2012-07-18 16:23 UTC (permalink / raw)
  To: hayeswang; +Cc: romieu, eric.dumazet, netdev

From: hayeswang <hayeswang@realtek.com>
Date: Wed, 18 Jul 2012 14:45:55 +0800

> Francois Romieu [mailto:romieu@fr.zoreil.com] 
> [...]
> 
>> Hayes, should we not add into the kernel driver something similar to
>> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
>> 8168 driver ?
>> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.
> 
> For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
> with the length less than 60 bytes. The hardware should pad this kind of packet
> to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
> 60 bytes. However, the hw checksum would be incorrect for the modified packet,
> so the software checksum is necessary.

I wonder how the hardware checksum can be incorrectly calculated if the padding
is done with zeros?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-18 16:23     ` David Miller
@ 2012-07-18 20:12       ` Francois Romieu
  2012-07-18 20:28         ` David Miller
  2012-07-20  2:11         ` hayeswang
  0 siblings, 2 replies; 20+ messages in thread
From: Francois Romieu @ 2012-07-18 20:12 UTC (permalink / raw)
  To: David Miller; +Cc: hayeswang, eric.dumazet, netdev

David Miller <davem@davemloft.net> :
> From: hayeswang <hayeswang@realtek.com>
> > Francois Romieu [mailto:romieu@fr.zoreil.com] 
> > [...]
> > 
> >> Hayes, should we not add into the kernel driver something similar to
> >> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
> >> 8168 driver ?
> >> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.
> > 
> > For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
> > with the length less than 60 bytes. The hardware should pad this kind of packet
> > to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
> > 60 bytes. However, the hw checksum would be incorrect for the modified packet,
> > so the software checksum is necessary.
> 
> I wonder how the hardware checksum can be incorrectly calculated if the padding
> is done with zeros?

A part of the apparent problem may stem from the fact that Realtek's 8168
driver claims a modified length but it does not really skb_padto... 

Hayes, would the patch below fix the original problem ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index be4e00f..a463697 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5740,7 +5740,7 @@ err_out:
 	return -EIO;
 }
 
-static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
+static inline bool rtl8169_tso_csum(struct rtl8169_private *tp,
 				    struct sk_buff *skb, u32 *opts)
 {
 	const struct rtl_tx_desc_info *info = tx_desc_info + tp->txd_version;
@@ -5753,6 +5753,12 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		const struct iphdr *ip = ip_hdr(skb);
 
+		if (unlikely(skb->len < 60 &&
+		    (tp->mac_version == RTL_GIGA_MAC_VER_34) &&
+		    skb_padto(skb, ETH_ZLEN))) {
+			return false;
+		}
+
 		if (ip->protocol == IPPROTO_TCP)
 			opts[offset] |= info->checksum.tcp;
 		else if (ip->protocol == IPPROTO_UDP)
@@ -5760,6 +5766,7 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 		else
 			WARN_ON_ONCE(1);
 	}
+	return true;
 }
 
 static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
@@ -5797,7 +5804,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	opts[1] = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
 	opts[0] = DescOwn;
 
-	rtl8169_tso_csum(tp, skb, opts);
+	if (!rtl8169_tso_csum(tp, skb, opts))
+		goto err_update_stats;
 
 	frags = rtl8169_xmit_frags(tp, skb, opts);
 	if (frags < 0)
@@ -5853,6 +5861,7 @@ err_dma_1:
 	rtl8169_unmap_tx_skb(d, tp->tx_skb + entry, txd);
 err_dma_0:
 	dev_kfree_skb(skb);
+err_update_stats:
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-18 20:12       ` Francois Romieu
@ 2012-07-18 20:28         ` David Miller
  2012-07-18 21:44           ` Francois Romieu
  2012-07-20  2:11         ` hayeswang
  1 sibling, 1 reply; 20+ messages in thread
From: David Miller @ 2012-07-18 20:28 UTC (permalink / raw)
  To: romieu; +Cc: hayeswang, eric.dumazet, netdev

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 18 Jul 2012 22:12:01 +0200

> David Miller <davem@davemloft.net> :
>> From: hayeswang <hayeswang@realtek.com>
>> > Francois Romieu [mailto:romieu@fr.zoreil.com] 
>> > [...]
>> > 
>> >> Hayes, should we not add into the kernel driver something similar to
>> >> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
>> >> 8168 driver ?
>> >> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.
>> > 
>> > For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
>> > with the length less than 60 bytes. The hardware should pad this kind of packet
>> > to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
>> > 60 bytes. However, the hw checksum would be incorrect for the modified packet,
>> > so the software checksum is necessary.
>> 
>> I wonder how the hardware checksum can be incorrectly calculated if the padding
>> is done with zeros?
> 
> A part of the apparent problem may stem from the fact that Realtek's 8168
> driver claims a modified length but it does not really skb_padto... 
> 
> Hayes, would the patch below fix the original problem ?

A NETDEV_TX_OK return means we accepted the SKB, it doesn't look like
that's what you are doing in the skb_padto() failure path.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-18 20:28         ` David Miller
@ 2012-07-18 21:44           ` Francois Romieu
  2012-07-18 22:05             ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Francois Romieu @ 2012-07-18 21:44 UTC (permalink / raw)
  To: David Miller; +Cc: hayeswang, eric.dumazet, netdev

David Miller <davem@davemloft.net> :
[...]
> A NETDEV_TX_OK return means we accepted the SKB, it doesn't look like
> that's what you are doing in the skb_padto() failure path.

?

- skb_padto fails
  (original skb is implicitely freed)
- skb_padto returns error status (!= 0)
- rtl8169_tso_csum returns false
- start_xmit returns NETDEV_TX_OK. 

I'll search the missing "!" after some sleep if that's what you are talking
about. Otherwise than that, I don't get it.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-18 21:44           ` Francois Romieu
@ 2012-07-18 22:05             ` Eric Dumazet
  2012-07-18 22:24               ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2012-07-18 22:05 UTC (permalink / raw)
  To: Francois Romieu; +Cc: David Miller, hayeswang, netdev

On Wed, 2012-07-18 at 23:44 +0200, Francois Romieu wrote:
> David Miller <davem@davemloft.net> :
> [...]
> > A NETDEV_TX_OK return means we accepted the SKB, it doesn't look like
> > that's what you are doing in the skb_padto() failure path.
> 
> ?
> 
> - skb_padto fails
>   (original skb is implicitely freed)
> - skb_padto returns error status (!= 0)
> - rtl8169_tso_csum returns false
> - start_xmit returns NETDEV_TX_OK. 
> 
> I'll search the missing "!" after some sleep if that's what you are talking
> about. Otherwise than that, I don't get it.
> 


Yes, I believe your patch is fine.

In fact many drivers dont account the error in their stats.

Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-18 22:05             ` Eric Dumazet
@ 2012-07-18 22:24               ` David Miller
  2012-07-20  7:14                 ` hayeswang
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2012-07-18 22:24 UTC (permalink / raw)
  To: eric.dumazet; +Cc: romieu, hayeswang, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 19 Jul 2012 00:05:36 +0200

> On Wed, 2012-07-18 at 23:44 +0200, Francois Romieu wrote:
>> David Miller <davem@davemloft.net> :
>> [...]
>> > A NETDEV_TX_OK return means we accepted the SKB, it doesn't look like
>> > that's what you are doing in the skb_padto() failure path.
>> 
>> ?
>> 
>> - skb_padto fails
>>   (original skb is implicitely freed)
>> - skb_padto returns error status (!= 0)
>> - rtl8169_tso_csum returns false
>> - start_xmit returns NETDEV_TX_OK. 
>> 
>> I'll search the missing "!" after some sleep if that's what you are talking
>> about. Otherwise than that, I don't get it.
>> 
> 
> 
> Yes, I believe your patch is fine.
> 
> In fact many drivers dont account the error in their stats.

My bad, I forgot that skb_padto() frees the SKB on failure.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-18 20:12       ` Francois Romieu
  2012-07-18 20:28         ` David Miller
@ 2012-07-20  2:11         ` hayeswang
  1 sibling, 0 replies; 20+ messages in thread
From: hayeswang @ 2012-07-20  2:11 UTC (permalink / raw)
  To: 'Francois Romieu', 'David Miller'; +Cc: eric.dumazet, netdev

 Francois Romieu [mailto:romieu@fr.zoreil.com] 
[...]
> A part of the apparent problem may stem from the fact that 
> Realtek's 8168
> driver claims a modified length but it does not really skb_padto... 
> 
> Hayes, would the patch below fix the original problem ?

According to the response from our hw engineer, it still has the problem even
though you pad the packet to 60 bytes with zeroes. I would still test this patch
to verify it.

>  static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
> @@ -5797,7 +5804,8 @@ static netdev_tx_t 
> rtl8169_start_xmit(struct sk_buff *skb,
>  	opts[1] = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
>  	opts[0] = DescOwn;
>  
> -	rtl8169_tso_csum(tp, skb, opts);
> +	if (!rtl8169_tso_csum(tp, skb, opts))
> +		goto err_update_stats;
>  

I think you should check the length of opts1 of the descriptor, too. Besides,
how about the length of dma_map_xxx? Should it use the original length or the
modified length?

>  	frags = rtl8169_xmit_frags(tp, skb, opts);
>  	if (frags < 0)
> @@ -5853,6 +5861,7 @@ err_dma_1:
>  	rtl8169_unmap_tx_skb(d, tp->tx_skb + entry, txd);
>  err_dma_0:
>  	dev_kfree_skb(skb);
> +err_update_stats:
>  	dev->stats.tx_dropped++;
>  	return NETDEV_TX_OK;
>  
 
Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-18 22:24               ` David Miller
@ 2012-07-20  7:14                 ` hayeswang
  2012-07-20 10:08                   ` Francois Romieu
  2012-07-20 16:17                   ` David Miller
  0 siblings, 2 replies; 20+ messages in thread
From: hayeswang @ 2012-07-20  7:14 UTC (permalink / raw)
  To: 'David Miller', eric.dumazet; +Cc: romieu, netdev

 Francois Romieu <romieu@fr.zoreil.com>

> >> > A NETDEV_TX_OK return means we accepted the SKB, it 
> doesn't look like
> >> > that's what you are doing in the skb_padto() failure path.
> >> 
> >> ?
> >> 
> >> - skb_padto fails
> >>   (original skb is implicitely freed)
> >> - skb_padto returns error status (!= 0)
> >> - rtl8169_tso_csum returns false
> >> - start_xmit returns NETDEV_TX_OK. 
> >> 
> >> I'll search the missing "!" after some sleep if that's 
> what you are talking
> >> about. Otherwise than that, I don't get it.
> >> 
> > 
> > 
> > Yes, I believe your patch is fine.
> > 
> > In fact many drivers dont account the error in their stats.
> 
> My bad, I forgot that skb_padto() frees the SKB on failure.
> 

I find that the total length field of IP header would be modified if the hw
checksum is enabled. Therefore, skb_padto + hw checksum wouldn't work. The
software checksum is necessary.
 
Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-20  7:14                 ` hayeswang
@ 2012-07-20 10:08                   ` Francois Romieu
  2012-07-20 16:01                     ` hayeswang
  2012-07-20 16:17                   ` David Miller
  1 sibling, 1 reply; 20+ messages in thread
From: Francois Romieu @ 2012-07-20 10:08 UTC (permalink / raw)
  To: hayeswang; +Cc: 'David Miller', eric.dumazet, netdev

hayeswang <hayeswang@realtek.com> :
[...]
> I find that the total length field of IP header would be modified if the hw
> checksum is enabled. Therefore, skb_padto + hw checksum wouldn't work.

Ok, my patch completely ignored the fact that skb_padto does not change the
length.

However skb_padto + length adjustement + hw checksum should work (at least in
theory if not in the patch below) ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index be4e00f..8d0cc09 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5740,7 +5740,7 @@ err_out:
 	return -EIO;
 }
 
-static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
+static inline bool rtl8169_tso_csum(struct rtl8169_private *tp,
 				    struct sk_buff *skb, u32 *opts)
 {
 	const struct rtl_tx_desc_info *info = tx_desc_info + tp->txd_version;
@@ -5753,6 +5753,13 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		const struct iphdr *ip = ip_hdr(skb);
 
+		if (unlikely(skb->len < ETH_ZLEN &&
+		    (tp->mac_version == RTL_GIGA_MAC_VER_34))) {
+			if (skb_padto(skb, ETH_ZLEN))
+				return false;
+			skb_put(skb, ETH_ZLEN - skb->len);
+		}
+
 		if (ip->protocol == IPPROTO_TCP)
 			opts[offset] |= info->checksum.tcp;
 		else if (ip->protocol == IPPROTO_UDP)
@@ -5760,6 +5767,7 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 		else
 			WARN_ON_ONCE(1);
 	}
+	return true;
 }
 
 static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
@@ -5783,25 +5791,26 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	if (unlikely(le32_to_cpu(txd->opts1) & DescOwn))
 		goto err_stop_0;
 
+	opts[1] = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
+	opts[0] = DescOwn;
+
+	if (!rtl8169_tso_csum(tp, skb, opts))
+		goto err_update_stats_0;
+
 	len = skb_headlen(skb);
 	mapping = dma_map_single(d, skb->data, len, DMA_TO_DEVICE);
 	if (unlikely(dma_mapping_error(d, mapping))) {
 		if (net_ratelimit())
 			netif_err(tp, drv, dev, "Failed to map TX DMA!\n");
-		goto err_dma_0;
+		goto err_free_skb_1;
 	}
 
 	tp->tx_skb[entry].len = len;
 	txd->addr = cpu_to_le64(mapping);
 
-	opts[1] = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
-	opts[0] = DescOwn;
-
-	rtl8169_tso_csum(tp, skb, opts);
-
 	frags = rtl8169_xmit_frags(tp, skb, opts);
 	if (frags < 0)
-		goto err_dma_1;
+		goto err_unmap_2;
 	else if (frags)
 		opts[0] |= FirstFrag;
 	else {
@@ -5849,10 +5858,11 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	return NETDEV_TX_OK;
 
-err_dma_1:
+err_unmap_2:
 	rtl8169_unmap_tx_skb(d, tp->tx_skb + entry, txd);
-err_dma_0:
+err_free_skb_1:
 	dev_kfree_skb(skb);
+err_update_stats_0:
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-20 10:08                   ` Francois Romieu
@ 2012-07-20 16:01                     ` hayeswang
  2012-07-20 16:28                       ` David Miller
  2012-07-20 21:01                       ` Francois Romieu
  0 siblings, 2 replies; 20+ messages in thread
From: hayeswang @ 2012-07-20 16:01 UTC (permalink / raw)
  To: 'Francois Romieu'; +Cc: 'David Miller', eric.dumazet, netdev

 Francois Romieu [mailto:romieu@fr.zoreil.com] 

[...]
> > I find that the total length field of IP header would be 
> modified if the hw
> > checksum is enabled. Therefore, skb_padto + hw checksum 
> wouldn't work.
> 
> Ok, my patch completely ignored the fact that skb_padto does 
> not change the
> length.
> 
> However skb_padto + length adjustement + hw checksum should 
> work (at least in
> theory if not in the patch below) ?

If the hw only fills in the checksum fields of IP header, UDP header, and TCP
header, the patch would work. However, the hw would also fill in the total
length field of IP header, so it causes problems. For example, I send a packet
with ethernet header 14 bytes + IP header 20 bytes + data 20 bytes = 54 bytes.

Case 1: Software checksum + pad zeroes to 60 bytes
   Receiver gets this packet and finds the total length in IP header would be 40
bytes. Therefore, the receiver knows the data would be 40 - 20 (IP header) = 20
bytes.

Case 2: pad zeroes to 60 bytes + hw checksum
   Receiver gets this packet and would find the total length in IP header is 40
+ (60-54) = 46 bytes, not 40 bytes. Therefore, the receiver consider the data
would be 46 - 20 = 26 bytes. However, the final 6 bytes should not be the parts
of data.
 
Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-20  7:14                 ` hayeswang
  2012-07-20 10:08                   ` Francois Romieu
@ 2012-07-20 16:17                   ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2012-07-20 16:17 UTC (permalink / raw)
  To: hayeswang; +Cc: eric.dumazet, romieu, netdev

From: hayeswang <hayeswang@realtek.com>
Date: Fri, 20 Jul 2012 15:14:26 +0800

> I find that the total length field of IP header would be modified if the hw
> checksum is enabled. Therefore, skb_padto + hw checksum wouldn't work. The
> software checksum is necessary.

It's really strange that the hardware has any reason to update the IP
header length field when it is asked to compute the checksum.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-20 16:01                     ` hayeswang
@ 2012-07-20 16:28                       ` David Miller
  2012-07-20 21:01                       ` Francois Romieu
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2012-07-20 16:28 UTC (permalink / raw)
  To: hayeswang; +Cc: romieu, eric.dumazet, netdev

From: hayeswang <hayeswang@realtek.com>
Date: Sat, 21 Jul 2012 00:01:38 +0800

> However, the hw would also fill in the total length field of IP
> header, so it causes problems.

Why does the HW modify fields we did not ask it to?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-20 16:01                     ` hayeswang
  2012-07-20 16:28                       ` David Miller
@ 2012-07-20 21:01                       ` Francois Romieu
  2012-07-24  6:34                         ` hayeswang
  1 sibling, 1 reply; 20+ messages in thread
From: Francois Romieu @ 2012-07-20 21:01 UTC (permalink / raw)
  To: hayeswang; +Cc: 'David Miller', eric.dumazet, netdev

hayeswang <hayeswang@realtek.com> :
[...]
> If the hw only fills in the checksum fields of IP header, UDP header, and TCP
> header, the patch would work. However, the hw would also fill in the total
> length field of IP header, so it causes problems.

Thanks, I missed this part. The patch below should be getting better now.

Btw, would there be any point in setting the TcpHeaderOffset field for
the post-8168c chipsets ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index be4e00f..2420af6 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5740,7 +5740,7 @@ err_out:
 	return -EIO;
 }
 
-static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
+static inline bool rtl8169_tso_csum(struct rtl8169_private *tp,
 				    struct sk_buff *skb, u32 *opts)
 {
 	const struct rtl_tx_desc_info *info = tx_desc_info + tp->txd_version;
@@ -5753,6 +5753,15 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		const struct iphdr *ip = ip_hdr(skb);
 
+		if (unlikely(skb->len < ETH_ZLEN &&
+		    (tp->mac_version == RTL_GIGA_MAC_VER_34))) {
+			if (skb_padto(skb, ETH_ZLEN))
+				return false;
+			skb_checksum_help(skb);
+			skb_put(skb, ETH_ZLEN - skb->len);
+			return true;
+		}
+
 		if (ip->protocol == IPPROTO_TCP)
 			opts[offset] |= info->checksum.tcp;
 		else if (ip->protocol == IPPROTO_UDP)
@@ -5760,6 +5769,7 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 		else
 			WARN_ON_ONCE(1);
 	}
+	return true;
 }
 
 static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
@@ -5783,25 +5793,26 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	if (unlikely(le32_to_cpu(txd->opts1) & DescOwn))
 		goto err_stop_0;
 
+	opts[1] = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
+	opts[0] = DescOwn;
+
+	if (!rtl8169_tso_csum(tp, skb, opts))
+		goto err_update_stats_0;
+
 	len = skb_headlen(skb);
 	mapping = dma_map_single(d, skb->data, len, DMA_TO_DEVICE);
 	if (unlikely(dma_mapping_error(d, mapping))) {
 		if (net_ratelimit())
 			netif_err(tp, drv, dev, "Failed to map TX DMA!\n");
-		goto err_dma_0;
+		goto err_free_skb_1;
 	}
 
 	tp->tx_skb[entry].len = len;
 	txd->addr = cpu_to_le64(mapping);
 
-	opts[1] = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
-	opts[0] = DescOwn;
-
-	rtl8169_tso_csum(tp, skb, opts);
-
 	frags = rtl8169_xmit_frags(tp, skb, opts);
 	if (frags < 0)
-		goto err_dma_1;
+		goto err_unmap_2;
 	else if (frags)
 		opts[0] |= FirstFrag;
 	else {
@@ -5849,10 +5860,11 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	return NETDEV_TX_OK;
 
-err_dma_1:
+err_unmap_2:
 	rtl8169_unmap_tx_skb(d, tp->tx_skb + entry, txd);
-err_dma_0:
+err_free_skb_1:
 	dev_kfree_skb(skb);
+err_update_stats_0:
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-20 21:01                       ` Francois Romieu
@ 2012-07-24  6:34                         ` hayeswang
  2012-07-24  6:59                           ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: hayeswang @ 2012-07-24  6:34 UTC (permalink / raw)
  To: 'Francois Romieu'; +Cc: 'David Miller', eric.dumazet, netdev

  Francois Romieu [mailto:romieu@fr.zoreil.com] 
> Sent: Saturday, July 21, 2012 5:01 AM
> To: Hayeswang
> Cc: 'David Miller'; eric.dumazet@gmail.com; netdev@vger.kernel.org
> Subject: Re: [RFC] r8169 : why SG / TX checksum are default disabled
> 
[...]
> 
> Btw, would there be any point in setting the TcpHeaderOffset field for
> the post-8168c chipsets ?

It is used to tell the haredware where the TCP header starts from. I think it is
reserved for ipv6.
 
Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-24  6:34                         ` hayeswang
@ 2012-07-24  6:59                           ` David Miller
  2012-07-25  2:10                             ` hayeswang
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2012-07-24  6:59 UTC (permalink / raw)
  To: hayeswang; +Cc: romieu, eric.dumazet, netdev

From: hayeswang <hayeswang@realtek.com>
Date: Tue, 24 Jul 2012 14:34:04 +0800

>   Francois Romieu [mailto:romieu@fr.zoreil.com] 
>> Sent: Saturday, July 21, 2012 5:01 AM
>> To: Hayeswang
>> Cc: 'David Miller'; eric.dumazet@gmail.com; netdev@vger.kernel.org
>> Subject: Re: [RFC] r8169 : why SG / TX checksum are default disabled
>> 
> [...]
>> 
>> Btw, would there be any point in setting the TcpHeaderOffset field for
>> the post-8168c chipsets ?
> 
> It is used to tell the haredware where the TCP header starts from. I think it is
> reserved for ipv6.

I still have not seen a good explanation why the chip modifies fields
in the packet, such as the length, when we ask it only to compute the
checksum?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [RFC] r8169 : why SG / TX checksum are default disabled
  2012-07-24  6:59                           ` David Miller
@ 2012-07-25  2:10                             ` hayeswang
  0 siblings, 0 replies; 20+ messages in thread
From: hayeswang @ 2012-07-25  2:10 UTC (permalink / raw)
  To: 'David Miller'; +Cc: romieu, eric.dumazet, netdev, 'nic_swsd'

 David Miller [mailto:davem@davemloft.net] 
> Sent: Tuesday, July 24, 2012 3:00 PM
> To: Hayeswang
> Cc: romieu@fr.zoreil.com; eric.dumazet@gmail.com; 
> netdev@vger.kernel.org
> Subject: Re: [RFC] r8169 : why SG / TX checksum are default disabled
> 
[...]
> 
> I still have not seen a good explanation why the chip modifies fields
> in the packet, such as the length, when we ask it only to compute the
> checksum?

I don't know the history and the detail about the bug. I think only our designer
could answer your question. I guess it is a careless mistake.
 
Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-07-25  2:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-17 22:39 [RFC] r8169 : why SG / TX checksum are default disabled Eric Dumazet
2012-07-17 23:40 ` Francois Romieu
2012-07-18  6:45   ` hayeswang
2012-07-18 16:23     ` David Miller
2012-07-18 20:12       ` Francois Romieu
2012-07-18 20:28         ` David Miller
2012-07-18 21:44           ` Francois Romieu
2012-07-18 22:05             ` Eric Dumazet
2012-07-18 22:24               ` David Miller
2012-07-20  7:14                 ` hayeswang
2012-07-20 10:08                   ` Francois Romieu
2012-07-20 16:01                     ` hayeswang
2012-07-20 16:28                       ` David Miller
2012-07-20 21:01                       ` Francois Romieu
2012-07-24  6:34                         ` hayeswang
2012-07-24  6:59                           ` David Miller
2012-07-25  2:10                             ` hayeswang
2012-07-20 16:17                   ` David Miller
2012-07-20  2:11         ` hayeswang
2012-07-18  8:55   ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.