[PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
@ 2016-10-26  0:09 Jon Maxwell
  2016-10-27 14:44   ` Thomas Falcon
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Jon Maxwell @ 2016-10-26  0:09 UTC (permalink / raw)
  To: tlfalcon
  Cc: benh, paulus, mpe, davem, tom, jarod, hofrat, netdev,
	linuxppc-dev, linux-kernel, mleitner, jmaxwell, Jon Maxwell

We recently encountered a bug where a few customers using ibmveth on the 
same LPAR hit an issue where a TCP session hung when large receive was
enabled. Closer analysis revealed that the session was stuck because the 
one side was advertising a zero window repeatedly.

We narrowed this down to the fact the ibmveth driver did not set gso_size 
which is translated by TCP into the MSS later up the stack. The MSS is 
used to calculate the TCP window size and as that was abnormally large, 
it was calculating a zero window, even although the sockets receive buffer 
was completely empty. 

We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
and Marcelo for all your help and review on this.

The patch fixes both our internal reproduction tests and our customers tests.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
 drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 29c05d0..c51717e 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 	int frames_processed = 0;
 	unsigned long lpar_rc;
 	struct iphdr *iph;
+	bool large_packet = 0;
+	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
 
 restart_poll:
 	while (frames_processed < budget) {
@@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 						iph->check = 0;
 						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
 						adapter->rx_large_packets++;
+						large_packet = 1;
 					}
 				}
 			}
 
+			if (skb->len > netdev->mtu) {
+				iph = (struct iphdr *)skb->data;
+				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
+				    iph->protocol == IPPROTO_TCP) {
+					hdr_len += sizeof(struct iphdr);
+					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
+				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
+					   iph->protocol == IPPROTO_TCP) {
+					hdr_len += sizeof(struct ipv6hdr);
+					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
+				}
+				if (!large_packet)
+					adapter->rx_large_packets++;
+			}
+
 			napi_gro_receive(napi, skb);	/* send it up */
 
 			netdev->stats.rx_packets++;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-10-26  0:09 [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type Jon Maxwell
@ 2016-10-27 14:44   ` Thomas Falcon
  2016-10-27 15:26   ` Eric Dumazet
  2016-12-08 22:40 ` [PATCH] ibmveth: set correct gso_size and gso_type Thomas Falcon
  2 siblings, 0 replies; 23+ messages in thread
From: Thomas Falcon @ 2016-10-27 14:44 UTC (permalink / raw)
  To: Jon Maxwell
  Cc: jmaxwell, hofrat, linux-kernel, jarod, netdev, paulus, tom,
	mleitner, linuxppc-dev, davem

On 10/25/2016 07:09 PM, Jon Maxwell wrote:
> We recently encountered a bug where a few customers using ibmveth on the 
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the 
> one side was advertising a zero window repeatedly.
>
> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> which is translated by TCP into the MSS later up the stack. The MSS is 
> used to calculate the TCP window size and as that was abnormally large, 
> it was calculating a zero window, even although the sockets receive buffer 
> was completely empty. 
>
> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> and Marcelo for all your help and review on this.
>
> The patch fixes both our internal reproduction tests and our customers tests.
>
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Thanks, Jon.

Acked-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index 29c05d0..c51717e 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  	int frames_processed = 0;
>  	unsigned long lpar_rc;
>  	struct iphdr *iph;
> +	bool large_packet = 0;
> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>
>  restart_poll:
>  	while (frames_processed < budget) {
> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  						iph->check = 0;
>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>  						adapter->rx_large_packets++;
> +						large_packet = 1;
>  					}
>  				}
>  			}
>
> +			if (skb->len > netdev->mtu) {
> +				iph = (struct iphdr *)skb->data;
> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
> +				    iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct iphdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
> +					   iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct ipv6hdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				}
> +				if (!large_packet)
> +					adapter->rx_large_packets++;
> +			}
> +
>  			napi_gro_receive(napi, skb);	/* send it up */
>
>  			netdev->stats.rx_packets++;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
@ 2016-10-27 14:44   ` Thomas Falcon
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Falcon @ 2016-10-27 14:44 UTC (permalink / raw)
  To: Jon Maxwell
  Cc: tom, netdev, jmaxwell, linux-kernel, jarod, paulus, hofrat,
	mleitner, linuxppc-dev, davem

On 10/25/2016 07:09 PM, Jon Maxwell wrote:
> We recently encountered a bug where a few customers using ibmveth on the 
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the 
> one side was advertising a zero window repeatedly.
>
> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> which is translated by TCP into the MSS later up the stack. The MSS is 
> used to calculate the TCP window size and as that was abnormally large, 
> it was calculating a zero window, even although the sockets receive buffer 
> was completely empty. 
>
> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> and Marcelo for all your help and review on this.
>
> The patch fixes both our internal reproduction tests and our customers tests.
>
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Thanks, Jon.

Acked-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index 29c05d0..c51717e 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  	int frames_processed = 0;
>  	unsigned long lpar_rc;
>  	struct iphdr *iph;
> +	bool large_packet = 0;
> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>
>  restart_poll:
>  	while (frames_processed < budget) {
> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  						iph->check = 0;
>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>  						adapter->rx_large_packets++;
> +						large_packet = 1;
>  					}
>  				}
>  			}
>
> +			if (skb->len > netdev->mtu) {
> +				iph = (struct iphdr *)skb->data;
> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
> +				    iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct iphdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
> +					   iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct ipv6hdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				}
> +				if (!large_packet)
> +					adapter->rx_large_packets++;
> +			}
> +
>  			napi_gro_receive(napi, skb);	/* send it up */
>
>  			netdev->stats.rx_packets++;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-10-26  0:09 [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type Jon Maxwell
  2016-10-27 14:44   ` Thomas Falcon
@ 2016-10-27 15:26   ` Eric Dumazet
  2016-12-08 22:40 ` [PATCH] ibmveth: set correct gso_size and gso_type Thomas Falcon
  2 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-10-27 15:26 UTC (permalink / raw)
  To: Jon Maxwell
  Cc: tlfalcon, benh, paulus, mpe, davem, tom, jarod, hofrat, netdev,
	linuxppc-dev, linux-kernel, mleitner, jmaxwell

On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
> We recently encountered a bug where a few customers using ibmveth on the 
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the 
> one side was advertising a zero window repeatedly.
> 
> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> which is translated by TCP into the MSS later up the stack. The MSS is 
> used to calculate the TCP window size and as that was abnormally large, 
> it was calculating a zero window, even although the sockets receive buffer 
> was completely empty. 
> 
> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> and Marcelo for all your help and review on this.
> 
> The patch fixes both our internal reproduction tests and our customers tests.
> 
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index 29c05d0..c51717e 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  	int frames_processed = 0;
>  	unsigned long lpar_rc;
>  	struct iphdr *iph;
> +	bool large_packet = 0;
> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>  
>  restart_poll:
>  	while (frames_processed < budget) {
> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  						iph->check = 0;
>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>  						adapter->rx_large_packets++;
> +						large_packet = 1;
>  					}
>  				}
>  			}
>  
> +			if (skb->len > netdev->mtu) {
> +				iph = (struct iphdr *)skb->data;
> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
> +				    iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct iphdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
> +					   iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct ipv6hdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				}
> +				if (!large_packet)
> +					adapter->rx_large_packets++;
> +			}
> +
>  

This might break forwarding and PMTU discovery.

You force gso_size to device mtu, regardless of real MSS used by the TCP
sender.

Don't you have the MSS provided in RX descriptor, instead of guessing
the value ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
@ 2016-10-27 15:26   ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-10-27 15:26 UTC (permalink / raw)
  To: Jon Maxwell
  Cc: tlfalcon, jmaxwell, hofrat, linux-kernel, jarod, netdev, paulus,
	tom, mleitner, linuxppc-dev, davem

On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
> We recently encountered a bug where a few customers using ibmveth on the 
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the 
> one side was advertising a zero window repeatedly.
> 
> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> which is translated by TCP into the MSS later up the stack. The MSS is 
> used to calculate the TCP window size and as that was abnormally large, 
> it was calculating a zero window, even although the sockets receive buffer 
> was completely empty. 
> 
> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> and Marcelo for all your help and review on this.
> 
> The patch fixes both our internal reproduction tests and our customers tests.
> 
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index 29c05d0..c51717e 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  	int frames_processed = 0;
>  	unsigned long lpar_rc;
>  	struct iphdr *iph;
> +	bool large_packet = 0;
> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>  
>  restart_poll:
>  	while (frames_processed < budget) {
> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  						iph->check = 0;
>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>  						adapter->rx_large_packets++;
> +						large_packet = 1;
>  					}
>  				}
>  			}
>  
> +			if (skb->len > netdev->mtu) {
> +				iph = (struct iphdr *)skb->data;
> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
> +				    iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct iphdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
> +					   iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct ipv6hdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				}
> +				if (!large_packet)
> +					adapter->rx_large_packets++;
> +			}
> +
>  

This might break forwarding and PMTU discovery.

You force gso_size to device mtu, regardless of real MSS used by the TCP
sender.

Don't you have the MSS provided in RX descriptor, instead of guessing
the value ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
@ 2016-10-27 15:26   ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-10-27 15:26 UTC (permalink / raw)
  To: Jon Maxwell
  Cc: tlfalcon, benh, paulus, mpe, davem, tom, jarod, hofrat, netdev,
	linuxppc-dev, linux-kernel, mleitner, jmaxwell

On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
> We recently encountered a bug where a few customers using ibmveth on the 
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the 
> one side was advertising a zero window repeatedly.
> 
> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> which is translated by TCP into the MSS later up the stack. The MSS is 
> used to calculate the TCP window size and as that was abnormally large, 
> it was calculating a zero window, even although the sockets receive buffer 
> was completely empty. 
> 
> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> and Marcelo for all your help and review on this.
> 
> The patch fixes both our internal reproduction tests and our customers tests.
> 
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> index 29c05d0..c51717e 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  	int frames_processed = 0;
>  	unsigned long lpar_rc;
>  	struct iphdr *iph;
> +	bool large_packet = 0;
> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>  
>  restart_poll:
>  	while (frames_processed < budget) {
> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>  						iph->check = 0;
>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>  						adapter->rx_large_packets++;
> +						large_packet = 1;
>  					}
>  				}
>  			}
>  
> +			if (skb->len > netdev->mtu) {
> +				iph = (struct iphdr *)skb->data;
> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
> +				    iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct iphdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
> +					   iph->protocol == IPPROTO_TCP) {
> +					hdr_len += sizeof(struct ipv6hdr);
> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> +				}
> +				if (!large_packet)
> +					adapter->rx_large_packets++;
> +			}
> +
>  

This might break forwarding and PMTU discovery.

You force gso_size to device mtu, regardless of real MSS used by the TCP
sender.

Don't you have the MSS provided in RX descriptor, instead of guessing
the value ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-10-27 15:26   ` Eric Dumazet
  (?)
  (?)
@ 2016-10-27 17:54   ` Thomas Falcon
  2016-10-27 18:08       ` Eric Dumazet
  -1 siblings, 1 reply; 23+ messages in thread
From: Thomas Falcon @ 2016-10-27 17:54 UTC (permalink / raw)
  To: Eric Dumazet, Jon Maxwell
  Cc: jmaxwell, hofrat, linux-kernel, jarod, netdev, paulus, tom,
	mleitner, linuxppc-dev, davem

On 10/27/2016 10:26 AM, Eric Dumazet wrote:
> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>> We recently encountered a bug where a few customers using ibmveth on the 
>> same LPAR hit an issue where a TCP session hung when large receive was
>> enabled. Closer analysis revealed that the session was stuck because the 
>> one side was advertising a zero window repeatedly.
>>
>> We narrowed this down to the fact the ibmveth driver did not set gso_size 
>> which is translated by TCP into the MSS later up the stack. The MSS is 
>> used to calculate the TCP window size and as that was abnormally large, 
>> it was calculating a zero window, even although the sockets receive buffer 
>> was completely empty. 
>>
>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
>> and Marcelo for all your help and review on this.
>>
>> The patch fixes both our internal reproduction tests and our customers tests.
>>
>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>> ---
>>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> index 29c05d0..c51717e 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>  	int frames_processed = 0;
>>  	unsigned long lpar_rc;
>>  	struct iphdr *iph;
>> +	bool large_packet = 0;
>> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>  
>>  restart_poll:
>>  	while (frames_processed < budget) {
>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>  						iph->check = 0;
>>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>>  						adapter->rx_large_packets++;
>> +						large_packet = 1;
>>  					}
>>  				}
>>  			}
>>  
>> +			if (skb->len > netdev->mtu) {
>> +				iph = (struct iphdr *)skb->data;
>> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>> +				    iph->protocol == IPPROTO_TCP) {
>> +					hdr_len += sizeof(struct iphdr);
>> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>> +					   iph->protocol == IPPROTO_TCP) {
>> +					hdr_len += sizeof(struct ipv6hdr);
>> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> +				}
>> +				if (!large_packet)
>> +					adapter->rx_large_packets++;
>> +			}
>> +
>>  
> This might break forwarding and PMTU discovery.
>
> You force gso_size to device mtu, regardless of real MSS used by the TCP
> sender.
>
> Don't you have the MSS provided in RX descriptor, instead of guessing
> the value ?
>
>
>
The MSS is not always available unfortunately, so this is the best solution there is at the moment. 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-10-27 17:54   ` Thomas Falcon
@ 2016-10-27 18:08       ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-10-27 18:08 UTC (permalink / raw)
  To: Thomas Falcon
  Cc: Jon Maxwell, jmaxwell, hofrat, linux-kernel, jarod, netdev,
	paulus, tom, mleitner, linuxppc-dev, davem

On Thu, 2016-10-27 at 12:54 -0500, Thomas Falcon wrote:
> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
> > On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
> >> We recently encountered a bug where a few customers using ibmveth on the 
> >> same LPAR hit an issue where a TCP session hung when large receive was
> >> enabled. Closer analysis revealed that the session was stuck because the 
> >> one side was advertising a zero window repeatedly.
> >>
> >> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> >> which is translated by TCP into the MSS later up the stack. The MSS is 
> >> used to calculate the TCP window size and as that was abnormally large, 
> >> it was calculating a zero window, even although the sockets receive buffer 
> >> was completely empty. 
> >>
> >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> >> and Marcelo for all your help and review on this.
> >>
> >> The patch fixes both our internal reproduction tests and our customers tests.
> >>
> >> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> >> ---
> >>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
> >>  1 file changed, 20 insertions(+)
> >>
> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> >> index 29c05d0..c51717e 100644
> >> --- a/drivers/net/ethernet/ibm/ibmveth.c
> >> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
> >>  	int frames_processed = 0;
> >>  	unsigned long lpar_rc;
> >>  	struct iphdr *iph;
> >> +	bool large_packet = 0;
> >> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
> >>  
> >>  restart_poll:
> >>  	while (frames_processed < budget) {
> >> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
> >>  						iph->check = 0;
> >>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
> >>  						adapter->rx_large_packets++;
> >> +						large_packet = 1;
> >>  					}
> >>  				}
> >>  			}
> >>  
> >> +			if (skb->len > netdev->mtu) {
> >> +				iph = (struct iphdr *)skb->data;
> >> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
> >> +				    iph->protocol == IPPROTO_TCP) {
> >> +					hdr_len += sizeof(struct iphdr);
> >> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> >> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> >> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
> >> +					   iph->protocol == IPPROTO_TCP) {
> >> +					hdr_len += sizeof(struct ipv6hdr);
> >> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
> >> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> >> +				}
> >> +				if (!large_packet)
> >> +					adapter->rx_large_packets++;
> >> +			}
> >> +
> >>  
> > This might break forwarding and PMTU discovery.
> >
> > You force gso_size to device mtu, regardless of real MSS used by the TCP
> > sender.
> >
> > Don't you have the MSS provided in RX descriptor, instead of guessing
> > the value ?
> >
> >
> >
> The MSS is not always available unfortunately, so this is the best solution there is at the moment. 

Hmm... then what about skb_shinfo(skb)->gso_segs ?

ip_rcv() for example has :

        __IP_ADD_STATS(net,
                       IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK),
                       max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));



Also prefer : (skb->protocol == htons(ETH_P_IP)) tests

And the ipv6 test is wrong :

} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
           iph->protocol == IPPROTO_TCP) {


Since iph is a pointer to ipv4 iphdr .

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
@ 2016-10-27 18:08       ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-10-27 18:08 UTC (permalink / raw)
  To: Thomas Falcon
  Cc: Jon Maxwell, jmaxwell, hofrat, linux-kernel, jarod, netdev,
	paulus, tom, mleitner, linuxppc-dev, davem

On Thu, 2016-10-27 at 12:54 -0500, Thomas Falcon wrote:
> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
> > On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
> >> We recently encountered a bug where a few customers using ibmveth on the 
> >> same LPAR hit an issue where a TCP session hung when large receive was
> >> enabled. Closer analysis revealed that the session was stuck because the 
> >> one side was advertising a zero window repeatedly.
> >>
> >> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> >> which is translated by TCP into the MSS later up the stack. The MSS is 
> >> used to calculate the TCP window size and as that was abnormally large, 
> >> it was calculating a zero window, even although the sockets receive buffer 
> >> was completely empty. 
> >>
> >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> >> and Marcelo for all your help and review on this.
> >>
> >> The patch fixes both our internal reproduction tests and our customers tests.
> >>
> >> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> >> ---
> >>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
> >>  1 file changed, 20 insertions(+)
> >>
> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
> >> index 29c05d0..c51717e 100644
> >> --- a/drivers/net/ethernet/ibm/ibmveth.c
> >> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
> >>  	int frames_processed = 0;
> >>  	unsigned long lpar_rc;
> >>  	struct iphdr *iph;
> >> +	bool large_packet = 0;
> >> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
> >>  
> >>  restart_poll:
> >>  	while (frames_processed < budget) {
> >> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
> >>  						iph->check = 0;
> >>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
> >>  						adapter->rx_large_packets++;
> >> +						large_packet = 1;
> >>  					}
> >>  				}
> >>  			}
> >>  
> >> +			if (skb->len > netdev->mtu) {
> >> +				iph = (struct iphdr *)skb->data;
> >> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
> >> +				    iph->protocol == IPPROTO_TCP) {
> >> +					hdr_len += sizeof(struct iphdr);
> >> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> >> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> >> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
> >> +					   iph->protocol == IPPROTO_TCP) {
> >> +					hdr_len += sizeof(struct ipv6hdr);
> >> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
> >> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
> >> +				}
> >> +				if (!large_packet)
> >> +					adapter->rx_large_packets++;
> >> +			}
> >> +
> >>  
> > This might break forwarding and PMTU discovery.
> >
> > You force gso_size to device mtu, regardless of real MSS used by the TCP
> > sender.
> >
> > Don't you have the MSS provided in RX descriptor, instead of guessing
> > the value ?
> >
> >
> >
> The MSS is not always available unfortunately, so this is the best solution there is at the moment. 

Hmm... then what about skb_shinfo(skb)->gso_segs ?

ip_rcv() for example has :

        __IP_ADD_STATS(net,
                       IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK),
                       max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));



Also prefer : (skb->protocol == htons(ETH_P_IP)) tests

And the ipv6 test is wrong :

} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
           iph->protocol == IPPROTO_TCP) {


Since iph is a pointer to ipv4 iphdr .

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-10-27 18:08       ` Eric Dumazet
  (?)
@ 2016-10-30  0:21       ` Jonathan Maxwell
  -1 siblings, 0 replies; 23+ messages in thread
From: Jonathan Maxwell @ 2016-10-30  0:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Falcon, Jon Maxwell, hofrat, linux-kernel, jarod, netdev,
	paulus, Tom Herbert, Marcelo Ricardo Leitner, linuxppc-dev,
	David Miller

On Fri, Oct 28, 2016 at 5:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-10-27 at 12:54 -0500, Thomas Falcon wrote:
>> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
>> > On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>> >> We recently encountered a bug where a few customers using ibmveth on the
>> >> same LPAR hit an issue where a TCP session hung when large receive was
>> >> enabled. Closer analysis revealed that the session was stuck because the
>> >> one side was advertising a zero window repeatedly.
>> >>
>> >> We narrowed this down to the fact the ibmveth driver did not set gso_size
>> >> which is translated by TCP into the MSS later up the stack. The MSS is
>> >> used to calculate the TCP window size and as that was abnormally large,
>> >> it was calculating a zero window, even although the sockets receive buffer
>> >> was completely empty.
>> >>
>> >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom
>> >> and Marcelo for all your help and review on this.
>> >>
>> >> The patch fixes both our internal reproduction tests and our customers tests.
>> >>
>> >> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>> >> ---
>> >>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>> >>  1 file changed, 20 insertions(+)
>> >>
>> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> >> index 29c05d0..c51717e 100644
>> >> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> >> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>> >>    int frames_processed = 0;
>> >>    unsigned long lpar_rc;
>> >>    struct iphdr *iph;
>> >> +  bool large_packet = 0;
>> >> +  u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>> >>
>> >>  restart_poll:
>> >>    while (frames_processed < budget) {
>> >> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>> >>                                            iph->check = 0;
>> >>                                            iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>> >>                                            adapter->rx_large_packets++;
>> >> +                                          large_packet = 1;
>> >>                                    }
>> >>                            }
>> >>                    }
>> >>
>> >> +                  if (skb->len > netdev->mtu) {
>> >> +                          iph = (struct iphdr *)skb->data;
>> >> +                          if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>> >> +                              iph->protocol == IPPROTO_TCP) {
>> >> +                                  hdr_len += sizeof(struct iphdr);
>> >> +                                  skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>> >> +                                  skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> >> +                          } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>> >> +                                     iph->protocol == IPPROTO_TCP) {
>> >> +                                  hdr_len += sizeof(struct ipv6hdr);
>> >> +                                  skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>> >> +                                  skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> >> +                          }
>> >> +                          if (!large_packet)
>> >> +                                  adapter->rx_large_packets++;
>> >> +                  }
>> >> +
>> >>
>> > This might break forwarding and PMTU discovery.
>> >
>> > You force gso_size to device mtu, regardless of real MSS used by the TCP
>> > sender.
>> >
>> > Don't you have the MSS provided in RX descriptor, instead of guessing
>> > the value ?
>> >
>> >
>> >
>> The MSS is not always available unfortunately, so this is the best solution there is at the moment.
>
> Hmm... then what about skb_shinfo(skb)->gso_segs ?
>
> ip_rcv() for example has :
>
>         __IP_ADD_STATS(net,
>                        IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK),
>                        max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));
>
>

Okay thanks Eric. As the MSS is the main concern, let me work with the
team here and see if we find can a better way to do this. Will take care of
the other points you raised at the same time.

>
> Also prefer : (skb->protocol == htons(ETH_P_IP)) tests
>
> And the ipv6 test is wrong :
>
> } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>            iph->protocol == IPPROTO_TCP) {
>
>
> Since iph is a pointer to ipv4 iphdr .
>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-10-27 15:26   ` Eric Dumazet
                     ` (2 preceding siblings ...)
  (?)
@ 2016-11-02 21:40   ` Brian King
  2016-11-06 21:22     ` Jonathan Maxwell
  -1 siblings, 1 reply; 23+ messages in thread
From: Brian King @ 2016-11-02 21:40 UTC (permalink / raw)
  To: Eric Dumazet, Jon Maxwell
  Cc: tlfalcon, jmaxwell, hofrat, linux-kernel, jarod, netdev, paulus,
	tom, mleitner, linuxppc-dev, davem

On 10/27/2016 10:26 AM, Eric Dumazet wrote:
> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>> We recently encountered a bug where a few customers using ibmveth on the 
>> same LPAR hit an issue where a TCP session hung when large receive was
>> enabled. Closer analysis revealed that the session was stuck because the 
>> one side was advertising a zero window repeatedly.
>>
>> We narrowed this down to the fact the ibmveth driver did not set gso_size 
>> which is translated by TCP into the MSS later up the stack. The MSS is 
>> used to calculate the TCP window size and as that was abnormally large, 
>> it was calculating a zero window, even although the sockets receive buffer 
>> was completely empty. 
>>
>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
>> and Marcelo for all your help and review on this.
>>
>> The patch fixes both our internal reproduction tests and our customers tests.
>>
>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>> ---
>>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> index 29c05d0..c51717e 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>  	int frames_processed = 0;
>>  	unsigned long lpar_rc;
>>  	struct iphdr *iph;
>> +	bool large_packet = 0;
>> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>  
>>  restart_poll:
>>  	while (frames_processed < budget) {
>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>  						iph->check = 0;
>>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>>  						adapter->rx_large_packets++;
>> +						large_packet = 1;
>>  					}
>>  				}
>>  			}
>>  
>> +			if (skb->len > netdev->mtu) {
>> +				iph = (struct iphdr *)skb->data;
>> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>> +				    iph->protocol == IPPROTO_TCP) {
>> +					hdr_len += sizeof(struct iphdr);
>> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>> +					   iph->protocol == IPPROTO_TCP) {
>> +					hdr_len += sizeof(struct ipv6hdr);
>> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> +				}
>> +				if (!large_packet)
>> +					adapter->rx_large_packets++;
>> +			}
>> +
>>  
> 
> This might break forwarding and PMTU discovery.
> 
> You force gso_size to device mtu, regardless of real MSS used by the TCP
> sender.
> 
> Don't you have the MSS provided in RX descriptor, instead of guessing
> the value ?

We've had some further discussions on this with the Virtual I/O Server (VIOS)
development team. The large receive aggregation in the VIOS (AIX based) is actually
being done by software in the VIOS. What they may be able to do is when performing
this aggregation, they could look at the packet lengths of all the packets being
aggregated and take the largest packet size within the aggregation unit, minus the
header length and return that to the virtual ethernet client which we could then stuff
into gso_size. They are currently assessing how feasible this would be to do and whether
it would impact other bits of the code. However, assuming this does end up being an option,
would this address the concerns here or is that going to break something else I'm
not thinking of?

Unfortunately, I don't think we'd have a good way to get gso_segs set correctly as I don't
see how that would get passed back up the interface.

Thanks,

Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-11-02 21:40   ` Brian King
@ 2016-11-06 21:22     ` Jonathan Maxwell
  2016-11-09 16:02         ` Brian King
  0 siblings, 1 reply; 23+ messages in thread
From: Jonathan Maxwell @ 2016-11-06 21:22 UTC (permalink / raw)
  To: Brian King
  Cc: Eric Dumazet, Thomas Falcon, Jon Maxwell, hofrat, linux-kernel,
	jarod, netdev, paulus, Tom Herbert, Marcelo Ricardo Leitner,
	linuxppc-dev, David Miller

On Thu, Nov 3, 2016 at 8:40 AM, Brian King <brking@linux.vnet.ibm.com> wrote:
> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
>> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>>> We recently encountered a bug where a few customers using ibmveth on the
>>> same LPAR hit an issue where a TCP session hung when large receive was
>>> enabled. Closer analysis revealed that the session was stuck because the
>>> one side was advertising a zero window repeatedly.
>>>
>>> We narrowed this down to the fact the ibmveth driver did not set gso_size
>>> which is translated by TCP into the MSS later up the stack. The MSS is
>>> used to calculate the TCP window size and as that was abnormally large,
>>> it was calculating a zero window, even although the sockets receive buffer
>>> was completely empty.
>>>
>>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom
>>> and Marcelo for all your help and review on this.
>>>
>>> The patch fixes both our internal reproduction tests and our customers tests.
>>>
>>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>>> ---
>>>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>>>  1 file changed, 20 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>>> index 29c05d0..c51717e 100644
>>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>>      int frames_processed = 0;
>>>      unsigned long lpar_rc;
>>>      struct iphdr *iph;
>>> +    bool large_packet = 0;
>>> +    u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>>
>>>  restart_poll:
>>>      while (frames_processed < budget) {
>>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>>                                              iph->check = 0;
>>>                                              iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>>>                                              adapter->rx_large_packets++;
>>> +                                            large_packet = 1;
>>>                                      }
>>>                              }
>>>                      }
>>>
>>> +                    if (skb->len > netdev->mtu) {
>>> +                            iph = (struct iphdr *)skb->data;
>>> +                            if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>>> +                                iph->protocol == IPPROTO_TCP) {
>>> +                                    hdr_len += sizeof(struct iphdr);
>>> +                                    skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>>> +                                    skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>>> +                            } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>>> +                                       iph->protocol == IPPROTO_TCP) {
>>> +                                    hdr_len += sizeof(struct ipv6hdr);
>>> +                                    skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>>> +                                    skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>>> +                            }
>>> +                            if (!large_packet)
>>> +                                    adapter->rx_large_packets++;
>>> +                    }
>>> +
>>>
>>
>> This might break forwarding and PMTU discovery.
>>
>> You force gso_size to device mtu, regardless of real MSS used by the TCP
>> sender.
>>
>> Don't you have the MSS provided in RX descriptor, instead of guessing
>> the value ?
>
> We've had some further discussions on this with the Virtual I/O Server (VIOS)
> development team. The large receive aggregation in the VIOS (AIX based) is actually
> being done by software in the VIOS. What they may be able to do is when performing
> this aggregation, they could look at the packet lengths of all the packets being
> aggregated and take the largest packet size within the aggregation unit, minus the
> header length and return that to the virtual ethernet client which we could then stuff
> into gso_size. They are currently assessing how feasible this would be to do and whether
> it would impact other bits of the code. However, assuming this does end up being an option,
> would this address the concerns here or is that going to break something else I'm
> not thinking of?

I was discussing this with a colleague and although this is better than
what we have so far. We wonder if there could be a corner case where
it ends up with a smaller value than the current MSS. For example if
the application sent a burst of small TCP packets with the PUSH
bit set. In that case they may not be coalesced by GRO. The VIOS could
probably be coded to detect that condition and use the previous MSS.
But that may not necessarily be the current MSS.

The ibmveth driver passes the MSS via gso_size to the VIOS. Either as the
3rd argument of ibmveth_send() or via tcp_hdr(skb)->check which is
presumably over-written when the VIOS does the TSO. Would it be possible
to keep a copy of this value on the TX side on the VIOS before it over-written
and then some how pass that up to the RX side along with frame and set
gso_size to that which should be the current MSS?

Regards

Jon

>
> Unfortunately, I don't think we'd have a good way to get gso_segs set correctly as I don't
> see how that would get passed back up the interface.
>
> Thanks,
>
> Brian
>
>
> --
> Brian King
> Power Linux I/O
> IBM Linux Technology Center
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-11-06 21:22     ` Jonathan Maxwell
@ 2016-11-09 16:02         ` Brian King
  0 siblings, 0 replies; 23+ messages in thread
From: Brian King @ 2016-11-09 16:02 UTC (permalink / raw)
  To: Jonathan Maxwell
  Cc: Eric Dumazet, Thomas Falcon, Jon Maxwell, hofrat, linux-kernel,
	jarod, netdev, paulus, Tom Herbert, Marcelo Ricardo Leitner,
	linuxppc-dev, David Miller

On 11/06/2016 03:22 PM, Jonathan Maxwell wrote:
> On Thu, Nov 3, 2016 at 8:40 AM, Brian King <brking@linux.vnet.ibm.com> wrote:
>> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
>>> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>>>> We recently encountered a bug where a few customers using ibmveth on the
>>>> same LPAR hit an issue where a TCP session hung when large receive was
>>>> enabled. Closer analysis revealed that the session was stuck because the
>>>> one side was advertising a zero window repeatedly.
>>>>
>>>> We narrowed this down to the fact the ibmveth driver did not set gso_size
>>>> which is translated by TCP into the MSS later up the stack. The MSS is
>>>> used to calculate the TCP window size and as that was abnormally large,
>>>> it was calculating a zero window, even although the sockets receive buffer
>>>> was completely empty.
>>>>
>>>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom
>>>> and Marcelo for all your help and review on this.
>>>>
>>>> The patch fixes both our internal reproduction tests and our customers tests.
>>>>
>>>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>>>> ---
>>>>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>>>>  1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>>>> index 29c05d0..c51717e 100644
>>>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>>>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>>>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>>>      int frames_processed = 0;
>>>>      unsigned long lpar_rc;
>>>>      struct iphdr *iph;
>>>> +    bool large_packet = 0;
>>>> +    u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>>>
>>>>  restart_poll:
>>>>      while (frames_processed < budget) {
>>>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>>>                                              iph->check = 0;
>>>>                                              iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>>>>                                              adapter->rx_large_packets++;
>>>> +                                            large_packet = 1;
>>>>                                      }
>>>>                              }
>>>>                      }
>>>>
>>>> +                    if (skb->len > netdev->mtu) {
>>>> +                            iph = (struct iphdr *)skb->data;
>>>> +                            if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>>>> +                                iph->protocol == IPPROTO_TCP) {
>>>> +                                    hdr_len += sizeof(struct iphdr);
>>>> +                                    skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>>>> +                                    skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>>>> +                            } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>>>> +                                       iph->protocol == IPPROTO_TCP) {
>>>> +                                    hdr_len += sizeof(struct ipv6hdr);
>>>> +                                    skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>>>> +                                    skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>>>> +                            }
>>>> +                            if (!large_packet)
>>>> +                                    adapter->rx_large_packets++;
>>>> +                    }
>>>> +
>>>>
>>>
>>> This might break forwarding and PMTU discovery.
>>>
>>> You force gso_size to device mtu, regardless of real MSS used by the TCP
>>> sender.
>>>
>>> Don't you have the MSS provided in RX descriptor, instead of guessing
>>> the value ?
>>
>> We've had some further discussions on this with the Virtual I/O Server (VIOS)
>> development team. The large receive aggregation in the VIOS (AIX based) is actually
>> being done by software in the VIOS. What they may be able to do is when performing
>> this aggregation, they could look at the packet lengths of all the packets being
>> aggregated and take the largest packet size within the aggregation unit, minus the
>> header length and return that to the virtual ethernet client which we could then stuff
>> into gso_size. They are currently assessing how feasible this would be to do and whether
>> it would impact other bits of the code. However, assuming this does end up being an option,
>> would this address the concerns here or is that going to break something else I'm
>> not thinking of?
> 
> I was discussing this with a colleague and although this is better than
> what we have so far. We wonder if there could be a corner case where
> it ends up with a smaller value than the current MSS. For example if
> the application sent a burst of small TCP packets with the PUSH
> bit set. In that case they may not be coalesced by GRO. The VIOS could

Would that be a big problem though? Other than a performance degradation
in this specific case, do you see a functional issue with this approach?

> probably be coded to detect that condition and use the previous MSS.
> But that may not necessarily be the current MSS.
> 
> The ibmveth driver passes the MSS via gso_size to the VIOS. Either as the
> 3rd argument of ibmveth_send() or via tcp_hdr(skb)->check which is
> presumably over-written when the VIOS does the TSO. Would it be possible
> to keep a copy of this value on the TX side on the VIOS before it over-written
> and then some how pass that up to the RX side along with frame and set
> gso_size to that which should be the current MSS?

That seems like it might be more difficult. Wouldn't that require the VIOS
to track the MSS on a per connection basis, since the MSS might differ
based on the source / destination? I discussed with VIOS development, and
they don't do any connection tracking where they'd be able to insert this.

Unless there is a functional issue with the approach to have the VIOS insert
the MSS based on the size of the packets received off the wire, I think that
might be our best option...

Thanks,

Brian



> 
> Regards
> 
> Jon
> 
>>
>> Unfortunately, I don't think we'd have a good way to get gso_segs set correctly as I don't
>> see how that would get passed back up the interface.
>>
>> Thanks,
>>
>> Brian
>>
>>
>> --
>> Brian King
>> Power Linux I/O
>> IBM Linux Technology Center
>>
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
@ 2016-11-09 16:02         ` Brian King
  0 siblings, 0 replies; 23+ messages in thread
From: Brian King @ 2016-11-09 16:02 UTC (permalink / raw)
  To: Jonathan Maxwell
  Cc: Thomas Falcon, Eric Dumazet, netdev, Jon Maxwell, linux-kernel,
	jarod, paulus, hofrat, Marcelo Ricardo Leitner, Tom Herbert,
	linuxppc-dev, David Miller

On 11/06/2016 03:22 PM, Jonathan Maxwell wrote:
> On Thu, Nov 3, 2016 at 8:40 AM, Brian King <brking@linux.vnet.ibm.com> wrote:
>> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
>>> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>>>> We recently encountered a bug where a few customers using ibmveth on the
>>>> same LPAR hit an issue where a TCP session hung when large receive was
>>>> enabled. Closer analysis revealed that the session was stuck because the
>>>> one side was advertising a zero window repeatedly.
>>>>
>>>> We narrowed this down to the fact the ibmveth driver did not set gso_size
>>>> which is translated by TCP into the MSS later up the stack. The MSS is
>>>> used to calculate the TCP window size and as that was abnormally large,
>>>> it was calculating a zero window, even although the sockets receive buffer
>>>> was completely empty.
>>>>
>>>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom
>>>> and Marcelo for all your help and review on this.
>>>>
>>>> The patch fixes both our internal reproduction tests and our customers tests.
>>>>
>>>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>>>> ---
>>>>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>>>>  1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>>>> index 29c05d0..c51717e 100644
>>>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>>>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>>>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>>>      int frames_processed = 0;
>>>>      unsigned long lpar_rc;
>>>>      struct iphdr *iph;
>>>> +    bool large_packet = 0;
>>>> +    u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>>>
>>>>  restart_poll:
>>>>      while (frames_processed < budget) {
>>>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>>>                                              iph->check = 0;
>>>>                                              iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>>>>                                              adapter->rx_large_packets++;
>>>> +                                            large_packet = 1;
>>>>                                      }
>>>>                              }
>>>>                      }
>>>>
>>>> +                    if (skb->len > netdev->mtu) {
>>>> +                            iph = (struct iphdr *)skb->data;
>>>> +                            if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>>>> +                                iph->protocol == IPPROTO_TCP) {
>>>> +                                    hdr_len += sizeof(struct iphdr);
>>>> +                                    skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>>>> +                                    skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>>>> +                            } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>>>> +                                       iph->protocol == IPPROTO_TCP) {
>>>> +                                    hdr_len += sizeof(struct ipv6hdr);
>>>> +                                    skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>>>> +                                    skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>>>> +                            }
>>>> +                            if (!large_packet)
>>>> +                                    adapter->rx_large_packets++;
>>>> +                    }
>>>> +
>>>>
>>>
>>> This might break forwarding and PMTU discovery.
>>>
>>> You force gso_size to device mtu, regardless of real MSS used by the TCP
>>> sender.
>>>
>>> Don't you have the MSS provided in RX descriptor, instead of guessing
>>> the value ?
>>
>> We've had some further discussions on this with the Virtual I/O Server (VIOS)
>> development team. The large receive aggregation in the VIOS (AIX based) is actually
>> being done by software in the VIOS. What they may be able to do is when performing
>> this aggregation, they could look at the packet lengths of all the packets being
>> aggregated and take the largest packet size within the aggregation unit, minus the
>> header length and return that to the virtual ethernet client which we could then stuff
>> into gso_size. They are currently assessing how feasible this would be to do and whether
>> it would impact other bits of the code. However, assuming this does end up being an option,
>> would this address the concerns here or is that going to break something else I'm
>> not thinking of?
> 
> I was discussing this with a colleague and although this is better than
> what we have so far. We wonder if there could be a corner case where
> it ends up with a smaller value than the current MSS. For example if
> the application sent a burst of small TCP packets with the PUSH
> bit set. In that case they may not be coalesced by GRO. The VIOS could

Would that be a big problem though? Other than a performance degradation
in this specific case, do you see a functional issue with this approach?

> probably be coded to detect that condition and use the previous MSS.
> But that may not necessarily be the current MSS.
> 
> The ibmveth driver passes the MSS via gso_size to the VIOS. Either as the
> 3rd argument of ibmveth_send() or via tcp_hdr(skb)->check which is
> presumably over-written when the VIOS does the TSO. Would it be possible
> to keep a copy of this value on the TX side on the VIOS before it over-written
> and then some how pass that up to the RX side along with frame and set
> gso_size to that which should be the current MSS?

That seems like it might be more difficult. Wouldn't that require the VIOS
to track the MSS on a per connection basis, since the MSS might differ
based on the source / destination? I discussed with VIOS development, and
they don't do any connection tracking where they'd be able to insert this.

Unless there is a functional issue with the approach to have the VIOS insert
the MSS based on the size of the packets received off the wire, I think that
might be our best option...

Thanks,

Brian



> 
> Regards
> 
> Jon
> 
>>
>> Unfortunately, I don't think we'd have a good way to get gso_segs set correctly as I don't
>> see how that would get passed back up the interface.
>>
>> Thanks,
>>
>> Brian
>>
>>
>> --
>> Brian King
>> Power Linux I/O
>> IBM Linux Technology Center
>>
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
  2016-10-27 15:26   ` Eric Dumazet
@ 2016-11-11 18:16     ` Brian King
  -1 siblings, 0 replies; 23+ messages in thread
From: Brian King @ 2016-11-11 18:16 UTC (permalink / raw)
  To: Eric Dumazet, Jon Maxwell
  Cc: tlfalcon, jmaxwell, hofrat, linux-kernel, jarod, netdev, paulus,
	tom, mleitner, linuxppc-dev, davem

On 10/27/2016 10:26 AM, Eric Dumazet wrote:
> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>> We recently encountered a bug where a few customers using ibmveth on the 
>> same LPAR hit an issue where a TCP session hung when large receive was
>> enabled. Closer analysis revealed that the session was stuck because the 
>> one side was advertising a zero window repeatedly.
>>
>> We narrowed this down to the fact the ibmveth driver did not set gso_size 
>> which is translated by TCP into the MSS later up the stack. The MSS is 
>> used to calculate the TCP window size and as that was abnormally large, 
>> it was calculating a zero window, even although the sockets receive buffer 
>> was completely empty. 
>>
>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
>> and Marcelo for all your help and review on this.
>>
>> The patch fixes both our internal reproduction tests and our customers tests.
>>
>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>> ---
>>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> index 29c05d0..c51717e 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>  	int frames_processed = 0;
>>  	unsigned long lpar_rc;
>>  	struct iphdr *iph;
>> +	bool large_packet = 0;
>> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>  
>>  restart_poll:
>>  	while (frames_processed < budget) {
>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>  						iph->check = 0;
>>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>>  						adapter->rx_large_packets++;
>> +						large_packet = 1;
>>  					}
>>  				}
>>  			}
>>  
>> +			if (skb->len > netdev->mtu) {
>> +				iph = (struct iphdr *)skb->data;
>> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>> +				    iph->protocol == IPPROTO_TCP) {
>> +					hdr_len += sizeof(struct iphdr);
>> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>> +					   iph->protocol == IPPROTO_TCP) {
>> +					hdr_len += sizeof(struct ipv6hdr);
>> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> +				}
>> +				if (!large_packet)
>> +					adapter->rx_large_packets++;
>> +			}
>> +
>>  
> 
> This might break forwarding and PMTU discovery.
> 
> You force gso_size to device mtu, regardless of real MSS used by the TCP
> sender.
> 
> Don't you have the MSS provided in RX descriptor, instead of guessing
> the value ?

Eric,

We are currently pursuing making changes to the Power Virtual I/O Server to provide
the MSS to the ibmveth driver. However, this will take time to go through test
and ultimately get released. Although imperfect, this patch does help a real customer
hitting this issue right now. Would you object to this patch getting merged as is,
with the understanding that when we get the change in the Virtual I/O Server released,
we will revert this interim change and apply the new method?

Thanks,

Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
@ 2016-11-11 18:16     ` Brian King
  0 siblings, 0 replies; 23+ messages in thread
From: Brian King @ 2016-11-11 18:16 UTC (permalink / raw)
  To: Eric Dumazet, Jon Maxwell
  Cc: tlfalcon, tom, netdev, jmaxwell, linux-kernel, jarod, paulus,
	hofrat, mleitner, linuxppc-dev, davem

On 10/27/2016 10:26 AM, Eric Dumazet wrote:
> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>> We recently encountered a bug where a few customers using ibmveth on the 
>> same LPAR hit an issue where a TCP session hung when large receive was
>> enabled. Closer analysis revealed that the session was stuck because the 
>> one side was advertising a zero window repeatedly.
>>
>> We narrowed this down to the fact the ibmveth driver did not set gso_size 
>> which is translated by TCP into the MSS later up the stack. The MSS is 
>> used to calculate the TCP window size and as that was abnormally large, 
>> it was calculating a zero window, even although the sockets receive buffer 
>> was completely empty. 
>>
>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
>> and Marcelo for all your help and review on this.
>>
>> The patch fixes both our internal reproduction tests and our customers tests.
>>
>> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
>> ---
>>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
>> index 29c05d0..c51717e 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>  	int frames_processed = 0;
>>  	unsigned long lpar_rc;
>>  	struct iphdr *iph;
>> +	bool large_packet = 0;
>> +	u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>  
>>  restart_poll:
>>  	while (frames_processed < budget) {
>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
>>  						iph->check = 0;
>>  						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
>>  						adapter->rx_large_packets++;
>> +						large_packet = 1;
>>  					}
>>  				}
>>  			}
>>  
>> +			if (skb->len > netdev->mtu) {
>> +				iph = (struct iphdr *)skb->data;
>> +				if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>> +				    iph->protocol == IPPROTO_TCP) {
>> +					hdr_len += sizeof(struct iphdr);
>> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
>> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> +				} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
>> +					   iph->protocol == IPPROTO_TCP) {
>> +					hdr_len += sizeof(struct ipv6hdr);
>> +					skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>> +					skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len;
>> +				}
>> +				if (!large_packet)
>> +					adapter->rx_large_packets++;
>> +			}
>> +
>>  
> 
> This might break forwarding and PMTU discovery.
> 
> You force gso_size to device mtu, regardless of real MSS used by the TCP
> sender.
> 
> Don't you have the MSS provided in RX descriptor, instead of guessing
> the value ?

Eric,

We are currently pursuing making changes to the Power Virtual I/O Server to provide
the MSS to the ibmveth driver. However, this will take time to go through test
and ultimately get released. Although imperfect, this patch does help a real customer
hitting this issue right now. Would you object to this patch getting merged as is,
with the understanding that when we get the change in the Virtual I/O Server released,
we will revert this interim change and apply the new method?

Thanks,

Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] ibmveth: set correct gso_size and gso_type
  2016-10-26  0:09 [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type Jon Maxwell
  2016-10-27 14:44   ` Thomas Falcon
  2016-10-27 15:26   ` Eric Dumazet
@ 2016-12-08 22:40 ` Thomas Falcon
  2016-12-10  1:31   ` [PATCH net v2] " Thomas Falcon
  2016-12-10  3:48   ` [PATCH] " David Miller
  2 siblings, 2 replies; 23+ messages in thread
From: Thomas Falcon @ 2016-12-08 22:40 UTC (permalink / raw)
  To: netdev

This patch is based on an earlier one submitted
by Jon Maxwell with the following commit message:

"We recently encountered a bug where a few customers using ibmveth on the
same LPAR hit an issue where a TCP session hung when large receive was
enabled. Closer analysis revealed that the session was stuck because the
one side was advertising a zero window repeatedly.

We narrowed this down to the fact the ibmveth driver did not set gso_size
which is translated by TCP into the MSS later up the stack. The MSS is
used to calculate the TCP window size and as that was abnormally large,
it was calculating a zero window, even although the sockets receive buffer
was completely empty."

We rely on the Virtual I/O Server partition in a pseries
environment to provide the MSS through the TCP header checksum
field. The stipulation is that users should not disable checksum
offloading if rx packet aggregation is enabled through VIOS.

Some firmware offerings provide the MSS in the RX buffer.
This is signalled by a bit in the RX queue descriptor.

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Dai <zdai@us.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

---
 drivers/net/ethernet/ibm/ibmveth.c | 65 ++++++++++++++++++++++++++++++++++++--
 drivers/net/ethernet/ibm/ibmveth.h |  1 +
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index ebe6071..a36022b 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -58,7 +58,7 @@
 
 static const char ibmveth_driver_name[] = "ibmveth";
 static const char ibmveth_driver_string[] = "IBM Power Virtual Ethernet Driver";
-#define ibmveth_driver_version "1.05"
+#define ibmveth_driver_version "1.06"
 
 MODULE_AUTHOR("Santiago Leon <santil@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("IBM Power Virtual Ethernet Driver");
@@ -137,6 +137,11 @@ static inline int ibmveth_rxq_frame_offset(struct ibmveth_adapter *adapter)
 	return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_OFF_MASK;
 }
 
+static inline int ibmveth_rxq_large_packet(struct ibmveth_adapter *adapter)
+{
+	return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_LRG_PKT;
+}
+
 static inline int ibmveth_rxq_frame_length(struct ibmveth_adapter *adapter)
 {
 	return be32_to_cpu(adapter->rx_queue.queue_addr[adapter->rx_queue.index].length);
@@ -1174,6 +1179,45 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 	goto retry_bounce;
 }
 
+static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt)
+{
+	int offset = 0;
+
+	/* only TCP packets will be aggregated */
+	if (skb->protocol == htons(ETH_P_IP)) {
+		struct iphdr *iph = (struct iphdr *)skb->data;
+
+		if (iph->protocol == IPPROTO_TCP) {
+			offset = iph->ihl * 4;
+			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+		} else {
+			return;
+		}
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		struct ipv6hdr *iph6 = (struct ipv6hdr *)skb->data;
+
+		if (iph6->nexthdr == IPPROTO_TCP) {
+			offset = sizeof(struct ipv6hdr);
+			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+		} else {
+			return;
+		}
+	} else {
+		return;
+	}
+	/* if mss is not set through Large Packet bit/mss in rx buffer,
+	 * expect that the mss will be written to the tcp header checksum.
+	 */
+	if (lrg_pkt) {
+		skb_shinfo(skb)->gso_size = mss;
+	} else if (offset) {
+		struct tcphdr *tcph = (struct tcphdr *)(skb->data + offset);
+
+		skb_shinfo(skb)->gso_size = ntohs(tcph->check);
+		tcph->check = 0;
+	}
+}
+
 static int ibmveth_poll(struct napi_struct *napi, int budget)
 {
 	struct ibmveth_adapter *adapter =
@@ -1182,6 +1226,7 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 	int frames_processed = 0;
 	unsigned long lpar_rc;
 	struct iphdr *iph;
+	u16 mss = 0;
 
 restart_poll:
 	while (frames_processed < budget) {
@@ -1199,9 +1244,21 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 			int length = ibmveth_rxq_frame_length(adapter);
 			int offset = ibmveth_rxq_frame_offset(adapter);
 			int csum_good = ibmveth_rxq_csum_good(adapter);
+			int lrg_pkt = ibmveth_rxq_large_packet(adapter);
 
 			skb = ibmveth_rxq_get_buffer(adapter);
 
+			/* if the large packet bit is set in the rx queue
+			 * descriptor, the mss will be written by PHYP eight
+			 * bytes from the start of the rx buffer, which is
+			 * skb->data at this stage
+			 */
+			if (lrg_pkt) {
+				__be64 *rxmss = (__be64 *)(skb->data + 8);
+
+				mss = (u16)be64_to_cpu(*rxmss);
+			}
+
 			new_skb = NULL;
 			if (length < rx_copybreak)
 				new_skb = netdev_alloc_skb(netdev, length);
@@ -1235,11 +1292,15 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 					if (iph->check == 0xffff) {
 						iph->check = 0;
 						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
-						adapter->rx_large_packets++;
 					}
 				}
 			}
 
+			if (length > netdev->mtu + ETH_HLEN) {
+				ibmveth_rx_mss_helper(skb, mss, lrg_pkt);
+				adapter->rx_large_packets++;
+			}
+
 			napi_gro_receive(napi, skb);	/* send it up */
 
 			netdev->stats.rx_packets++;
diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h
index 4eade67..7acda04 100644
--- a/drivers/net/ethernet/ibm/ibmveth.h
+++ b/drivers/net/ethernet/ibm/ibmveth.h
@@ -209,6 +209,7 @@ struct ibmveth_rx_q_entry {
 #define IBMVETH_RXQ_TOGGLE		0x80000000
 #define IBMVETH_RXQ_TOGGLE_SHIFT	31
 #define IBMVETH_RXQ_VALID		0x40000000
+#define IBMVETH_RXQ_LRG_PKT		0x04000000
 #define IBMVETH_RXQ_NO_CSUM		0x02000000
 #define IBMVETH_RXQ_CSUM_GOOD		0x01000000
 #define IBMVETH_RXQ_OFF_MASK		0x0000FFFF
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net v2] ibmveth: set correct gso_size and gso_type
  2016-12-08 22:40 ` [PATCH] ibmveth: set correct gso_size and gso_type Thomas Falcon
@ 2016-12-10  1:31   ` Thomas Falcon
  2016-12-10  3:28     ` Eric Dumazet
  2016-12-10 18:39     ` [PATCH net v3] " Thomas Falcon
  2016-12-10  3:48   ` [PATCH] " David Miller
  1 sibling, 2 replies; 23+ messages in thread
From: Thomas Falcon @ 2016-12-10  1:31 UTC (permalink / raw)
  To: netdev; +Cc: brking, pradeeps, marcelo.leitner, jmaxwell37, zdai, eric.dumazet

This patch is based on an earlier one submitted
by Jon Maxwell with the following commit message:

"We recently encountered a bug where a few customers using ibmveth on the
same LPAR hit an issue where a TCP session hung when large receive was
enabled. Closer analysis revealed that the session was stuck because the
one side was advertising a zero window repeatedly.

We narrowed this down to the fact the ibmveth driver did not set gso_size
which is translated by TCP into the MSS later up the stack. The MSS is
used to calculate the TCP window size and as that was abnormally large,
it was calculating a zero window, even although the sockets receive buffer
was completely empty."

We rely on the Virtual I/O Server partition in a pseries
environment to provide the MSS through the TCP header checksum
field. The stipulation is that users should not disable checksum
offloading if rx packet aggregation is enabled through VIOS.

Some firmware offerings provide the MSS in the RX buffer.
This is signalled by a bit in the RX queue descriptor.

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Dai <zdai@us.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch
    and make sure everyone is included on CC
---
 drivers/net/ethernet/ibm/ibmveth.c | 72 ++++++++++++++++++++++++++++++++++++--
 drivers/net/ethernet/ibm/ibmveth.h |  1 +
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index ebe6071..f0c3ae7 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -58,7 +58,7 @@
 
 static const char ibmveth_driver_name[] = "ibmveth";
 static const char ibmveth_driver_string[] = "IBM Power Virtual Ethernet Driver";
-#define ibmveth_driver_version "1.05"
+#define ibmveth_driver_version "1.06"
 
 MODULE_AUTHOR("Santiago Leon <santil@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("IBM Power Virtual Ethernet Driver");
@@ -137,6 +137,11 @@ static inline int ibmveth_rxq_frame_offset(struct ibmveth_adapter *adapter)
 	return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_OFF_MASK;
 }
 
+static inline int ibmveth_rxq_large_packet(struct ibmveth_adapter *adapter)
+{
+	return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_LRG_PKT;
+}
+
 static inline int ibmveth_rxq_frame_length(struct ibmveth_adapter *adapter)
 {
 	return be32_to_cpu(adapter->rx_queue.queue_addr[adapter->rx_queue.index].length);
@@ -1174,6 +1179,52 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 	goto retry_bounce;
 }
 
+static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt)
+{
+	struct tcphdr *tcph;
+	int offset = 0;
+	int hdr_len;
+
+	/* only TCP packets will be aggregated */
+	if (skb->protocol == htons(ETH_P_IP)) {
+		struct iphdr *iph = (struct iphdr *)skb->data;
+
+		if (iph->protocol == IPPROTO_TCP) {
+			offset = iph->ihl * 4;
+			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+		} else {
+			return;
+		}
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		struct ipv6hdr *iph6 = (struct ipv6hdr *)skb->data;
+
+		if (iph6->nexthdr == IPPROTO_TCP) {
+			offset = sizeof(struct ipv6hdr);
+			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+		} else {
+			return;
+		}
+	} else {
+		return;
+	}
+	/* if mss is not set through Large Packet bit/mss in rx buffer,
+	 * expect that the mss will be written to the tcp header checksum.
+	 */
+	tcph = (struct tcphdr *)(skb->data + offset);
+	hdr_len = offset + tcph->doff * 4;
+	if (lrg_pkt) {
+		skb_shinfo(skb)->gso_size = mss;
+		skb_shinfo(skb)->gso_segs =
+					DIV_ROUND_UP(skb->len - hdr_len, mss);
+	} else if (offset) {
+		skb_shinfo(skb)->gso_size = ntohs(tcph->check);
+		skb_shinfo(skb)->gso_segs =
+				DIV_ROUND_UP(skb->len - hdr_len,
+					     skb_shinfo(skb)->gso_size);
+		tcph->check = 0;
+	}
+}
+
 static int ibmveth_poll(struct napi_struct *napi, int budget)
 {
 	struct ibmveth_adapter *adapter =
@@ -1182,6 +1233,7 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 	int frames_processed = 0;
 	unsigned long lpar_rc;
 	struct iphdr *iph;
+	u16 mss = 0;
 
 restart_poll:
 	while (frames_processed < budget) {
@@ -1199,9 +1251,21 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 			int length = ibmveth_rxq_frame_length(adapter);
 			int offset = ibmveth_rxq_frame_offset(adapter);
 			int csum_good = ibmveth_rxq_csum_good(adapter);
+			int lrg_pkt = ibmveth_rxq_large_packet(adapter);
 
 			skb = ibmveth_rxq_get_buffer(adapter);
 
+			/* if the large packet bit is set in the rx queue
+			 * descriptor, the mss will be written by PHYP eight
+			 * bytes from the start of the rx buffer, which is
+			 * skb->data at this stage
+			 */
+			if (lrg_pkt) {
+				__be64 *rxmss = (__be64 *)(skb->data + 8);
+
+				mss = (u16)be64_to_cpu(*rxmss);
+			}
+
 			new_skb = NULL;
 			if (length < rx_copybreak)
 				new_skb = netdev_alloc_skb(netdev, length);
@@ -1235,11 +1299,15 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 					if (iph->check == 0xffff) {
 						iph->check = 0;
 						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
-						adapter->rx_large_packets++;
 					}
 				}
 			}
 
+			if (length > netdev->mtu + ETH_HLEN) {
+				ibmveth_rx_mss_helper(skb, mss, lrg_pkt);
+				adapter->rx_large_packets++;
+			}
+
 			napi_gro_receive(napi, skb);	/* send it up */
 
 			netdev->stats.rx_packets++;
diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h
index 4eade67..7acda04 100644
--- a/drivers/net/ethernet/ibm/ibmveth.h
+++ b/drivers/net/ethernet/ibm/ibmveth.h
@@ -209,6 +209,7 @@ struct ibmveth_rx_q_entry {
 #define IBMVETH_RXQ_TOGGLE		0x80000000
 #define IBMVETH_RXQ_TOGGLE_SHIFT	31
 #define IBMVETH_RXQ_VALID		0x40000000
+#define IBMVETH_RXQ_LRG_PKT		0x04000000
 #define IBMVETH_RXQ_NO_CSUM		0x02000000
 #define IBMVETH_RXQ_CSUM_GOOD		0x01000000
 #define IBMVETH_RXQ_OFF_MASK		0x0000FFFF
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2] ibmveth: set correct gso_size and gso_type
  2016-12-10  1:31   ` [PATCH net v2] " Thomas Falcon
@ 2016-12-10  3:28     ` Eric Dumazet
  2016-12-10 18:39     ` [PATCH net v3] " Thomas Falcon
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2016-12-10  3:28 UTC (permalink / raw)
  To: Thomas Falcon; +Cc: netdev, brking, pradeeps, marcelo.leitner, jmaxwell37, zdai

On Fri, 2016-12-09 at 19:31 -0600, Thomas Falcon wrote:
> This patch is based on an earlier one submitted
> by Jon Maxwell with the following commit message:
> 

> +					DIV_ROUND_UP(skb->len - hdr_len, mss);
> +	} else if (offset) {
> +		skb_shinfo(skb)->gso_size = ntohs(tcph->check);
> +		skb_shinfo(skb)->gso_segs =
> +				DIV_ROUND_UP(skb->len - hdr_len,
> +					     skb_shinfo(skb)->gso_size);
> +		tcph->check = 0;
> +	}

Are you sure that tcph->check could never be 0 on some cases ?

That would crash on a divide by 0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] ibmveth: set correct gso_size and gso_type
  2016-12-08 22:40 ` [PATCH] ibmveth: set correct gso_size and gso_type Thomas Falcon
  2016-12-10  1:31   ` [PATCH net v2] " Thomas Falcon
@ 2016-12-10  3:48   ` David Miller
  1 sibling, 0 replies; 23+ messages in thread
From: David Miller @ 2016-12-10  3:48 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Thu,  8 Dec 2016 16:40:03 -0600

> This patch is based on an earlier one submitted
> by Jon Maxwell with the following commit message:
> 
> "We recently encountered a bug where a few customers using ibmveth on the
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the
> one side was advertising a zero window repeatedly.
> 
> We narrowed this down to the fact the ibmveth driver did not set gso_size
> which is translated by TCP into the MSS later up the stack. The MSS is
> used to calculate the TCP window size and as that was abnormally large,
> it was calculating a zero window, even although the sockets receive buffer
> was completely empty."
> 
> We rely on the Virtual I/O Server partition in a pseries
> environment to provide the MSS through the TCP header checksum
> field. The stipulation is that users should not disable checksum
> offloading if rx packet aggregation is enabled through VIOS.
> 
> Some firmware offerings provide the MSS in the RX buffer.
> This is signalled by a bit in the RX queue descriptor.
> 
> Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
> Reviewed-by: David Dai <zdai@us.ibm.com>
> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

Applied, although mis-using the TCP checksum field for this is kind of
bogus.  I'm surprised there wasn't some other place you could stick
this value, which wouldn't modify the packet contents.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net v3] ibmveth: set correct gso_size and gso_type
  2016-12-10  1:31   ` [PATCH net v2] " Thomas Falcon
  2016-12-10  3:28     ` Eric Dumazet
@ 2016-12-10 18:39     ` Thomas Falcon
  2016-12-10 20:56       ` David Miller
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Falcon @ 2016-12-10 18:39 UTC (permalink / raw)
  To: netdev; +Cc: brking, marcelo.leitner, pradeeps, jmaxwell37, zdai, eric.dumazet

This patch is based on an earlier one submitted
by Jon Maxwell with the following commit message:

"We recently encountered a bug where a few customers using ibmveth on the
same LPAR hit an issue where a TCP session hung when large receive was
enabled. Closer analysis revealed that the session was stuck because the
one side was advertising a zero window repeatedly.

We narrowed this down to the fact the ibmveth driver did not set gso_size
which is translated by TCP into the MSS later up the stack. The MSS is
used to calculate the TCP window size and as that was abnormally large,
it was calculating a zero window, even although the sockets receive buffer
was completely empty."

We rely on the Virtual I/O Server partition in a pseries
environment to provide the MSS through the TCP header checksum
field. The stipulation is that users should not disable checksum
offloading if rx packet aggregation is enabled through VIOS.

Some firmware offerings provide the MSS in the RX buffer.
This is signalled by a bit in the RX queue descriptor.

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Dai <zdai@us.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
v3: include a check for non-zero mss when calculating gso_segs

v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch
    and make sure everyone is included on CC
---
 drivers/net/ethernet/ibm/ibmveth.c | 72 ++++++++++++++++++++++++++++++++++++--
 drivers/net/ethernet/ibm/ibmveth.h |  1 +
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index ebe6071..6dc24a1 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -58,7 +58,7 @@
 
 static const char ibmveth_driver_name[] = "ibmveth";
 static const char ibmveth_driver_string[] = "IBM Power Virtual Ethernet Driver";
-#define ibmveth_driver_version "1.05"
+#define ibmveth_driver_version "1.06"
 
 MODULE_AUTHOR("Santiago Leon <santil@linux.vnet.ibm.com>");
 MODULE_DESCRIPTION("IBM Power Virtual Ethernet Driver");
@@ -137,6 +137,11 @@ static inline int ibmveth_rxq_frame_offset(struct ibmveth_adapter *adapter)
 	return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_OFF_MASK;
 }
 
+static inline int ibmveth_rxq_large_packet(struct ibmveth_adapter *adapter)
+{
+	return ibmveth_rxq_flags(adapter) & IBMVETH_RXQ_LRG_PKT;
+}
+
 static inline int ibmveth_rxq_frame_length(struct ibmveth_adapter *adapter)
 {
 	return be32_to_cpu(adapter->rx_queue.queue_addr[adapter->rx_queue.index].length);
@@ -1174,6 +1179,52 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 	goto retry_bounce;
 }
 
+static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt)
+{
+	struct tcphdr *tcph;
+	int offset = 0;
+	int hdr_len;
+
+	/* only TCP packets will be aggregated */
+	if (skb->protocol == htons(ETH_P_IP)) {
+		struct iphdr *iph = (struct iphdr *)skb->data;
+
+		if (iph->protocol == IPPROTO_TCP) {
+			offset = iph->ihl * 4;
+			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+		} else {
+			return;
+		}
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		struct ipv6hdr *iph6 = (struct ipv6hdr *)skb->data;
+
+		if (iph6->nexthdr == IPPROTO_TCP) {
+			offset = sizeof(struct ipv6hdr);
+			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+		} else {
+			return;
+		}
+	} else {
+		return;
+	}
+	/* if mss is not set through Large Packet bit/mss in rx buffer,
+	 * expect that the mss will be written to the tcp header checksum.
+	 */
+	tcph = (struct tcphdr *)(skb->data + offset);
+	hdr_len = offset + tcph->doff * 4;
+	if (lrg_pkt) {
+		skb_shinfo(skb)->gso_size = mss;
+	} else if (offset) {
+		skb_shinfo(skb)->gso_size = ntohs(tcph->check);
+		tcph->check = 0;
+	}
+
+	if (skb_shinfo(skb)->gso_size)
+		skb_shinfo(skb)->gso_segs =
+				DIV_ROUND_UP(skb->len - hdr_len,
+					     skb_shinfo(skb)->gso_size);
+}
+
 static int ibmveth_poll(struct napi_struct *napi, int budget)
 {
 	struct ibmveth_adapter *adapter =
@@ -1182,6 +1233,7 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 	int frames_processed = 0;
 	unsigned long lpar_rc;
 	struct iphdr *iph;
+	u16 mss = 0;
 
 restart_poll:
 	while (frames_processed < budget) {
@@ -1199,9 +1251,21 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 			int length = ibmveth_rxq_frame_length(adapter);
 			int offset = ibmveth_rxq_frame_offset(adapter);
 			int csum_good = ibmveth_rxq_csum_good(adapter);
+			int lrg_pkt = ibmveth_rxq_large_packet(adapter);
 
 			skb = ibmveth_rxq_get_buffer(adapter);
 
+			/* if the large packet bit is set in the rx queue
+			 * descriptor, the mss will be written by PHYP eight
+			 * bytes from the start of the rx buffer, which is
+			 * skb->data at this stage
+			 */
+			if (lrg_pkt) {
+				__be64 *rxmss = (__be64 *)(skb->data + 8);
+
+				mss = (u16)be64_to_cpu(*rxmss);
+			}
+
 			new_skb = NULL;
 			if (length < rx_copybreak)
 				new_skb = netdev_alloc_skb(netdev, length);
@@ -1235,11 +1299,15 @@ static int ibmveth_poll(struct napi_struct *napi, int budget)
 					if (iph->check == 0xffff) {
 						iph->check = 0;
 						iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
-						adapter->rx_large_packets++;
 					}
 				}
 			}
 
+			if (length > netdev->mtu + ETH_HLEN) {
+				ibmveth_rx_mss_helper(skb, mss, lrg_pkt);
+				adapter->rx_large_packets++;
+			}
+
 			napi_gro_receive(napi, skb);	/* send it up */
 
 			netdev->stats.rx_packets++;
diff --git a/drivers/net/ethernet/ibm/ibmveth.h b/drivers/net/ethernet/ibm/ibmveth.h
index 4eade67..7acda04 100644
--- a/drivers/net/ethernet/ibm/ibmveth.h
+++ b/drivers/net/ethernet/ibm/ibmveth.h
@@ -209,6 +209,7 @@ struct ibmveth_rx_q_entry {
 #define IBMVETH_RXQ_TOGGLE		0x80000000
 #define IBMVETH_RXQ_TOGGLE_SHIFT	31
 #define IBMVETH_RXQ_VALID		0x40000000
+#define IBMVETH_RXQ_LRG_PKT		0x04000000
 #define IBMVETH_RXQ_NO_CSUM		0x02000000
 #define IBMVETH_RXQ_CSUM_GOOD		0x01000000
 #define IBMVETH_RXQ_OFF_MASK		0x0000FFFF
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net v3] ibmveth: set correct gso_size and gso_type
  2016-12-10 18:39     ` [PATCH net v3] " Thomas Falcon
@ 2016-12-10 20:56       ` David Miller
  2016-12-13 15:49         ` Thomas Falcon
  0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2016-12-10 20:56 UTC (permalink / raw)
  To: tlfalcon
  Cc: netdev, brking, marcelo.leitner, pradeeps, jmaxwell37, zdai,
	eric.dumazet

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Sat, 10 Dec 2016 12:39:48 -0600

> v3: include a check for non-zero mss when calculating gso_segs
> 
> v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch
>     and make sure everyone is included on CC

I already applied v1 which made it all the way even to Linus's
tree.  So you'll have to send me relative fixups if there are
things to fix or change since v1.

You must always generate patches against the current 'net' tree.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v3] ibmveth: set correct gso_size and gso_type
  2016-12-10 20:56       ` David Miller
@ 2016-12-13 15:49         ` Thomas Falcon
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Falcon @ 2016-12-13 15:49 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, brking, marcelo.leitner, pradeeps, jmaxwell37, zdai,
	eric.dumazet

On 12/10/2016 02:56 PM, David Miller wrote:
> From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
> Date: Sat, 10 Dec 2016 12:39:48 -0600
>
>> v3: include a check for non-zero mss when calculating gso_segs
>>
>> v2: calculate gso_segs after Eric Dumazet's comments on the earlier patch
>>     and make sure everyone is included on CC
> I already applied v1 which made it all the way even to Linus's
> tree.  So you'll have to send me relative fixups if there are
> things to fix or change since v1.
>
> You must always generate patches against the current 'net' tree.
>
Sorry about that.  Thank you for applying it on short notice.  I agree that using the TCP checksum field is not ideal, but it was a compromise with the VIOS team.  Hopefully, there will be a better way in the future.

Thanks again,

Tom 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-12-13 15:49 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-26  0:09 [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type Jon Maxwell
2016-10-27 14:44 ` Thomas Falcon
2016-10-27 14:44   ` Thomas Falcon
2016-10-27 15:26 ` Eric Dumazet
2016-10-27 15:26   ` Eric Dumazet
2016-10-27 15:26   ` Eric Dumazet
2016-10-27 17:54   ` Thomas Falcon
2016-10-27 18:08     ` Eric Dumazet
2016-10-27 18:08       ` Eric Dumazet
2016-10-30  0:21       ` Jonathan Maxwell
2016-11-02 21:40   ` Brian King
2016-11-06 21:22     ` Jonathan Maxwell
2016-11-09 16:02       ` Brian King
2016-11-09 16:02         ` Brian King
2016-11-11 18:16   ` Brian King
2016-11-11 18:16     ` Brian King
2016-12-08 22:40 ` [PATCH] ibmveth: set correct gso_size and gso_type Thomas Falcon
2016-12-10  1:31   ` [PATCH net v2] " Thomas Falcon
2016-12-10  3:28     ` Eric Dumazet
2016-12-10 18:39     ` [PATCH net v3] " Thomas Falcon
2016-12-10 20:56       ` David Miller
2016-12-13 15:49         ` Thomas Falcon
2016-12-10  3:48   ` [PATCH] " David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.