netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down
@ 2013-05-13 21:04 Nithin Nayak Sujir
  2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Nithin Nayak Sujir @ 2013-05-13 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, Nithin Nayak Sujir

v2:
 - Modify tg3_phy_power_bug() function to use a switch instead of a
   complicated if statement. Suggested by Joe Perches.

Michael Chan (1):
  tg3: Fix data corruption on 5725 with TSO

Nithin Nayak Sujir (1):
  tg3: Skip powering down function 0 on certain serdes devices

 drivers/net/ethernet/broadcom/tg3.c | 49 ++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 6 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices
  2013-05-13 21:04 [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down Nithin Nayak Sujir
@ 2013-05-13 21:04 ` Nithin Nayak Sujir
  2013-05-14 18:08   ` Joe Perches
  2013-05-13 21:04 ` [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO Nithin Nayak Sujir
  2013-05-14 18:32 ` [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down David Miller
  2 siblings, 1 reply; 20+ messages in thread
From: Nithin Nayak Sujir @ 2013-05-13 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, Nithin Nayak Sujir, stable, Michael Chan

On the 5718, 5719 and 5720 serdes devices, powering down function 0
results in all the other ports being powered down. Add code to skip
function 0 power down.

v2:
 - Modify tg3_phy_power_bug() function to use a switch instead of a
   complicated if statement. Suggested by Joe Perches.

Cc: <stable@vger.kernel.org>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 728d42a..781be76 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -2957,6 +2957,31 @@ static int tg3_5700_link_polarity(struct tg3 *tp, u32 speed)
 	return 0;
 }
 
+static bool tg3_phy_power_bug(struct tg3 *tp)
+{
+	switch (tg3_asic_rev(tp)) {
+	case ASIC_REV_5700:
+	case ASIC_REV_5704:
+		return true;
+	case ASIC_REV_5780:
+		if (tp->phy_flags & TG3_PHYFLG_MII_SERDES)
+			return true;
+		return false;
+	case ASIC_REV_5717:
+		if (!tp->pci_fn)
+			return true;
+		return false;
+	case ASIC_REV_5719:
+	case ASIC_REV_5720:
+		if ((tp->phy_flags & TG3_PHYFLG_PHY_SERDES) &&
+		    !tp->pci_fn)
+			return true;
+		return false;
+	}
+
+	return false;
+}
+
 static void tg3_power_down_phy(struct tg3 *tp, bool do_low_power)
 {
 	u32 val;
@@ -3016,12 +3041,7 @@ static void tg3_power_down_phy(struct tg3 *tp, bool do_low_power)
 	/* The PHY should not be powered down on some chips because
 	 * of bugs.
 	 */
-	if (tg3_asic_rev(tp) == ASIC_REV_5700 ||
-	    tg3_asic_rev(tp) == ASIC_REV_5704 ||
-	    (tg3_asic_rev(tp) == ASIC_REV_5780 &&
-	     (tp->phy_flags & TG3_PHYFLG_MII_SERDES)) ||
-	    (tg3_asic_rev(tp) == ASIC_REV_5717 &&
-	     !tp->pci_fn))
+	if (tg3_phy_power_bug(tp))
 		return;
 
 	if (tg3_chip_rev(tp) == CHIPREV_5784_AX ||
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-13 21:04 [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down Nithin Nayak Sujir
  2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir
@ 2013-05-13 21:04 ` Nithin Nayak Sujir
  2013-05-13 21:14   ` Eric Dumazet
  2013-05-14 18:32 ` [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down David Miller
  2 siblings, 1 reply; 20+ messages in thread
From: Nithin Nayak Sujir @ 2013-05-13 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, Michael Chan, stable, Nithin Nayak Sujir

From: Michael Chan <mchan@broadcom.com>

The 5725 family of devices (asic rev 5762), corrupts TSO packets where
the buffer is within MSS bytes of a 4G boundary (4G, 8G etc.). Detect
this condition and trigger the workaround path.

Cc: <stable@vger.kernel.org>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 781be76..e285d76 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7448,6 +7448,20 @@ static inline int tg3_4g_overflow_test(dma_addr_t mapping, int len)
 	return (base > 0xffffdcc0) && (base + len + 8 < base);
 }
 
+/* Test for TSO DMA buffers that cross into regions which are within MSS bytes
+ * of any 4GB boundaries: 4G, 8G, etc
+ */
+static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping,
+					   u32 len, u32 mss)
+{
+	if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
+		u32 base = (u32) mapping & 0xffffffff;
+
+		return ((base + len + (mss & 0x3fff)) < base);
+	}
+	return 0;
+}
+
 /* Test for DMA addresses > 40-bit */
 static inline int tg3_40bit_overflow_test(struct tg3 *tp, dma_addr_t mapping,
 					  int len)
@@ -7484,6 +7498,9 @@ static bool tg3_tx_frag_set(struct tg3_napi *tnapi, u32 *entry, u32 *budget,
 	if (tg3_4g_overflow_test(map, len))
 		hwbug = true;
 
+	if (tg3_4g_tso_overflow_test(tp, map, len, mss))
+		hwbug = true;
+
 	if (tg3_40bit_overflow_test(tp, map, len))
 		hwbug = true;
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-13 21:04 ` [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO Nithin Nayak Sujir
@ 2013-05-13 21:14   ` Eric Dumazet
  2013-05-13 21:34     ` Nithin Nayak Sujir
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-05-13 21:14 UTC (permalink / raw)
  To: Nithin Nayak Sujir; +Cc: davem, netdev, Michael Chan, stable

On Mon, 2013-05-13 at 14:04 -0700, Nithin Nayak Sujir wrote:
> From: Michael Chan <mchan@broadcom.com>
> 
> The 5725 family of devices (asic rev 5762), corrupts TSO packets where
> the buffer is within MSS bytes of a 4G boundary (4G, 8G etc.). Detect
> this condition and trigger the workaround path.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/tg3.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index 781be76..e285d76 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -7448,6 +7448,20 @@ static inline int tg3_4g_overflow_test(dma_addr_t mapping, int len)
>  	return (base > 0xffffdcc0) && (base + len + 8 < base);
>  }
>  
> +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes
> + * of any 4GB boundaries: 4G, 8G, etc
> + */
> +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping,
> +					   u32 len, u32 mss)
> +{
> +	if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
> +		u32 base = (u32) mapping & 0xffffffff;
> +
> +		return ((base + len + (mss & 0x3fff)) < base);
> +	}
> +	return 0;
> +}
> +

I am curious : Does this condition even triggers ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-13 21:14   ` Eric Dumazet
@ 2013-05-13 21:34     ` Nithin Nayak Sujir
  2013-05-13 21:40       ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Nithin Nayak Sujir @ 2013-05-13 21:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, Michael Chan, stable



On 05/13/2013 02:14 PM, Eric Dumazet wrote:
> On Mon, 2013-05-13 at 14:04 -0700, Nithin Nayak Sujir wrote:
>> From: Michael Chan <mchan@broadcom.com>
>>
>> The 5725 family of devices (asic rev 5762), corrupts TSO packets where
>> the buffer is within MSS bytes of a 4G boundary (4G, 8G etc.). Detect
>> this condition and trigger the workaround path.
>>
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Michael Chan <mchan@broadcom.com>
>> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
>> ---
>>   drivers/net/ethernet/broadcom/tg3.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
>> index 781be76..e285d76 100644
>> --- a/drivers/net/ethernet/broadcom/tg3.c
>> +++ b/drivers/net/ethernet/broadcom/tg3.c
>> @@ -7448,6 +7448,20 @@ static inline int tg3_4g_overflow_test(dma_addr_t mapping, int len)
>>   	return (base > 0xffffdcc0) && (base + len + 8 < base);
>>   }
>>
>> +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes
>> + * of any 4GB boundaries: 4G, 8G, etc
>> + */
>> +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping,
>> +					   u32 len, u32 mss)
>> +{
>> +	if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
>> +		u32 base = (u32) mapping & 0xffffffff;
>> +
>> +		return ((base + len + (mss & 0x3fff)) < base);
>> +	}
>> +	return 0;
>> +}
>> +
>
> I am curious : Does this condition even triggers ?
>

Yes, it's a rare problem to occur and was reported in our lab. After we 
implemented this fix, the problem didn't happen again.


>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-13 21:34     ` Nithin Nayak Sujir
@ 2013-05-13 21:40       ` Eric Dumazet
  2013-05-13 21:47         ` Nithin Nayak Sujir
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-05-13 21:40 UTC (permalink / raw)
  To: Nithin Nayak Sujir; +Cc: davem, netdev, Michael Chan, stable

On Mon, 2013-05-13 at 14:34 -0700, Nithin Nayak Sujir wrote:
> 
> On 05/13/2013 02:14 PM, Eric Dumazet wrote:

> >> +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes
> >> + * of any 4GB boundaries: 4G, 8G, etc
> >> + */
> >> +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping,
> >> +					   u32 len, u32 mss)
> >> +{
> >> +	if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
> >> +		u32 base = (u32) mapping & 0xffffffff;
> >> +
> >> +		return ((base + len + (mss & 0x3fff)) < base);
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >
> > I am curious : Does this condition even triggers ?
> >
> 
> Yes, it's a rare problem to occur and was reported in our lab. After we 
> implemented this fix, the problem didn't happen again.
> 

I just cant figure out which part of the kernel could allocate a
fragment spanning a 4G region.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-13 21:40       ` Eric Dumazet
@ 2013-05-13 21:47         ` Nithin Nayak Sujir
  2013-05-13 22:10           ` Eric Dumazet
  2013-05-14  8:40           ` David Laight
  0 siblings, 2 replies; 20+ messages in thread
From: Nithin Nayak Sujir @ 2013-05-13 21:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, Michael Chan, stable



On 05/13/2013 02:40 PM, Eric Dumazet wrote:
> On Mon, 2013-05-13 at 14:34 -0700, Nithin Nayak Sujir wrote:
>>
>> On 05/13/2013 02:14 PM, Eric Dumazet wrote:
>
>>>> +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes
>>>> + * of any 4GB boundaries: 4G, 8G, etc
>>>> + */
>>>> +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping,
>>>> +					   u32 len, u32 mss)
>>>> +{
>>>> +	if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
>>>> +		u32 base = (u32) mapping & 0xffffffff;
>>>> +
>>>> +		return ((base + len + (mss & 0x3fff)) < base);
>>>> +	}
>>>> +	return 0;
>>>> +}
>>>> +
>>>
>>> I am curious : Does this condition even triggers ?
>>>
>>
>> Yes, it's a rare problem to occur and was reported in our lab. After we
>> implemented this fix, the problem didn't happen again.
>>
>
> I just cant figure out which part of the kernel could allocate a
> fragment spanning a 4G region.
>

For the bug to occur, the fragment does not have to span a 4G boundary. If it is 
within MSS bytes (9.6k) of a 4G boundary, it triggers the failure.


>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-13 21:47         ` Nithin Nayak Sujir
@ 2013-05-13 22:10           ` Eric Dumazet
  2013-05-14  8:40           ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2013-05-13 22:10 UTC (permalink / raw)
  To: Nithin Nayak Sujir; +Cc: davem, netdev, Michael Chan, stable

On Mon, 2013-05-13 at 14:47 -0700, Nithin Nayak Sujir wrote:

> For the bug to occur, the fragment does not have to span a 4G boundary. If it is 
> within MSS bytes (9.6k) of a 4G boundary, it triggers the failure.
> 

Ah, that indeed can happen.

Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-13 21:47         ` Nithin Nayak Sujir
  2013-05-13 22:10           ` Eric Dumazet
@ 2013-05-14  8:40           ` David Laight
  2013-05-14 15:04             ` Michael Chan
  1 sibling, 1 reply; 20+ messages in thread
From: David Laight @ 2013-05-14  8:40 UTC (permalink / raw)
  To: Nithin Nayak Sujir, Eric Dumazet; +Cc: davem, netdev, Michael Chan, stable

> >>>> +	if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
> >>>> +		u32 base = (u32) mapping & 0xffffffff;
> >>>> +
> >>>> +		return ((base + len + (mss & 0x3fff)) < base);
...
> For the bug to occur, the fragment does not have to span a 4G boundary. If it is
> within MSS bytes (9.6k) of a 4G boundary, it triggers the failure.

Would it be worth simplifying the test to assume that 'len'
is 64k and 'mss' 9.6k?
(commenting on the actual condition.)
The number of false positives would be small, but the test
a lot quicker.
The '(u32)mapping + (0x10000 + 9600) < (u32)mapping' test might
even be faster than the ' tg3_asic_rev(tp) == ASIC_REV_5762' one.

	David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-14  8:40           ` David Laight
@ 2013-05-14 15:04             ` Michael Chan
  2013-05-14 15:20               ` David Laight
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Chan @ 2013-05-14 15:04 UTC (permalink / raw)
  To: David Laight; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable

On Tue, 2013-05-14 at 09:40 +0100, David Laight wrote:
> > >>>> +        if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
> > >>>> +                u32 base = (u32) mapping & 0xffffffff;
> > >>>> +
> > >>>> +                return ((base + len + (mss & 0x3fff)) < base);
> ... 
> > For the bug to occur, the fragment does not have to span a 4G boundary. If it is
> > within MSS bytes (9.6k) of a 4G boundary, it triggers the failure.
> 
> Would it be worth simplifying the test to assume that 'len'
> is 64k and 'mss' 9.6k?
> (commenting on the actual condition.)
> The number of false positives would be small, but the test
> a lot quicker.
> The '(u32)mapping + (0x10000 + 9600) < (u32)mapping' test might
> even be faster than the ' tg3_asic_rev(tp) == ASIC_REV_5762' one. 

I think that if we do this and detect a false positive, it may be very
far from the 4G boundary.  The new skb that we allocate to workaround
the condition may be even closer to 4G and may hit the real bug
condition.

The mss and len values are accessed many times in this immediate code
path just before setting the TX BD, gcc should be able to optimize this
quite nicely.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-14 15:04             ` Michael Chan
@ 2013-05-14 15:20               ` David Laight
  2013-05-14 16:19                 ` Michael Chan
  0 siblings, 1 reply; 20+ messages in thread
From: David Laight @ 2013-05-14 15:20 UTC (permalink / raw)
  To: Michael Chan; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable

> On Tue, 2013-05-14 at 09:40 +0100, David Laight wrote:
> > > >>>> +        if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
> > > >>>> +                u32 base = (u32) mapping & 0xffffffff;
> > > >>>> +
> > > >>>> +                return ((base + len + (mss & 0x3fff)) < base);
> > ...
> > > For the bug to occur, the fragment does not have to span a 4G boundary. If it is
> > > within MSS bytes (9.6k) of a 4G boundary, it triggers the failure.
> >
> > Would it be worth simplifying the test to assume that 'len'
> > is 64k and 'mss' 9.6k?
> > (commenting on the actual condition.)
> > The number of false positives would be small, but the test
> > a lot quicker.
> > The '(u32)mapping + (0x10000 + 9600) < (u32)mapping' test might
> > even be faster than the ' tg3_asic_rev(tp) == ASIC_REV_5762' one.
> 
> I think that if we do this and detect a false positive, it may be very
> far from the 4G boundary.

It can't be very far away, approx 1 in 65k checks would fail.
You could do the finer test afterwards.

> The new skb that we allocate to workaround the condition may be
> even closer to 4G and may hit the real bug condition.

If the 'fix' is to relocate the skb you are doomed to lose regardless
of the check - unless you are willing to reallocate a lot of times,
and without freeing the old skb.
I'd assumed the 'fix' was to disable the relevant offload.

> The mss and len values are accessed many times in this immediate code
> path just before setting the TX BD, gcc should be able to optimize this
> quite nicely.

I was looking at the number of branches in the hot path, not whether
the values were already in registers.

	David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-14 15:20               ` David Laight
@ 2013-05-14 16:19                 ` Michael Chan
  2013-05-14 16:46                   ` Eric Dumazet
  2013-05-15  8:56                   ` David Laight
  0 siblings, 2 replies; 20+ messages in thread
From: Michael Chan @ 2013-05-14 16:19 UTC (permalink / raw)
  To: David Laight; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable

On Tue, 2013-05-14 at 16:20 +0100, David Laight wrote: 
> > On Tue, 2013-05-14 at 09:40 +0100, David Laight wrote:
> > > > >>>> +        if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) {
> > > > >>>> +                u32 base = (u32) mapping & 0xffffffff;
> > > > >>>> +
> > > > >>>> +                return ((base + len + (mss & 0x3fff)) < base);
> > > ...
> > > > For the bug to occur, the fragment does not have to span a 4G boundary. If it is
> > > > within MSS bytes (9.6k) of a 4G boundary, it triggers the failure.
> > >
> > > Would it be worth simplifying the test to assume that 'len'
> > > is 64k and 'mss' 9.6k?
> > > (commenting on the actual condition.)
> > > The number of false positives would be small, but the test
> > > a lot quicker.
> > > The '(u32)mapping + (0x10000 + 9600) < (u32)mapping' test might
> > > even be faster than the ' tg3_asic_rev(tp) == ASIC_REV_5762' one.
> > 
> > I think that if we do this and detect a false positive, it may be very
> > far from the 4G boundary.
> 
> It can't be very far away, approx 1 in 65k checks would fail.
> You could do the finer test afterwards.

If we do a 2nd level test, it will be ok.  But I'm not sure if it is
worth the complexity.

> 
> > The new skb that we allocate to workaround the condition may be
> > even closer to 4G and may hit the real bug condition.
> 
> If the 'fix' is to relocate the skb you are doomed to lose regardless
> of the check - unless you are willing to reallocate a lot of times,
> and without freeing the old skb.
> I'd assumed the 'fix' was to disable the relevant offload.

We relocate once and then drop the packet if we encounter additional
errors, including OOM, DMA mapping error, 4G boundary, etc.  The new
linear skb should not hit the 4G boundary again.  The room between the
end of this current buffer and 4G isn't big enough for the new linear
skb.

> 
> > The mss and len values are accessed many times in this immediate code
> > path just before setting the TX BD, gcc should be able to optimize this
> > quite nicely.
> 
> I was looking at the number of branches in the hot path, not whether
> the values were already in registers.
> 

Isn't the number of branches the same whether we use actual values in
registers or fixed values?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-14 16:19                 ` Michael Chan
@ 2013-05-14 16:46                   ` Eric Dumazet
  2013-05-15  8:56                   ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2013-05-14 16:46 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Laight, Nithin Nayak Sujir, davem, netdev, stable

On Tue, 2013-05-14 at 09:19 -0700, Michael Chan wrote:

> We relocate once and then drop the packet if we encounter additional
> errors, including OOM, DMA mapping error, 4G boundary, etc.  The new
> linear skb should not hit the 4G boundary again.  The room between the
> end of this current buffer and 4G isn't big enough for the new linear
> skb.

This remind me an issue on bnx2x :

bnx2x FW has a limitation on GSO packets :
    
A single mss can not span more than 10 fragments.

After "net: use a per task frag allocator" patch, its possible
for an application interleaving small write() on several sockets
to build pathological skbs using 16 fragments (aka MAX_SKB_FRAGS)
but small amount of payload.
    
Fast path should build skbs with 2 or 3 fragments, as fragments
can be order-3 pages.
    
bnx2x driver performs an expensive skb_linearize() call and
this can fail if memory is fragmented : skb->len can be around 64K,
and including the skb_shared_info overhead, we might need order-5
pages.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices
  2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir
@ 2013-05-14 18:08   ` Joe Perches
  2013-05-14 18:17     ` Nithin Nayak Sujir
  0 siblings, 1 reply; 20+ messages in thread
From: Joe Perches @ 2013-05-14 18:08 UTC (permalink / raw)
  To: Nithin Nayak Sujir; +Cc: davem, netdev, stable, Michael Chan

On Mon, 2013-05-13 at 14:04 -0700, Nithin Nayak Sujir wrote:
> On the 5718, 5719 and 5720 serdes devices, powering down function 0
> results in all the other ports being powered down. Add code to skip
> function 0 power down.

Hi Nithin.  5718?  I'm confused a bit by the commit message.

> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
[]
> +static bool tg3_phy_power_bug(struct tg3 *tp)
> +{
> +	switch (tg3_asic_rev(tp)) {
> +	case ASIC_REV_5700:
> +	case ASIC_REV_5704:
> +		return true;
> +	case ASIC_REV_5780:
> +		if (tp->phy_flags & TG3_PHYFLG_MII_SERDES)
> +			return true;
> +		return false;
> +	case ASIC_REV_5717:
> +		if (!tp->pci_fn)
> +			return true;
> +		return false;
> +	case ASIC_REV_5719:
> +	case ASIC_REV_5720:
> +		if ((tp->phy_flags & TG3_PHYFLG_PHY_SERDES) &&
> +		    !tp->pci_fn)
> +			return true;
> +		return false;
> +	}
> +

Where is the 5718 in this?
What is the 5718?
There is no #define for it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices
  2013-05-14 18:08   ` Joe Perches
@ 2013-05-14 18:17     ` Nithin Nayak Sujir
  0 siblings, 0 replies; 20+ messages in thread
From: Nithin Nayak Sujir @ 2013-05-14 18:17 UTC (permalink / raw)
  To: Joe Perches; +Cc: davem, netdev, stable, Michael Chan



On 05/14/2013 11:08 AM, Joe Perches wrote:
> On Mon, 2013-05-13 at 14:04 -0700, Nithin Nayak Sujir wrote:
>> On the 5718, 5719 and 5720 serdes devices, powering down function 0
>> results in all the other ports being powered down. Add code to skip
>> function 0 power down.
>
> Hi Nithin.  5718?  I'm confused a bit by the commit message.
>
>> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> []
>> +static bool tg3_phy_power_bug(struct tg3 *tp)
>> +{
>> +	switch (tg3_asic_rev(tp)) {
>> +	case ASIC_REV_5700:
>> +	case ASIC_REV_5704:
>> +		return true;
>> +	case ASIC_REV_5780:
>> +		if (tp->phy_flags & TG3_PHYFLG_MII_SERDES)
>> +			return true;
>> +		return false;
>> +	case ASIC_REV_5717:
>> +		if (!tp->pci_fn)
>> +			return true;
>> +		return false;
>> +	case ASIC_REV_5719:
>> +	case ASIC_REV_5720:
>> +		if ((tp->phy_flags & TG3_PHYFLG_PHY_SERDES) &&
>> +		    !tp->pci_fn)
>> +			return true;
>> +		return false;
>> +	}
>> +
>
> Where is the 5718 in this?
> What is the 5718?
> There is no #define for it.
>

The 5718 is another device in the same family as the 5719 and 5720. There is no 
case or define for the ASIC_REV because it has the same asic revision as the 5719.

However, it is a separate device and you can find it in the pci table and in the 
code as TG3PCI_DEVICE_TIGON3_5718.


>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down
  2013-05-13 21:04 [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down Nithin Nayak Sujir
  2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir
  2013-05-13 21:04 ` [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO Nithin Nayak Sujir
@ 2013-05-14 18:32 ` David Miller
  2 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2013-05-14 18:32 UTC (permalink / raw)
  To: nsujir; +Cc: netdev

From: "Nithin Nayak Sujir" <nsujir@broadcom.com>
Date: Mon, 13 May 2013 14:04:14 -0700

> v2:
>  - Modify tg3_phy_power_bug() function to use a switch instead of a
>    complicated if statement. Suggested by Joe Perches.
> 
> Michael Chan (1):
>   tg3: Fix data corruption on 5725 with TSO
> 
> Nithin Nayak Sujir (1):
>   tg3: Skip powering down function 0 on certain serdes devices

All applied, thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-14 16:19                 ` Michael Chan
  2013-05-14 16:46                   ` Eric Dumazet
@ 2013-05-15  8:56                   ` David Laight
  2013-05-15 15:12                     ` Michael Chan
  1 sibling, 1 reply; 20+ messages in thread
From: David Laight @ 2013-05-15  8:56 UTC (permalink / raw)
  To: Michael Chan; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable

> > If the 'fix' is to relocate the skb you are doomed to lose regardless
> > of the check - unless you are willing to reallocate a lot of times,
> > and without freeing the old skb.
> > I'd assumed the 'fix' was to disable the relevant offload.
> 
> We relocate once and then drop the packet if we encounter additional
> errors, including OOM, DMA mapping error, 4G boundary, etc.  The new
> linear skb should not hit the 4G boundary again.  The room between the
> end of this current buffer and 4G isn't big enough for the new linear
> skb.

The first skb might be just below the 4G boundary and the
second just below the 8G one.

	David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-15  8:56                   ` David Laight
@ 2013-05-15 15:12                     ` Michael Chan
  2013-05-15 15:23                       ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Chan @ 2013-05-15 15:12 UTC (permalink / raw)
  To: David Laight; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable

On Wed, 2013-05-15 at 09:56 +0100, David Laight wrote: 
> > > If the 'fix' is to relocate the skb you are doomed to lose regardless
> > > of the check - unless you are willing to reallocate a lot of times,
> > > and without freeing the old skb.
> > > I'd assumed the 'fix' was to disable the relevant offload.
> > 
> > We relocate once and then drop the packet if we encounter additional
> > errors, including OOM, DMA mapping error, 4G boundary, etc.  The new
> > linear skb should not hit the 4G boundary again.  The room between the
> > end of this current buffer and 4G isn't big enough for the new linear
> > skb.
> 
> The first skb might be just below the 4G boundary and the
> second just below the 8G one.
> 

We will discard the packet if that happens.  I think the probabilty is
very small.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-15 15:12                     ` Michael Chan
@ 2013-05-15 15:23                       ` Eric Dumazet
  2013-05-15 15:51                         ` Michael Chan
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-05-15 15:23 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Laight, Nithin Nayak Sujir, davem, netdev, stable

On Wed, 2013-05-15 at 08:12 -0700, Michael Chan wrote:

> 
> We will discard the packet if that happens.  I think the probabilty is
> very small.

Data corruption means the content of the packet was mangled on the wire,
or was it a more serious issue, like a tx queue hang ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO
  2013-05-15 15:23                       ` Eric Dumazet
@ 2013-05-15 15:51                         ` Michael Chan
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Chan @ 2013-05-15 15:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Laight, Nithin Nayak Sujir, davem, netdev, stable

On Wed, 2013-05-15 at 08:23 -0700, Eric Dumazet wrote: 
> On Wed, 2013-05-15 at 08:12 -0700, Michael Chan wrote:
> 
> > 
> > We will discard the packet if that happens.  I think the probabilty is
> > very small.
> 
> Data corruption means the content of the packet was mangled on the wire,
> or was it a more serious issue, like a tx queue hang ?
> 
> 
I think the DMA engine is getting the wrong data under this condition so
we end up with bad data but correct header checksums on the wire.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-05-15 15:51 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-13 21:04 [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down Nithin Nayak Sujir
2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir
2013-05-14 18:08   ` Joe Perches
2013-05-14 18:17     ` Nithin Nayak Sujir
2013-05-13 21:04 ` [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO Nithin Nayak Sujir
2013-05-13 21:14   ` Eric Dumazet
2013-05-13 21:34     ` Nithin Nayak Sujir
2013-05-13 21:40       ` Eric Dumazet
2013-05-13 21:47         ` Nithin Nayak Sujir
2013-05-13 22:10           ` Eric Dumazet
2013-05-14  8:40           ` David Laight
2013-05-14 15:04             ` Michael Chan
2013-05-14 15:20               ` David Laight
2013-05-14 16:19                 ` Michael Chan
2013-05-14 16:46                   ` Eric Dumazet
2013-05-15  8:56                   ` David Laight
2013-05-15 15:12                     ` Michael Chan
2013-05-15 15:23                       ` Eric Dumazet
2013-05-15 15:51                         ` Michael Chan
2013-05-14 18:32 ` [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).