All of lore.kernel.org
 help / color / mirror / Atom feed
* bnx2x statistics occasionally wrong by near multiples of 4GB.
@ 2013-03-15 21:03 Maciej Żenczykowski
  2013-03-15 21:56 ` [PATCH] bnx2x: fix occasional statistics off-by-4GB error Maciej Żenczykowski
  0 siblings, 1 reply; 19+ messages in thread
From: Maciej Żenczykowski @ 2013-03-15 21:03 UTC (permalink / raw)
  To: Mintz Yuval; +Cc: Linux NetDev

bnx2x_stats.h:

#define UPDATE_QSTAT(s, t) \
     do { \
              qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi); \
              qstats->t##_lo = qstats_old->t##_lo + le32_to_cpu(s.lo); \
      } while (0)

Looks wrong because it doesn't deal with overflow of _lo.

I'm not 100% sure if I tracked it down correctly, but I think this is
the root cause of us occasionally seeing stats (/proc/net/dev tx_bytes
in this case) jump back by roughly N*4GB and then go forward again.

In particular we saw successive reads of /proc/net/dev eth0 Transmit bytes show:

0x8F6 45356E60
0x8F4 4C8D72B9
0x8F6 5420D096

(which since this is a sum of 3 values (ucase/mcast/bcast) from all
queues, means we actually wrongly accounted for overflow twice...)

- Maciej

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-15 21:03 bnx2x statistics occasionally wrong by near multiples of 4GB Maciej Żenczykowski
@ 2013-03-15 21:56 ` Maciej Żenczykowski
  2013-03-17 13:10   ` Dmitry Kravkov
  0 siblings, 1 reply; 19+ messages in thread
From: Maciej Żenczykowski @ 2013-03-15 21:56 UTC (permalink / raw)
  To: Maciej Żenczykowski, David S. Miller; +Cc: netdev, Mintz Yuval

From: Maciej Żenczykowski <maze@google.com>

The UPDATE_QSTAT function introduced on February 15, 2012
in commit 1355b704b9ba "bnx2x: consistent statistics after
internal driver reload" incorrectly fails to handle overflow
during addition of the lower 32-bit field of a stat.

This bug is present since 3.4-rc1 and should thus be considered
a candidate for stable 3.4+ releases.

Google-Bug-Id: 8374428
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Mintz Yuval <yuvalmin@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
index 364e37ecbc5c..198f6f1c9ad5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
@@ -459,8 +459,9 @@ struct bnx2x_fw_port_stats_old {
 
 #define UPDATE_QSTAT(s, t) \
 	do { \
-		qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi); \
 		qstats->t##_lo = qstats_old->t##_lo + le32_to_cpu(s.lo); \
+		qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi) \
+			+ ((qstats->t##_lo < qstats_old->t##_lo) ? 1 : 0); \
 	} while (0)
 
 #define UPDATE_QSTAT_OLD(f) \
-- 
1.8.1.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* RE: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-15 21:56 ` [PATCH] bnx2x: fix occasional statistics off-by-4GB error Maciej Żenczykowski
@ 2013-03-17 13:10   ` Dmitry Kravkov
  2013-03-17 18:24     ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Kravkov @ 2013-03-17 13:10 UTC (permalink / raw)
  To: Maciej Żenczykowski, Maciej Żenczykowski, David S. Miller
  Cc: netdev, Yuval Mintz

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Maciej ?enczykowski
> Sent: Friday, March 15, 2013 11:56 PM
> To: Maciej Żenczykowski; David S. Miller
> Cc: netdev@vger.kernel.org; Yuval Mintz
> Subject: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
> 
> From: Maciej Żenczykowski <maze@google.com>
> 
> The UPDATE_QSTAT function introduced on February 15, 2012
> in commit 1355b704b9ba "bnx2x: consistent statistics after
> internal driver reload" incorrectly fails to handle overflow
> during addition of the lower 32-bit field of a stat.
> 
> This bug is present since 3.4-rc1 and should thus be considered
> a candidate for stable 3.4+ releases.
> 
> Google-Bug-Id: 8374428
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> Cc: Mintz Yuval <yuvalmin@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
> index 364e37ecbc5c..198f6f1c9ad5 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
> @@ -459,8 +459,9 @@ struct bnx2x_fw_port_stats_old {
> 
>  #define UPDATE_QSTAT(s, t) \
>  	do { \
> -		qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi); \
>  		qstats->t##_lo = qstats_old->t##_lo + le32_to_cpu(s.lo); \
> +		qstats->t##_hi = qstats_old->t##_hi + le32_to_cpu(s.hi) \
> +			+ ((qstats->t##_lo < qstats_old->t##_lo) ? 1 : 0); \
>  	} while (0)
> 
>  #define UPDATE_QSTAT_OLD(f) \
> --
> 1.8.1.3
> 

Probably this commit resolved the issue:

commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
Author: Dmitry Kravkov <dmitry@broadcom.com>
Date:   Tue Sep 11 04:34:08 2012 +0000

    bnx2x: Avoid sending multiple statistics queries

Can you try it pls?

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-17 13:10   ` Dmitry Kravkov
@ 2013-03-17 18:24     ` David Miller
  2013-03-17 19:37       ` Eilon Greenstein
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2013-03-17 18:24 UTC (permalink / raw)
  To: dmitry; +Cc: zenczykowski, maze, netdev, yuvalmin

From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Sun, 17 Mar 2013 13:10:37 +0000

> Probably this commit resolved the issue:
> 
> commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
> Author: Dmitry Kravkov <dmitry@broadcom.com>
> Date:   Tue Sep 11 04:34:08 2012 +0000
> 
>     bnx2x: Avoid sending multiple statistics queries
> 
> Can you try it pls?

These are completely seperate bugs.

The macro in question does not handle rollover of the lower 32-bits of
the statistic properly at all, and therefore Maciej's patch should be
applied and queud up for -stable.

Please give it your ACK unless you can find a bug in his change.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-17 18:24     ` David Miller
@ 2013-03-17 19:37       ` Eilon Greenstein
  2013-03-17 20:25         ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Eilon Greenstein @ 2013-03-17 19:37 UTC (permalink / raw)
  To: David Miller; +Cc: dmitry, zenczykowski, maze, netdev, yuvalmin

On Sun, 2013-03-17 at 14:24 -0400, David Miller wrote:
> From: "Dmitry Kravkov" <dmitry@broadcom.com>
> Date: Sun, 17 Mar 2013 13:10:37 +0000
> 
> > Probably this commit resolved the issue:
> > 
> > commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
> > Author: Dmitry Kravkov <dmitry@broadcom.com>
> > Date:   Tue Sep 11 04:34:08 2012 +0000
> > 
> >     bnx2x: Avoid sending multiple statistics queries
> > 
> > Can you try it pls?
> 
> These are completely seperate bugs.
> 
> The macro in question does not handle rollover of the lower 32-bits of
> the statistic properly at all, and therefore Maciej's patch should be
> applied and queud up for -stable.
> 
> Please give it your ACK unless you can find a bug in his change.

Both the high value and the low value are read from the chip - so this
patch will increment the higher 32 bits twice (well, more than once).
Not taking the HW/FW high counter at all is also not acceptable since
the reading frequency is not high enough without those.

So the original patch is nacked, but we are trying to figure out what is
causing the statistics to misbehave and it might be related to sending
statistics query twice.

Thanks,
Eilon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-17 19:37       ` Eilon Greenstein
@ 2013-03-17 20:25         ` Eric Dumazet
  2013-03-17 20:53           ` Eilon Greenstein
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-03-17 20:25 UTC (permalink / raw)
  To: eilong; +Cc: David Miller, dmitry, zenczykowski, maze, netdev, yuvalmin

On Sun, 2013-03-17 at 21:37 +0200, Eilon Greenstein wrote:
> On Sun, 2013-03-17 at 14:24 -0400, David Miller wrote:
> > From: "Dmitry Kravkov" <dmitry@broadcom.com>
> > Date: Sun, 17 Mar 2013 13:10:37 +0000
> > 
> > > Probably this commit resolved the issue:
> > > 
> > > commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
> > > Author: Dmitry Kravkov <dmitry@broadcom.com>
> > > Date:   Tue Sep 11 04:34:08 2012 +0000
> > > 
> > >     bnx2x: Avoid sending multiple statistics queries
> > > 
> > > Can you try it pls?
> > 
> > These are completely seperate bugs.
> > 
> > The macro in question does not handle rollover of the lower 32-bits of
> > the statistic properly at all, and therefore Maciej's patch should be
> > applied and queud up for -stable.
> > 
> > Please give it your ACK unless you can find a bug in his change.
> 
> Both the high value and the low value are read from the chip - so this
> patch will increment the higher 32 bits twice (well, more than once).
> Not taking the HW/FW high counter at all is also not acceptable since
> the reading frequency is not high enough without those.
> 
> So the original patch is nacked, but we are trying to figure out what is
> causing the statistics to misbehave and it might be related to sending
> statistics query twice.

This looks like the typical problem of updating a 64bit value in non
atomic way.

Its guaranteed to happen on 32bit hosts.
We had to introduce include/linux/u64_stats_sync.h to help to solve this
without adding extra cost on 64bit arches.

In bnx2x case, we perform 32bit operations even on 64bit host, so we
probably need to add a seqcount_t, so that a stat consumer can detect it
read a stale value.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-17 20:25         ` Eric Dumazet
@ 2013-03-17 20:53           ` Eilon Greenstein
  2013-03-17 23:17             ` Eric Dumazet
  2013-03-18  6:18             ` Maciej Żenczykowski
  0 siblings, 2 replies; 19+ messages in thread
From: Eilon Greenstein @ 2013-03-17 20:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, dmitry, zenczykowski, maze, netdev, yuvalmin

On Sun, 2013-03-17 at 13:25 -0700, Eric Dumazet wrote:
> On Sun, 2013-03-17 at 21:37 +0200, Eilon Greenstein wrote:
> > On Sun, 2013-03-17 at 14:24 -0400, David Miller wrote:
> > > From: "Dmitry Kravkov" <dmitry@broadcom.com>
> > > Date: Sun, 17 Mar 2013 13:10:37 +0000
> > > 
> > > > Probably this commit resolved the issue:
> > > > 
> > > > commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
> > > > Author: Dmitry Kravkov <dmitry@broadcom.com>
> > > > Date:   Tue Sep 11 04:34:08 2012 +0000
> > > > 
> > > >     bnx2x: Avoid sending multiple statistics queries
> > > > 
> > > > Can you try it pls?
> > > 
> > > These are completely seperate bugs.
> > > 
> > > The macro in question does not handle rollover of the lower 32-bits of
> > > the statistic properly at all, and therefore Maciej's patch should be
> > > applied and queud up for -stable.
> > > 
> > > Please give it your ACK unless you can find a bug in his change.
> > 
> > Both the high value and the low value are read from the chip - so this
> > patch will increment the higher 32 bits twice (well, more than once).
> > Not taking the HW/FW high counter at all is also not acceptable since
> > the reading frequency is not high enough without those.
> > 
> > So the original patch is nacked, but we are trying to figure out what is
> > causing the statistics to misbehave and it might be related to sending
> > statistics query twice.
> 
> This looks like the typical problem of updating a 64bit value in non
> atomic way.
> 
> Its guaranteed to happen on 32bit hosts.
> We had to introduce include/linux/u64_stats_sync.h to help to solve this
> without adding extra cost on 64bit arches.
> 
> In bnx2x case, we perform 32bit operations even on 64bit host, so we
> probably need to add a seqcount_t, so that a stat consumer can detect it
> read a stale value.

This is not such a trivial issue - the HW/FW is guaranteeing the atomic
read and this is why we can always use 32b variables.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-17 20:53           ` Eilon Greenstein
@ 2013-03-17 23:17             ` Eric Dumazet
  2013-03-18  7:27               ` Eilon Greenstein
  2013-03-18  6:18             ` Maciej Żenczykowski
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2013-03-17 23:17 UTC (permalink / raw)
  To: eilong; +Cc: David Miller, dmitry, zenczykowski, maze, netdev, yuvalmin

On Sun, 2013-03-17 at 22:53 +0200, Eilon Greenstein wrote:

> This is not such a trivial issue - the HW/FW is guaranteeing the atomic
> read and this is why we can always use 32b variables.
> 

I am glad you qualify the following code as 'trivial'

/* difference = minuend - subtrahend */
#define DIFF_64(d_hi, m_hi, s_hi, d_lo, m_lo, s_lo) \
        do { \
                if (m_lo < s_lo) { \
                        /* underflow */ \
                        d_hi = m_hi - s_hi; \
                        if (d_hi > 0) { \
                                /* we can 'loan' 1 */ \
                                d_hi--; \
                                d_lo = m_lo + (UINT_MAX - s_lo) + 1; \
                        } else { \
                                /* m_hi <= s_hi */ \
                                d_hi = 0; \
                                d_lo = 0; \
                        } \
                } else { \
                        /* m_lo >= s_lo */ \
                        if (m_hi < s_hi) { \
                                d_hi = 0; \
                                d_lo = 0; \
                        } else { \
                                /* m_hi >= s_hi */ \
                                d_hi = m_hi - s_hi; \
                                d_lo = m_lo - s_lo; \
                        } \
                } \
        } while (0)


All the macros and _hi/_ho fields found in
drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
are really badly designed.

It hurts so much the common sense its not even funny.

All you really needed is to have normal host structures containing u64
fields, and you only needed one single helper to convert from hw u32:u32
to host u64.

Because gcc handles u64 operations very well, even on 32bit hosts, and
it knows about the carry flag.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-17 20:53           ` Eilon Greenstein
  2013-03-17 23:17             ` Eric Dumazet
@ 2013-03-18  6:18             ` Maciej Żenczykowski
  2013-03-18 10:06               ` Eilon Greenstein
  1 sibling, 1 reply; 19+ messages in thread
From: Maciej Żenczykowski @ 2013-03-18  6:18 UTC (permalink / raw)
  To: eilong
  Cc: Eric Dumazet, David Miller, Dmitry Kravkov, Linux NetDev, Mintz Yuval

This has nothing to do with atomicity or races or anything.

Look at the ADD_64 macro (which implements a += b) and compare it to
this one (which implents a = b + c).

It simply doesn't know how to correctly add in any situation in which
"qstats_old->t..._lo != 0".
By this point all the stats are already in software and both firmware
and hardware is not relevant.

Normally qstats_old starts off at 0 and the bug isn't visible (since
with a value of 0, there can't be overflow and carry propagation is
not needed).

The reason qstats_old was introduced was to allow for a partial nic
reset [or whatever the proper term is] which resets fw/hw provided
stats to not reset sw exported stats.
During the reset old stats are copied into qstats_old, and in the
future stats are returned as "qstats_old + current stats from fw/hw"
except this addition is buggy and only works if qstats_old happens to
have lower 32-bits zero (ie. basically only before first partial nic
reset).

Imagine:
(a) driver initializes, nic gets brought up, qstats_old = 0, stats
from device = 0
(b) 2GB of xfer happens
(c) qstats_old still = 0, stats from device = 2GB
(d) addition happens qstats = qstats_old + stats_from_dev = 0 + 2GB =
2GB   --- we report 2GB xfer
(e) something partially resets nic (say mtu change), qstats_old = 2GB,
stats from device = 0
(f) 2GB-1 of xfer happens
(g) qstats_old = 2GB, stats from device = 2GB-1
(h) addition happens qstats = qstats_old + stats_from_dev = 2GB +
2GB-1 = 4GB-1   --- we report 4GB-1 xfer
(i) 1 byte of xfer happens
(j) qstats_old = 2GB, stats from device = 2GB
(k) addition happens qstats = qstats_old + stats_from_dev = 2GB + 2GB
= 0   --- we report 0 xfer
(l) 2GB of xfer happens
(m) qstats_old = 2GB, stats from device = 4GB
(n) addition happens qstats = qstats_old + stats_from_dev = 2GB + 4GB
= 6GB   --- we report 6GB xfer

And yes the macro in question when asked to do 2GB+2GB will return 0.

(btw. at least on the driver I tested there's still some weirdness
with mtu change not-resetting per-queue stats but resetting
coalesced-from-all-queues stats)

Now, to be fair, I don't yet have confirmation that my patch fixes the
problem our SREs saw in deployment, but I expect to have that soon.
(and even if it doesn't fix the problem, this code is still buggy and
should be fixed)

Maciej

On Sun, Mar 17, 2013 at 1:53 PM, Eilon Greenstein <eilong@broadcom.com> wrote:
> On Sun, 2013-03-17 at 13:25 -0700, Eric Dumazet wrote:
>> On Sun, 2013-03-17 at 21:37 +0200, Eilon Greenstein wrote:
>> > On Sun, 2013-03-17 at 14:24 -0400, David Miller wrote:
>> > > From: "Dmitry Kravkov" <dmitry@broadcom.com>
>> > > Date: Sun, 17 Mar 2013 13:10:37 +0000
>> > >
>> > > > Probably this commit resolved the issue:
>> > > >
>> > > > commit bef05406ac0ea6f468e1e25e9934f3011ea9259b
>> > > > Author: Dmitry Kravkov <dmitry@broadcom.com>
>> > > > Date:   Tue Sep 11 04:34:08 2012 +0000
>> > > >
>> > > >     bnx2x: Avoid sending multiple statistics queries
>> > > >
>> > > > Can you try it pls?
>> > >
>> > > These are completely seperate bugs.
>> > >
>> > > The macro in question does not handle rollover of the lower 32-bits of
>> > > the statistic properly at all, and therefore Maciej's patch should be
>> > > applied and queud up for -stable.
>> > >
>> > > Please give it your ACK unless you can find a bug in his change.
>> >
>> > Both the high value and the low value are read from the chip - so this
>> > patch will increment the higher 32 bits twice (well, more than once).
>> > Not taking the HW/FW high counter at all is also not acceptable since
>> > the reading frequency is not high enough without those.
>> >
>> > So the original patch is nacked, but we are trying to figure out what is
>> > causing the statistics to misbehave and it might be related to sending
>> > statistics query twice.
>>
>> This looks like the typical problem of updating a 64bit value in non
>> atomic way.
>>
>> Its guaranteed to happen on 32bit hosts.
>> We had to introduce include/linux/u64_stats_sync.h to help to solve this
>> without adding extra cost on 64bit arches.
>>
>> In bnx2x case, we perform 32bit operations even on 64bit host, so we
>> probably need to add a seqcount_t, so that a stat consumer can detect it
>> read a stale value.
>
> This is not such a trivial issue - the HW/FW is guaranteeing the atomic
> read and this is why we can always use 32b variables.
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-17 23:17             ` Eric Dumazet
@ 2013-03-18  7:27               ` Eilon Greenstein
  0 siblings, 0 replies; 19+ messages in thread
From: Eilon Greenstein @ 2013-03-18  7:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, dmitry, zenczykowski, maze, netdev, yuvalmin

On Sun, 2013-03-17 at 16:17 -0700, Eric Dumazet wrote:
> On Sun, 2013-03-17 at 22:53 +0200, Eilon Greenstein wrote:
> 
> > This is not such a trivial issue - the HW/FW is guaranteeing the atomic
> > read and this is why we can always use 32b variables.
> > 
> 
> I am glad you qualify the following code as 'trivial'
> 

Eric - I did not mean that the solution is trivial, I meant that the
problem is trivial. Like the trivial statement that the code should not
contain any bugs :)


> All you really needed is to have normal host structures containing u64
> fields, and you only needed one single helper to convert from hw u32:u32
> to host u64.
> 
> Because gcc handles u64 operations very well, even on 32bit hosts, and
> it knows about the carry flag.

I was not clear when I said size-extension: most of the HW counters are
46bits and we are trying to keep 64bits counters in the SW, so simply
copying the data from the HW is not enough.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-18  6:18             ` Maciej Żenczykowski
@ 2013-03-18 10:06               ` Eilon Greenstein
  2013-03-18 17:13                 ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Eilon Greenstein @ 2013-03-18 10:06 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Eric Dumazet, David Miller, Dmitry Kravkov, Linux NetDev, Mintz Yuval

On Sun, 2013-03-17 at 23:18 -0700, Maciej Żenczykowski wrote:
> This has nothing to do with atomicity or races or anything.
> 
> Look at the ADD_64 macro (which implements a += b) and compare it to
> this one (which implents a = b + c).
> 
> It simply doesn't know how to correctly add in any situation in which
> "qstats_old->t..._lo != 0".
> By this point all the stats are already in software and both firmware
> and hardware is not relevant.
> 
> Normally qstats_old starts off at 0 and the bug isn't visible (since
> with a value of 0, there can't be overflow and carry propagation is
> not needed).
> 
> The reason qstats_old was introduced was to allow for a partial nic
> reset [or whatever the proper term is] which resets fw/hw provided
> stats to not reset sw exported stats.
> During the reset old stats are copied into qstats_old, and in the
> future stats are returned as "qstats_old + current stats from fw/hw"
> except this addition is buggy and only works if qstats_old happens to
> have lower 32-bits zero (ie. basically only before first partial nic
> reset).
> 
> Imagine:
> (a) driver initializes, nic gets brought up, qstats_old = 0, stats
> from device = 0
> (b) 2GB of xfer happens
> (c) qstats_old still = 0, stats from device = 2GB
> (d) addition happens qstats = qstats_old + stats_from_dev = 0 + 2GB =
> 2GB   --- we report 2GB xfer
> (e) something partially resets nic (say mtu change), qstats_old = 2GB,
> stats from device = 0
> (f) 2GB-1 of xfer happens
> (g) qstats_old = 2GB, stats from device = 2GB-1
> (h) addition happens qstats = qstats_old + stats_from_dev = 2GB +
> 2GB-1 = 4GB-1   --- we report 4GB-1 xfer
> (i) 1 byte of xfer happens
> (j) qstats_old = 2GB, stats from device = 2GB
> (k) addition happens qstats = qstats_old + stats_from_dev = 2GB + 2GB
> = 0   --- we report 0 xfer
> (l) 2GB of xfer happens
> (m) qstats_old = 2GB, stats from device = 4GB
> (n) addition happens qstats = qstats_old + stats_from_dev = 2GB + 4GB
> = 6GB   --- we report 6GB xfer
> 
> And yes the macro in question when asked to do 2GB+2GB will return 0.
> 
> (btw. at least on the driver I tested there's still some weirdness
> with mtu change not-resetting per-queue stats but resetting
> coalesced-from-all-queues stats)
> 
> Now, to be fair, I don't yet have confirmation that my patch fixes the
> problem our SREs saw in deployment, but I expect to have that soon.
> (and even if it doesn't fix the problem, this code is still buggy and
> should be fixed)
> 
> Maciej

Maciej - thanks for the detailed information. You are right - it has
nothing to do with the HW/FW and it is simply a bug that needs to be
fixed. I withdraw my objections and add my ACK.

Acked-by: Eilon Greenstein <eilong@broadcom.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-18 10:06               ` Eilon Greenstein
@ 2013-03-18 17:13                 ` David Miller
  2013-03-18 19:54                   ` Maciej Żenczykowski
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2013-03-18 17:13 UTC (permalink / raw)
  To: eilong; +Cc: maze, eric.dumazet, dmitry, netdev, yuvalmin

From: "Eilon Greenstein" <eilong@broadcom.com>
Date: Mon, 18 Mar 2013 12:06:22 +0200

> Maciej - thanks for the detailed information. You are right - it has
> nothing to do with the HW/FW and it is simply a bug that needs to be
> fixed. I withdraw my objections and add my ACK.
> 
> Acked-by: Eilon Greenstein <eilong@broadcom.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-18 17:13                 ` David Miller
@ 2013-03-18 19:54                   ` Maciej Żenczykowski
  2013-03-18 20:05                     ` David Miller
  2013-03-18 20:06                     ` Maciej Żenczykowski
  0 siblings, 2 replies; 19+ messages in thread
From: Maciej Żenczykowski @ 2013-03-18 19:54 UTC (permalink / raw)
  To: David Miller; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin

I've reproduced this in the lab.

I can confirm this patch fixes per-queue statistics (ie. ethtool -S
"[0]: tx_bytes") - which previously were not monotonically increasing.
However there is still a bug wrt. global device stats (ie. ethtool -S
"tx_bytes" and /proc/net/dev transmit bytes).
I'm guessing there's another similar bug later on during aggregation.

I expect to find the offending code and send out a patch soon.

On Mon, Mar 18, 2013 at 10:13 AM, David Miller <davem@davemloft.net> wrote:
> From: "Eilon Greenstein" <eilong@broadcom.com>
> Date: Mon, 18 Mar 2013 12:06:22 +0200
>
>> Maciej - thanks for the detailed information. You are right - it has
>> nothing to do with the HW/FW and it is simply a bug that needs to be
>> fixed. I withdraw my objections and add my ACK.
>>
>> Acked-by: Eilon Greenstein <eilong@broadcom.com>
>
> Applied and queued up for -stable.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-18 19:54                   ` Maciej Żenczykowski
@ 2013-03-18 20:05                     ` David Miller
  2013-03-18 20:06                     ` Maciej Żenczykowski
  1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2013-03-18 20:05 UTC (permalink / raw)
  To: zenczykowski; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon, 18 Mar 2013 12:54:49 -0700

> I've reproduced this in the lab.
> 
> I can confirm this patch fixes per-queue statistics (ie. ethtool -S
> "[0]: tx_bytes") - which previously were not monotonically increasing.
> However there is still a bug wrt. global device stats (ie. ethtool -S
> "tx_bytes" and /proc/net/dev transmit bytes).
> I'm guessing there's another similar bug later on during aggregation.
> 
> I expect to find the offending code and send out a patch soon.

Thanks Maciej.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-18 19:54                   ` Maciej Żenczykowski
  2013-03-18 20:05                     ` David Miller
@ 2013-03-18 20:06                     ` Maciej Żenczykowski
  2013-03-18 20:35                       ` Eric Dumazet
  2013-03-18 20:36                       ` Maciej Żenczykowski
  1 sibling, 2 replies; 19+ messages in thread
From: Maciej Żenczykowski @ 2013-03-18 20:06 UTC (permalink / raw)
  To: David Miller; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin

Current best guess is UPDATE_ESTAT_QSTAT_64 which uses SUB_64
implemented via DIFF_64 which is crazy.

On Mon, Mar 18, 2013 at 12:54 PM, Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
> I've reproduced this in the lab.
>
> I can confirm this patch fixes per-queue statistics (ie. ethtool -S
> "[0]: tx_bytes") - which previously were not monotonically increasing.
> However there is still a bug wrt. global device stats (ie. ethtool -S
> "tx_bytes" and /proc/net/dev transmit bytes).
> I'm guessing there's another similar bug later on during aggregation.
>
> I expect to find the offending code and send out a patch soon.
>
> On Mon, Mar 18, 2013 at 10:13 AM, David Miller <davem@davemloft.net> wrote:
>> From: "Eilon Greenstein" <eilong@broadcom.com>
>> Date: Mon, 18 Mar 2013 12:06:22 +0200
>>
>>> Maciej - thanks for the detailed information. You are right - it has
>>> nothing to do with the HW/FW and it is simply a bug that needs to be
>>> fixed. I withdraw my objections and add my ACK.
>>>
>>> Acked-by: Eilon Greenstein <eilong@broadcom.com>
>>
>> Applied and queued up for -stable.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-18 20:06                     ` Maciej Żenczykowski
@ 2013-03-18 20:35                       ` Eric Dumazet
  2013-03-18 20:36                       ` Maciej Żenczykowski
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2013-03-18 20:35 UTC (permalink / raw)
  To: Maciej Żenczykowski; +Cc: David Miller, eilong, dmitry, netdev, yuvalmin

On Mon, 2013-03-18 at 13:06 -0700, Maciej Żenczykowski wrote:
> Current best guess is UPDATE_ESTAT_QSTAT_64 which uses SUB_64
> implemented via DIFF_64 which is crazy.

All these macros are really ugly, I wish someone could cleanup this
code.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-18 20:06                     ` Maciej Żenczykowski
  2013-03-18 20:35                       ` Eric Dumazet
@ 2013-03-18 20:36                       ` Maciej Żenczykowski
  2013-03-21 21:23                         ` Maciej Żenczykowski
  1 sibling, 1 reply; 19+ messages in thread
From: Maciej Żenczykowski @ 2013-03-18 20:36 UTC (permalink / raw)
  To: David Miller; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin

Not quite the right call sequence above.
  UPDATE_FSTAT_QSTAT(total_bytes_received); --> SUB_64 --> DIFF_64
is probably more relevant.

Regardless:

/* difference = minuend - subtrahend */
#define DIFF_64(d_hi, m_hi, s_hi, d_lo, m_lo, s_lo) \
<------>do { \
<------><------>if (m_lo < s_lo) { \
<------><------><------>/* underflow */ \
<------><------><------>d_hi = m_hi - s_hi; \
<------><------><------>if (d_hi > 0) { \
<------><------><------><------>/* we can 'loan' 1 */ \
<------><------><------><------>d_hi--; \
<------><------><------><------>d_lo = m_lo + (UINT_MAX - s_lo) + 1; \
<------><------><------>} else { \
<------><------><------><------>/* m_hi <= s_hi */ \
<------><------><------><------>d_hi = 0; \
<------><------><------><------>d_lo = 0; \
<------><------><------>} \
<------><------>} else { \
...
I believe this fails.  All parameters are most likely going to be u32,
since that's used for stats pretty much everywhere.
As such after d_hi = m_hi - s_hi;  d_hi will be >= 0 since it's u32.

As such if "m_hi == s_hi" && "m_lo < s_lo" we will return (0,0)
however if "m_hi < s_hi" && "m_lo < s_lo" we will not return (0,0)

I'm not sure which behaviour is desired, but either way this is obviously wrong.

0 - 1 returns 0
0 - (4GB+1) returns -4GB-1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-18 20:36                       ` Maciej Żenczykowski
@ 2013-03-21 21:23                         ` Maciej Żenczykowski
  2013-03-21 21:25                           ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Maciej Żenczykowski @ 2013-03-21 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin

Just as an update on the state here.

Extensive testing both in lab and in deployment confirms the already
applied patch fixes the problem.

My concerns about aggregate counters being wrong (after taking another
long look at the data from the 'bad' run)
was simply a matter of me getting mixed up in what was the 'correct'
value I should be seeing.

ie. I did not actually see any bad behaviour with this patch.

That said.  I spent a fair bit of time debugging this before I
realized I was stupid, and found other problems
while doing that.  I cannot actually trigger the potential problems,
but I'll follow up to this with a couple patches
to fix potential problems (and export more stats).

(a) The DIFF_64 macro is buggy in the case of underflow due to unsigned-ness
(b) Any code which does SUB/ADD should instead do ADD/SUB, because
SUB(A,B) is implemented via DIFF(A,B) and thus doesn't actually do
subtraction, but instead does A := max(A-B, 0)
(c) there's a bunch of extra statistics it is trivial to export via
ethtool which make debugging stuff like this much easier

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
  2013-03-21 21:23                         ` Maciej Żenczykowski
@ 2013-03-21 21:25                           ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2013-03-21 21:25 UTC (permalink / raw)
  To: zenczykowski; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Thu, 21 Mar 2013 14:23:47 -0700

> (a) The DIFF_64 macro is buggy in the case of underflow due to unsigned-ness
> (b) Any code which does SUB/ADD should instead do ADD/SUB, because
> SUB(A,B) is implemented via DIFF(A,B) and thus doesn't actually do
> subtraction, but instead does A := max(A-B, 0)
> (c) there's a bunch of extra statistics it is trivial to export via
> ethtool which make debugging stuff like this much easier

Thanks for looking into this and the status update.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-03-21 21:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-15 21:03 bnx2x statistics occasionally wrong by near multiples of 4GB Maciej Żenczykowski
2013-03-15 21:56 ` [PATCH] bnx2x: fix occasional statistics off-by-4GB error Maciej Żenczykowski
2013-03-17 13:10   ` Dmitry Kravkov
2013-03-17 18:24     ` David Miller
2013-03-17 19:37       ` Eilon Greenstein
2013-03-17 20:25         ` Eric Dumazet
2013-03-17 20:53           ` Eilon Greenstein
2013-03-17 23:17             ` Eric Dumazet
2013-03-18  7:27               ` Eilon Greenstein
2013-03-18  6:18             ` Maciej Żenczykowski
2013-03-18 10:06               ` Eilon Greenstein
2013-03-18 17:13                 ` David Miller
2013-03-18 19:54                   ` Maciej Żenczykowski
2013-03-18 20:05                     ` David Miller
2013-03-18 20:06                     ` Maciej Żenczykowski
2013-03-18 20:35                       ` Eric Dumazet
2013-03-18 20:36                       ` Maciej Żenczykowski
2013-03-21 21:23                         ` Maciej Żenczykowski
2013-03-21 21:25                           ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.