netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: netdev@vger.kernel.org
Cc: Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Claudiu Manoil <claudiu.manoil@nxp.com>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>,
	UNGLinuxDriver@microchip.com,
	Colin Foster <colin.foster@in-advantage.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH net 0/3] Fix trainwreck with Ocelot switch statistics counters
Date: Tue, 21 Mar 2023 03:03:22 +0200	[thread overview]
Message-ID: <20230321010325.897817-1-vladimir.oltean@nxp.com> (raw)

While testing the patch set for preemptible traffic classes with some
controlled traffic and measuring counter deltas:
https://lore.kernel.org/netdev/20230220122343.1156614-1-vladimir.oltean@nxp.com/

I noticed that in the output of "ethtool -S swp0 --groups eth-mac
eth-phy eth-ctrl rmon -- --src emac | grep -v ': 0'", the TX counters
were off. Quickly I realized that their values were permutated by 1
compared to their names, and that for example
tx-rmon-etherStatsPkts64to64Octets was incrementing when
tx-rmon-etherStatsPkts65to127Octets should have.

Initially I suspected something having to do with the bulk reading
logic, and indeed I found a bug there (fixed as 1/3), but that was not
the source of the problems. Instead it revealed other problems.

While dumping the regions created by the driver on my switch, I figured
out that it sees a discontinuity which shouldn't have existed between
reg 0x278 and reg 0x280.

Discontinuity between last reg 0x0 and new reg 0x0, creating new region
Discontinuity between last reg 0x108 and new reg 0x200, creating new region
Discontinuity between last reg 0x278 and new reg 0x280, creating new region
Discontinuity between last reg 0x2b0 and new reg 0x400, creating new region
region of 67 contiguous counters starting with SYS:STAT:CNT[0x000]
region of 31 contiguous counters starting with SYS:STAT:CNT[0x080]
region of 13 contiguous counters starting with SYS:STAT:CNT[0x0a0]
region of 18 contiguous counters starting with SYS:STAT:CNT[0x100]

That is where TX_MM_HOLD should have been, and that was the bug, since
it was missing. After adding it, the regions look like this and the
off-by-one issue is resolved:

Discontinuity between last reg 0x000000 and new reg 0x000000, creating new region
Discontinuity between last reg 0x000108 and new reg 0x000200, creating new region
Discontinuity between last reg 0x0002b0 and new reg 0x000400, creating new region
region of 67 contiguous counters starting with SYS:STAT:CNT[0x000]
region of 45 contiguous counters starting with SYS:STAT:CNT[0x080]
region of 18 contiguous counters starting with SYS:STAT:CNT[0x100]

However, as I am thinking out loud, it should have not reported the
other counters as off by one even when skipping TX_MM_HOLD... after all,
on Ocelot/Seville, there are more counters which need to be skipped.

Which is when I investigated and noticed the bug solved in 2/3.
I've validated that both on native VSC9959 (which uses
ocelot_mm_stats_layout) as well as by faking the other switches by
making VSC9959 use the plain ocelot_stats_layout.

To summarize: on all Ocelot switches, the TX counters and drop counters
are completely broken. The RX counters are mostly fine.

With this occasion, I have collected more cleanup patches in this area,
which I'm going to submit after the net -> net-next merge.

Vladimir Oltean (3):
  net: mscc: ocelot: fix stats region batching
  net: mscc: ocelot: fix transfer from region->buf to ocelot->stats
  net: mscc: ocelot: add TX_MM_HOLD to ocelot_mm_stats_layout

 drivers/net/ethernet/mscc/ocelot_stats.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

-- 
2.34.1


             reply	other threads:[~2023-03-21  1:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21  1:03 Vladimir Oltean [this message]
2023-03-21  1:03 ` [PATCH net 1/3] net: mscc: ocelot: fix stats region batching Vladimir Oltean
2023-03-21  2:14   ` Colin Foster
2023-03-21  1:03 ` [PATCH net 2/3] net: mscc: ocelot: fix transfer from region->buf to ocelot->stats Vladimir Oltean
2023-03-21  1:03 ` [PATCH net 3/3] net: mscc: ocelot: add TX_MM_HOLD to ocelot_mm_stats_layout Vladimir Oltean
2023-03-22  4:40 ` [PATCH net 0/3] Fix trainwreck with Ocelot switch statistics counters patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230321010325.897817-1-vladimir.oltean@nxp.com \
    --to=vladimir.oltean@nxp.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=andrew@lunn.ch \
    --cc=claudiu.manoil@nxp.com \
    --cc=colin.foster@in-advantage.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=f.fainelli@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).