linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Russell King <rmk+kernel@armlinux.org.uk>,
	Matteo Croce <mcroce@redhat.com>,
	"David S . Miller" <davem@davemloft.net>,
	Sasha Levin <sashal@kernel.org>,
	netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 5.6 29/47] net: mvpp2: fix RX hashing for non-10G ports
Date: Thu, 28 May 2020 07:55:42 -0400	[thread overview]
Message-ID: <20200528115600.1405808-29-sashal@kernel.org> (raw)
In-Reply-To: <20200528115600.1405808-1-sashal@kernel.org>

From: Russell King <rmk+kernel@armlinux.org.uk>

[ Upstream commit 3138a07ce219acde4c0d7ea0b6d54ba64153328b ]

When rxhash is enabled on any ethernet port except the first in each CP
block, traffic flow is prevented.  The analysis is below:

I've been investigating this afternoon, and what I've found, comparing
a kernel without 895586d5dc32 and with 895586d5dc32 applied is:

- The table programmed into the hardware via mvpp22_rss_fill_table()
  appears to be identical with or without the commit.

- When rxhash is enabled on eth2, mvpp2_rss_port_c2_enable() reports
  that c2.attr[0] and c2.attr[2] are written back containing:

   - with 895586d5dc32, failing:    00200000 40000000
   - without 895586d5dc32, working: 04000000 40000000

- When disabling rxhash, c2.attr[0] and c2.attr[2] are written back as:

   04000000 00000000

The second value represents the MVPP22_CLS_C2_ATTR2_RSS_EN bit, the
first value is the queue number, which comprises two fields. The high
5 bits are 24:29 and the low three are 21:23 inclusive. This comes
from:

       c2.attr[0] = MVPP22_CLS_C2_ATTR0_QHIGH(qh) |
                     MVPP22_CLS_C2_ATTR0_QLOW(ql);

So, the working case gives eth2 a queue id of 4.0, or 32 as per
port->first_rxq, and the non-working case a queue id of 0.1, or 1.
The allocation of queue IDs seems to be in mvpp2_port_probe():

        if (priv->hw_version == MVPP21)
                port->first_rxq = port->id * port->nrxqs;
        else
                port->first_rxq = port->id * priv->max_port_rxqs;

Where:

        if (priv->hw_version == MVPP21)
                priv->max_port_rxqs = 8;
        else
                priv->max_port_rxqs = 32;

Making the port 0 (eth0 / eth1) have port->first_rxq = 0, and port 1
(eth2) be 32. It seems the idea is that the first 32 queues belong to
port 0, the second 32 queues belong to port 1, etc.

mvpp2_rss_port_c2_enable() gets the queue number from it's parameter,
'ctx', which comes from mvpp22_rss_ctx(port, 0). This returns
port->rss_ctx[0].

mvpp22_rss_context_create() is responsible for allocating that, which
it does by looking for an unallocated priv->rss_tables[] pointer. This
table is shared amongst all ports on the CP silicon.

When we write the tables in mvpp22_rss_fill_table(), the RSS table
entry is defined by:

                u32 sel = MVPP22_RSS_INDEX_TABLE(rss_ctx) |
                          MVPP22_RSS_INDEX_TABLE_ENTRY(i);

where rss_ctx is the context ID (queue number) and i is the index in
the table.

If we look at what is written:

- The first table to be written has "sel" values of 00000000..0000001f,
  containing values 0..3. This appears to be for eth1. This is table 0,
  RX queue number 0.
- The second table has "sel" values of 00000100..0000011f, and appears
  to be for eth2.  These contain values 0x20..0x23. This is table 1,
  RX queue number 0.
- The third table has "sel" values of 00000200..0000021f, and appears
  to be for eth3.  These contain values 0x40..0x43. This is table 2,
  RX queue number 0.

How do queue numbers translate to the RSS table?  There is another
table - the RXQ2RSS table, indexed by the MVPP22_RSS_INDEX_QUEUE field
of MVPP22_RSS_INDEX and accessed through the MVPP22_RXQ2RSS_TABLE
register. Before 895586d5dc32, it was:

       mvpp2_write(priv, MVPP22_RSS_INDEX,
                   MVPP22_RSS_INDEX_QUEUE(port->first_rxq));
       mvpp2_write(priv, MVPP22_RXQ2RSS_TABLE,
                   MVPP22_RSS_TABLE_POINTER(port->id));

and after:

       mvpp2_write(priv, MVPP22_RSS_INDEX, MVPP22_RSS_INDEX_QUEUE(ctx));
       mvpp2_write(priv, MVPP22_RXQ2RSS_TABLE, MVPP22_RSS_TABLE_POINTER(ctx));

Before the commit, for eth2, that would've contained '32' for the
index and '1' for the table pointer - mapping queue 32 to table 1.
Remember that this is queue-high.queue-low of 4.0.

After the commit, we appear to map queue 1 to table 1. That again
looks fine on the face of it.

Section 9.3.1 of the A8040 manual seems indicate the reason that the
queue number is separated. queue-low seems to always come from the
classifier, whereas queue-high can be from the ingress physical port
number or the classifier depending on the MVPP2_CLS_SWFWD_PCTRL_REG.

We set the port bit in MVPP2_CLS_SWFWD_PCTRL_REG, meaning that queue-high
comes from the MVPP2_CLS_SWFWD_P2HQ_REG() register... and this seems to
be where our bug comes from.

mvpp2_cls_oversize_rxq_set() sets this up as:

        mvpp2_write(port->priv, MVPP2_CLS_SWFWD_P2HQ_REG(port->id),
                    (port->first_rxq >> MVPP2_CLS_OVERSIZE_RXQ_LOW_BITS));

        val = mvpp2_read(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG);
        val |= MVPP2_CLS_SWFWD_PCTRL_MASK(port->id);
        mvpp2_write(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG, val);

Setting the MVPP2_CLS_SWFWD_PCTRL_MASK bit means that the queue-high
for eth2 is _always_ 4, so only queues 32 through 39 inclusive are
available to eth2. Yet, we're trying to tell the classifier to set
queue-high, which will be ignored, to zero. Hence, the queue-high
field (MVPP22_CLS_C2_ATTR0_QHIGH()) from the classifier will be
ignored.

This means we end up directing traffic from eth2 not to queue 1, but
to queue 33, and then we tell it to look up queue 33 in the RSS table.
However, RSS table has not been programmed for queue 33, and so it ends
up (presumably) dropping the packets.

It seems that mvpp22_rss_context_create() doesn't take account of the
fact that the upper 5 bits of the queue ID can't actually be changed
due to the settings in mvpp2_cls_oversize_rxq_set(), _or_ it seems that
mvpp2_cls_oversize_rxq_set() has been missed in this commit. Either
way, these two functions mutually disagree with what queue number
should be used.

Looking deeper into what mvpp2_cls_oversize_rxq_set() and the MTU
validation is doing, it seems that MVPP2_CLS_SWFWD_P2HQ_REG() is used
for over-sized packets attempting to egress through this port. With
the classifier having had RSS enabled and directing eth2 traffic to
queue 1, we may still have packets appearing on queue 32 for this port.

However, the only way we may end up with over-sized packets attempting
to egress through eth2 - is if the A8040 forwards frames between its
ports. From what I can see, we don't support that feature, and the
kernel restricts the egress packet size to the MTU. In any case, if we
were to attempt to transmit an oversized packet, we have no support in
the kernel to deal with that appearing in the port's receive queue.

So, this patch attempts to solve the issue by clearing the
MVPP2_CLS_SWFWD_PCTRL_MASK() bit, allowing MVPP22_CLS_C2_ATTR0_QHIGH()
from the classifier to define the queue-high field of the queue number.

My testing seems to confirm my findings above - clearing this bit
means that if I enable rxhash on eth2, the interface can then pass
traffic, as we are now directing traffic to RX queue 1 rather than
queue 33. Traffic still seems to work with rxhash off as well.

Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Fixes: 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
index 4344a59c823f..6122057d60c0 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
@@ -1070,7 +1070,7 @@ void mvpp2_cls_oversize_rxq_set(struct mvpp2_port *port)
 		    (port->first_rxq >> MVPP2_CLS_OVERSIZE_RXQ_LOW_BITS));
 
 	val = mvpp2_read(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG);
-	val |= MVPP2_CLS_SWFWD_PCTRL_MASK(port->id);
+	val &= ~MVPP2_CLS_SWFWD_PCTRL_MASK(port->id);
 	mvpp2_write(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG, val);
 }
 
-- 
2.25.1


  parent reply	other threads:[~2020-05-28 12:08 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 11:55 [PATCH AUTOSEL 5.6 01/47] ARC: Fix ICCM & DCCM runtime size checks Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 02/47] ARC: [plat-eznps]: Restrict to CONFIG_ISA_ARCOMPACT Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 03/47] efi/libstub: Avoid returning uninitialized data from setup_graphics() Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 04/47] evm: Fix RCU list related warnings Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 05/47] scsi: pm: Balance pm_only counter of request queue during system resume Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 06/47] efi/earlycon: Fix early printk for wider fonts Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 07/47] x86/hyperv: Properly suspend/resume reenlightenment notifications Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 08/47] dmaengine: ti: k3-udma: Fix TR mode flags for slave_sg and memcpy Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 09/47] i2c: altera: Fix race between xfer_msg and isr thread Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 10/47] io_uring: initialize ctx->sqo_wait earlier Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 11/47] io_uring: don't prepare DRAIN reqs twice Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 12/47] io_uring: fix FORCE_ASYNC req preparation Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 13/47] net: phy: propagate an error back to the callers of phy_sfp_probe Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 14/47] net sched: fix reporting the first-time use timestamp Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 15/47] x86/mmiotrace: Use cpumask_available() for cpumask_var_t variables Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 16/47] net: bmac: Fix read of MAC address from ROM Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 17/47] r8152: support additional Microsoft Surface Ethernet Adapter variant Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 18/47] drm/edid: Add Oculus Rift S to non-desktop list Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 19/47] s390/mm: fix set_huge_pte_at() for empty ptes Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 20/47] io_uring: reset -EBUSY error when io sq thread is waken up Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 21/47] drm/amd/display: DP training to set properly SCRAMBLING_DISABLE Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 22/47] riscv: Fix print_vm_layout build error if NOMMU Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 23/47] wireguard: selftests: use newer iproute2 for gcc-10 Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 24/47] wireguard: queueing: preserve flow hash across packet scrubbing Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 25/47] null_blk: return error for invalid zone size Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 26/47] net: ethernet: ti: fix some return value check of cpsw_ale_create() Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 27/47] net: sgi: ioc3-eth: Fix return value check in ioc3eth_probe() Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 28/47] felix: Fix initialization of ioremap resources Sasha Levin
2020-05-28 11:55 ` Sasha Levin [this message]
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 30/47] net/ethernet/freescale: rework quiesce/activate for ucc_geth Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 31/47] net: ethernet: stmmac: Enable interface clocks on probe for IPQ806x Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 32/47] mlxsw: spectrum: Fix use-after-free of split/unsplit/type_set in case reload fails Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 33/47] selftests: mlxsw: qos_mc_aware: Specify arping timeout as an integer Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 34/47] r8169: fix OCP access on RTL8117 Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 35/47] net: mscc: ocelot: fix address ageing time (again) Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 36/47] net: sun: fix missing release regions in cas_init_one() Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 37/47] net/mlx5: Add command entry handling completion Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 38/47] net/mlx5: Fix a race when moving command interface to events mode Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 39/47] net/mlx5e: Fix inner tirs handling Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 40/47] net/mlx5: Fix memory leak in mlx5_events_init Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 41/47] net/mlx5: Fix cleaning unmanaged flow tables Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 42/47] net/mlx5e: Update netdev txq on completions during closure Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 43/47] net/mlx5: Fix error flow in case of function_setup failure Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 44/47] net: Fix return value about devm_platform_ioremap_resource() Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 45/47] net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend Sasha Levin
2020-05-28 11:55 ` [PATCH AUTOSEL 5.6 46/47] net/mlx4_core: fix a memory leak bug Sasha Levin
2020-05-28 11:56 ` [PATCH AUTOSEL 5.6 47/47] net: smsc911x: Fix runtime PM imbalance on error Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200528115600.1405808-29-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcroce@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rmk+kernel@armlinux.org.uk \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).