From: "Alvin Šipraga" <ALSI@bang-olufsen.dk>
To: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"Linus Walleij" <linus.walleij@linaro.org>,
"Arınç ÜNAL" <arinc.unal@arinc9.com>
Subject: Re: net: dsa: realtek: silent indirect reg read failures on SMP
Date: Wed, 9 Feb 2022 10:32:17 +0000 [thread overview]
Message-ID: <878rukib4f.fsf@bang-olufsen.dk> (raw)
In-Reply-To: <CAJq09z5FCgG-+jVT7uxh1a-0CiiFsoKoHYsAWJtiKwv7LXKofQ@mail.gmail.com> (Luiz Angelo Daros de Luca's message of "Sun, 6 Feb 2022 17:26:58 -0300")
Hi Luiz,
Thanks very much for the bug report!
Luiz Angelo Daros de Luca <luizluca@gmail.com> writes:
> Hello,
>
> Arinç reported an issue with interfaces going down and up in a fixed
> interval (a little less than 40s). It happened with both SMI or MDIO
> interfaces for two different rtl8365mb supported models.
Is this using the Generic PHY driver or the Realtek RTL8365MB-VC PHY
driver? If it's the latter then I would be interested to know whether
the interrupt counters are going up too. Maybe the switch really has
something to say?
I think the Generic PHY driver is based on polling of PHY registers,
which seems more likely to trigger the issue you describe.
In any case, I think I can reproduce your issue. Using an SMI setup with
just one cable attached to port 2, I did the following:
1. Enable ftrace event logging for regmap and mdio
2. Poll BSMR PHY register for my connected port; it should always read
the same (0x79ed).
3. Wait for 2 to read out something else.
So I used this command:
while true; do phytool read swp2/2/0x01; sleep 0.1; done
After a couple of seconds I already read out 0x0000. And after checking
the ftrace log I can see that this is because of some interleaved
register access:
kworker/3:4-70 [003] ....... 1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd
phytool-16816 [002] ....... 1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0
kworker/3:4-70 [003] ....... 1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0
phytool-16816 [002] ....... 1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69
kworker/3:4-70 [003] ....... 1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0
phytool-16816 [002] ....... 1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041
kworker/3:4-70 [003] ....... 1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0
phytool-16816 [002] ....... 1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1
kworker/3:4-70 [003] ....... 1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be
phytool-16816 [002] ....... 1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0
kworker/3:4-70 [003] ....... 1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0
phytool-16816 [002] ....... 1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0
phytool-16816 [002] ....... 1927.142641: mdio_access: SMI-0 read phy:0x02 reg:0x01 val:0x0000
kworker/3:4-70 [003] ....... 1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0
kworker/3:4-70 [003] ....... 1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89
kworker/3:4-70 [003] ....... 1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be
kworker/3:4-70 [003] ....... 1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0
kworker/3:4-70 [003] ....... 1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0
kworker/3:4-70 [003] ....... 1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6
The kworker here is polling MIB counters for stats, as evidenced by the
register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG).
Sometimes I can even read out completely junk values if I tighten my
phytool loop:
# while true; do phytool read swp2/2/0x01; done | uniq
0x79ed
0000
0x79ed
0000
0x79ed
0000
0x79ed
0x0014 <-- ?!
This is just a preliminary investigation - I only checked your mail
today - but I will follow up on this next week and try and make a
fix. In the mean time could you and Arinç perhaps do some similar
tracing and confirm that in your case it is a similar issue? You needn't
use phytool if the issue reproduces itself every 40 seconds - in that
case just enable ftrace events and inspect the register access sequence
when a PHY read gives an unexpected value.
Let's please restrict the investigation to registers which we expect to
give well-defined values though - unused or otherwise inactive registers
may very well give us junk, but it is harder to argue that this is a
bug. Anyway, this seems pretty easy to reproduce, so I don't think you
need to go down that path :-)
Overall we probably need some stricter locking but I want to spend some
time on this before just sending a big-fat-lock solution to the
list. I didn't try your patch yet due to other items in my inbox today.
Kind regards,
Alvin
>
> The indirect read procedure is something like this:
>
> rtl8365mb_phy_read
> rtl8365mb_phy_ocp_read
> rtl8365mb_phy_poll_busy
> regmap_read_poll_timeout(RTL8365MB_INDIRECT_ACCESS_STATUS_REG)
> rtl8365mb_phy_ocp_prepare
> regmap_update_bits(RTL8365MB_GPHY_OCP_MSB_0_REG)
> regmap_write(RTL8365MB_INDIRECT_ACCESS_ADDRESS_REG)
> regmap_write(RTL8365MB_INDIRECT_ACCESS_CTRL_REG)
> rtl8365mb_phy_poll_busy
> regmap_read_poll_timeout(RTL8365MB_INDIRECT_ACCESS_STATUS_REG)
> regmap_read(RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG)
>
> And the write is similar. The regmap read/write does have its own
> mutex. realtek-mdio does use a sequence of read/writes to mdio for
> each read/write ops but beyond regmap internal locks, I also lock
> bus->mdio_lock.
>
> In a non-SMP system or with extra cores disabled, everything works as
> expected. However, with an SMP system, there is some kind of
> concurrent access to registers and
> regmap_read(RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG) will eventually
> read 0x0000. It is like the chip didn't have time to update that
> register or got lost in the way. It only happens when the system is
> idle. If you stress the CPU, the simultaneous register access gets
> unsynchronized or serialized.
>
> I can "simulate" a similar issue in a non-SMP system by dumping regmap
> from non-used registers like 0x20a0 to 0x20ff (0x20a0 is related to
> port 5 state, but I have 0,1,2,3,4 + 6,7). There are other regions
> that cause the same behavior but [0x20a0,0x20ff] is the first one I
> found. It never failed while reading good memory regions. Maybe it is
> just caused by another issue like the chip getting confused while
> reading undefined memory regions.
>
> for a in $(seq 1000); do dd
> if=/sys/kernel/debug/regmap/mdio-bus:1d/registers bs=11
> skip=$((0x20a0)) count=$((0x1)) 2>/dev/null; done
>
> (That range might not be precise as I don't know if dd is enough to
> select only a specific range)
>
> While the script is running, I can see both the driver and the debugfs
> dump flicking between good data, garbage and zero. When DSA reads
> 0x0000, it brings the port down and up.
>
> I tried to add a simple mutex over rtl8365mb_phy_ocp_read call but it
> still failed to prevent the event in a SMP system (it would not
> protect the debugfs dump). Maybe I'm missing something obvious.
>
> I also wonder if all those functions that use a sequence of
> regmap_{read,write} might need a mutex to protect from parallel
> execution. I didn't find a way to create a "regmap transaction" by
> locking it externally as its lock control is private as well as the
> non-locked _regmap_read(). I could disable its lock and do it inside
> the driver, but it would also disable the debugfs dump and add a bunch
> of new code. And I don't even know if it would fix the issue.
> Is there a better way to deal with this issue? Maybe force dsa polling
> to use a single processor?
>
> This is the failed patch.
>
> diff --git a/drivers/net/dsa/realtek/rtl8365mb.c
> b/drivers/net/dsa/realtek/rtl8365mb.c
> index 1a0562811c1a..5572271a2514 100644
> --- a/drivers/net/dsa/realtek/rtl8365mb.c
> +++ b/drivers/net/dsa/realtek/rtl8365mb.c
> @@ -817,6 +817,7 @@ struct rtl8365mb_port {
> * @port_mask: mask of all ports
> * @learn_limit_max: maximum number of L2 addresses the chip can learn
> * @mib_lock: prevent concurrent reads of MIB counters
> + * @indirect_reg_lock: prevent concurrent access of indirect registers
> * @ports: per-port data
> * @jam_table: chip-specific initialization jam table
> * @jam_size: size of the chip's jam table
> @@ -832,6 +833,7 @@ struct rtl8365mb {
> u32 port_mask;
> u32 learn_limit_max;
> struct mutex mib_lock;
> + struct mutex indirect_reg_lock;
> struct rtl8365mb_port ports[RTL8365MB_MAX_NUM_PORTS];
> const struct rtl8365mb_jam_tbl_entry *jam_table;
> size_t jam_size;
> @@ -989,6 +991,7 @@ static int rtl8365mb_phy_ocp_write(struct
> realtek_priv *priv, int phy,
>
> static int rtl8365mb_phy_read(struct realtek_priv *priv, int phy, int regnum)
> {
> + struct rtl8365mb *mb = priv->chip_data;
> u32 ocp_addr;
> u16 val;
> int ret;
> @@ -1001,16 +1004,24 @@ static int rtl8365mb_phy_read(struct
> realtek_priv *priv, int phy, int regnum)
>
> ocp_addr = RTL8365MB_PHY_OCP_ADDR_PHYREG_BASE + regnum * 2;
>
> - ret = rtl8365mb_phy_ocp_read(priv, phy, ocp_addr, &val);
> + ret = mutex_lock_interruptible(&mb->indirect_reg_lock);
> if (ret) {
> dev_err(priv->dev,
> - "failed to read PHY%d reg %02x @ %04x, ret %d\n", phy,
> + "read PHY%d reg %02x @ %04x, ret %d interrupted\n", phy,
> regnum, ocp_addr, ret);
> return ret;
> }
>
> - dev_dbg(priv->dev, "read PHY%d register 0x%02x @ %04x, val <- %04x\n",
> - phy, regnum, ocp_addr, val);
> + ret = rtl8365mb_phy_ocp_read(priv, phy, ocp_addr, &val);
> + if (ret)
> + dev_err(priv->dev,
> + "failed to read PHY%d reg %02x @ %04x, ret %d\n", phy,
> + regnum, ocp_addr, ret);
> + else
> + dev_dbg(priv->dev, "read PHY%d register 0x%02x @ %04x,
> val <- %04x\n",
> + phy, regnum, ocp_addr, val);
> +
> + mutex_unlock(&mb->indirect_reg_lock);
>
> return val;
> }
> @@ -1018,6 +1029,7 @@ static int rtl8365mb_phy_read(struct
> realtek_priv *priv, int phy, int regnum)
> static int rtl8365mb_phy_write(struct realtek_priv *priv, int phy, int regnum,
> u16 val)
> {
> + struct rtl8365mb *mb = priv->chip_data;
> u32 ocp_addr;
> int ret;
>
> @@ -1029,18 +1041,26 @@ static int rtl8365mb_phy_write(struct
> realtek_priv *priv, int phy, int regnum,
>
> ocp_addr = RTL8365MB_PHY_OCP_ADDR_PHYREG_BASE + regnum * 2;
>
> - ret = rtl8365mb_phy_ocp_write(priv, phy, ocp_addr, val);
> + ret = mutex_lock_interruptible(&mb->indirect_reg_lock);
> if (ret) {
> dev_err(priv->dev,
> - "failed to write PHY%d reg %02x @ %04x, ret %d\n", phy,
> + "write PHY%d reg %02x @ %04x, ret %d
> interrupted\n", phy,
> regnum, ocp_addr, ret);
> return ret;
> }
>
> - dev_dbg(priv->dev, "write PHY%d register 0x%02x @ %04x, val -> %04x\n",
> - phy, regnum, ocp_addr, val);
> + ret = rtl8365mb_phy_ocp_write(priv, phy, ocp_addr, val);
> + if (ret)
> + dev_err(priv->dev,
> + "failed to write PHY%d reg %02x @ %04x, ret %d\n", phy,
> + regnum, ocp_addr, ret);
> + else
> + dev_dbg(priv->dev, "write PHY%d register 0x%02x @
> %04x, val -> %04x\n",
> + phy, regnum, ocp_addr, val);
>
> - return 0;
> + mutex_unlock(&mb->indirect_reg_lock);
> +
> + return ret;
> }
>
> static int rtl8365mb_dsa_phy_read(struct dsa_switch *ds, int phy, int regnum)
> @@ -2215,6 +2235,8 @@ static int rtl8365mb_setup(struct dsa_switch *ds)
>
> mb = priv->chip_data;
>
> + mutex_init(&mb->indirect_reg_lock);
> +
> ret = rtl8365mb_reset_chip(priv);
> if (ret) {
> dev_err(priv->dev, "failed to reset chip: %d\n", ret);
> --
> 2.34.1
>
>
> ---
> Luiz
next prev parent reply other threads:[~2022-02-09 10:51 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-06 20:26 net: dsa: realtek: silent indirect reg read failures on SMP Luiz Angelo Daros de Luca
2022-02-09 10:32 ` Alvin Šipraga [this message]
2022-02-09 23:28 ` Luiz Angelo Daros de Luca
2022-02-10 3:10 ` Luiz Angelo Daros de Luca
2022-02-10 8:13 ` Alvin Šipraga
2022-02-10 16:01 ` Alvin Šipraga
2022-02-11 5:40 ` Luiz Angelo Daros de Luca
2022-02-11 8:41 ` Alvin Šipraga
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878rukib4f.fsf@bang-olufsen.dk \
--to=alsi@bang-olufsen.dk \
--cc=arinc.unal@arinc9.com \
--cc=linus.walleij@linaro.org \
--cc=luizluca@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).