netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Alvin Šipraga" <ALSI@bang-olufsen.dk>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: "Alvin Šipraga" <alvin@pqrs.dk>,
	"Linus Walleij" <linus.walleij@linaro.org>,
	"Andrew Lunn" <andrew@lunn.ch>,
	"Vivien Didelot" <vivien.didelot@gmail.com>,
	"Florian Fainelli" <f.fainelli@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Luiz Angelo Daros de Luca" <luizluca@gmail.com>,
	"Arınç ÜNAL" <arinc.unal@arinc9.com>,
	"Michael Rasmussen" <MIR@bang-olufsen.dk>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next 2/2] net: dsa: realtek: rtl8365mb: serialize indirect PHY register access
Date: Thu, 17 Feb 2022 12:51:37 +0000	[thread overview]
Message-ID: <87bkz5r6zq.fsf@bang-olufsen.dk> (raw)
In-Reply-To: <20220217111709.x5g6alnhz3njo4t2@skbuf> (Vladimir Oltean's message of "Thu, 17 Feb 2022 13:17:09 +0200")

Vladimir Oltean <olteanv@gmail.com> writes:

> On Thu, Feb 17, 2022 at 07:41:32AM +0000, Alvin Šipraga wrote:
>> Vladimir Oltean <olteanv@gmail.com> writes:
>> 
>> > On Wed, Feb 16, 2022 at 05:05:00PM +0100, Alvin Šipraga wrote:
>> >> From: Alvin Šipraga <alsi@bang-olufsen.dk>
>> >> 
>> >> Realtek switches in the rtl8365mb family can access the PHY registers of
>> >> the internal PHYs via the switch registers. This method is called
>> >> indirect access. At a high level, the indirect PHY register access
>> >> method involves reading and writing some special switch registers in a
>> >> particular sequence. This works for both SMI and MDIO connected
>> >> switches.
>> >> 
>> >> Currently the rtl8365mb driver does not take any care to serialize the
>> >> aforementioned access to the switch registers. In particular, it is
>> >> permitted for other driver code to access other switch registers while
>> >> the indirect PHY register access is ongoing. Locking is only done at the
>> >> regmap level. This, however, is a bug: concurrent register access, even
>> >> to unrelated switch registers, risks corrupting the PHY register value
>> >> read back via the indirect access method described above.
>> >> 
>> >> Arınç reported that the switch sometimes returns nonsense data when
>> >> reading the PHY registers. In particular, a value of 0 causes the
>> >> kernel's PHY subsystem to think that the link is down, but since most
>> >> reads return correct data, the link then flip-flops between up and down
>> >> over a period of time.
>> >> 
>> >> The aforementioned bug can be readily observed by:
>> >> 
>> >>  1. Enabling ftrace events for regmap and mdio
>> >>  2. Polling BSMR PHY register for a connected port;
>> >>     it should always read the same (e.g. 0x79ed)
>> >>  3. Wait for step 2 to give a different value
>> >> 
>> >> Example command for step 2:
>> >> 
>> >>     while true; do phytool read swp2/2/0x01; done
>> >> 
>> >> On my i.MX8MM, the above steps will yield a bogus value for the BSMR PHY
>> >> register within a matter of seconds. The interleaved register access it
>> >> then evident in the trace log:
>> >> 
>> >>  kworker/3:4-70      [003] .......  1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd
>> >>      phytool-16816   [002] .......  1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0
>> >>  kworker/3:4-70      [003] .......  1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0
>> >>      phytool-16816   [002] .......  1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69
>> >>  kworker/3:4-70      [003] .......  1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0
>> >>      phytool-16816   [002] .......  1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041
>> >>  kworker/3:4-70      [003] .......  1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0
>> >>      phytool-16816   [002] .......  1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1
>> >>  kworker/3:4-70      [003] .......  1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be
>> >>      phytool-16816   [002] .......  1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0
>> >>  kworker/3:4-70      [003] .......  1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0
>> >>      phytool-16816   [002] .......  1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0
>> >>      phytool-16816   [002] .......  1927.142641: mdio_access: SMI-0 read  phy:0x02 reg:0x01 val:0x0000 <- ?!
>> >>  kworker/3:4-70      [003] .......  1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0
>> >>  kworker/3:4-70      [003] .......  1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89
>> >>  kworker/3:4-70      [003] .......  1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be
>> >>  kworker/3:4-70      [003] .......  1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0
>> >>  kworker/3:4-70      [003] .......  1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0
>> >>  kworker/3:4-70      [003] .......  1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6
>> >> 
>> >> The kworker here is polling MIB counters for stats, as evidenced by the
>> >> register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG). This
>> >> polling is performed every 3 seconds, but is just one example of such
>> >> unsynchronized access.
>> >> 
>> >> Further investigation reveals the underlying problem: if we read from an
>> >> arbitrary register A and this read coincides with the indirect access
>> >> method in rtl8365mb_phy_ocp_read, then the final read from
>> >> RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG will always return the value in
>> >> register A. The value read back can be readily poisoned by repeatedly
>> >> reading back the value of another register A via debugfs in a busy loop
>> >> via the dd utility or similar.
>> >> 
>> >> This issue appears to be unique to the indirect PHY register access
>> >> pattern. In particular, it does not seem to impact similar sequential
>> >> register operations such MIB counter access.
>> >> 
>> >> To fix this problem, one must guard against exactly the scenario seen in
>> >> the above trace. In particular, other parts of the driver using the
>> >> regmap API must not be permitted to access the switch registers until
>> >> the PHY register access is complete. Fix this by using the newly
>> >> introduced "nolock" regmap in all PHY-related functions, and by aquiring
>> >> the regmap mutex at the top level of the PHY register access callbacks.
>> >> Although no issue has been observed with PHY register _writes_, this
>> >> change also serializes the indirect access method there. This is done
>> >> purely as a matter of convenience.
>> >> 
>> >> Fixes: 4af2950c50c8 ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
>> >> Link: https://lore.kernel.org/netdev/CAJq09z5FCgG-+jVT7uxh1a-0CiiFsoKoHYsAWJtiKwv7LXKofQ@mail.gmail.com/
>> >> Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
>> >> Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
>> >> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
>> >> ---
>> >
>> > This implementation where the indirect PHY access blocks out every other
>> > register read and write is only justified if you can prove that you can
>> > stuff just about any unrelated register read or write before
>> > RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG, and this, in and of itself,
>> > will poison what gets read back from RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG.
>> 
>> I (at least treied to) state that clearly here:
>> 
>> >> Further investigation reveals the underlying problem: if we read from an
>> >> arbitrary register A and this read coincides with the indirect access
>> >> method in rtl8365mb_phy_ocp_read, then the final read from
>> >> RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG will always return the value in
>> >> register A. The value read back can be readily poisoned by repeatedly
>> >> reading back the value of another register A via debugfs in a busy loop
>> >> via the dd utility or similar.
>> 
>> That is, I used regmap debugfs to spam reads of switch registers like,
>> for example, this one:
>> 
>> #define RTL8365MB_CFG0_MAX_LEN_REG	0x088C
>> 
>> ... which controls the MTU of the switch. This is something we set up
>> just once to be 0x600 and then it is never touched again. Now in the
>> above example, let A = 0x088C. Spamming the read of A phytool command
>> described above, I would expect to read a value 0x79c9 out of my BSMR
>> PHY register with phytool. But in cases where the read of switch
>> register A coincides with the indirect access procedure, I end up
>> reading back 0x600 from the PHY register. This is specifically because
>> the read of A (=0x600) then poisons the value in
>> RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG (should be 0x79c9, but is
>> 0x600).
>
> Yes, well, that was a bit handwavy, you didn't mention any other
> specific register, you just stated a rule which appeared to be inferred
> from little evidence.
>
>> > rtl8365mb_mib_counter_read() doesn't seem like a particularly good
>> > example to prove this, since it appears to be an indirect access
>> > procedure as well. Single register reads or writes would be ideal, like
>> > RTL8365MB_CPU_CTRL_REG, artificially inserted into strategic places.
>> > Ideally you wouldn't even have a DSA or MDIO or PHY driver running.
>> 
>> I hope it is clear from my above explanation that I did show this, if
>> you agree that RTL8365MB_CFG0_MAX_LEN_REG is just as arbitrary as
>> RTL8365MB_CPU_CTRL_REG.
>> 
>> What I meant to say here:
>> 
>> >> This issue appears to be unique to the indirect PHY register access
>> >> pattern. In particular, it does not seem to impact similar sequential
>> >> register operations such MIB counter access.
>> 
>> ... about MIB counter access (which is also indirect as you point out),
>> is that it does _not_ suffer from the above problem. The way I checked
>> this was with ethtool -S, while again spamming regmap_read of an
>> unrelated switch register like CPU_CTRL or CFG0_MAX_LEN. In this case
>> the counter values always seem sane, and I can't detect the poisoned
>> value getting read back (like 0x600 in the above example).
>> 
>> > Just a simple kernel module with access to the regmap, and try to read
>> > something known, like the PHY ID of one of the internal PHYs, via an
>> > open-coded function. Then add extra regmap accesses and see what
>> > corrupts the indirect PHY access procedure.
>> 
>> The switch is generally idle and I did my testing with the periodic MIB
>> counter disabled, so I think what you describe is not far off from what
>> I did. The only difference is that the switch was already configured and
>> switching packets. I used ftrace events to verify the phenomenon.
>> 
>> If you are still not persuaded, just write me back here, and I will go
>> ahead and implement such a test module. But it seems like you
>> misunderstood my initial commit message, so perhaps I just need to
>> rephrase it?
>
> If the problem you've identified is correct, then this simple test
> module would yield the exact same result, yet would eliminate beyond any
> doubt the timing and other circumstantial factors, and you could also
> do better testing of the PHY write sequence, and MIB counter reads.
> And if simply inserting a stray register access in the middle of the PHY
> read procedure doesn't produce the same result, this would be new
> information. It shouldn't even be too hard to do.
>
<snip>
>
> I have little to no problem with the workaround you've implemented, it's
> just that extraordinary claims require extraordinary proof. Having a
> standalone kernel module that can deterministically and not statistically
> reproduce the bug would go a long way.

Thanks Vladimir, I very much appreciate your scrutiny here. I'll make
the test module to verify the claims I have made. In the mean time I
asked Realtek if they have any comment.

Kind regards,
Alvin

  reply	other threads:[~2022-02-17 12:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16 16:04 [PATCH net-next 0/2] net: dsa: realtek: fix PHY register read corruption Alvin Šipraga
2022-02-16 16:04 ` [PATCH net-next 1/2] net: dsa: realtek: allow subdrivers to externally lock regmap Alvin Šipraga
2022-02-16 16:05 ` [PATCH net-next 2/2] net: dsa: realtek: rtl8365mb: serialize indirect PHY register access Alvin Šipraga
2022-02-16 23:39   ` Vladimir Oltean
2022-02-17  3:01     ` Luiz Angelo Daros de Luca
2022-02-17  8:16       ` Alvin Šipraga
2022-02-22  0:18         ` Luiz Angelo Daros de Luca
2022-02-17  7:41     ` Alvin Šipraga
2022-02-17 11:17       ` Vladimir Oltean
2022-02-17 12:51         ` Alvin Šipraga [this message]
2022-02-21 14:50           ` Alvin Šipraga
2022-02-21 17:15             ` Vladimir Oltean
2022-02-21 18:10               ` Alvin Šipraga
2022-02-16 17:57 ` [PATCH net-next 0/2] net: dsa: realtek: fix PHY register read corruption Luiz Angelo Daros de Luca
2022-02-16 18:23   ` Alvin Šipraga
2022-02-16 19:11     ` Andrew Lunn
2022-02-16 19:26       ` Alvin Šipraga
2022-02-17 12:12         ` Andrew Lunn
2022-02-17 13:09           ` Alvin Šipraga
2022-02-17 13:32             ` Andrew Lunn
2022-02-17  4:28     ` Luiz Angelo Daros de Luca
2022-02-17  7:53       ` Alvin Šipraga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bkz5r6zq.fsf@bang-olufsen.dk \
    --to=alsi@bang-olufsen.dk \
    --cc=MIR@bang-olufsen.dk \
    --cc=alvin@pqrs.dk \
    --cc=andrew@lunn.ch \
    --cc=arinc.unal@arinc9.com \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linus.walleij@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luizluca@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).