linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Eggers <ceggers@arri.de>
To: Vladimir Oltean <olteanv@gmail.com>,
	Woojung Huh <woojung.huh@microchip.com>,
	Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Florian Fainelli <f.fainelli@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	Christian Eggers <ceggers@arri.de>
Subject: [net v2] net: dsa: microchip: fix race condition
Date: Tue, 6 Oct 2020 17:56:51 +0200	[thread overview]
Message-ID: <20201006155651.21473-1-ceggers@arri.de> (raw)

Between queuing the delayed work and finishing the setup of the dsa
ports, the process may sleep in request_module() (via
phy_device_create()) and the queued work may be executed prior to the
switch net devices being registered. In ksz_mib_read_work(), a NULL
dereference will happen within netof_carrier_ok(dp->slave).

Not queuing the delayed work in ksz_init_mib_timer() makes things even
worse because the work will now be queued for immediate execution
(instead of 2000 ms) in ksz_mac_link_down() via
dsa_port_link_register_of().

Call tree:
ksz9477_i2c_probe()
\--ksz9477_switch_register()
   \--ksz_switch_register()
      +--dsa_register_switch()
      |  \--dsa_switch_probe()
      |     \--dsa_tree_setup()
      |        \--dsa_tree_setup_switches()
      |           +--dsa_switch_setup()
      |           |  +--ksz9477_setup()
      |           |  |  \--ksz_init_mib_timer()
      |           |  |     |--/* Start the timer 2 seconds later. */
      |           |  |     \--schedule_delayed_work(&dev->mib_read, msecs_to_jiffies(2000));
      |           |  \--__mdiobus_register()
      |           |     \--mdiobus_scan()
      |           |        \--get_phy_device()
      |           |           +--get_phy_id()
      |           |           \--phy_device_create()
      |           |              |--/* sleeping, ksz_mib_read_work() can be called meanwhile */
      |           |              \--request_module()
      |           |
      |           \--dsa_port_setup()
      |              +--/* Called for non-CPU ports */
      |              +--dsa_slave_create()
      |              |  +--/* Too late, ksz_mib_read_work() may be called beforehand */
      |              |  \--port->slave = ...
      |             ...
      |              +--Called for CPU port */
      |              \--dsa_port_link_register_of()
      |                 \--ksz_mac_link_down()
      |                    +--/* mib_read must be initialized here */
      |                    +--/* work is already scheduled, so it will be executed after 2000 ms */
      |                    \--schedule_delayed_work(&dev->mib_read, 0);
      \-- /* here port->slave is setup properly, scheduling the delayed work should be safe */

Solution:
1. Do not queue (only initialize) delayed work in ksz_init_mib_timer().
2. Only queue delayed work in ksz_mac_link_down() if init is completed.
3. Queue work once in ksz_switch_register(), after dsa_register_switch()
has completed.

Fixes: 7c6ff470aa ("net: dsa: microchip: add MIB counter reading support")
Signed-off-by: Christian Eggers <ceggers@arri.de>
---
v2:
---------
- no changes in the patch itself
- use correct subject-prefix
- changed wording of commit description
- added call tree to commit description
- added "Fixes:" tag


On Monday, 5 October 2020, 22:55:44 CEST, Vladimir Oltean wrote:
> On Mon, Oct 05, 2020 at 06:08:29PM +0200, Christian Eggers wrote:
> [...]
> Read this for more details
> https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html
Thank for pointing this out.

> FYI, you haven't even addressed the root cause of the problem, which is
> ksz_mib_read_work sticking its nose where it's not supposed to:
> 
> 		/* Only read MIB counters when the port is told to do.
> 		 * If not, read only dropped counters when link is not up.
> 		 */
> 		if (!p->read) {
> 			const struct dsa_port *dp = dsa_to_port(dev->ds, i);
> 
> 			if (!netif_carrier_ok(dp->slave))
> 				mib->cnt_ptr = dev->reg_mib_cnt;
> 		}
> 
> This is simply Not Ok.
> Not only the dp->slave is on purpose registered outside of the driver's
> control (as you came to find out yourself), but not even all ports are
> user ports. For example, the CPU port doesn't have a valid struct
> net_device *slave pointer. You are just lucky that it's defined like
> this:
> 
> struct dsa_port {
> 	/* A CPU port is physically connected to a master device.
> 	 * A user port exposed to userspace has a slave device.
> 	 */
> 	union {
> 		struct net_device *master;
> 		struct net_device *slave;
> 	};
> 
> so the code is in fact checking the status of the master interface's link.
> But DSA doesn't assume that the *master and *slave pointers are under a
> union. That can change any day, and when it changes, the KSZ driver will
> break.
Calling dsa_is_user_port() before calling netif_carrier_ok() could improve the
situation a little bit...

		/* Only read MIB counters when the port is told to do.
		 * If not, read only dropped counters when link is not up.
		 */
port_r_cnt() is called independently of p->read and netif_carrier_ok()... What
is correct here (comment or code)?

> My personal feeling is that this driver hides a landmine beneath every
> line of code, and it isn't getting better.
https://github.com/Microchip-Ethernet/EVB-KSZ9477 ...

> Sure, you should absolutely add the call stack to the commit message,
> but how many people are going to git blame so they can see it. The code
> needs to be obviously correct.
> 
> Things like needing to check dev->mib_read_interval as an indication
> whether the race between ksz_mac_link_down and ksz_switch_register is
> over are exactly the type of things that make it not fun to follow.
I needed some amount of time to understand the segfault and to draw the
call stack...

> If reading MIB counters for ports that are down is such a "waste of time"
> as per commit 7c6ff470aa867f53b8522a3a5c84c36ac7a20090, then how about
> scheduling the delayed work from .phylink_mac_link_up, and canceling it
> from .phylink_mac_link_down? Either that, or set a boolean variable to
> struct ksz_port p->link_up, to true or false respectively from the
> phylink callbacks, and using that as an indication whether to read the
> MIB counters or not, instead of accessing the potentially invalid
> dp->slave pointer? Would that work?
I am definitely not an expert for this driver. For starting/stopping the
delayed work on demand, a separate work struct for each port could be useful.
In this case, struct ksz_port would need a pointer to the ksz_device struct,
as the ports are allocated seperately and container_of() cannot be used.

Using a bool variable has the property, that reading the MIB will not be
performed "immediately" after phylink_mac_down(). But if I am correct, this
is also not the case today as the work is typically already queued when
ksz_mac_link_down() is executed.

- First call of ksz_mac_link_down:
Work is already queued (prior this patch) or will not be queued (after this
patch).

- Further calls:
Work is already queued (it requeues itself).

Result (please verify):
- Not scheduling the work in ksz_mac_link_down() won't change anything.
- Checking for mib_read_interval in ksz_switch_remove() can be obmitted,
  as the condition is always true when ksz_switch_remove() is called.

> Sorry for rambling. I realize that there aren't probably a lot of things
> you can do better to fix this problem for stable, but maybe you could
> take some time and clean it up a little bit?
My original intention is to upstream the PTP functionality I have developed. I
hope that I can get this into without cleaning up the whole driver...

If you can confirm that checking mib_read_interval is not necessary in
ksz_switch_remove() and that scheduling the work in ksz_mac_link_down() makes
no sense, I can either extend this patch or create a follup up patch.

Best regards
Christian

 drivers/net/dsa/microchip/ksz_common.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 8e755b50c9c1..a94d2278b95c 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -103,14 +103,8 @@ void ksz_init_mib_timer(struct ksz_device *dev)
 
 	INIT_DELAYED_WORK(&dev->mib_read, ksz_mib_read_work);
 
-	/* Read MIB counters every 30 seconds to avoid overflow. */
-	dev->mib_read_interval = msecs_to_jiffies(30000);
-
 	for (i = 0; i < dev->mib_port_cnt; i++)
 		dev->dev_ops->port_init_cnt(dev, i);
-
-	/* Start the timer 2 seconds later. */
-	schedule_delayed_work(&dev->mib_read, msecs_to_jiffies(2000));
 }
 EXPORT_SYMBOL_GPL(ksz_init_mib_timer);
 
@@ -143,7 +137,9 @@ void ksz_mac_link_down(struct dsa_switch *ds, int port, unsigned int mode,
 
 	/* Read all MIB counters when the link is going down. */
 	p->read = true;
-	schedule_delayed_work(&dev->mib_read, 0);
+	/* timer started */
+	if (dev->mib_read_interval)
+		schedule_delayed_work(&dev->mib_read, 0);
 }
 EXPORT_SYMBOL_GPL(ksz_mac_link_down);
 
@@ -446,6 +442,12 @@ int ksz_switch_register(struct ksz_device *dev,
 		return ret;
 	}
 
+	/* Read MIB counters every 30 seconds to avoid overflow. */
+	dev->mib_read_interval = msecs_to_jiffies(30000);
+
+	/* Start the MIB timer. */
+	schedule_delayed_work(&dev->mib_read, 0);
+
 	return 0;
 }
 EXPORT_SYMBOL(ksz_switch_register);
-- 
Christian Eggers
Embedded software developer

Arnold & Richter Cine Technik GmbH & Co. Betriebs KG
Sitz: Muenchen - Registergericht: Amtsgericht Muenchen - Handelsregisternummer: HRA 57918
Persoenlich haftender Gesellschafter: Arnold & Richter Cine Technik GmbH
Sitz: Muenchen - Registergericht: Amtsgericht Muenchen - Handelsregisternummer: HRB 54477
Geschaeftsfuehrer: Dr. Michael Neuhaeuser; Stephan Schenk; Walter Trauninger; Markus Zeiler


             reply	other threads:[~2020-10-06 15:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-06 15:56 Christian Eggers [this message]
2020-10-06 16:21 ` [net v2] net: dsa: microchip: fix race condition Vladimir Oltean
2020-10-06 16:24   ` Vladimir Oltean
2020-10-06 16:30   ` Christian Eggers
2020-10-06 16:57     ` Vladimir Oltean
2020-10-06 20:36 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201006155651.21473-1-ceggers@arri.de \
    --to=ceggers@arri.de \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=vivien.didelot@gmail.com \
    --cc=woojung.huh@microchip.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).