netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rob Herring <robh+dt@kernel.org>
To: Frank Rowand <frowand.list@gmail.com>,
	Vladimir Oltean <olteanv@gmail.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>,
	devicetree@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>
Subject: Re: of_node_put() usage is buggy all over drivers/of/base.c?!
Date: Mon, 16 Aug 2021 14:20:08 -0500	[thread overview]
Message-ID: <CAL_JsqKYd288Th2cfOp0_HD1C8xtgKjyJfUW4JLpyn0NkGdU5w@mail.gmail.com> (raw)
In-Reply-To: <4cad28e0-d6b4-800d-787b-936ffaca7be3@gmail.com>

On Mon, Aug 16, 2021 at 10:14 AM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/16/21 9:46 AM, Vladimir Oltean wrote:
> > Hi Frank,
> >
> > On Mon, Aug 16, 2021 at 09:33:03AM -0500, Frank Rowand wrote:
> >> Hi Vladimir,
> >>
> >> On 8/13/21 8:01 PM, Vladimir Oltean wrote:
> >>> Hi,
> >>>
> >>> I was debugging an RCU stall which happened during the probing of a
> >>> driver. Activating lock debugging, I see:
> >>
> >> I took a quick look at sja1105_mdiobus_register() in v5.14-rc1 and v5.14-rc6.
> >>
> >> Looking at the following stack trace, I did not see any calls to
> >> of_find_compatible_node() in sja1105_mdiobus_register().  I am
> >> guessing that maybe there is an inlined function that calls
> >> of_find_compatible_node().  This would likely be either
> >> sja1105_mdiobus_base_tx_register() or sja1105_mdioux_base_t1_register().
> >
> > Yes, it is sja1105_mdiobus_base_t1_register which is inlined.
> >
> >>>
> >>> [  101.710694] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938
> >>> [  101.719119] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1534, name: sh
> >>> [  101.726763] INFO: lockdep is turned off.
> >>> [  101.730674] irq event stamp: 0
> >>> [  101.733716] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> >>> [  101.739973] hardirqs last disabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98
> >>> [  101.748146] softirqs last  enabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98
> >>> [  101.756313] softirqs last disabled at (0): [<0000000000000000>] 0x0
> >>> [  101.762569] CPU: 4 PID: 1534 Comm: sh Not tainted 5.14.0-rc5+ #272
> >>> [  101.774558] Call trace:
> >>> [  101.794734]  __might_sleep+0x50/0x88
> >>> [  101.798297]  __mutex_lock+0x60/0x938
> >>> [  101.801863]  mutex_lock_nested+0x38/0x50
> >>> [  101.805775]  kernfs_remove+0x2c/0x50             <---- this takes mutex_lock(&kernfs_mutex);
> >>> [  101.809341]  sysfs_remove_dir+0x54/0x70
> >>
> >> The __kobject_del() occurs only if the refcount on the node
> >> becomes zero.  This should never be true when of_find_compatible_node()
> >> calls of_node_put() unless a "from" node is passed to of_find_compatible_node().
> >
> > I figured that was the assumption, that the of_node_put would never
> > trigger a sysfs file / kobject deletion from there.
> >
> >> In both sja1105_mdiobus_base_tx_register() and sja1105_mdioux_base_t1_register()
> >> a from node ("mdio") is passed to of_find_compatible_node() without first doing an
> >> of_node_get(mdio).  If you add the of_node_get() calls the problem should be fixed.
> >
> > The answer seems simple enough, but stupid question, but why does
> > of_find_compatible_node call of_node_put on "from" in the first place?
>
> Actually a good question.
>
> I do not know why of_find_compatible_node() calls of_node_put() instead of making
> the caller of of_find_compatible_node() responsible.  That pattern was created
> long before I was involved in devicetree and I have not gone back to read the
> review comments of when that code was created.

Because it is an iterator function and they all drop the ref from the
prior iteration.

I would say any open coded call where from is not NULL is an error.
It's not reliable because the DT search order is not defined and could
change. Someone want to write a coccinelle script to check that?

The above code should be using of_get_compatible_child() instead.

Rob

  reply	other threads:[~2021-08-16 19:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-14  1:01 of_node_put() usage is buggy all over drivers/of/base.c?! Vladimir Oltean
2021-08-16 14:33 ` Frank Rowand
2021-08-16 14:46   ` Vladimir Oltean
2021-08-16 15:14     ` Frank Rowand
2021-08-16 19:20       ` Rob Herring [this message]
2021-08-16 19:56         ` Frank Rowand
2021-08-16 20:00           ` Frank Rowand
2021-08-16 20:25           ` Rob Herring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL_JsqKYd288Th2cfOp0_HD1C8xtgKjyJfUW4JLpyn0NkGdU5w@mail.gmail.com \
    --to=robh+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=s.hauer@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).