All of lore.kernel.org
 help / color / mirror / Atom feed
From: Herve Codina <herve.codina@bootlin.com>
To: "Nuno Sá" <noname.nuno@gmail.com>,
	"Saravana Kannan" <saravanak@google.com>,
	"Rob Herring" <robh@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	Lizhi Hou <lizhi.hou@amd.com>, Max Zhen <max.zhen@amd.com>,
	Sonal Santan <sonal.santan@amd.com>,
	Stefano Stabellini <stefano.stabellini@xilinx.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	Allan Nielsen <allan.nielsen@microchip.com>,
	Horatiu Vultur <horatiu.vultur@microchip.com>,
	Steen Hegelund <steen.hegelund@microchip.com>,
	Luca Ceresoli <luca.ceresoli@bootlin.com>,
	Nuno Sa <nuno.sa@analog.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals
Date: Tue, 5 Mar 2024 11:27:08 +0100	[thread overview]
Message-ID: <20240305112708.56869e4c@bootlin.com> (raw)
In-Reply-To: <2f497783da939f13d8c8faeab931cac0ef9c98eb.camel@gmail.com>

Hi Nuno, Saravana, Rob,

On Tue, 05 Mar 2024 08:36:45 +0100
Nuno Sá <noname.nuno@gmail.com> wrote:

> On Mon, 2024-03-04 at 22:47 -0800, Saravana Kannan wrote:
> > On Mon, Mar 4, 2024 at 8:49 AM Herve Codina <herve.codina@bootlin.com> wrote:  
> > > 
> > > Hi Rob,
> > > 
> > > On Mon, 4 Mar 2024 09:22:02 -0600
> > > Rob Herring <robh@kernel.org> wrote:
> > > 
> > > ...
> > >   
> > > > > > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct
> > > > > > overlay_changeset *ovcs)
> > > > > >  {
> > > > > >   int i;
> > > > > > 
> > > > > > + /*
> > > > > > +  * Wait for any ongoing device link removals before removing some of
> > > > > > +  * nodes. Drop the global lock while waiting
> > > > > > +  */
> > > > > > + mutex_unlock(&of_mutex);
> > > > > > + device_link_wait_removal();
> > > > > > + mutex_lock(&of_mutex);  
> > > > > 
> > > > > I'm still not convinced we need to drop the lock. What happens if
> > > > > someone else
> > > > > grabs the lock while we are in device_link_wait_removal()? Can we
> > > > > guarantee that
> > > > > we can't screw things badly?  
> > > > 
> > > > It is also just ugly because it's the callers of
> > > > free_overlay_changeset() that hold the lock and now we're releasing it
> > > > behind their back.
> > > > 
> > > > As device_link_wait_removal() is called before we touch anything, can't
> > > > it be called before we take the lock? And do we need to call it if
> > > > applying the overlay fails?  
> > 
> > Rob,
> > 
> > This[1] scenario Luca reported seems like a reason for the
> > device_link_wait_removal() to be where Herve put it. That example
> > seems reasonable.
> > 
> > [1] - https://lore.kernel.org/all/20231220181627.341e8789@booty/
> >   
> 
> I'm still not totally convinced about that. Why not putting the check right
> before checking the kref in __of_changeset_entry_destroy(). I'll contradict
> myself a bit because this is just theory but if we look at pci_stop_dev(), which
> AFAIU, could be reached from a sysfs write(), we have:
> 
> device_release_driver(&dev->dev);
> ...
> of_pci_remove_node(dev);
> 	of_changeset_revert(np->data);
> 	of_changeset_destroy(np->data);
> 
> So looking at the above we would hit the same issue if we flush the queue in
> free_overlay_changeset() - as the queue won't be flushed at all and we could
> have devlink removal due to device_release_driver(). Right?
> 
> Again, completely theoretical but seems like a reasonable one plus I'm not
> understanding the push against having the flush in
> __of_changeset_entry_destroy(). Conceptually, it looks the best place to me but
> I may be missing some issue in doing it there?

Instead of having the wait called in __of_changeset_entry_destroy() and so
called in a loop. I could move this call in the __of_changeset_entry_destroy()
caller (without any of_mutex lock drop).

So this will look like this:
--- 8< ---
void of_changeset_destroy(struct of_changeset *ocs)
{
	struct of_changeset_entry *ce, *cen;

	device_link_wait_removal();

	list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node)
		__of_changeset_entry_destroy(ce);
}
--- 8< ---

I already tested on my system and it works correctly with
device_link_wait_removal() only called from of_changeset_destroy()
as proposed.

Saravana, Nuno, Rob does it seems ok for you ?

...

> > 
> > In general I hate these kinds of sequences that release a lock and
> > then grab it again quickly. It's not always a bug, but my personal
> > take on that is 90% of these introduce a bug.
> > 
> > Drop the unlock/lock and we'll deal a deadlock if we actually hit one.
> > I'm also fairly certain that device_link_wait_removal() can't trigger
> > something else that can cause an OF overlay change while we are in the
> > middle of one. And like Rob said, I'm not sure this unlock/lock is a
> > good solution for that anyway.  
> 
> Totally agree. Unless we really see a deadlock this is a very bad idea (IMHO).
> Even on the PCI code, it seems to me that we're never destroying a changeset
> from a device/kobj_type release callback. That would be super weird right?

Convinced too.
I will drop the unlock/re-lock sequence in the next iteration of this series.

Best regards,
Hervé

-- 
Hervé Codina, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

  reply	other threads:[~2024-03-05 10:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29 10:52 [PATCH v3 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
2024-02-29 10:52 ` [PATCH v3 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
2024-02-29 11:16   ` Nuno Sá
2024-02-29 12:57     ` Rafael J. Wysocki
2024-02-29 13:01     ` Rafael J. Wysocki
2024-02-29 13:06       ` Nuno Sá
2024-02-29 13:10         ` Rafael J. Wysocki
2024-02-29 14:00           ` Herve Codina
2024-02-29 10:52 ` [PATCH v3 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
2024-02-29 11:18   ` Nuno Sá
2024-03-04 15:22     ` Rob Herring
2024-03-04 15:36       ` Nuno Sá
2024-03-04 16:49       ` Herve Codina
2024-03-05  6:47         ` Saravana Kannan
2024-03-05  7:36           ` Nuno Sá
2024-03-05 10:27             ` Herve Codina [this message]
2024-03-05 10:47               ` Nuno Sá
2024-03-06  2:59                 ` Saravana Kannan
2024-03-04 15:02 ` [PATCH v3 0/2] Synchronize DT overlay removal with " Rob Herring
2024-03-05  6:17   ` Saravana Kannan
2024-03-05  7:17     ` Nuno Sá

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240305112708.56869e4c@bootlin.com \
    --to=herve.codina@bootlin.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=allan.nielsen@microchip.com \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=horatiu.vultur@microchip.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizhi.hou@amd.com \
    --cc=luca.ceresoli@bootlin.com \
    --cc=max.zhen@amd.com \
    --cc=noname.nuno@gmail.com \
    --cc=nuno.sa@analog.com \
    --cc=rafael@kernel.org \
    --cc=robh@kernel.org \
    --cc=saravanak@google.com \
    --cc=sonal.santan@amd.com \
    --cc=stable@vger.kernel.org \
    --cc=steen.hegelund@microchip.com \
    --cc=stefano.stabellini@xilinx.com \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.