All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] software node: balance refcount for managed sw nodes
@ 2021-07-16 10:16 laurentiu.tudor
  2021-07-16 10:34 ` Laurentiu Tudor
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: laurentiu.tudor @ 2021-07-16 10:16 UTC (permalink / raw)
  To: andriy.shevchenko, heikki.krogerus, gregkh, rafael, linux-acpi,
	linux-kernel
  Cc: jon, Laurentiu Tudor

From: Laurentiu Tudor <laurentiu.tudor@nxp.com>

software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
software nodes, thus leading to underflow errors. Balance the refcount by
bumping it in the device_create_managed_software_node() function.

The error [1] was encountered after adding a .shutdown() op to our
fsl-mc-bus driver.

[1]
pc : refcount_warn_saturate+0xf8/0x150
lr : refcount_warn_saturate+0xf8/0x150
sp : ffff80001009b920
x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
Call trace:
 refcount_warn_saturate+0xf8/0x150
 kobject_put+0x10c/0x120
 software_node_notify+0xd8/0x140
 device_platform_notify+0x4c/0xb4
 device_del+0x188/0x424
 fsl_mc_device_remove+0x2c/0x4c
 rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
 device_for_each_child+0x5c/0xac
 dprc_remove+0x9c/0xc0
 fsl_mc_driver_remove+0x28/0x64
 __device_release_driver+0x188/0x22c
 device_release_driver+0x30/0x50
 bus_remove_device+0x128/0x134
 device_del+0x16c/0x424
 fsl_mc_bus_remove+0x8c/0x114
 fsl_mc_bus_shutdown+0x14/0x20
 platform_shutdown+0x28/0x40
 device_shutdown+0x15c/0x330
 __do_sys_reboot+0x218/0x2a0
 __arm64_sys_reboot+0x28/0x34
 invoke_syscall+0x48/0x114
 el0_svc_common+0x40/0xdc
 do_el0_svc+0x2c/0x94
 el0_svc+0x2c/0x54
 el0t_64_sync_handler+0xa8/0x12c
 el0t_64_sync+0x198/0x19c
---[ end trace 32eb1c71c7d86821 ]---

Reported-by: Jon Nettleton <jon@solid-run.com>
Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
---
Changes since RFC:
 - use software_node_notify(KOBJ_ADD) instead of directly bumping
   refcount (Heikki)

 drivers/base/swnode.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index d1f1a8240120..bdb50a06c82a 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -1113,6 +1113,9 @@ int device_create_managed_software_node(struct device *dev,
 	to_swnode(fwnode)->managed = true;
 	set_secondary_fwnode(dev, fwnode);
 
+	if (device_is_registered(dev))
+		software_node_notify(dev, KOBJ_ADD);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(device_create_managed_software_node);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-16 10:16 [PATCH] software node: balance refcount for managed sw nodes laurentiu.tudor
@ 2021-07-16 10:34 ` Laurentiu Tudor
  2021-07-16 12:17 ` Andy Shevchenko
  2021-09-14 14:00 ` Heikki Krogerus
  2 siblings, 0 replies; 23+ messages in thread
From: Laurentiu Tudor @ 2021-07-16 10:34 UTC (permalink / raw)
  To: andriy.shevchenko, heikki.krogerus, gregkh, rafael, linux-acpi,
	linux-kernel
  Cc: jon

Didn't notice in the first place, sorry about that. I now get this:

sysfs: cannot create duplicate filename
'/devices/platform/808622B7:00/xhci-hcd.1.auto/software_node'
CPU: 3 PID: 1 Comm: swapper/0 Not tainted
5.13.0-next-20210701-g5859a372a858-dirty #62
Hardware name: NXP NXP LX2160ARDB Platform, BIOS EDK II Apr 16 2021
Call trace:
 dump_backtrace+0x0/0x1c0
 show_stack+0x18/0x28
 dump_stack_lvl+0x68/0x84
 dump_stack+0x18/0x34
 sysfs_warn_dup+0x60/0x80
 sysfs_do_create_link_sd.isra.2+0x104/0x108
 sysfs_create_link+0x24/0x48
 software_node_notify+0xf0/0x148
 device_create_managed_software_node+0x90/0xc8
 iort_named_component_init+0x90/0xd0
 iort_iommu_configure_id+0x94/0x190
 acpi_dma_configure_id+0xc8/0x140
 platform_dma_configure+0x94/0xb0
 really_probe+0x70/0x2f8
 __driver_probe_device+0x7c/0xe8
 driver_probe_device+0x8c/0x130
 __driver_attach+0x98/0xf8
 bus_for_each_dev+0x7c/0xd8
 driver_attach+0x24/0x30
 bus_add_driver+0x154/0x200
 driver_register+0x64/0x120
 __platform_driver_register+0x28/0x38
 xhci_plat_init+0x30/0x3c
 do_one_initcall+0x60/0x1d8
 kernel_init_freeable+0x238/0x2ac
 kernel_init+0x24/0x128
 ret_from_fork+0x10/0x18

---
Best Regards, Laurentiu

On 7/16/2021 1:16 PM, laurentiu.tudor@nxp.com wrote:
> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> 
> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> software nodes, thus leading to underflow errors. Balance the refcount by
> bumping it in the device_create_managed_software_node() function.
> 
> The error [1] was encountered after adding a .shutdown() op to our
> fsl-mc-bus driver.
> 
> [1]
> pc : refcount_warn_saturate+0xf8/0x150
> lr : refcount_warn_saturate+0xf8/0x150
> sp : ffff80001009b920
> x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
> x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
> x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
> x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
> x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
> x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
> x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
> x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
> Call trace:
>  refcount_warn_saturate+0xf8/0x150
>  kobject_put+0x10c/0x120
>  software_node_notify+0xd8/0x140
>  device_platform_notify+0x4c/0xb4
>  device_del+0x188/0x424
>  fsl_mc_device_remove+0x2c/0x4c
>  rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
>  device_for_each_child+0x5c/0xac
>  dprc_remove+0x9c/0xc0
>  fsl_mc_driver_remove+0x28/0x64
>  __device_release_driver+0x188/0x22c
>  device_release_driver+0x30/0x50
>  bus_remove_device+0x128/0x134
>  device_del+0x16c/0x424
>  fsl_mc_bus_remove+0x8c/0x114
>  fsl_mc_bus_shutdown+0x14/0x20
>  platform_shutdown+0x28/0x40
>  device_shutdown+0x15c/0x330
>  __do_sys_reboot+0x218/0x2a0
>  __arm64_sys_reboot+0x28/0x34
>  invoke_syscall+0x48/0x114
>  el0_svc_common+0x40/0xdc
>  do_el0_svc+0x2c/0x94
>  el0_svc+0x2c/0x54
>  el0t_64_sync_handler+0xa8/0x12c
>  el0t_64_sync+0x198/0x19c
> ---[ end trace 32eb1c71c7d86821 ]---
> 
> Reported-by: Jon Nettleton <jon@solid-run.com>
> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> ---
> Changes since RFC:
>  - use software_node_notify(KOBJ_ADD) instead of directly bumping
>    refcount (Heikki)
> 
>  drivers/base/swnode.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
> index d1f1a8240120..bdb50a06c82a 100644
> --- a/drivers/base/swnode.c
> +++ b/drivers/base/swnode.c
> @@ -1113,6 +1113,9 @@ int device_create_managed_software_node(struct device *dev,
>  	to_swnode(fwnode)->managed = true;
>  	set_secondary_fwnode(dev, fwnode);
>  
> +	if (device_is_registered(dev))
> +		software_node_notify(dev, KOBJ_ADD);
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(device_create_managed_software_node);
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-16 10:16 [PATCH] software node: balance refcount for managed sw nodes laurentiu.tudor
  2021-07-16 10:34 ` Laurentiu Tudor
@ 2021-07-16 12:17 ` Andy Shevchenko
  2021-07-16 17:21   ` Jon Nettleton
  2021-09-14 14:00 ` Heikki Krogerus
  2 siblings, 1 reply; 23+ messages in thread
From: Andy Shevchenko @ 2021-07-16 12:17 UTC (permalink / raw)
  To: laurentiu.tudor
  Cc: heikki.krogerus, gregkh, rafael, linux-acpi, linux-kernel, jon

On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> 
> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> software nodes, thus leading to underflow errors. Balance the refcount by
> bumping it in the device_create_managed_software_node() function.
> 
> The error [1] was encountered after adding a .shutdown() op to our
> fsl-mc-bus driver.

Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
later on), I can tell that probably something is wrong in the ->shutdown()
method itself.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-16 12:17 ` Andy Shevchenko
@ 2021-07-16 17:21   ` Jon Nettleton
  2021-07-19 12:00     ` Laurentiu Tudor
  0 siblings, 1 reply; 23+ messages in thread
From: Jon Nettleton @ 2021-07-16 17:21 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Laurentiu Tudor, Heikki Krogerus, Greg Kroah-Hartman,
	Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List

On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
> > From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >
> > software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> > software nodes, thus leading to underflow errors. Balance the refcount by
> > bumping it in the device_create_managed_software_node() function.
> >
> > The error [1] was encountered after adding a .shutdown() op to our
> > fsl-mc-bus driver.
>
> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
> later on), I can tell that probably something is wrong in the ->shutdown()
> method itself.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>

Isn't the other alternative to just remove the second kobject_put from
KOBJ_REMOVE ?

@@ -1149,7 +1147,6 @@ int software_node_notify(struct device *dev,
unsigned long action)

                if (swnode->managed) {
                        set_secondary_fwnode(dev, NULL);
-                       kobject_put(&swnode->kobj);
                }
                break;
        default:

If we aren't being incremented in device_create_managed_software_node() then
should we be decremented here?

-Jon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-16 17:21   ` Jon Nettleton
@ 2021-07-19 12:00     ` Laurentiu Tudor
  2021-07-19 12:22       ` Andy Shevchenko
  0 siblings, 1 reply; 23+ messages in thread
From: Laurentiu Tudor @ 2021-07-19 12:00 UTC (permalink / raw)
  To: Jon Nettleton, Andy Shevchenko
  Cc: Heikki Krogerus, Greg Kroah-Hartman, Rafael J . Wysocki,
	ACPI Devel Maling List, Linux Kernel Mailing List



On 7/16/2021 8:21 PM, Jon Nettleton wrote:
> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
>>
>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>
>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>>> software nodes, thus leading to underflow errors. Balance the refcount by
>>> bumping it in the device_create_managed_software_node() function.
>>>
>>> The error [1] was encountered after adding a .shutdown() op to our
>>> fsl-mc-bus driver.
>>
>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
>> later on), I can tell that probably something is wrong in the ->shutdown()
>> method itself.
>>
>> --
>> With Best Regards,
>> Andy Shevchenko
>>
>>
> 
> Isn't the other alternative to just remove the second kobject_put from
> KOBJ_REMOVE ?
> 

Or maybe on top of Heikki's suggestion, replace the calls to
sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-19 12:00     ` Laurentiu Tudor
@ 2021-07-19 12:22       ` Andy Shevchenko
  2021-07-20  9:20         ` Laurentiu Tudor
  0 siblings, 1 reply; 23+ messages in thread
From: Andy Shevchenko @ 2021-07-19 12:22 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Jon Nettleton, Heikki Krogerus, Greg Kroah-Hartman,
	Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List

On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
> > On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> >>
> >> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
> >>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >>>
> >>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> >>> software nodes, thus leading to underflow errors. Balance the refcount by
> >>> bumping it in the device_create_managed_software_node() function.
> >>>
> >>> The error [1] was encountered after adding a .shutdown() op to our
> >>> fsl-mc-bus driver.
> >>
> >> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
> >> later on), I can tell that probably something is wrong in the ->shutdown()
> >> method itself.
> > 
> > Isn't the other alternative to just remove the second kobject_put from
> > KOBJ_REMOVE ?
> > 
> 
> Or maybe on top of Heikki's suggestion, replace the calls to
> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?

_noearn will hide the problem. It was there, it was removed from there.
Perhaps we have to understand the root cause better (some specific flow?).

Any insight from you on the flow when the issue appears? I.o.w. what happened
on the big picture that we got into the warning you see?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-19 12:22       ` Andy Shevchenko
@ 2021-07-20  9:20         ` Laurentiu Tudor
  2021-07-20 10:27           ` Andy Shevchenko
  0 siblings, 1 reply; 23+ messages in thread
From: Laurentiu Tudor @ 2021-07-20  9:20 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Jon Nettleton, Heikki Krogerus, Greg Kroah-Hartman,
	Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List



On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>>
>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>
>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
>>>>> bumping it in the device_create_managed_software_node() function.
>>>>>
>>>>> The error [1] was encountered after adding a .shutdown() op to our
>>>>> fsl-mc-bus driver.
>>>>
>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
>>>> later on), I can tell that probably something is wrong in the ->shutdown()
>>>> method itself.
>>>
>>> Isn't the other alternative to just remove the second kobject_put from
>>> KOBJ_REMOVE ?
>>>
>>
>> Or maybe on top of Heikki's suggestion, replace the calls to
>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
> 
> _noearn will hide the problem. It was there, it was removed from there.
> Perhaps we have to understand the root cause better (some specific flow?).
> 
> Any insight from you on the flow when the issue appears? I.o.w. what happened
> on the big picture that we got into the warning you see?
> 

I encountered the initial issue when trying to shut down a system booted
with ACPI but only after adding a .shutdown() callback to our bus driver
so that the devices are properly taken down. The problem was that
software_node_notify(), on KOBJ_REMOVE was dropping the reference count
twice leading to an underflow error. My initial proposal was to just
bump the refcount in device_create_managed_software_node(). The device
properties that triggered the problem are created here [1].

Heikko suggested that instead of manually incrementing the refcount to
use software_node_notify(KOBJ_ADD). This triggered the second issue, a
duplicated sysfs entry warning originating in the usb subsystem:
device_create_managed_software_node() ends up being called twice, once
here [2] and secondly, the place I previous mentioned [1].

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/arm64/iort.c#n952
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/host.c#n111

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-20  9:20         ` Laurentiu Tudor
@ 2021-07-20 10:27           ` Andy Shevchenko
  2021-07-26  7:59             ` Laurentiu Tudor
  0 siblings, 1 reply; 23+ messages in thread
From: Andy Shevchenko @ 2021-07-20 10:27 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Andy Shevchenko, Jon Nettleton, Heikki Krogerus,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List

On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
<laurentiu.tudor@nxp.com> wrote:
> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
> > On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
> >> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
> >>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
> >>> <andriy.shevchenko@linux.intel.com> wrote:
> >>>>
> >>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
> >>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >>>>>
> >>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> >>>>> software nodes, thus leading to underflow errors. Balance the refcount by
> >>>>> bumping it in the device_create_managed_software_node() function.
> >>>>>
> >>>>> The error [1] was encountered after adding a .shutdown() op to our
> >>>>> fsl-mc-bus driver.
> >>>>
> >>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
> >>>> later on), I can tell that probably something is wrong in the ->shutdown()
> >>>> method itself.
> >>>
> >>> Isn't the other alternative to just remove the second kobject_put from
> >>> KOBJ_REMOVE ?
> >>
> >> Or maybe on top of Heikki's suggestion, replace the calls to
> >> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
> >
> > _noearn will hide the problem. It was there, it was removed from there.
> > Perhaps we have to understand the root cause better (some specific flow?).
> >
> > Any insight from you on the flow when the issue appears? I.o.w. what happened
> > on the big picture that we got into the warning you see?
>
> I encountered the initial issue when trying to shut down a system booted
> with ACPI but only after adding a .shutdown() callback to our bus driver
> so that the devices are properly taken down. The problem was that
> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
> twice leading to an underflow error. My initial proposal was to just
> bump the refcount in device_create_managed_software_node(). The device
> properties that triggered the problem are created here [1].
>
> Heikko suggested that instead of manually incrementing the refcount to
> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
> duplicated sysfs entry warning originating in the usb subsystem:
> device_create_managed_software_node() ends up being called twice, once
> here [2] and secondly, the place I previous mentioned [1].

This [3] is what I have reported against DWC3 when ->shutdown() has
been added there. And here [4] is another thread about the issue with
that callback. The ->release() callback is called at put_device() [5]
and ->shutdown() is called before that [6]. That said, can you inspect
your ->shutdown() implementation once more time and perhaps see if
there is anything that can be amended?

> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/arm64/iort.c#n952
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/host.c#n111

[3]: https://lore.kernel.org/linux-usb/CAHp75Vd-5U5zgtDfM5C3Jsx51HVYB+rNcHYC2XP=G7dOd=cdTg@mail.gmail.com/
[4]:  https://lore.kernel.org/linux-usb/c3c75895-313a-5be7-6421-b32bac741a88@arm.com/T/#u
[5]: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L2216
[6]: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L4447

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-20 10:27           ` Andy Shevchenko
@ 2021-07-26  7:59             ` Laurentiu Tudor
  2021-09-07 15:59               ` Laurentiu Tudor
  0 siblings, 1 reply; 23+ messages in thread
From: Laurentiu Tudor @ 2021-07-26  7:59 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andy Shevchenko, Jon Nettleton, Heikki Krogerus,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List



On 7/20/2021 1:27 PM, Andy Shevchenko wrote:
> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
> <laurentiu.tudor@nxp.com> wrote:
>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
>>>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>>>>
>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>>>
>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
>>>>>>> bumping it in the device_create_managed_software_node() function.
>>>>>>>
>>>>>>> The error [1] was encountered after adding a .shutdown() op to our
>>>>>>> fsl-mc-bus driver.
>>>>>>
>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
>>>>>> later on), I can tell that probably something is wrong in the ->shutdown()
>>>>>> method itself.
>>>>>
>>>>> Isn't the other alternative to just remove the second kobject_put from
>>>>> KOBJ_REMOVE ?
>>>>
>>>> Or maybe on top of Heikki's suggestion, replace the calls to
>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
>>>
>>> _noearn will hide the problem. It was there, it was removed from there.
>>> Perhaps we have to understand the root cause better (some specific flow?).
>>>
>>> Any insight from you on the flow when the issue appears? I.o.w. what happened
>>> on the big picture that we got into the warning you see?
>>
>> I encountered the initial issue when trying to shut down a system booted
>> with ACPI but only after adding a .shutdown() callback to our bus driver
>> so that the devices are properly taken down. The problem was that
>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
>> twice leading to an underflow error. My initial proposal was to just
>> bump the refcount in device_create_managed_software_node(). The device
>> properties that triggered the problem are created here [1].
>>
>> Heikko suggested that instead of manually incrementing the refcount to
>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
>> duplicated sysfs entry warning originating in the usb subsystem:
>> device_create_managed_software_node() ends up being called twice, once
>> here [2] and secondly, the place I previous mentioned [1].
> 
> This [3] is what I have reported against DWC3 when ->shutdown() has
> been added there. And here [4] is another thread about the issue with
> that callback. The ->release() callback is called at put_device() [5]
> and ->shutdown() is called before that [6]. That said, can you inspect
> your ->shutdown() implementation once more time and perhaps see if
> there is anything that can be amended?
> 

Will do, thanks for the pointers. It could be that we mess something out
in how we use the driver model.

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-26  7:59             ` Laurentiu Tudor
@ 2021-09-07 15:59               ` Laurentiu Tudor
  2021-09-09 12:13                 ` Heikki Krogerus
  0 siblings, 1 reply; 23+ messages in thread
From: Laurentiu Tudor @ 2021-09-07 15:59 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andy Shevchenko, Jon Nettleton, Heikki Krogerus,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List



On 7/26/2021 10:59 AM, Laurentiu Tudor wrote:
> 
> 
> On 7/20/2021 1:27 PM, Andy Shevchenko wrote:
>> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
>> <laurentiu.tudor@nxp.com> wrote:
>>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
>>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
>>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
>>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
>>>>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>>>>>
>>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
>>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>>>>
>>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
>>>>>>>> bumping it in the device_create_managed_software_node() function.
>>>>>>>>
>>>>>>>> The error [1] was encountered after adding a .shutdown() op to our
>>>>>>>> fsl-mc-bus driver.
>>>>>>>
>>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
>>>>>>> later on), I can tell that probably something is wrong in the ->shutdown()
>>>>>>> method itself.
>>>>>>
>>>>>> Isn't the other alternative to just remove the second kobject_put from
>>>>>> KOBJ_REMOVE ?
>>>>>
>>>>> Or maybe on top of Heikki's suggestion, replace the calls to
>>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
>>>>
>>>> _noearn will hide the problem. It was there, it was removed from there.
>>>> Perhaps we have to understand the root cause better (some specific flow?).
>>>>
>>>> Any insight from you on the flow when the issue appears? I.o.w. what happened
>>>> on the big picture that we got into the warning you see?
>>>
>>> I encountered the initial issue when trying to shut down a system booted
>>> with ACPI but only after adding a .shutdown() callback to our bus driver
>>> so that the devices are properly taken down. The problem was that
>>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
>>> twice leading to an underflow error. My initial proposal was to just
>>> bump the refcount in device_create_managed_software_node(). The device
>>> properties that triggered the problem are created here [1].
>>>
>>> Heikko suggested that instead of manually incrementing the refcount to
>>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
>>> duplicated sysfs entry warning originating in the usb subsystem:
>>> device_create_managed_software_node() ends up being called twice, once
>>> here [2] and secondly, the place I previous mentioned [1].
>>
>> This [3] is what I have reported against DWC3 when ->shutdown() has
>> been added there. And here [4] is another thread about the issue with
>> that callback. The ->release() callback is called at put_device() [5]
>> and ->shutdown() is called before that [6]. That said, can you inspect
>> your ->shutdown() implementation once more time and perhaps see if
>> there is anything that can be amended?
>>
> 
> Will do, thanks for the pointers. It could be that we mess something out
> in how we use the driver model.
> 

Quick (and late, sorry) update from my side. I've spent time on
debugging our bus, did found some issues but, at least for now, none are
related to sw node.
In the mean time, I noticed in the swnode code that
device_add_software_node() calls software_node_notify(KOBJ_ADD) while
device_create_managed_software_node() doesn't. Updating [1] the later
with the call to software_node_notify(KOBJ_ADD) does seem to fix the
issue I'm seeing.

Could this be a problem? Any comments appreciated.

One more thing perhaps worth mentioning is that, at least for now, there
are few uses for this device_create_managed_software_node() api,
mentioning here a couple of them:
 - arm64 iort code - this seems to be triggering the issue i'm getting
 - dwc3 usb - Andy reported similar issues here, maybe the issue is common?

[1]
@@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct
device *dev,
        to_swnode(fwnode)->managed = true;
        set_secondary_fwnode(dev, fwnode);

+       /*
+        * If the device has been fully registered by the time this
function is
+        * called, software_node_notify() must be called separately so
that the
+        * symlinks get created and the reference count of the node is
kept in
+        * balance.
+        */
+       if (device_is_registered(dev))
+               software_node_notify(dev, KOBJ_ADD);
+
        return 0;
 }


---
Thanks & Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-07 15:59               ` Laurentiu Tudor
@ 2021-09-09 12:13                 ` Heikki Krogerus
  2021-09-09 12:16                   ` Heikki Krogerus
  0 siblings, 1 reply; 23+ messages in thread
From: Heikki Krogerus @ 2021-09-09 12:13 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Andy Shevchenko, Andy Shevchenko, Jon Nettleton,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List

On Tue, Sep 07, 2021 at 06:59:18PM +0300, Laurentiu Tudor wrote:
> 
> 
> On 7/26/2021 10:59 AM, Laurentiu Tudor wrote:
> > 
> > 
> > On 7/20/2021 1:27 PM, Andy Shevchenko wrote:
> >> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
> >> <laurentiu.tudor@nxp.com> wrote:
> >>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
> >>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
> >>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
> >>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
> >>>>>> <andriy.shevchenko@linux.intel.com> wrote:
> >>>>>>>
> >>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
> >>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >>>>>>>>
> >>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> >>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
> >>>>>>>> bumping it in the device_create_managed_software_node() function.
> >>>>>>>>
> >>>>>>>> The error [1] was encountered after adding a .shutdown() op to our
> >>>>>>>> fsl-mc-bus driver.
> >>>>>>>
> >>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
> >>>>>>> later on), I can tell that probably something is wrong in the ->shutdown()
> >>>>>>> method itself.
> >>>>>>
> >>>>>> Isn't the other alternative to just remove the second kobject_put from
> >>>>>> KOBJ_REMOVE ?
> >>>>>
> >>>>> Or maybe on top of Heikki's suggestion, replace the calls to
> >>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
> >>>>
> >>>> _noearn will hide the problem. It was there, it was removed from there.
> >>>> Perhaps we have to understand the root cause better (some specific flow?).
> >>>>
> >>>> Any insight from you on the flow when the issue appears? I.o.w. what happened
> >>>> on the big picture that we got into the warning you see?
> >>>
> >>> I encountered the initial issue when trying to shut down a system booted
> >>> with ACPI but only after adding a .shutdown() callback to our bus driver
> >>> so that the devices are properly taken down. The problem was that
> >>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
> >>> twice leading to an underflow error. My initial proposal was to just
> >>> bump the refcount in device_create_managed_software_node(). The device
> >>> properties that triggered the problem are created here [1].
> >>>
> >>> Heikko suggested that instead of manually incrementing the refcount to
> >>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
> >>> duplicated sysfs entry warning originating in the usb subsystem:
> >>> device_create_managed_software_node() ends up being called twice, once
> >>> here [2] and secondly, the place I previous mentioned [1].
> >>
> >> This [3] is what I have reported against DWC3 when ->shutdown() has
> >> been added there. And here [4] is another thread about the issue with
> >> that callback. The ->release() callback is called at put_device() [5]
> >> and ->shutdown() is called before that [6]. That said, can you inspect
> >> your ->shutdown() implementation once more time and perhaps see if
> >> there is anything that can be amended?
> >>
> > 
> > Will do, thanks for the pointers. It could be that we mess something out
> > in how we use the driver model.
> > 
> 
> Quick (and late, sorry) update from my side. I've spent time on
> debugging our bus, did found some issues but, at least for now, none are
> related to sw node.
> In the mean time, I noticed in the swnode code that
> device_add_software_node() calls software_node_notify(KOBJ_ADD) while
> device_create_managed_software_node() doesn't. Updating [1] the later
> with the call to software_node_notify(KOBJ_ADD) does seem to fix the
> issue I'm seeing.
> 
> Could this be a problem? Any comments appreciated.
> 
> One more thing perhaps worth mentioning is that, at least for now, there
> are few uses for this device_create_managed_software_node() api,
> mentioning here a couple of them:
>  - arm64 iort code - this seems to be triggering the issue i'm getting
>  - dwc3 usb - Andy reported similar issues here, maybe the issue is common?
> 
> [1]
> @@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct
> device *dev,
>         to_swnode(fwnode)->managed = true;
>         set_secondary_fwnode(dev, fwnode);
> 
> +       /*
> +        * If the device has been fully registered by the time this
> function is
> +        * called, software_node_notify() must be called separately so
> that the
> +        * symlinks get created and the reference count of the node is
> kept in
> +        * balance.
> +        */
> +       if (device_is_registered(dev))
> +               software_node_notify(dev, KOBJ_ADD);
> +
>         return 0;
>  }

That should be fixed indeed. Please send that after -rc1 is out.

thanks,

-- 
heikki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-09 12:13                 ` Heikki Krogerus
@ 2021-09-09 12:16                   ` Heikki Krogerus
  2021-09-09 14:01                     ` Laurentiu Tudor
  0 siblings, 1 reply; 23+ messages in thread
From: Heikki Krogerus @ 2021-09-09 12:16 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Andy Shevchenko, Andy Shevchenko, Jon Nettleton,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List

On Thu, Sep 09, 2021 at 03:13:47PM +0300, Heikki Krogerus wrote:
> On Tue, Sep 07, 2021 at 06:59:18PM +0300, Laurentiu Tudor wrote:
> > 
> > 
> > On 7/26/2021 10:59 AM, Laurentiu Tudor wrote:
> > > 
> > > 
> > > On 7/20/2021 1:27 PM, Andy Shevchenko wrote:
> > >> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
> > >> <laurentiu.tudor@nxp.com> wrote:
> > >>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
> > >>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
> > >>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
> > >>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
> > >>>>>> <andriy.shevchenko@linux.intel.com> wrote:
> > >>>>>>>
> > >>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
> > >>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> > >>>>>>>>
> > >>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> > >>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
> > >>>>>>>> bumping it in the device_create_managed_software_node() function.
> > >>>>>>>>
> > >>>>>>>> The error [1] was encountered after adding a .shutdown() op to our
> > >>>>>>>> fsl-mc-bus driver.
> > >>>>>>>
> > >>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
> > >>>>>>> later on), I can tell that probably something is wrong in the ->shutdown()
> > >>>>>>> method itself.
> > >>>>>>
> > >>>>>> Isn't the other alternative to just remove the second kobject_put from
> > >>>>>> KOBJ_REMOVE ?
> > >>>>>
> > >>>>> Or maybe on top of Heikki's suggestion, replace the calls to
> > >>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
> > >>>>
> > >>>> _noearn will hide the problem. It was there, it was removed from there.
> > >>>> Perhaps we have to understand the root cause better (some specific flow?).
> > >>>>
> > >>>> Any insight from you on the flow when the issue appears? I.o.w. what happened
> > >>>> on the big picture that we got into the warning you see?
> > >>>
> > >>> I encountered the initial issue when trying to shut down a system booted
> > >>> with ACPI but only after adding a .shutdown() callback to our bus driver
> > >>> so that the devices are properly taken down. The problem was that
> > >>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
> > >>> twice leading to an underflow error. My initial proposal was to just
> > >>> bump the refcount in device_create_managed_software_node(). The device
> > >>> properties that triggered the problem are created here [1].
> > >>>
> > >>> Heikko suggested that instead of manually incrementing the refcount to
> > >>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
> > >>> duplicated sysfs entry warning originating in the usb subsystem:
> > >>> device_create_managed_software_node() ends up being called twice, once
> > >>> here [2] and secondly, the place I previous mentioned [1].
> > >>
> > >> This [3] is what I have reported against DWC3 when ->shutdown() has
> > >> been added there. And here [4] is another thread about the issue with
> > >> that callback. The ->release() callback is called at put_device() [5]
> > >> and ->shutdown() is called before that [6]. That said, can you inspect
> > >> your ->shutdown() implementation once more time and perhaps see if
> > >> there is anything that can be amended?
> > >>
> > > 
> > > Will do, thanks for the pointers. It could be that we mess something out
> > > in how we use the driver model.
> > > 
> > 
> > Quick (and late, sorry) update from my side. I've spent time on
> > debugging our bus, did found some issues but, at least for now, none are
> > related to sw node.
> > In the mean time, I noticed in the swnode code that
> > device_add_software_node() calls software_node_notify(KOBJ_ADD) while
> > device_create_managed_software_node() doesn't. Updating [1] the later
> > with the call to software_node_notify(KOBJ_ADD) does seem to fix the
> > issue I'm seeing.
> > 
> > Could this be a problem? Any comments appreciated.
> > 
> > One more thing perhaps worth mentioning is that, at least for now, there
> > are few uses for this device_create_managed_software_node() api,
> > mentioning here a couple of them:
> >  - arm64 iort code - this seems to be triggering the issue i'm getting
> >  - dwc3 usb - Andy reported similar issues here, maybe the issue is common?
> > 
> > [1]
> > @@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct
> > device *dev,
> >         to_swnode(fwnode)->managed = true;
> >         set_secondary_fwnode(dev, fwnode);
> > 
> > +       /*
> > +        * If the device has been fully registered by the time this
> > function is
> > +        * called, software_node_notify() must be called separately so
> > that the
> > +        * symlinks get created and the reference count of the node is
> > kept in
> > +        * balance.
> > +        */
> > +       if (device_is_registered(dev))
> > +               software_node_notify(dev, KOBJ_ADD);
> > +
> >         return 0;
> >  }
> 
> That should be fixed indeed. Please send that after -rc1 is out.

I mean, resend :-)


-- 
heikki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-09 12:16                   ` Heikki Krogerus
@ 2021-09-09 14:01                     ` Laurentiu Tudor
  2021-09-10 12:05                       ` Laurentiu Tudor
  0 siblings, 1 reply; 23+ messages in thread
From: Laurentiu Tudor @ 2021-09-09 14:01 UTC (permalink / raw)
  To: Heikki Krogerus
  Cc: Andy Shevchenko, Andy Shevchenko, Jon Nettleton,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List, Lorenzo Pieralisi



On 9/9/2021 3:16 PM, Heikki Krogerus wrote:
> On Thu, Sep 09, 2021 at 03:13:47PM +0300, Heikki Krogerus wrote:
>> On Tue, Sep 07, 2021 at 06:59:18PM +0300, Laurentiu Tudor wrote:
>>>
>>>
>>> On 7/26/2021 10:59 AM, Laurentiu Tudor wrote:
>>>>
>>>>
>>>> On 7/20/2021 1:27 PM, Andy Shevchenko wrote:
>>>>> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
>>>>> <laurentiu.tudor@nxp.com> wrote:
>>>>>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
>>>>>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
>>>>>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
>>>>>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
>>>>>>>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
>>>>>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>>>>>>>
>>>>>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>>>>>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
>>>>>>>>>>> bumping it in the device_create_managed_software_node() function.
>>>>>>>>>>>
>>>>>>>>>>> The error [1] was encountered after adding a .shutdown() op to our
>>>>>>>>>>> fsl-mc-bus driver.
>>>>>>>>>>
>>>>>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
>>>>>>>>>> later on), I can tell that probably something is wrong in the ->shutdown()
>>>>>>>>>> method itself.
>>>>>>>>>
>>>>>>>>> Isn't the other alternative to just remove the second kobject_put from
>>>>>>>>> KOBJ_REMOVE ?
>>>>>>>>
>>>>>>>> Or maybe on top of Heikki's suggestion, replace the calls to
>>>>>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
>>>>>>>
>>>>>>> _noearn will hide the problem. It was there, it was removed from there.
>>>>>>> Perhaps we have to understand the root cause better (some specific flow?).
>>>>>>>
>>>>>>> Any insight from you on the flow when the issue appears? I.o.w. what happened
>>>>>>> on the big picture that we got into the warning you see?
>>>>>>
>>>>>> I encountered the initial issue when trying to shut down a system booted
>>>>>> with ACPI but only after adding a .shutdown() callback to our bus driver
>>>>>> so that the devices are properly taken down. The problem was that
>>>>>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
>>>>>> twice leading to an underflow error. My initial proposal was to just
>>>>>> bump the refcount in device_create_managed_software_node(). The device
>>>>>> properties that triggered the problem are created here [1].
>>>>>>
>>>>>> Heikko suggested that instead of manually incrementing the refcount to
>>>>>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
>>>>>> duplicated sysfs entry warning originating in the usb subsystem:
>>>>>> device_create_managed_software_node() ends up being called twice, once
>>>>>> here [2] and secondly, the place I previous mentioned [1].
>>>>>
>>>>> This [3] is what I have reported against DWC3 when ->shutdown() has
>>>>> been added there. And here [4] is another thread about the issue with
>>>>> that callback. The ->release() callback is called at put_device() [5]
>>>>> and ->shutdown() is called before that [6]. That said, can you inspect
>>>>> your ->shutdown() implementation once more time and perhaps see if
>>>>> there is anything that can be amended?
>>>>>
>>>>
>>>> Will do, thanks for the pointers. It could be that we mess something out
>>>> in how we use the driver model.
>>>>
>>>
>>> Quick (and late, sorry) update from my side. I've spent time on
>>> debugging our bus, did found some issues but, at least for now, none are
>>> related to sw node.
>>> In the mean time, I noticed in the swnode code that
>>> device_add_software_node() calls software_node_notify(KOBJ_ADD) while
>>> device_create_managed_software_node() doesn't. Updating [1] the later
>>> with the call to software_node_notify(KOBJ_ADD) does seem to fix the
>>> issue I'm seeing.
>>>
>>> Could this be a problem? Any comments appreciated.
>>>
>>> One more thing perhaps worth mentioning is that, at least for now, there
>>> are few uses for this device_create_managed_software_node() api,
>>> mentioning here a couple of them:
>>>  - arm64 iort code - this seems to be triggering the issue i'm getting
>>>  - dwc3 usb - Andy reported similar issues here, maybe the issue is common?
>>>
>>> [1]
>>> @@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct
>>> device *dev,
>>>         to_swnode(fwnode)->managed = true;
>>>         set_secondary_fwnode(dev, fwnode);
>>>
>>> +       /*
>>> +        * If the device has been fully registered by the time this
>>> function is
>>> +        * called, software_node_notify() must be called separately so
>>> that the
>>> +        * symlinks get created and the reference count of the node is
>>> kept in
>>> +        * balance.
>>> +        */
>>> +       if (device_is_registered(dev))
>>> +               software_node_notify(dev, KOBJ_ADD);
>>> +
>>>         return 0;
>>>  }
>>
>> That should be fixed indeed. Please send that after -rc1 is out.
> 
> I mean, resend :-)
> 

Right, actually I just noticed that this is the fix you suggested last
time we discussed. :-) I forgot about it, sorry.
There's still the WARN_ON() [1] triggered by the usb subsys, apparently
happening only (!) in ACPI boot scenario, so +Lorenzo.
I'll delay the sending a bit to try to understand what's going on.

[1]
[   11.760346] sysfs: cannot create duplicate filename
'/devices/platform/808622B7:01/xhci-hcd.2.auto/software_node'
[   11.770612] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G        W
 5.14.0-rc1-00214-gbf7f1083ebd3-dirty #62
[   11.780611] Hardware name: NXP NXP LX2160ARDB Platform, BIOS EDK II
Apr 16 2021
[   11.787913] Call trace:
[   11.790351]  dump_backtrace+0x0/0x2a4
[   11.794017]  show_stack+0x1c/0x30
[   11.797331]  dump_stack_lvl+0x68/0x84
[   11.800991]  dump_stack+0x20/0x3c
[   11.804302]  sysfs_warn_dup+0x88/0xac
[   11.807965]  sysfs_do_create_link_sd+0xf8/0x100
[   11.812492]  sysfs_create_link+0x48/0x80
[   11.816411]  software_node_notify+0x1a8/0x35c
[   11.820769]  device_create_managed_software_node+0x158/0x1b0
[   11.826428]  iort_named_component_init+0xe0/0x140
[   11.831131]  iort_iommu_configure_id+0xf4/0x270
[   11.835660]  acpi_dma_configure_id+0x160/0x200
[   11.840101]  platform_dma_configure+0xa0/0xa4
[   11.844457]  really_probe.part.0+0x84/0x480
[   11.848639]  __driver_probe_device+0xd4/0x180
[   11.852994]  driver_probe_device+0xf8/0x1e0
[   11.857174]  __driver_attach+0x108/0x220
[   11.861095]  bus_for_each_dev+0xe4/0x154
[   11.865014]  driver_attach+0x38/0x50
[   11.868587]  bus_add_driver+0x1bc/0x2c4
[   11.872419]  driver_register+0xf0/0x210
[   11.876253]  __platform_driver_register+0x48/0x60
[   11.880956]  xhci_plat_init+0x34/0x40
[   11.884616]  do_one_initcall+0xa8/0x270
[   11.888449]  kernel_init_freeable+0x2c0/0x348
[   11.892806]  kernel_init+0x28/0x140
[   11.896295]  ret_from_fork+0x10/0x18
[   11.900062] xhci-hcd xhci-hcd.2.auto: Adding to iommu group 6
[   11.906044] xhci-hcd xhci-hcd.2.auto: xHCI Host Controller
[   11.911566] xhci-hcd xhci-hcd.2.auto: new USB bus registered,
assigned bus number 3
[   11.919702] xhci-hcd xhci-hcd.2.auto: hcc params 0x0220f66d hci
version 0x100 quirks 0x0000000000010010
[   11.929187] xhci-hcd xhci-hcd.2.auto: irq 106, io mem 0x03110000

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-09 14:01                     ` Laurentiu Tudor
@ 2021-09-10 12:05                       ` Laurentiu Tudor
  2021-09-10 12:38                         ` Heikki Krogerus
  0 siblings, 1 reply; 23+ messages in thread
From: Laurentiu Tudor @ 2021-09-10 12:05 UTC (permalink / raw)
  To: Heikki Krogerus
  Cc: Andy Shevchenko, Andy Shevchenko, Jon Nettleton,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List, Lorenzo Pieralisi



On 9/9/2021 5:01 PM, Laurentiu Tudor wrote:
> 
> 
> On 9/9/2021 3:16 PM, Heikki Krogerus wrote:
>> On Thu, Sep 09, 2021 at 03:13:47PM +0300, Heikki Krogerus wrote:
>>> On Tue, Sep 07, 2021 at 06:59:18PM +0300, Laurentiu Tudor wrote:
>>>>
>>>>
>>>> On 7/26/2021 10:59 AM, Laurentiu Tudor wrote:
>>>>>
>>>>>
>>>>> On 7/20/2021 1:27 PM, Andy Shevchenko wrote:
>>>>>> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
>>>>>> <laurentiu.tudor@nxp.com> wrote:
>>>>>>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
>>>>>>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
>>>>>>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
>>>>>>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
>>>>>>>>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
>>>>>>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>>>>>>>>
>>>>>>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>>>>>>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
>>>>>>>>>>>> bumping it in the device_create_managed_software_node() function.
>>>>>>>>>>>>
>>>>>>>>>>>> The error [1] was encountered after adding a .shutdown() op to our
>>>>>>>>>>>> fsl-mc-bus driver.
>>>>>>>>>>>
>>>>>>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
>>>>>>>>>>> later on), I can tell that probably something is wrong in the ->shutdown()
>>>>>>>>>>> method itself.
>>>>>>>>>>
>>>>>>>>>> Isn't the other alternative to just remove the second kobject_put from
>>>>>>>>>> KOBJ_REMOVE ?
>>>>>>>>>
>>>>>>>>> Or maybe on top of Heikki's suggestion, replace the calls to
>>>>>>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
>>>>>>>>
>>>>>>>> _noearn will hide the problem. It was there, it was removed from there.
>>>>>>>> Perhaps we have to understand the root cause better (some specific flow?).
>>>>>>>>
>>>>>>>> Any insight from you on the flow when the issue appears? I.o.w. what happened
>>>>>>>> on the big picture that we got into the warning you see?
>>>>>>>
>>>>>>> I encountered the initial issue when trying to shut down a system booted
>>>>>>> with ACPI but only after adding a .shutdown() callback to our bus driver
>>>>>>> so that the devices are properly taken down. The problem was that
>>>>>>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
>>>>>>> twice leading to an underflow error. My initial proposal was to just
>>>>>>> bump the refcount in device_create_managed_software_node(). The device
>>>>>>> properties that triggered the problem are created here [1].
>>>>>>>
>>>>>>> Heikko suggested that instead of manually incrementing the refcount to
>>>>>>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
>>>>>>> duplicated sysfs entry warning originating in the usb subsystem:
>>>>>>> device_create_managed_software_node() ends up being called twice, once
>>>>>>> here [2] and secondly, the place I previous mentioned [1].
>>>>>>
>>>>>> This [3] is what I have reported against DWC3 when ->shutdown() has
>>>>>> been added there. And here [4] is another thread about the issue with
>>>>>> that callback. The ->release() callback is called at put_device() [5]
>>>>>> and ->shutdown() is called before that [6]. That said, can you inspect
>>>>>> your ->shutdown() implementation once more time and perhaps see if
>>>>>> there is anything that can be amended?
>>>>>>
>>>>>
>>>>> Will do, thanks for the pointers. It could be that we mess something out
>>>>> in how we use the driver model.
>>>>>
>>>>
>>>> Quick (and late, sorry) update from my side. I've spent time on
>>>> debugging our bus, did found some issues but, at least for now, none are
>>>> related to sw node.
>>>> In the mean time, I noticed in the swnode code that
>>>> device_add_software_node() calls software_node_notify(KOBJ_ADD) while
>>>> device_create_managed_software_node() doesn't. Updating [1] the later
>>>> with the call to software_node_notify(KOBJ_ADD) does seem to fix the
>>>> issue I'm seeing.
>>>>
>>>> Could this be a problem? Any comments appreciated.
>>>>
>>>> One more thing perhaps worth mentioning is that, at least for now, there
>>>> are few uses for this device_create_managed_software_node() api,
>>>> mentioning here a couple of them:
>>>>  - arm64 iort code - this seems to be triggering the issue i'm getting
>>>>  - dwc3 usb - Andy reported similar issues here, maybe the issue is common?
>>>>
>>>> [1]
>>>> @@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct
>>>> device *dev,
>>>>         to_swnode(fwnode)->managed = true;
>>>>         set_secondary_fwnode(dev, fwnode);
>>>>
>>>> +       /*
>>>> +        * If the device has been fully registered by the time this
>>>> function is
>>>> +        * called, software_node_notify() must be called separately so
>>>> that the
>>>> +        * symlinks get created and the reference count of the node is
>>>> kept in
>>>> +        * balance.
>>>> +        */
>>>> +       if (device_is_registered(dev))
>>>> +               software_node_notify(dev, KOBJ_ADD);
>>>> +
>>>>         return 0;
>>>>  }
>>>
>>> That should be fixed indeed. Please send that after -rc1 is out.
>>
>> I mean, resend :-)
>>
> 
> Right, actually I just noticed that this is the fix you suggested last
> time we discussed. :-) I forgot about it, sorry.
> There's still the WARN_ON() [1] triggered by the usb subsys, apparently
> happening only (!) in ACPI boot scenario, so +Lorenzo.
> I'll delay the sending a bit to try to understand what's going on.

I've spent some time looking into this and it  turns out that in the
ACPI case, device_create_managed_software_node() ends up being called
twice, first here [1] and after that, in the IORT code here [2]. With
the proposed patch this causes software_node_notify(KOBJ_ADD) being
called twice thus triggering the dup sysfs entry warning.
Any comments / ideas welcomed.

[1]
https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/host.c#L111
[2]
https://elixir.bootlin.com/linux/latest/source/drivers/acpi/arm64/iort.c#L952

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-10 12:05                       ` Laurentiu Tudor
@ 2021-09-10 12:38                         ` Heikki Krogerus
  2021-09-10 13:00                           ` Laurentiu Tudor
  0 siblings, 1 reply; 23+ messages in thread
From: Heikki Krogerus @ 2021-09-10 12:38 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Andy Shevchenko, Andy Shevchenko, Jon Nettleton,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List, Lorenzo Pieralisi

On Fri, Sep 10, 2021 at 03:05:16PM +0300, Laurentiu Tudor wrote:
> 
> 
> On 9/9/2021 5:01 PM, Laurentiu Tudor wrote:
> > 
> > 
> > On 9/9/2021 3:16 PM, Heikki Krogerus wrote:
> >> On Thu, Sep 09, 2021 at 03:13:47PM +0300, Heikki Krogerus wrote:
> >>> On Tue, Sep 07, 2021 at 06:59:18PM +0300, Laurentiu Tudor wrote:
> >>>>
> >>>>
> >>>> On 7/26/2021 10:59 AM, Laurentiu Tudor wrote:
> >>>>>
> >>>>>
> >>>>> On 7/20/2021 1:27 PM, Andy Shevchenko wrote:
> >>>>>> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
> >>>>>> <laurentiu.tudor@nxp.com> wrote:
> >>>>>>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
> >>>>>>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
> >>>>>>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
> >>>>>>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
> >>>>>>>>>> <andriy.shevchenko@linux.intel.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
> >>>>>>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> >>>>>>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
> >>>>>>>>>>>> bumping it in the device_create_managed_software_node() function.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The error [1] was encountered after adding a .shutdown() op to our
> >>>>>>>>>>>> fsl-mc-bus driver.
> >>>>>>>>>>>
> >>>>>>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
> >>>>>>>>>>> later on), I can tell that probably something is wrong in the ->shutdown()
> >>>>>>>>>>> method itself.
> >>>>>>>>>>
> >>>>>>>>>> Isn't the other alternative to just remove the second kobject_put from
> >>>>>>>>>> KOBJ_REMOVE ?
> >>>>>>>>>
> >>>>>>>>> Or maybe on top of Heikki's suggestion, replace the calls to
> >>>>>>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
> >>>>>>>>
> >>>>>>>> _noearn will hide the problem. It was there, it was removed from there.
> >>>>>>>> Perhaps we have to understand the root cause better (some specific flow?).
> >>>>>>>>
> >>>>>>>> Any insight from you on the flow when the issue appears? I.o.w. what happened
> >>>>>>>> on the big picture that we got into the warning you see?
> >>>>>>>
> >>>>>>> I encountered the initial issue when trying to shut down a system booted
> >>>>>>> with ACPI but only after adding a .shutdown() callback to our bus driver
> >>>>>>> so that the devices are properly taken down. The problem was that
> >>>>>>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
> >>>>>>> twice leading to an underflow error. My initial proposal was to just
> >>>>>>> bump the refcount in device_create_managed_software_node(). The device
> >>>>>>> properties that triggered the problem are created here [1].
> >>>>>>>
> >>>>>>> Heikko suggested that instead of manually incrementing the refcount to
> >>>>>>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
> >>>>>>> duplicated sysfs entry warning originating in the usb subsystem:
> >>>>>>> device_create_managed_software_node() ends up being called twice, once
> >>>>>>> here [2] and secondly, the place I previous mentioned [1].
> >>>>>>
> >>>>>> This [3] is what I have reported against DWC3 when ->shutdown() has
> >>>>>> been added there. And here [4] is another thread about the issue with
> >>>>>> that callback. The ->release() callback is called at put_device() [5]
> >>>>>> and ->shutdown() is called before that [6]. That said, can you inspect
> >>>>>> your ->shutdown() implementation once more time and perhaps see if
> >>>>>> there is anything that can be amended?
> >>>>>>
> >>>>>
> >>>>> Will do, thanks for the pointers. It could be that we mess something out
> >>>>> in how we use the driver model.
> >>>>>
> >>>>
> >>>> Quick (and late, sorry) update from my side. I've spent time on
> >>>> debugging our bus, did found some issues but, at least for now, none are
> >>>> related to sw node.
> >>>> In the mean time, I noticed in the swnode code that
> >>>> device_add_software_node() calls software_node_notify(KOBJ_ADD) while
> >>>> device_create_managed_software_node() doesn't. Updating [1] the later
> >>>> with the call to software_node_notify(KOBJ_ADD) does seem to fix the
> >>>> issue I'm seeing.
> >>>>
> >>>> Could this be a problem? Any comments appreciated.
> >>>>
> >>>> One more thing perhaps worth mentioning is that, at least for now, there
> >>>> are few uses for this device_create_managed_software_node() api,
> >>>> mentioning here a couple of them:
> >>>>  - arm64 iort code - this seems to be triggering the issue i'm getting
> >>>>  - dwc3 usb - Andy reported similar issues here, maybe the issue is common?
> >>>>
> >>>> [1]
> >>>> @@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct
> >>>> device *dev,
> >>>>         to_swnode(fwnode)->managed = true;
> >>>>         set_secondary_fwnode(dev, fwnode);
> >>>>
> >>>> +       /*
> >>>> +        * If the device has been fully registered by the time this
> >>>> function is
> >>>> +        * called, software_node_notify() must be called separately so
> >>>> that the
> >>>> +        * symlinks get created and the reference count of the node is
> >>>> kept in
> >>>> +        * balance.
> >>>> +        */
> >>>> +       if (device_is_registered(dev))
> >>>> +               software_node_notify(dev, KOBJ_ADD);
> >>>> +
> >>>>         return 0;
> >>>>  }
> >>>
> >>> That should be fixed indeed. Please send that after -rc1 is out.
> >>
> >> I mean, resend :-)
> >>
> > 
> > Right, actually I just noticed that this is the fix you suggested last
> > time we discussed. :-) I forgot about it, sorry.
> > There's still the WARN_ON() [1] triggered by the usb subsys, apparently
> > happening only (!) in ACPI boot scenario, so +Lorenzo.
> > I'll delay the sending a bit to try to understand what's going on.
> 
> I've spent some time looking into this and it  turns out that in the
> ACPI case, device_create_managed_software_node() ends up being called
> twice, first here [1] and after that, in the IORT code here [2]. With
> the proposed patch this causes software_node_notify(KOBJ_ADD) being
> called twice thus triggering the dup sysfs entry warning.
> Any comments / ideas welcomed.
> 
> [1] https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/host.c#L111

I think the problem here is that the secondary fwnode get's replaced
because the primary fwnode is shared. Can you test it with this, just
to see if the problem goes away:

diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index f29a264635aa1..e4b40f8b8f242 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -76,7 +76,6 @@ int dwc3_host_init(struct dwc3 *dwc)
        }
 
        xhci->dev.parent        = dwc->dev;
-       ACPI_COMPANION_SET(&xhci->dev, ACPI_COMPANION(dwc->dev));
 
        dwc->xhci = xhci;

> [2] https://elixir.bootlin.com/linux/latest/source/drivers/acpi/arm64/iort.c#L952

I didn't yet look at that one.

thanks,

-- 
heikki

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-10 12:38                         ` Heikki Krogerus
@ 2021-09-10 13:00                           ` Laurentiu Tudor
  2021-09-14 14:13                             ` Heikki Krogerus
  0 siblings, 1 reply; 23+ messages in thread
From: Laurentiu Tudor @ 2021-09-10 13:00 UTC (permalink / raw)
  To: Heikki Krogerus
  Cc: Andy Shevchenko, Andy Shevchenko, Jon Nettleton,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List, Lorenzo Pieralisi

Hi Heikki,

On 9/10/2021 3:38 PM, Heikki Krogerus wrote:
> On Fri, Sep 10, 2021 at 03:05:16PM +0300, Laurentiu Tudor wrote:
>>
>>
>> On 9/9/2021 5:01 PM, Laurentiu Tudor wrote:
>>>
>>>
>>> On 9/9/2021 3:16 PM, Heikki Krogerus wrote:
>>>> On Thu, Sep 09, 2021 at 03:13:47PM +0300, Heikki Krogerus wrote:
>>>>> On Tue, Sep 07, 2021 at 06:59:18PM +0300, Laurentiu Tudor wrote:
>>>>>>
>>>>>>
>>>>>> On 7/26/2021 10:59 AM, Laurentiu Tudor wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 7/20/2021 1:27 PM, Andy Shevchenko wrote:
>>>>>>>> On Tue, Jul 20, 2021 at 12:22 PM Laurentiu Tudor
>>>>>>>> <laurentiu.tudor@nxp.com> wrote:
>>>>>>>>> On 7/19/2021 3:22 PM, Andy Shevchenko wrote:
>>>>>>>>>> On Mon, Jul 19, 2021 at 03:00:17PM +0300, Laurentiu Tudor wrote:
>>>>>>>>>>> On 7/16/2021 8:21 PM, Jon Nettleton wrote:
>>>>>>>>>>>> On Fri, Jul 16, 2021 at 2:17 PM Andy Shevchenko
>>>>>>>>>>>> <andriy.shevchenko@linux.intel.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
>>>>>>>>>>>>>> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>>>>>>>>>>>>>> software nodes, thus leading to underflow errors. Balance the refcount by
>>>>>>>>>>>>>> bumping it in the device_create_managed_software_node() function.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The error [1] was encountered after adding a .shutdown() op to our
>>>>>>>>>>>>>> fsl-mc-bus driver.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looking into the history of adding ->shutdown() to dwc3 driver (it got reverted
>>>>>>>>>>>>> later on), I can tell that probably something is wrong in the ->shutdown()
>>>>>>>>>>>>> method itself.
>>>>>>>>>>>>
>>>>>>>>>>>> Isn't the other alternative to just remove the second kobject_put from
>>>>>>>>>>>> KOBJ_REMOVE ?
>>>>>>>>>>>
>>>>>>>>>>> Or maybe on top of Heikki's suggestion, replace the calls to
>>>>>>>>>>> sysfs_create_link() from KOBJ_ADD with sysfs_create_link_nowarn()?
>>>>>>>>>>
>>>>>>>>>> _noearn will hide the problem. It was there, it was removed from there.
>>>>>>>>>> Perhaps we have to understand the root cause better (some specific flow?).
>>>>>>>>>>
>>>>>>>>>> Any insight from you on the flow when the issue appears? I.o.w. what happened
>>>>>>>>>> on the big picture that we got into the warning you see?
>>>>>>>>>
>>>>>>>>> I encountered the initial issue when trying to shut down a system booted
>>>>>>>>> with ACPI but only after adding a .shutdown() callback to our bus driver
>>>>>>>>> so that the devices are properly taken down. The problem was that
>>>>>>>>> software_node_notify(), on KOBJ_REMOVE was dropping the reference count
>>>>>>>>> twice leading to an underflow error. My initial proposal was to just
>>>>>>>>> bump the refcount in device_create_managed_software_node(). The device
>>>>>>>>> properties that triggered the problem are created here [1].
>>>>>>>>>
>>>>>>>>> Heikko suggested that instead of manually incrementing the refcount to
>>>>>>>>> use software_node_notify(KOBJ_ADD). This triggered the second issue, a
>>>>>>>>> duplicated sysfs entry warning originating in the usb subsystem:
>>>>>>>>> device_create_managed_software_node() ends up being called twice, once
>>>>>>>>> here [2] and secondly, the place I previous mentioned [1].
>>>>>>>>
>>>>>>>> This [3] is what I have reported against DWC3 when ->shutdown() has
>>>>>>>> been added there. And here [4] is another thread about the issue with
>>>>>>>> that callback. The ->release() callback is called at put_device() [5]
>>>>>>>> and ->shutdown() is called before that [6]. That said, can you inspect
>>>>>>>> your ->shutdown() implementation once more time and perhaps see if
>>>>>>>> there is anything that can be amended?
>>>>>>>>
>>>>>>>
>>>>>>> Will do, thanks for the pointers. It could be that we mess something out
>>>>>>> in how we use the driver model.
>>>>>>>
>>>>>>
>>>>>> Quick (and late, sorry) update from my side. I've spent time on
>>>>>> debugging our bus, did found some issues but, at least for now, none are
>>>>>> related to sw node.
>>>>>> In the mean time, I noticed in the swnode code that
>>>>>> device_add_software_node() calls software_node_notify(KOBJ_ADD) while
>>>>>> device_create_managed_software_node() doesn't. Updating [1] the later
>>>>>> with the call to software_node_notify(KOBJ_ADD) does seem to fix the
>>>>>> issue I'm seeing.
>>>>>>
>>>>>> Could this be a problem? Any comments appreciated.
>>>>>>
>>>>>> One more thing perhaps worth mentioning is that, at least for now, there
>>>>>> are few uses for this device_create_managed_software_node() api,
>>>>>> mentioning here a couple of them:
>>>>>>  - arm64 iort code - this seems to be triggering the issue i'm getting
>>>>>>  - dwc3 usb - Andy reported similar issues here, maybe the issue is common?
>>>>>>
>>>>>> [1]
>>>>>> @@ -1113,6 +1125,15 @@ int device_create_managed_software_node(struct
>>>>>> device *dev,
>>>>>>         to_swnode(fwnode)->managed = true;
>>>>>>         set_secondary_fwnode(dev, fwnode);
>>>>>>
>>>>>> +       /*
>>>>>> +        * If the device has been fully registered by the time this
>>>>>> function is
>>>>>> +        * called, software_node_notify() must be called separately so
>>>>>> that the
>>>>>> +        * symlinks get created and the reference count of the node is
>>>>>> kept in
>>>>>> +        * balance.
>>>>>> +        */
>>>>>> +       if (device_is_registered(dev))
>>>>>> +               software_node_notify(dev, KOBJ_ADD);
>>>>>> +
>>>>>>         return 0;
>>>>>>  }
>>>>>
>>>>> That should be fixed indeed. Please send that after -rc1 is out.
>>>>
>>>> I mean, resend :-)
>>>>
>>>
>>> Right, actually I just noticed that this is the fix you suggested last
>>> time we discussed. :-) I forgot about it, sorry.
>>> There's still the WARN_ON() [1] triggered by the usb subsys, apparently
>>> happening only (!) in ACPI boot scenario, so +Lorenzo.
>>> I'll delay the sending a bit to try to understand what's going on.
>>
>> I've spent some time looking into this and it  turns out that in the
>> ACPI case, device_create_managed_software_node() ends up being called
>> twice, first here [1] and after that, in the IORT code here [2]. With
>> the proposed patch this causes software_node_notify(KOBJ_ADD) being
>> called twice thus triggering the dup sysfs entry warning.
>> Any comments / ideas welcomed.
>>
>> [1] https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/host.c#L111
> 
> I think the problem here is that the secondary fwnode get's replaced
> because the primary fwnode is shared. Can you test it with this, just
> to see if the problem goes away:
> 
> diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
> index f29a264635aa1..e4b40f8b8f242 100644
> --- a/drivers/usb/dwc3/host.c
> +++ b/drivers/usb/dwc3/host.c
> @@ -76,7 +76,6 @@ int dwc3_host_init(struct dwc3 *dwc)
>         }
>  
>         xhci->dev.parent        = dwc->dev;
> -       ACPI_COMPANION_SET(&xhci->dev, ACPI_COMPANION(dwc->dev));
>  
>         dwc->xhci = xhci;


Thanks for looking into this! Yes, this does make the issue go away.

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-07-16 10:16 [PATCH] software node: balance refcount for managed sw nodes laurentiu.tudor
  2021-07-16 10:34 ` Laurentiu Tudor
  2021-07-16 12:17 ` Andy Shevchenko
@ 2021-09-14 14:00 ` Heikki Krogerus
  2 siblings, 0 replies; 23+ messages in thread
From: Heikki Krogerus @ 2021-09-14 14:00 UTC (permalink / raw)
  To: laurentiu.tudor
  Cc: andriy.shevchenko, gregkh, rafael, linux-acpi, linux-kernel, jon

On Fri, Jul 16, 2021 at 01:16:02PM +0300, laurentiu.tudor@nxp.com wrote:
> From: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> 
> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> software nodes, thus leading to underflow errors. Balance the refcount by
> bumping it in the device_create_managed_software_node() function.
> 
> The error [1] was encountered after adding a .shutdown() op to our
> fsl-mc-bus driver.
> 
> [1]
> pc : refcount_warn_saturate+0xf8/0x150
> lr : refcount_warn_saturate+0xf8/0x150
> sp : ffff80001009b920
> x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
> x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
> x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
> x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
> x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
> x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
> x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
> x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
> Call trace:
>  refcount_warn_saturate+0xf8/0x150
>  kobject_put+0x10c/0x120
>  software_node_notify+0xd8/0x140
>  device_platform_notify+0x4c/0xb4
>  device_del+0x188/0x424
>  fsl_mc_device_remove+0x2c/0x4c
>  rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
>  device_for_each_child+0x5c/0xac
>  dprc_remove+0x9c/0xc0
>  fsl_mc_driver_remove+0x28/0x64
>  __device_release_driver+0x188/0x22c
>  device_release_driver+0x30/0x50
>  bus_remove_device+0x128/0x134
>  device_del+0x16c/0x424
>  fsl_mc_bus_remove+0x8c/0x114
>  fsl_mc_bus_shutdown+0x14/0x20
>  platform_shutdown+0x28/0x40
>  device_shutdown+0x15c/0x330
>  __do_sys_reboot+0x218/0x2a0
>  __arm64_sys_reboot+0x28/0x34
>  invoke_syscall+0x48/0x114
>  el0_svc_common+0x40/0xdc
>  do_el0_svc+0x2c/0x94
>  el0_svc+0x2c/0x54
>  el0t_64_sync_handler+0xa8/0x12c
>  el0t_64_sync+0x198/0x19c
> ---[ end trace 32eb1c71c7d86821 ]---
> 
> Reported-by: Jon Nettleton <jon@solid-run.com>
> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>

FWIW:

Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>

> ---
> Changes since RFC:
>  - use software_node_notify(KOBJ_ADD) instead of directly bumping
>    refcount (Heikki)
> 
>  drivers/base/swnode.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
> index d1f1a8240120..bdb50a06c82a 100644
> --- a/drivers/base/swnode.c
> +++ b/drivers/base/swnode.c
> @@ -1113,6 +1113,9 @@ int device_create_managed_software_node(struct device *dev,
>  	to_swnode(fwnode)->managed = true;
>  	set_secondary_fwnode(dev, fwnode);
>  
> +	if (device_is_registered(dev))
> +		software_node_notify(dev, KOBJ_ADD);
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(device_create_managed_software_node);
> -- 
> 2.17.1

thanks,

-- 
heikki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-10 13:00                           ` Laurentiu Tudor
@ 2021-09-14 14:13                             ` Heikki Krogerus
  0 siblings, 0 replies; 23+ messages in thread
From: Heikki Krogerus @ 2021-09-14 14:13 UTC (permalink / raw)
  To: Laurentiu Tudor
  Cc: Andy Shevchenko, Andy Shevchenko, Jon Nettleton,
	Greg Kroah-Hartman, Rafael J . Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List, Lorenzo Pieralisi

On Fri, Sep 10, 2021 at 04:00:49PM +0300, Laurentiu Tudor wrote:
> >> I've spent some time looking into this and it  turns out that in the
> >> ACPI case, device_create_managed_software_node() ends up being called
> >> twice, first here [1] and after that, in the IORT code here [2]. With
> >> the proposed patch this causes software_node_notify(KOBJ_ADD) being
> >> called twice thus triggering the dup sysfs entry warning.
> >> Any comments / ideas welcomed.
> >>
> >> [1] https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/host.c#L111
> > 
> > I think the problem here is that the secondary fwnode get's replaced
> > because the primary fwnode is shared. Can you test it with this, just
> > to see if the problem goes away:
> > 
> > diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
> > index f29a264635aa1..e4b40f8b8f242 100644
> > --- a/drivers/usb/dwc3/host.c
> > +++ b/drivers/usb/dwc3/host.c
> > @@ -76,7 +76,6 @@ int dwc3_host_init(struct dwc3 *dwc)
> >         }
> >  
> >         xhci->dev.parent        = dwc->dev;
> > -       ACPI_COMPANION_SET(&xhci->dev, ACPI_COMPANION(dwc->dev));
> >  
> >         dwc->xhci = xhci;
> 
> 
> Thanks for looking into this! Yes, this does make the issue go away.

We need to think about how to solve this one. The problem is that we
have to share the ACPI node between the parent dwc3 device and child
xHCI, but at the same xHCI needs to have its own software node.

The fwnode->secondary pointer does not quite bend to this. If the
primary fwnode is shared, the secondary fwnode has to be shared as
well.

thanks,

-- 
heikki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-27 12:42   ` Laurentiu Tudor
  2021-09-27 13:34     ` Greg KH
@ 2021-09-27 13:36     ` Heikki Krogerus
  1 sibling, 0 replies; 23+ messages in thread
From: Heikki Krogerus @ 2021-09-27 13:36 UTC (permalink / raw)
  To: Laurentiu Tudor; +Cc: Greg KH, jon, rafael.j.wysocki, stable

On Mon, Sep 27, 2021 at 03:42:42PM +0300, Laurentiu Tudor wrote:
> 
> On 9/27/2021 3:17 PM, Greg KH wrote:
> > On Mon, Sep 27, 2021 at 01:22:49PM +0300, Laurentiu Tudor wrote:
> >> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> >> software nodes, thus leading to underflow errors. Balance the refcount by
> >> bumping it in the device_create_managed_software_node() function.
> >>
> >> The error [1] was encountered after adding a .shutdown() op to our
> >> fsl-mc-bus driver.
> >>
> >> [Backported to stable from mainline commit
> >> 5aeb05b27f81 ("software node: balance refcount for managed software nodes")]
> 
> But ...
> 
> >> [1]
> >> pc : refcount_warn_saturate+0xf8/0x150
> >> lr : refcount_warn_saturate+0xf8/0x150
> >> sp : ffff80001009b920
> >> x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
> >> x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
> >> x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
> >> x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
> >> x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
> >> x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
> >> x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
> >> x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
> >> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
> >> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
> >> Call trace:
> >>  refcount_warn_saturate+0xf8/0x150
> >>  kobject_put+0x10c/0x120
> >>  software_node_notify+0xd8/0x140
> >>  device_platform_notify+0x4c/0xb4
> >>  device_del+0x188/0x424
> >>  fsl_mc_device_remove+0x2c/0x4c
> >>  rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
> >>  device_for_each_child+0x5c/0xac
> >>  dprc_remove+0x9c/0xc0
> >>  fsl_mc_driver_remove+0x28/0x64
> >>  __device_release_driver+0x188/0x22c
> >>  device_release_driver+0x30/0x50
> >>  bus_remove_device+0x128/0x134
> >>  device_del+0x16c/0x424
> >>  fsl_mc_bus_remove+0x8c/0x114
> >>  fsl_mc_bus_shutdown+0x14/0x20
> >>  platform_shutdown+0x28/0x40
> >>  device_shutdown+0x15c/0x330
> >>  __do_sys_reboot+0x218/0x2a0
> >>  __arm64_sys_reboot+0x28/0x34
> >>  invoke_syscall+0x48/0x114
> >>  el0_svc_common+0x40/0xdc
> >>  do_el0_svc+0x2c/0x94
> >>  el0_svc+0x2c/0x54
> >>  el0t_64_sync_handler+0xa8/0x12c
> >>  el0t_64_sync+0x198/0x19c
> >> ---[ end trace 32eb1c71c7d86821 ]---
> >>
> >> Fixes: 151f6ff78cdf ("software node: Provide replacement for device_add_properties()")
> >> Reported-by: Jon Nettleton <jon@solid-run.com>
> >> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> >> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> >> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> >> Cc: <stable@vger.kernel.org> # 5.12+
> >> ---
> >>  drivers/base/swnode.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> > 
> > Next time, please include the git commit id of this patch that is
> > already in Linus's tree so that I don't have to go and manually look it
> > up...
> > 
> 
> ... i did mention the mainline commit in the description. Maybe there's
> a mix up with the previous submission, or i should have used a special
> syntax that i'm not aware of...

The syntax is defined in the documentation:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

> Anyway, thanks for picking this up.

thanks,

-- 
heikki

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-27 12:42   ` Laurentiu Tudor
@ 2021-09-27 13:34     ` Greg KH
  2021-09-27 13:36     ` Heikki Krogerus
  1 sibling, 0 replies; 23+ messages in thread
From: Greg KH @ 2021-09-27 13:34 UTC (permalink / raw)
  To: Laurentiu Tudor; +Cc: heikki.krogerus, jon, rafael.j.wysocki, stable

On Mon, Sep 27, 2021 at 03:42:42PM +0300, Laurentiu Tudor wrote:
> 
> On 9/27/2021 3:17 PM, Greg KH wrote:
> > On Mon, Sep 27, 2021 at 01:22:49PM +0300, Laurentiu Tudor wrote:
> >> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> >> software nodes, thus leading to underflow errors. Balance the refcount by
> >> bumping it in the device_create_managed_software_node() function.
> >>
> >> The error [1] was encountered after adding a .shutdown() op to our
> >> fsl-mc-bus driver.
> >>
> >> [Backported to stable from mainline commit
> >> 5aeb05b27f81 ("software node: balance refcount for managed software nodes")]
> 
> But ...

Ah, missed that up here in the middle of the changelog text, sorry.

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-27 12:17 ` Greg KH
@ 2021-09-27 12:42   ` Laurentiu Tudor
  2021-09-27 13:34     ` Greg KH
  2021-09-27 13:36     ` Heikki Krogerus
  0 siblings, 2 replies; 23+ messages in thread
From: Laurentiu Tudor @ 2021-09-27 12:42 UTC (permalink / raw)
  To: Greg KH; +Cc: heikki.krogerus, jon, rafael.j.wysocki, stable


On 9/27/2021 3:17 PM, Greg KH wrote:
> On Mon, Sep 27, 2021 at 01:22:49PM +0300, Laurentiu Tudor wrote:
>> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
>> software nodes, thus leading to underflow errors. Balance the refcount by
>> bumping it in the device_create_managed_software_node() function.
>>
>> The error [1] was encountered after adding a .shutdown() op to our
>> fsl-mc-bus driver.
>>
>> [Backported to stable from mainline commit
>> 5aeb05b27f81 ("software node: balance refcount for managed software nodes")]

But ...

>> [1]
>> pc : refcount_warn_saturate+0xf8/0x150
>> lr : refcount_warn_saturate+0xf8/0x150
>> sp : ffff80001009b920
>> x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
>> x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
>> x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
>> x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
>> x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
>> x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
>> x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
>> x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
>> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
>> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
>> Call trace:
>>  refcount_warn_saturate+0xf8/0x150
>>  kobject_put+0x10c/0x120
>>  software_node_notify+0xd8/0x140
>>  device_platform_notify+0x4c/0xb4
>>  device_del+0x188/0x424
>>  fsl_mc_device_remove+0x2c/0x4c
>>  rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
>>  device_for_each_child+0x5c/0xac
>>  dprc_remove+0x9c/0xc0
>>  fsl_mc_driver_remove+0x28/0x64
>>  __device_release_driver+0x188/0x22c
>>  device_release_driver+0x30/0x50
>>  bus_remove_device+0x128/0x134
>>  device_del+0x16c/0x424
>>  fsl_mc_bus_remove+0x8c/0x114
>>  fsl_mc_bus_shutdown+0x14/0x20
>>  platform_shutdown+0x28/0x40
>>  device_shutdown+0x15c/0x330
>>  __do_sys_reboot+0x218/0x2a0
>>  __arm64_sys_reboot+0x28/0x34
>>  invoke_syscall+0x48/0x114
>>  el0_svc_common+0x40/0xdc
>>  do_el0_svc+0x2c/0x94
>>  el0_svc+0x2c/0x54
>>  el0t_64_sync_handler+0xa8/0x12c
>>  el0t_64_sync+0x198/0x19c
>> ---[ end trace 32eb1c71c7d86821 ]---
>>
>> Fixes: 151f6ff78cdf ("software node: Provide replacement for device_add_properties()")
>> Reported-by: Jon Nettleton <jon@solid-run.com>
>> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
>> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
>> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
>> Cc: <stable@vger.kernel.org> # 5.12+
>> ---
>>  drivers/base/swnode.c | 3 +++
>>  1 file changed, 3 insertions(+)
> 
> Next time, please include the git commit id of this patch that is
> already in Linus's tree so that I don't have to go and manually look it
> up...
> 

... i did mention the mainline commit in the description. Maybe there's
a mix up with the previous submission, or i should have used a special
syntax that i'm not aware of...
Anyway, thanks for picking this up.

---
Best Regards, Laurentiu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] software node: balance refcount for managed sw nodes
  2021-09-27 10:22 Laurentiu Tudor
@ 2021-09-27 12:17 ` Greg KH
  2021-09-27 12:42   ` Laurentiu Tudor
  0 siblings, 1 reply; 23+ messages in thread
From: Greg KH @ 2021-09-27 12:17 UTC (permalink / raw)
  To: Laurentiu Tudor; +Cc: heikki.krogerus, jon, rafael.j.wysocki, stable

On Mon, Sep 27, 2021 at 01:22:49PM +0300, Laurentiu Tudor wrote:
> software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
> software nodes, thus leading to underflow errors. Balance the refcount by
> bumping it in the device_create_managed_software_node() function.
> 
> The error [1] was encountered after adding a .shutdown() op to our
> fsl-mc-bus driver.
> 
> [Backported to stable from mainline commit
> 5aeb05b27f81 ("software node: balance refcount for managed software nodes")]
> 
> [1]
> pc : refcount_warn_saturate+0xf8/0x150
> lr : refcount_warn_saturate+0xf8/0x150
> sp : ffff80001009b920
> x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
> x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
> x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
> x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
> x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
> x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
> x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
> x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
> x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
> Call trace:
>  refcount_warn_saturate+0xf8/0x150
>  kobject_put+0x10c/0x120
>  software_node_notify+0xd8/0x140
>  device_platform_notify+0x4c/0xb4
>  device_del+0x188/0x424
>  fsl_mc_device_remove+0x2c/0x4c
>  rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
>  device_for_each_child+0x5c/0xac
>  dprc_remove+0x9c/0xc0
>  fsl_mc_driver_remove+0x28/0x64
>  __device_release_driver+0x188/0x22c
>  device_release_driver+0x30/0x50
>  bus_remove_device+0x128/0x134
>  device_del+0x16c/0x424
>  fsl_mc_bus_remove+0x8c/0x114
>  fsl_mc_bus_shutdown+0x14/0x20
>  platform_shutdown+0x28/0x40
>  device_shutdown+0x15c/0x330
>  __do_sys_reboot+0x218/0x2a0
>  __arm64_sys_reboot+0x28/0x34
>  invoke_syscall+0x48/0x114
>  el0_svc_common+0x40/0xdc
>  do_el0_svc+0x2c/0x94
>  el0_svc+0x2c/0x54
>  el0t_64_sync_handler+0xa8/0x12c
>  el0t_64_sync+0x198/0x19c
> ---[ end trace 32eb1c71c7d86821 ]---
> 
> Fixes: 151f6ff78cdf ("software node: Provide replacement for device_add_properties()")
> Reported-by: Jon Nettleton <jon@solid-run.com>
> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> Cc: <stable@vger.kernel.org> # 5.12+
> ---
>  drivers/base/swnode.c | 3 +++
>  1 file changed, 3 insertions(+)

Next time, please include the git commit id of this patch that is
already in Linus's tree so that I don't have to go and manually look it
up...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] software node: balance refcount for managed sw nodes
@ 2021-09-27 10:22 Laurentiu Tudor
  2021-09-27 12:17 ` Greg KH
  0 siblings, 1 reply; 23+ messages in thread
From: Laurentiu Tudor @ 2021-09-27 10:22 UTC (permalink / raw)
  To: gregkh; +Cc: heikki.krogerus, jon, rafael.j.wysocki, Laurentiu Tudor, stable

software_node_notify(), on KOBJ_REMOVE drops the refcount twice on managed
software nodes, thus leading to underflow errors. Balance the refcount by
bumping it in the device_create_managed_software_node() function.

The error [1] was encountered after adding a .shutdown() op to our
fsl-mc-bus driver.

[Backported to stable from mainline commit
5aeb05b27f81 ("software node: balance refcount for managed software nodes")]

[1]
pc : refcount_warn_saturate+0xf8/0x150
lr : refcount_warn_saturate+0xf8/0x150
sp : ffff80001009b920
x29: ffff80001009b920 x28: ffff1a2420318000 x27: 0000000000000000
x26: ffffccac15e7a038 x25: 0000000000000008 x24: ffffccac168e0030
x23: ffff1a2428a82000 x22: 0000000000080000 x21: ffff1a24287b5000
x20: 0000000000000001 x19: ffff1a24261f4400 x18: ffffffffffffffff
x17: 6f72645f726f7272 x16: 0000000000000000 x15: ffff80009009b607
x14: 0000000000000000 x13: ffffccac16602670 x12: 0000000000000a17
x11: 000000000000035d x10: ffffccac16602670 x9 : ffffccac16602670
x8 : 00000000ffffefff x7 : ffffccac1665a670 x6 : ffffccac1665a670
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1a2420318000
Call trace:
 refcount_warn_saturate+0xf8/0x150
 kobject_put+0x10c/0x120
 software_node_notify+0xd8/0x140
 device_platform_notify+0x4c/0xb4
 device_del+0x188/0x424
 fsl_mc_device_remove+0x2c/0x4c
 rebofind sp.c__fsl_mc_device_remove+0x14/0x2c
 device_for_each_child+0x5c/0xac
 dprc_remove+0x9c/0xc0
 fsl_mc_driver_remove+0x28/0x64
 __device_release_driver+0x188/0x22c
 device_release_driver+0x30/0x50
 bus_remove_device+0x128/0x134
 device_del+0x16c/0x424
 fsl_mc_bus_remove+0x8c/0x114
 fsl_mc_bus_shutdown+0x14/0x20
 platform_shutdown+0x28/0x40
 device_shutdown+0x15c/0x330
 __do_sys_reboot+0x218/0x2a0
 __arm64_sys_reboot+0x28/0x34
 invoke_syscall+0x48/0x114
 el0_svc_common+0x40/0xdc
 do_el0_svc+0x2c/0x94
 el0_svc+0x2c/0x54
 el0t_64_sync_handler+0xa8/0x12c
 el0t_64_sync+0x198/0x19c
---[ end trace 32eb1c71c7d86821 ]---

Fixes: 151f6ff78cdf ("software node: Provide replacement for device_add_properties()")
Reported-by: Jon Nettleton <jon@solid-run.com>
Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: <stable@vger.kernel.org> # 5.12+
---
 drivers/base/swnode.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index d1f1a8240120..bdb50a06c82a 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -1113,6 +1113,9 @@ int device_create_managed_software_node(struct device *dev,
 	to_swnode(fwnode)->managed = true;
 	set_secondary_fwnode(dev, fwnode);
 
+	if (device_is_registered(dev))
+		software_node_notify(dev, KOBJ_ADD);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(device_create_managed_software_node);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-09-27 13:37 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16 10:16 [PATCH] software node: balance refcount for managed sw nodes laurentiu.tudor
2021-07-16 10:34 ` Laurentiu Tudor
2021-07-16 12:17 ` Andy Shevchenko
2021-07-16 17:21   ` Jon Nettleton
2021-07-19 12:00     ` Laurentiu Tudor
2021-07-19 12:22       ` Andy Shevchenko
2021-07-20  9:20         ` Laurentiu Tudor
2021-07-20 10:27           ` Andy Shevchenko
2021-07-26  7:59             ` Laurentiu Tudor
2021-09-07 15:59               ` Laurentiu Tudor
2021-09-09 12:13                 ` Heikki Krogerus
2021-09-09 12:16                   ` Heikki Krogerus
2021-09-09 14:01                     ` Laurentiu Tudor
2021-09-10 12:05                       ` Laurentiu Tudor
2021-09-10 12:38                         ` Heikki Krogerus
2021-09-10 13:00                           ` Laurentiu Tudor
2021-09-14 14:13                             ` Heikki Krogerus
2021-09-14 14:00 ` Heikki Krogerus
2021-09-27 10:22 Laurentiu Tudor
2021-09-27 12:17 ` Greg KH
2021-09-27 12:42   ` Laurentiu Tudor
2021-09-27 13:34     ` Greg KH
2021-09-27 13:36     ` Heikki Krogerus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.