devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Fix errors on DT overlay removal with devlinks
@ 2020-10-14 19:36 Michael Auchter
  2020-10-14 19:36 ` [RFC PATCH 1/3] of: unittest: add test of overlay " Michael Auchter
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Michael Auchter @ 2020-10-14 19:36 UTC (permalink / raw)
  To: devicetree, linux-kernel
  Cc: saravanak, robh+dt, frowand.list, gregkh, rafael, Michael Auchter

After updating to v5.9, I've started seeing errors in the kernel log
when using device tree overlays. Specifically, the problem seems to
happen when removing a device tree overlay that contains two devices
with some dependency between them (e.g., a device that provides a clock
and a device that consumes that clock). Removing such an overlay results
in:

  OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy
  OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy

followed by hitting some REFCOUNT_WARNs in refcount.c

In the first patch, I've included a unittest that can be used to
reproduce this when built with CONFIG_OF_UNITTEST [1].

I believe the issue is caused by the cleanup performed when releasing
the devlink device that's created to represent the dependency between
devices. The devlink device has references to the consumer and supplier
devices, which it drops in device_link_free; the devlink device's
release callback calls device_link_free via call_srcu.

When the overlay is being removed, all devices are removed, and
eventually the release callback for the devlink device run, and
schedules cleanup using call_srcu. Before device_link_free can and call
put_device on the consumer/supplier, the rest of the overlay removal
process runs, resulting in the error traces above.

Patches 2 and 3 are an attempt at fixing this: call srcu_barrier to wait
for any pending device_link_free's to execute before continuing on with
the removal process.

These patches resolve the issue, but probably not in the best way. In
particular, it seems strange to need to leak details of devlinks into
the device tree overlay code. So, I'd be curious to get some feedback or
hear any other ideas for how to resolve this issue.

Thanks,
 Michael

1. Note that this isn't a very good unit test: it will report a "pass"
   even if it fails with the aforementioned errors, as these errors
   aren't propogated.

Michael Auchter (3):
  of: unittest: add test of overlay with devlinks
  driver core: add device_links_barrier
  of: dynamic: add device links barrier before detach

 drivers/base/core.c                     | 10 ++++++++++
 drivers/of/dynamic.c                    |  3 +++
 drivers/of/unittest-data/Makefile       |  1 +
 drivers/of/unittest-data/overlay_16.dts | 26 +++++++++++++++++++++++++
 drivers/of/unittest.c                   | 16 +++++++++++++++
 include/linux/device.h                  |  1 +
 6 files changed, 57 insertions(+)
 create mode 100644 drivers/of/unittest-data/overlay_16.dts

-- 
2.25.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-29 21:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14 19:36 [RFC PATCH 0/3] Fix errors on DT overlay removal with devlinks Michael Auchter
2020-10-14 19:36 ` [RFC PATCH 1/3] of: unittest: add test of overlay " Michael Auchter
2020-10-14 19:36 ` [RFC PATCH 2/3] driver core: add device_links_barrier Michael Auchter
2020-10-14 19:36 ` [RFC PATCH 3/3] of: dynamic: add device links barrier before detach Michael Auchter
2020-10-15 21:34 ` [RFC PATCH 0/3] Fix errors on DT overlay removal with devlinks Frank Rowand
2020-10-21 21:02 ` Frank Rowand
2020-10-26 19:10   ` Saravana Kannan
2020-10-28 16:25     ` Michael Auchter
2020-10-28 18:03       ` Saravana Kannan
2020-10-29 20:54       ` Frank Rowand
2020-10-29 21:13         ` Michael Auchter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).