[PATCH 1/2] drivers: core: Don't try to use a dead glue_dir

* [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
       [not found] ` <CA+55aFxR0qg0yY-NWnH0DDruVWw8qRqp8=CRLq13p=TyxosJKw@mail.gmail.com>
@ 2018-06-29  2:21   ` Benjamin Herrenschmidt
  2018-06-30 19:45     ` Linus Torvalds
  2018-07-07 16:51     ` Greg Kroah-Hartman
  2018-06-29  2:21   ` [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier Benjamin Herrenschmidt
       [not found]   ` <edc7b03b9550ddcf1291ebf5a6dafd24f4455c23.camel@kernel.crashing.org>
  2 siblings, 2 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-06-29  2:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley, linux-kernel

Under some circumstances (such as when using kobject debugging)
a gluedir whose kref is 0 might remain in the class kset for
a long time. The reason is that we don't actively remove glue
dirs when they become empty, but instead rely on the implicit
removal done by kobject_release(), which can happen some amount
of time after the last kobject_put().

Using such a dead object is a bad idea and will lead to warnings
and crashes.

Unfortunately that can happen in get_device_parent() if the
last child of a glue dir was removed and a new one added
before the glue dir gets fully released().

This prevents this by making get_device_parent() only "find"
a glue dir whose refcount is non-0.

While this fixes the crash, it doesn't fully fix the problem,
instead the race will now result in an error attempting to
use a duplicate file name in sysfs. A fix for that will come
separately.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

(Adding lkml, I just realized I completely forgot to CC it in
the first place on this whole conversation, blame the 1am debugging
session)

 drivers/base/core.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b610816eb887..e9eff2099896 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1517,11 +1517,13 @@ static struct kobject *get_device_parent(struct device *dev,
 
 		/* find our class-directory at the parent and reference it */
 		spin_lock(&dev->class->p->glue_dirs.list_lock);
-		list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
+		list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) {
 			if (k->parent == parent_kobj) {
-				kobj = kobject_get(k);
-				break;
+				kobj = kobject_get_unless_zero(k);
+				if (kobj)
+					break;
 			}
+		}
 		spin_unlock(&dev->class->p->glue_dirs.list_lock);
 		if (kobj) {
 			mutex_unlock(&gdp_mutex);


^ permalink raw reply related	[flat|nested] 42+ messages in thread