linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
       [not found] ` <CA+55aFxR0qg0yY-NWnH0DDruVWw8qRqp8=CRLq13p=TyxosJKw@mail.gmail.com>
@ 2018-06-29  2:21   ` Benjamin Herrenschmidt
  2018-06-30 19:45     ` Linus Torvalds
  2018-07-07 16:51     ` Greg Kroah-Hartman
  2018-06-29  2:21   ` [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier Benjamin Herrenschmidt
       [not found]   ` <edc7b03b9550ddcf1291ebf5a6dafd24f4455c23.camel@kernel.crashing.org>
  2 siblings, 2 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-06-29  2:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley, linux-kernel

Under some circumstances (such as when using kobject debugging)
a gluedir whose kref is 0 might remain in the class kset for
a long time. The reason is that we don't actively remove glue
dirs when they become empty, but instead rely on the implicit
removal done by kobject_release(), which can happen some amount
of time after the last kobject_put().

Using such a dead object is a bad idea and will lead to warnings
and crashes.

Unfortunately that can happen in get_device_parent() if the
last child of a glue dir was removed and a new one added
before the glue dir gets fully released().

This prevents this by making get_device_parent() only "find"
a glue dir whose refcount is non-0.

While this fixes the crash, it doesn't fully fix the problem,
instead the race will now result in an error attempting to
use a duplicate file name in sysfs. A fix for that will come
separately.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

(Adding lkml, I just realized I completely forgot to CC it in
the first place on this whole conversation, blame the 1am debugging
session)

 drivers/base/core.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b610816eb887..e9eff2099896 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1517,11 +1517,13 @@ static struct kobject *get_device_parent(struct device *dev,
 
 		/* find our class-directory at the parent and reference it */
 		spin_lock(&dev->class->p->glue_dirs.list_lock);
-		list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
+		list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) {
 			if (k->parent == parent_kobj) {
-				kobj = kobject_get(k);
-				break;
+				kobj = kobject_get_unless_zero(k);
+				if (kobj)
+					break;
 			}
+		}
 		spin_unlock(&dev->class->p->glue_dirs.list_lock);
 		if (kobj) {
 			mutex_unlock(&gdp_mutex);


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
       [not found] ` <CA+55aFxR0qg0yY-NWnH0DDruVWw8qRqp8=CRLq13p=TyxosJKw@mail.gmail.com>
  2018-06-29  2:21   ` [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir Benjamin Herrenschmidt
@ 2018-06-29  2:21   ` Benjamin Herrenschmidt
  2018-06-29 13:56     ` Linus Torvalds
       [not found]   ` <edc7b03b9550ddcf1291ebf5a6dafd24f4455c23.camel@kernel.crashing.org>
  2 siblings, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-06-29  2:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley, linux-kernel

For devices with a class, we create a "glue" directory between
the parent device and the new device with the class name.

This directory is never "explicitely" removed when empty however,
this is left to the implicit sysfs removal done by kobjects when
they are released on the last kobject_put().

This is problematic because as long as it's not been removed from
sysfs, it is still present in the class kset and in sysfs directory
structure.

The presence in the class kset exposes a use after free bug fixed
by the previous patch, but the presence in sysfs means that until
the kobject is released, which can take a while (especially with
kobject debugging), any attempt at re-creating such as binding a
new device for that class/parent pair, will result in a sysfs
duplicate file name error.

This fixes it by instead doing an explicit kobject_del() when
the glue dir is empty, by keeping track of the number of
child devices of the gluedir.

This is made easy by the fact that all glue dir operations are
done with a global mutex, and there's already a function
(cleanup_glue_dir) called in all the right places taking that
mutex that can be enhanced for this.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

(Adding lkml, I just realized I completely forgot to CC it in
the first place on this whole conversation, blame the 1am debugging
session)

 drivers/base/core.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index e9eff2099896..66e15daa9980 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1436,6 +1436,13 @@ struct kobject *virtual_device_parent(struct device *dev)
 struct class_dir {
 	struct kobject kobj;
 	struct class *class;
+
+	/*
+	 * This counts the number of children of the gluedir in
+	 * order to "cleanly" kobject_del() it when it becomes
+	 * empty. It is protected by the gdb_mutex
+	 */
+	unsigned int child_count;
 };
 
 #define to_class_dir(obj) container_of(obj, struct class_dir, kobj)
@@ -1470,6 +1477,7 @@ class_dir_create_and_add(struct class *class, struct kobject *parent_kobj)
 		return NULL;
 
 	dir->class = class;
+	dir->child_count = 1;
 	kobject_init(&dir->kobj, &class_dir_ktype);
 
 	dir->kobj.kset = &class->p->glue_dirs;
@@ -1526,6 +1534,8 @@ static struct kobject *get_device_parent(struct device *dev,
 		}
 		spin_unlock(&dev->class->p->glue_dirs.list_lock);
 		if (kobj) {
+			struct class_dir *class_dir = to_class_dir(kobj);
+			class_dir->child_count++;
 			mutex_unlock(&gdp_mutex);
 			return kobj;
 		}
@@ -1567,11 +1577,18 @@ static inline struct kobject *get_glue_dir(struct device *dev)
  */
 static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
 {
+	struct class_dir *class_dir = to_class_dir(glue_dir);
+
 	/* see if we live in a "glue" directory */
 	if (!live_in_glue_dir(glue_dir, dev))
 		return;
 
 	mutex_lock(&gdp_mutex);
+	if (WARN_ON(class_dir->child_count == 0))
+		goto bail;
+	if (--class_dir->child_count == 0)
+		kobject_del(glue_dir);
+ bail:
 	kobject_put(glue_dir);
 	mutex_unlock(&gdp_mutex);
 }

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-06-29  2:21   ` [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier Benjamin Herrenschmidt
@ 2018-06-29 13:56     ` Linus Torvalds
  2018-06-29 13:57       ` Linus Torvalds
  2018-06-30  1:04       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2018-06-29 13:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Thu, Jun 28, 2018 at 7:22 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> This fixes it by instead doing an explicit kobject_del() when
> the glue dir is empty, by keeping track of the number of
> child devices of the gluedir.

Ugh. I was hoping that you'd just do the "only check duplicate names
if actually in use" instead.

This patch seems to make patch 1/2 pointless. No?

                  Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-06-29 13:56     ` Linus Torvalds
@ 2018-06-29 13:57       ` Linus Torvalds
  2018-06-30  1:04       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2018-06-29 13:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Fri, Jun 29, 2018 at 6:56 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This patch seems to make patch 1/2 pointless. No?

.. well, maybe not "pointless", since it seems to be a good idea
regardless, but you get what I mean, I think.

             Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-06-29 13:56     ` Linus Torvalds
  2018-06-29 13:57       ` Linus Torvalds
@ 2018-06-30  1:04       ` Benjamin Herrenschmidt
  2018-06-30  3:51         ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-06-30  1:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Fri, 2018-06-29 at 06:56 -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 7:22 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > This fixes it by instead doing an explicit kobject_del() when
> > the glue dir is empty, by keeping track of the number of
> > child devices of the gluedir.
> 
> Ugh. I was hoping that you'd just do the "only check duplicate names
> if actually in use" instead.

I had a look (see my other email). It's non-trivial. We can still look
into it, but from what I gathered of how sysfs works, it's based on
kernfs which doesn't have the kobjects nor access to the reference
count, and is the holder of the names rbtree.

So we'd need to find a way to convey that "in-use" information down to
kernfs (I thought maybe adding an optional pointer to a kref ? but
then, it's still somewhat racy ...)

Additionally, we would need a few more pairs of eyes to make sure that
sticking duplicates in that rbtree isn't going to break some corner
case in there.

It's a code base I'm not at all familiar with. So while I agree with
the overall idea that a kobject whose reference has gone down to 0
should probably be ignored by sysfs, actually doing it doesn't seem
simple.

> This patch seems to make patch 1/2 pointless. No?

Somewhat. It's stil the "right thing" to do in case a hole shows up
elsewhere, and it's an easier stable backport. But yes. I wrote it
before I figured out what to do with 2/2 (I was still trying to do what
you wanted and just skip ref=0 in sysfs name lookups).

But yes, with 2/2, there shouldn't be a kobject with a 0 ref in the
list anymore... hopefully.

I would very much appreciate the opinion of someone more familiar with
sysfs and the device core on all this though.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-06-30  1:04       ` Benjamin Herrenschmidt
@ 2018-06-30  3:51         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-06-30  3:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Sat, 2018-06-30 at 11:04 +1000, Benjamin Herrenschmidt wrote:
> I had a look (see my other email). It's non-trivial. We can still look
> into it, but from what I gathered of how sysfs works, it's based on
> kernfs which doesn't have the kobjects nor access to the reference
> count, and is the holder of the names rbtree.
> 
> So we'd need to find a way to convey that "in-use" information down to
> kernfs (I thought maybe adding an optional pointer to a kref ? but
> then, it's still somewhat racy ...)
> 
> Additionally, we would need a few more pairs of eyes to make sure that
> sticking duplicates in that rbtree isn't going to break some corner
> case in there.

Just to clarify, I will look into it, but it will take more time so I'm
keen on having the low hanging fixes in now.

Also in general, I dislike the idea of leaving things in sysfs with a
refcount of 0. That "late" cleanup in kobject_release() looks to me
more like a band-aid than a feature.

I'd be almost tempted to stick a WARN_ON in there and see who gets hit
but I worry it's going send me down one of those rabbit holes and my
time is limited these days ;-)

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-06-29  2:21   ` [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir Benjamin Herrenschmidt
@ 2018-06-30 19:45     ` Linus Torvalds
  2018-07-07 16:48       ` Greg Kroah-Hartman
  2018-07-07 16:51     ` Greg Kroah-Hartman
  1 sibling, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2018-06-30 19:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Thu, Jun 28, 2018 at 7:21 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> Under some circumstances (such as when using kobject debugging)
> a gluedir whose kref is 0 might remain in the class kset for
> a long time. The reason is that we don't actively remove glue
> dirs when they become empty, but instead rely on the implicit
> removal done by kobject_release(), which can happen some amount
> of time after the last kobject_put().
>
> Using such a dead object is a bad idea and will lead to warnings
> and crashes.

So with the other patch in mind, here's my comments on this one. Pick
one of two scenarios:

 (a) it's obviously correct.

     We obviously can *not* take an object with a zero refcount,
because it is already been scheduled for kobject_cleanup(), and
incrementing the refcount is simply fundamentally wrong, because
incrementing the refcount won't unschedule the deletion of the object.

 (b) the patch is wrong, and our "kobject_get()" should cancel the
kobject_cleanup() instead.

There are problems with both of the above cases.

The "patch is obviously correct" case leads to another issue: why
would kobject_get() _ever_ succeed on an object wioth a zero refcount?
IOW, why do we have kobject_get() vs kobject_get_unless_zero() in the
first place? It is *never* ok to get an kobject with a zero refcount
because of the above "it's already scheduled for deletion" issue.

The (b) case sounds nice, and would actually fix the problem that
patch 2/2 was tryihng to address, and would make
CONFIG_DEBUG_KOBJECT_RELEASE work.

HOWEVER. It's completely untenable in reality - it's a nightmare from
a locking standpoint, because kref_put() literally depends not on
locking, but on the exclusive "went to zero".

So I think (b) is practically not acceptable. Which means that (a) is
the right reaction, and "kobject_get()" on an object with a zero
refcount is _always_ wrong.

But that says that "yes, the patch is obviously correct", but it also
says "the patch should be pointless, because kobject_get() should just
_always_ have the semantics of "kobject_get_unless_zero()", and the
latter shouldn't even exist.

Greg? When would it possibly be valid to do "kobject_get()" on a zero
refcount object? I don't see it. But this is all very much your code.

               Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
       [not found]       ` <7eb06b499f2be366cf68c6b6588b16c603e6a567.camel@kernel.crashing.org>
@ 2018-07-01  2:07         ` Linus Torvalds
  2018-07-01  2:18           ` Linus Torvalds
  2018-07-01  3:42           ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2018-07-01  2:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Linux Kernel Mailing List
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley

[ Ben - feel free to post the missing emails to lkml too ]

On Sat, Jun 30, 2018 at 6:56 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Sat, 2018-06-30 at 12:29 -0700, Linus Torvalds wrote:
> >
> > Because normally, the last kobject_put() really does do a synchronous
> > kobject_del(). So normally, this is all completely pointless, and the
> > normal kobject lifetime rules should be sufficient.
>
> Not entirely sadly....
>
> There's a small race between the kref going down to 0 and the last
> kobject_put(). Something might still "catch" the 0-reference object
> during that window.

That's only true if the underlying subsystem doesn't have any
serialization itself.

Which I don't think is normal.

IOW, if you have (say) a PCI hotplug model, the PCI layer will have
the pci_hp_mutex, for example, which protects the PCI hotplug slot
lists and the kobj things.

Those locks won't protect kobj races in _general_ (ie there is no
locking between two totally unrelated buses), but they *should*
serialize the case of a device being added within one class. No?

If not, maybe we need to add some class-based serialization.

> I think it just opens an existing race more widely. The race always
> exist becaues another CPU can observe the object between the reference
> going to 0 and the last kobject_del done by kobject_release.

No it really damn well can't, exactly because we should not allow it -
and allowing it would be fundamentally racy.

We should never allow a kobject_get() with a zero reference count.
It's always a *fundamental* bug - see my other email on your 1/2
patch.

So there is no race, because the reference count itself protects the
object. Either another CPU gets it in time (before it went down to
zero), or it went down to zero and it is dead (because at that point,
deletion will have been scheduled.

Note that this is entirely independent of the auto-clean actions,
since even with absolutely zero cleanup, you still have the
fundamental issue of "reference count goes to zero causes the object
to be free'd".

              Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01  2:07         ` Linus Torvalds
@ 2018-07-01  2:18           ` Linus Torvalds
  2018-07-01  3:49             ` Benjamin Herrenschmidt
  2018-07-01  3:42           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2018-07-01  2:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Linux Kernel Mailing List
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley

On Sat, Jun 30, 2018 at 7:07 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Those locks won't protect kobj races in _general_ (ie there is no
> locking between two totally unrelated buses), but they *should*
> serialize the case of a device being added within one class. No?

Side note: there had *better* be some locking whenever there is a way
to find an object, because otherwise you have a fundamental lifetime
problem: one thread finding the object at the same time another thread
frees it for the last time. Even the "unless_zero()" won't fix it,
because the final free will release the underlying object itself, so
the "zero" state is ephemeral.

That locking might be just RCU during lookup, and rcu-delaying the
release, of course.  I think that's all the sysfs code needs, for
example, since that's what lookup uses.

And for any other embedded kobj cases, where you can reach the object
using some random subsystem pointers, there had better be other
locking in place for that pointer lookup vs the last removal.

kobject itself doesn't provide that locking, it only provides the
reference counting. But that's partly why it really has to disallow
any kobject_get() of a zero object, because it means that the
tear-down has been started, but the tear-down itself may not have had
time to get the lock yet (ie kobject_release() may be just about to
call the t->release() function).

But maybe I'm missing something subtle.

               Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01  2:07         ` Linus Torvalds
  2018-07-01  2:18           ` Linus Torvalds
@ 2018-07-01  3:42           ` Benjamin Herrenschmidt
  2018-07-01  3:57             ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-01  3:42 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley

On Sat, 2018-06-30 at 19:07 -0700, Linus Torvalds wrote:
> [ Ben - feel free to post the missing emails to lkml too ]
> 
> On Sat, Jun 30, 2018 at 6:56 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > On Sat, 2018-06-30 at 12:29 -0700, Linus Torvalds wrote:
> > > 
> > > Because normally, the last kobject_put() really does do a synchronous
> > > kobject_del(). So normally, this is all completely pointless, and the
> > > normal kobject lifetime rules should be sufficient.
> > 
> > Not entirely sadly....
> > 
> > There's a small race between the kref going down to 0 and the last
> > kobject_put(). Something might still "catch" the 0-reference object
> > during that window.
> 
> That's only true if the underlying subsystem doesn't have any
> serialization itself.
> 
> Which I don't think is normal.
> 
> IOW, if you have (say) a PCI hotplug model, the PCI layer will have
> the pci_hp_mutex, for example, which protects the PCI hotplug slot
> lists and the kobj things.
> 
> Those locks won't protect kobj races in _general_ (ie there is no
> locking between two totally unrelated buses), but they *should*
> serialize the case of a device being added within one class. No?
> 
> If not, maybe we need to add some class-based serialization.

Well, yes and no ... yes I agree there should be upper level
serialization such that you can't hit the "add" path until the "del"
path has completed.

And this will work as long as the kobject_put done at the end of "del"
is indeed the very last one.

But what if something *else* still holds a reference to the kobject ?
It could be anything really... the driver itself, that is now unbound,
might have a client that's holding a stale reference because some
chardev is still open, for example. I don't know if sysfs can also hold
one because a users still has an open directory there, but in the
general case I don't thing one can assume that the kobject_put() done
by the end of the "owner" subsystem delete path (in our example
device_del) is the very last kobject_put.

And since some other random thing might be doing that last put, it may
also do it without any of the upper level locking/serialization.

Which is the reason I so dislike the whole "cleanup sysfs in the last
kobject_put" we do in kobject_release(). This is also why we clearly
differenciate in the device model device_del from the final device_put
etc...

We generally separeate "visibility to the outside world" from the
actual object lifetime, the latter being what the kref controls.

 .. except when we dont, such as this gluedir case. And I think that's
wrong in principle.

This is the reason why I prefer the approach of explicitely doing the
kobject_del() instead of kobject_put() in the tail of device_del (what
this 2/2 patch does), because it restores that separation between the
object being "active/visible" or not from the actual lifetime of the
structure.

> > I think it just opens an existing race more widely. The race always
> > exist becaues another CPU can observe the object between the reference
> > going to 0 and the last kobject_del done by kobject_release.
> 
> No it really damn well can't, exactly because we should not allow it -
> and allowing it would be fundamentally racy.
> 
> We should never allow a kobject_get() with a zero reference count.
> It's always a *fundamental* bug - see my other email on your 1/2
> patch.

It is and there's a WARN_ON about it inside kobject_get(). I don't
think anybody argues against that, you are absolutely right.

That said, the fact that it happens is I think the result of a deeper
problem of conflating the visibility of the object and its refcount,
and the fact that we do the sysfs "cleanup" (visibility) after we've
dropped the last reference.

Yes, the race is generally not going to happen if the put() done by the
owning subystem is indeed the last one, because as you said, the
subsystem should have higher level locking to precent racing between
removal and addition.

However, it *can* happen because by definition kobjects can have
references held elsewhere for any reason, and those can be dropped
outside of any subsystem locking.

> So there is no race, because the reference count itself protects the
> object. Either another CPU gets it in time (before it went down to
> zero), or it went down to zero and it is dead (because at that point,
> deletion will have been scheduled.
> 
> Note that this is entirely independent of the auto-clean actions,
> since even with absolutely zero cleanup, you still have the
> fundamental issue of "reference count goes to zero causes the object
> to be free'd".

Yes.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01  2:18           ` Linus Torvalds
@ 2018-07-01  3:49             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-01  3:49 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley

On Sat, 2018-06-30 at 19:18 -0700, Linus Torvalds wrote:
> On Sat, Jun 30, 2018 at 7:07 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > 
> > Those locks won't protect kobj races in _general_ (ie there is no
> > locking between two totally unrelated buses), but they *should*
> > serialize the case of a device being added within one class. No?
> 
> Side note: there had *better* be some locking whenever there is a way
> to find an object, because otherwise you have a fundamental lifetime
> problem: one thread finding the object at the same time another thread
> frees it for the last time. Even the "unless_zero()" won't fix it,
> because the final free will release the underlying object itself, so
> the "zero" state is ephemeral.

Right. We agree. The problem here relates to that built-in cleanup in
kobject_release() which violates that rule: it removes the object from
sysfs (and from any kset) after the reference has been dropped.

Users such as the gluedir code try to workaround this by using outer
mutexes or locks, but it's fundamentally racy if there's any chance
that another 3rd party holds the last reference, since in that case,
the cleanup happens outside of any added locking.

That's why I tend to think that anything that relies on that "late
removal from sysfs" is broken by design :) Anything that uses kobject
should ensure they are taken out of sysfs and their respective ksets
before the last reference drops.

Now it's probably all fine for most cases, I assume sysfs itself checks
the reference to be non-0 in lookup path (I haven't checked), at least
to prevent use-after-free. This is still open to the duplicate name
issue. The bigger issue of use-after-free only happens in the rare
cases where there's a separate list being maintained and cleaned up
late.

The gluedir is such an example because it "manually" thrawls the kset
list.

So you are right, kobject_get should probably always be _unless_zero,
but my point is we even with that, we probably still need to ensure the
gluedir is taken out of the kset before we drop the last reference, so
this can be done with the appropriate upper layer subsystem locking,
and thus without racing with a new device being bound.
 
> That locking might be just RCU during lookup, and rcu-delaying the
> release, of course.  I think that's all the sysfs code needs, for
> example, since that's what lookup uses.
> 
> And for any other embedded kobj cases, where you can reach the object
> using some random subsystem pointers, there had better be other
> locking in place for that pointer lookup vs the last removal.
> 
> kobject itself doesn't provide that locking, it only provides the
> reference counting. But that's partly why it really has to disallow
> any kobject_get() of a zero object, because it means that the
> tear-down has been started, but the tear-down itself may not have had
> time to get the lock yet (ie kobject_release() may be just about to
> call the t->release() function).
> 
> But maybe I'm missing something subtle.
> 
>                Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
       [not found]     ` <CA+55aFxS7OVEN5XrxceC5ibz780mhn-qRa50w1gVFjsz2JjMbw@mail.gmail.com>
       [not found]       ` <7eb06b499f2be366cf68c6b6588b16c603e6a567.camel@kernel.crashing.org>
@ 2018-07-01  3:52       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-01  3:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley, linux-kernel

(This is a resend with lkml added for background & archival purposes)

On Sat, 2018-06-30 at 12:29 -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 7:19 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > For devices with a class, we create a "glue" directory between
> > the parent device and the new device with the class name.
> > 
> > This directory is never "explicitely" removed when empty however,
> > this is left to the implicit sysfs removal done by kobjects when
> > they are released on the last kobject_put().
> > 
> > This is problematic because as long as it's not been removed from
> > sysfs, it is still present in the class kset and in sysfs directory
> > structure.
> 
> Ok, so I don't hate the patch per se, but looking around, I think this
> is still wrong in the end.
> 
> Why?
> 
> Because normally, the last kobject_put() really does do a synchronous
> kobject_del(). So normally, this is all completely pointless, and the
> normal kobject lifetime rules should be sufficient.

Not entirely sadly....

There's a small race between the kref going down to 0 and the last
kobject_put(). Something might still "catch" the 0-reference object
during that window. 

In this specific case our bacon is mostly saved by the gdp_mutex which
is taken by cleanup_glue_dir() around what *should* be the lats
kobject_put.... but what if it isn't ?

If anything else happens to hold a reference to the directory object
(open files in sysfs maybe ?), then the last put will come from
elsewhere and will happen without that mutex being held, thus re-
opening the tiny race.

Is this possible ?
 
> So I *think* your problem happens because you have
> CONFIG_DEBUG_KOBJECT_RELEASE enabled, and that intentionally delays
> the cleanup.

I think it just opens an existing race more widely. The race always
exist becaues another CPU can observe the object between the reference
going to 0 and the last kobject_del done by kobject_release. That's one
main reason why I dislike this "auto-clean" mechanism.

One other way to solve it, which I just thought about, could be to,
inside kobject_put() itself, check that the reference is *1* and do
kobject_del() before the last kref_put. That does mean that somebody
can snatch it in that window after it's been removed from sysfs though,
is that ok ? It won't crash I suppose...

> This is actually not really what DEBUG_KOBJECT_RELEASE is really
> documented to do.  It is documented as a "let's debug problems where
> drivers think deletion is immediate", but the sysfs interaction with
> the same-name issue really smells different.
> 
> So what the patch does is basically to just fight
> DEBUG_KOBJECT_RELEASE delaying rules, and that kind of stinks.

Not entirely, it fight an existing race that DEBUG_KOBJECT_RELEASE just
opens more widely.

> To me, it really feels like either we should see the
> DEBUG_KOBJECT_RELEASE rules are "real" (in which case fighting them is
> wrong), or we should admit that DEBUG_KOBJECT_RELEASE causes problems
> (in which case we should probably try to fix the debug aid).
> 
> Ben, can you confirm that your problem just goes away if you don't
> select DEBUG_KOBJECT_RELEASE?

The easily reproducable crash goes away, because the device I've been
observing these is a tiny slow single core ARM embedded thing. But I'm
not sure the therical race is solved.

> Greg - comments? The pattern of "remove last device, add a new device
> of the same class" really seems to be a valid pattern, and
> CONFIG_DEBUG_KOBJECT_RELEASE seems to actively break it.
> 
> Could we perhaps add a synthetic test for exactly this pattern (add a
> silly device with a bogus class, remove it, and add another device
> with the same class)?
> 
>                  Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01  3:42           ` Benjamin Herrenschmidt
@ 2018-07-01  3:57             ` Linus Torvalds
  2018-07-01  7:16               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2018-07-01  3:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Sat, Jun 30, 2018 at 8:42 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> But what if something *else* still holds a reference to the kobject ?
> It could be anything really... t

But that's fine. Then the object will continue to exist, and the sysfs
file will continue to exist, and you won't get a new glue directory.

In fact, what you describe is a problem with *your* patch, exactly
because you introduce another counter that is *not* the reference
count, and now you have the problem with "old directory kobject is
still live, but I removed the sysfs part, and now I'm creating a new
object with the same name".

Hmm?

> It is and there's a WARN_ON about it inside kobject_get(). I don't
> think anybody argues against that, you are absolutely right.

No.  That the zero kobject_get() will not result in a warning. It just
does a kref_get(), no warnings anywhere.

Yes, there is a kobject_get() warning, but that's about an
_uninitialized_ kobject, not a already released one! You will get no
warning that I can see from the "oh, you just got a stale one".

           Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01  3:57             ` Linus Torvalds
@ 2018-07-01  7:16               ` Benjamin Herrenschmidt
  2018-07-01 17:04                 ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-01  7:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Sat, 2018-06-30 at 20:57 -0700, Linus Torvalds wrote:
> On Sat, Jun 30, 2018 at 8:42 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > But what if something *else* still holds a reference to the kobject ?
> > It could be anything really... t
> 
> But that's fine. Then the object will continue to exist, and the sysfs
> file will continue to exist, and you won't get a new glue directory.

I suspect you didn't read it my entire argument or I wasn't clear
enough :-) This is actually the crux of the problem:

Yes the object continues to exist. However, the *last* kobject_put to
it will no longer be done under whatever higher level locking the
subsystem provides (whatever prevents for example concurrent add and
removes).

Thus in that scenario the "last minute" kobject_release() done by the
last kobject_put() will be effectively unprotected from for example the
gdp_mutex (in the case of the gluedirs) or whatever other locking the
subsystem owning the kobject is using to avoid making that "refount 0"
object "discoverable".

So not only the conitnued existence of the object will potentially
trigger the duplicate sysfs name problem, but it *will* also re-open
the window for the object to be temporarily be visible with a 0
refcount.

The kobject debugging doesn't create a new race here. It just
significantly increase the size of an existing race window.

You argued yourself that the reason that window is a non-issue without
kobject debugging is that the expectation is that the release happens
synchronously with the last kobject_put done by device_del, which has
appropriate higher-level locking.

My point is that this specific assumption doesn't hold in the case
where another reference exists. In that case the object will be freed
with the race open for others to observe it while its refcount is 0.

> In fact, what you describe is a problem with *your* patch, exactly
> because you introduce another counter that is *not* the reference
> count, and now you have the problem with "old directory kobject is
> still live, but I removed the sysfs part, and now I'm creating a new
> object with the same name".
> 
> Hmm?

So my patch 1/2 prevents us from finding the old dying object (and thus
from crashing) but replaces this with the duplicate name problem.

My patch 2/2 removes that second problem by ensuring we remove the
object from sysfs synchronously in device_del when it no longer
contains any children, explicitely rather than implicitely by the
virtue of doing the "last" kobject_put.

You'll notice the existing code wraps that kobject_put() with the
gdp_mutex, specifically I suppose to synronize with the creation path
of the gluedir, ie with the expectation that this kobject_put() will
synchronously remove the object from the kset.

I argue this assumption is flawed, that another reference could delay
the removal from the kset to a point where the mutex isn't held, and
thus the race re-opens.

My patch doesn't implement a separate "refcount" per-se, it implements
a "childcount". This is akin to mm_users vs mm_count.

We want the glue dir to be removed when it has no more children, we
currently "conflate" this with the kobject having no reference. This is
what I argue is incorrect. The kobject can have "unrelated" references,
that define the lifetime of the object in memory, but it's presence in
sysfs is dictated by the number of children it contains.

> > It is and there's a WARN_ON about it inside kobject_get(). I don't
> > think anybody argues against that, you are absolutely right.
> 
> No.  That the zero kobject_get() will not result in a warning. It just
> does a kref_get(), no warnings anywhere.

It's there but it's in refcount:

void refcount_inc(refcount_t *r)
{
	WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
}
EXPORT_SYMBOL(refcount_inc);

In fact that's how I started digging into that problem in the first place :-)

> Yes, there is a kobject_get() warning, but that's about an
> _uninitialized_ kobject, not a already released one! You will get no
> warning that I can see from the "oh, you just got a stale one".

Cheers,
Ben.

>            Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01  7:16               ` Benjamin Herrenschmidt
@ 2018-07-01 17:04                 ` Linus Torvalds
  2018-07-01 23:36                   ` Benjamin Herrenschmidt
  2018-07-02 10:22                   ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2018-07-01 17:04 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Sun, Jul 1, 2018 at 12:16 AM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> I suspect you didn't read it my entire argument or I wasn't clear
> enough :-) This is actually the crux of the problem:
>
> Yes the object continues to exist. However, the *last* kobject_put to
> it will no longer be done under whatever higher level locking the
> subsystem provides (whatever prevents for example concurrent add and
> removes).

Well, yes and no.

Why "no"?

The last dropping is actually not necessarily that interesting.
Especially with the sysfs interface, we basically know that you can
look up the object using RCU (since that's what the filesystem lookup
does), and that basically means that the refcount is always the final
serialization mechanism.There is nothing else that can possibly lock
it.

So this is where we disagree:

> Thus in that scenario the "last minute" kobject_release() done by the
> last kobject_put() will be effectively unprotected from for example the
> gdp_mutex (in the case of the gluedirs) or whatever other locking the
> subsystem owning the kobject is using to avoid making that "refount 0"
> object "discoverable".

No. *Fundamentally*, there is only one thing that protects that
object: the refcount.

And my argument is that anything that has this model (which means
anything that has any sysfs linkage, which pretty much means any
kobject) absolutely *must* use "kobject_get_unless_zero()" unless it
had an existing stable pointer and just wants to increase the refcount
(ie "already got a reference throuigh one of my data structures that
use the lock")

But if you have that model, that means that the "last drop" is
actually almost totally irrelevant.  Because it doesn't matter if it's
done inside the lock or not - you know that once it has been done,
that object is entirely gone. It's not discoverable any more -
regardless of locking. The discoverability is basically controlled
entirely by the refcount.

So what happens then?

The locking isn't important for the last release, but it *is*
important for new object *creation*.

Why?

The refcount means that once an object is gone, it's gone for
*everyone*. It's a one-way thing, and it's thread-safe. So the code
that does *creation* can do this:

 - get subsystem lock
 - look up object (using the "unless_zero()" model)
 - if you got the object, re-use it, and you're done: drop lock and return
 - otherwise, you know that nobody else can get it either
 - create new object and instantiate it
 - drop lock

and this means that you create a new object IFF the old object had its
refcount drop to zero. So you always have exactly one copy (or no copy
at all, in between the last drop and the creation of the new one).

See? The lack of locking at drop time didn't matter.  The refcount
itself serialized things.

So the above is what I *think* the "glue_dir" logic should be. No need
for any other count. Just re-use the old glue dir if you can find it,
and create a new one if you can't.

The one important thing is that the object lookup above needs to use
the lookup needs to find the object all the way until the refcount has
become zero. And that actually means that the object MUST NOT be
removed from the object lists until *after* the refcount has been
decremented to zero. Which is actually why that "automatic cleanup"
that you hate is actually an integral and important part of the
process: removing the object *before* the refcount went to zero is
broken, because that means that the "look up object" phase can now
miss an object that still has a non-zero refcount.

> So my patch 1/2 prevents us from finding the old dying object (and thus
> from crashing) but replaces this with the duplicate name problem.

So I absolutely agree with your patch 1/2. My argument against it is
actually that I think the "unless_zero" thing needs to be more
universal.

> My patch 2/2 removes that second problem by ensuring we remove the
> object from sysfs synchronously in device_del when it no longer
> contains any children, explicitely rather than implicitely by the
> virtue of doing the "last" kobject_put.

No. See above. The reason I think your patch 2/2 is wrong is that is
actually *breaks* the above model, exactly because of that thing that
you hatre.

The explicit removal is actively wrong for the "I want to reuse the
object" model, exactly because it happens before the refcount has gone
to zero.

> > No.  That the zero kobject_get() will not result in a warning. It just
> > does a kref_get(), no warnings anywhere.
>
> It's there but it's in refcount:
>
> void refcount_inc(refcount_t *r)
> {
>         WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
> }
> EXPORT_SYMBOL(refcount_inc);
>
> In fact that's how I started digging into that problem in the first place :-)

Hey, you are again hitting this because of extra config options.

Because the refcount_inc() that I found looks like this:

  static inline void refcount_inc(refcount_t *r)
  {
          atomic_inc(&r->refs);
  }

and has no warning.

I wonder how many people actually run with REFCOUNT_FULL that warns -
because it's way too expensive. It's not set in the default Fedora
config, for example, and that's despite how Fedora tends to set all
the other debug options.

So no, it really doesn't warn normally.

                Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01 17:04                 ` Linus Torvalds
@ 2018-07-01 23:36                   ` Benjamin Herrenschmidt
  2018-07-02 10:23                     ` Benjamin Herrenschmidt
  2018-07-02 10:22                   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-01 23:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Sun, 2018-07-01 at 10:04 -0700, Linus Torvalds wrote:
> 
> So this is where we disagree:
> 
> > Thus in that scenario the "last minute" kobject_release() done by the
> > last kobject_put() will be effectively unprotected from for example the
> > gdp_mutex (in the case of the gluedirs) or whatever other locking the
> > subsystem owning the kobject is using to avoid making that "refount 0"
> > object "discoverable".
> 
> No. *Fundamentally*, there is only one thing that protects that
> object: the refcount.
> 
> And my argument is that anything that has this model (which means
> anything that has any sysfs linkage, which pretty much means any
> kobject) absolutely *must* use "kobject_get_unless_zero()" unless it
> had an existing stable pointer and just wants to increase the refcount
> (ie "already got a reference throuigh one of my data structures that
> use the lock")
> 
> But if you have that model, that means that the "last drop" is
> actually almost totally irrelevant.  Because it doesn't matter if it's
> done inside the lock or not - you know that once it has been done,
> that object is entirely gone. It's not discoverable any more -
> regardless of locking. The discoverability is basically controlled
> entirely by the refcount.

This is almost true... the issue there is that gap of time between the
refcount going to 0 and kobject_release() actually doing the
kobject_del. Thus the object will be present in the kset for example,
and thus discoverable, while its refcount is 0.

Yes, you are absolutely right, anything that can find such a dead
object must be using kobject_get_unless_zero(). This is what patch 1/2
does and it fixes the basic crashing race.

But we are then exposed to the duplicate name issue on fast bind/unbind
or add/remove.

> So what happens then?
> 
> The locking isn't important for the last release, but it *is*
> important for new object *creation*.
> 
> Why?
> 
> The refcount means that once an object is gone, it's gone for
> *everyone*. It's a one-way thing, and it's thread-safe. So the code
> that does *creation* can do this:
> 
>  - get subsystem lock
>  - look up object (using the "unless_zero()" model)
>  - if you got the object, re-use it, and you're done: drop lock and return
>  - otherwise, you know that nobody else can get it either
>  - create new object and instantiate it
>  - drop lock
> 
> and this means that you create a new object IFF the old object had its
> refcount drop to zero. So you always have exactly one copy (or no copy
> at all, in between the last drop and the creation of the new one).

Except that create will occasionally fail due to duplicate names. So we
need to fix that one way or another (see below)

> See? The lack of locking at drop time didn't matter.  The refcount
> itself serialized things.

Oh I absolutely agree with all that when it comes to object lifetime, I
don't completely agree when it comes to visibility to the outside
world, see below.

> So the above is what I *think* the "glue_dir" logic should be. No need
> for any other count. Just re-use the old glue dir if you can find it,
> and create a new one if you can't.
> 
> The one important thing is that the object lookup above needs to use
> the lookup needs to find the object all the way until the refcount has
> become zero. And that actually means that the object MUST NOT be
> removed from the object lists until *after* the refcount has been
> decremented to zero. Which is actually why that "automatic cleanup"
> that you hate is actually an integral and important part of the
> process: removing the object *before* the refcount went to zero is
> broken, because that means that the "look up object" phase can now
> miss an object that still has a non-zero refcount.

In the case of gluedir that is perfectly fine though. It's a bit like
having an open unlinked file. You want to "unlink" the gluedir
synchronously with the last of its children going away, in order to be
able to create a new one as soon as a new children comes in. However
you don't want to "free" the underlying kobject until after all users
have dropped their reference.

What I don't like is that idea of conflating discoverability and
lifetime. We may just have to agree to disagree on that one, but unless
we can make kernfs&sysfs prevent "finding" of a 0-ref object and thus
allow creating of a duplicate in that case (which isn't that simple a
problem to solve), we still have a problem.  

> > So my patch 1/2 prevents us from finding the old dying object (and thus
> > from crashing) but replaces this with the duplicate name problem.
> 
> So I absolutely agree with your patch 1/2. My argument against it is
> actually that I think the "unless_zero" thing needs to be more
> universal.

I agree with that, absolutely. kobject_get() should probably just be an
unless_zero variant always rather than a special case.

> > My patch 2/2 removes that second problem by ensuring we remove the
> > object from sysfs synchronously in device_del when it no longer
> > contains any children, explicitely rather than implicitely by the
> > virtue of doing the "last" kobject_put.
> 
> No. See above. The reason I think your patch 2/2 is wrong is that is
> actually *breaks* the above model, exactly because of that thing that
> you hatre.
> 
> The explicit removal is actively wrong for the "I want to reuse the
> object" model, exactly because it happens before the refcount has gone
> to zero.

That's indeed where we disagree.

In fact that's the entire point of the gdb_mutex today, ie the code was
clearly written with the *intention* that kobject_put() in device_del
is the last thing happening to the glue dir so that it gets
synchronously removed from the kset and sysfs along with the last
child.

This is the only reason why it makes any sense to be taking that mutex
around kobject_put.

It does so in order to avoid the race alltogether.

And that assumption in the code today is wrong because there could be
another reference being held, delaying the removal from the kset and
sysfs to outside of that mutex.

So kobject debugging's delayed freeing doesn't create a new race here,
it just exposes more widely an existing one.

So to get back to "how do we fix things", I think we agree with patch 1
and we agree that we should probably make kobject_get unconditionally
be a "unless zero" version accross the board, but let's Greg chime in
here before we do anything drastic.

This fixes the use-after-free and crash. However it does open us to a
duplicate name window where we'll fail to re-bind or re-add a device
due to a late un-reference. Kobject debugging makes this very likely to
happen by delaying the freeing by seconds, but without it, there' still
a small window.

There's two ways we've proposed to fix this. One is my patch 2 which
you don't like, which basically is equivalent to saying let's unlink
the gluedir synchronously with the last children going away, and keep
the kobeject refcount strictly as a mean to deal with the liftime of
the structure (freeing it). It uses the existing gdp_mutex and just
makes the existing assumptions in the code work.

The other way is what you prefer, which I agree is a good idea, but it
will take me at least some time to figure out a way to actually
implement it, and it is to essentially change sysfs so that objects
that have a refount of 0 aren't discoverable anymore, so that we don't
name-clash on trying to create a new one.

It's tricky because all the name business is handled by kernfs which
has no idea about kobjects of krefs. My initial idea would be to add an
optional pointer to a kref to kernfs entries, so it can check the
refcount on lookups and name collision checks. I'm not entirely certain
how to work out the locking/ordering for this though, we might need to
add memory barriers in kobject_put to match.

> > > No.  That the zero kobject_get() will not result in a warning. It just
> > > does a kref_get(), no warnings anywhere.
> > 
> > It's there but it's in refcount:
> > 
> > void refcount_inc(refcount_t *r)
> > {
> >         WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
> > }
> > EXPORT_SYMBOL(refcount_inc);
> > 
> > In fact that's how I started digging into that problem in the first place :-)
> 
> Hey, you are again hitting this because of extra config options.

Ah possibly. I enabled every debug option I could find :-)
> 
> Because the refcount_inc() that I found looks like this:
> 
>   static inline void refcount_inc(refcount_t *r)
>   {
>           atomic_inc(&r->refs);
>   }
> 
> and has no warning.
> 
> I wonder how many people actually run with REFCOUNT_FULL that warns -
> because it's way too expensive. It's not set in the default Fedora
> config, for example, and that's despite how Fedora tends to set all
> the other debug options.
> 
> So no, it really doesn't warn normally.

Yup, you're right.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01 17:04                 ` Linus Torvalds
  2018-07-01 23:36                   ` Benjamin Herrenschmidt
@ 2018-07-02 10:22                   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-02 10:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Sun, 2018-07-01 at 10:04 -0700, Linus Torvalds wrote:
> On Sun, Jul 1, 2018 at 12:16 AM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > I suspect you didn't read it my entire argument or I wasn't clear
> > enough :-) This is actually the crux of the problem:
> > 
> > Yes the object continues to exist. However, the *last* kobject_put to
> > it will no longer be done under whatever higher level locking the
> > subsystem provides (whatever prevents for example concurrent add and
> > removes).
> 
> Well, yes and no.
> 
> Why "no"?
> 
> The last dropping is actually not necessarily that interesting.
> Especially with the sysfs interface, we basically know that you can
> look up the object using RCU (since that's what the filesystem lookup
> does), and that basically means that the refcount is always the final
> serialization mechanism.There is nothing else that can possibly lock
> it.
> 
> So this is where we disagree:
> 
> > Thus in that scenario the "last minute" kobject_release() done by the
> > last kobject_put() will be effectively unprotected from for example the
> > gdp_mutex (in the case of the gluedirs) or whatever other locking the
> > subsystem owning the kobject is using to avoid making that "refount 0"
> > object "discoverable".
> 
> No. *Fundamentally*, there is only one thing that protects that
> object: the refcount.
> 
> And my argument is that anything that has this model (which means
> anything that has any sysfs linkage, which pretty much means any
> kobject) absolutely *must* use "kobject_get_unless_zero()" unless it
> had an existing stable pointer and just wants to increase the refcount
> (ie "already got a reference throuigh one of my data structures that
> use the lock")
> 
> But if you have that model, that means that the "last drop" is
> actually almost totally irrelevant.  Because it doesn't matter if it's
> done inside the lock or not - you know that once it has been done,
> that object is entirely gone. It's not discoverable any more -
> regardless of locking. The discoverability is basically controlled
> entirely by the refcount.
> 
> So what happens then?
> 
> The locking isn't important for the last release, but it *is*
> important for new object *creation*.
> 
> Why?
> 
> The refcount means that once an object is gone, it's gone for
> *everyone*. It's a one-way thing, and it's thread-safe. So the code
> that does *creation* can do this:
> 
>  - get subsystem lock
>  - look up object (using the "unless_zero()" model)
>  - if you got the object, re-use it, and you're done: drop lock and return
>  - otherwise, you know that nobody else can get it either
>  - create new object and instantiate it
>  - drop lock
> 
> and this means that you create a new object IFF the old object had its
> refcount drop to zero. So you always have exactly one copy (or no copy
> at all, in between the last drop and the creation of the new one).
> 
> See? The lack of locking at drop time didn't matter.  The refcount
> itself serialized things.
> 
> So the above is what I *think* the "glue_dir" logic should be. No need
> for any other count. Just re-use the old glue dir if you can find it,
> and create a new one if you can't.
> 
> The one important thing is that the object lookup above needs to use
> the lookup needs to find the object all the way until the refcount has
> become zero. And that actually means that the object MUST NOT be
> removed from the object lists until *after* the refcount has been
> decremented to zero. Which is actually why that "automatic cleanup"
> that you hate is actually an integral and important part of the
> process: removing the object *before* the refcount went to zero is
> broken, because that means that the "look up object" phase can now
> miss an object that still has a non-zero refcount.
> 
> > So my patch 1/2 prevents us from finding the old dying object (and thus
> > from crashing) but replaces this with the duplicate name problem.
> 
> So I absolutely agree with your patch 1/2. My argument against it is
> actually that I think the "unless_zero" thing needs to be more
> universal.
> 
> > My patch 2/2 removes that second problem by ensuring we remove the
> > object from sysfs synchronously in device_del when it no longer
> > contains any children, explicitely rather than implicitely by the
> > virtue of doing the "last" kobject_put.
> 
> No. See above. The reason I think your patch 2/2 is wrong is that is
> actually *breaks* the above model, exactly because of that thing that
> you hatre.
> 
> The explicit removal is actively wrong for the "I want to reuse the
> object" model, exactly because it happens before the refcount has gone
> to zero.
> 
> > > No.  That the zero kobject_get() will not result in a warning. It just
> > > does a kref_get(), no warnings anywhere.
> > 
> > It's there but it's in refcount:
> > 
> > void refcount_inc(refcount_t *r)
> > {
> >         WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
> > }
> > EXPORT_SYMBOL(refcount_inc);
> > 
> > In fact that's how I started digging into that problem in the first place :-)
> 
> Hey, you are again hitting this because of extra config options.
> 
> Because the refcount_inc() that I found looks like this:
> 
>   static inline void refcount_inc(refcount_t *r)
>   {
>           atomic_inc(&r->refs);
>   }
> 
> and has no warning.
> 
> I wonder how many people actually run with REFCOUNT_FULL that warns -
> because it's way too expensive. It's not set in the default Fedora
> config, for example, and that's despite how Fedora tends to set all
> the other debug options.
> 
> So no, it really doesn't warn normally.
> 
>                 Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-01 23:36                   ` Benjamin Herrenschmidt
@ 2018-07-02 10:23                     ` Benjamin Herrenschmidt
  2018-07-02 19:24                       ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-02 10:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Mon, 2018-07-02 at 09:36 +1000, Benjamin Herrenschmidt wrote:
> > No. See above. The reason I think your patch 2/2 is wrong is that is
> > actually *breaks* the above model, exactly because of that thing that
> > you hatre.
> > 
> > The explicit removal is actively wrong for the "I want to reuse the
> > object" model, exactly because it happens before the refcount has gone
> > to zero.
> 
> That's indeed where we disagree.

Sooo...

I had some quality time with my good friend Oban 14 last night,
thinking through this and during the resulting hangover today. I've
tried very hard to "get" your perspective, but after another dive
through the code, I'm afraid I still think patch 2/2 is the right thing
to do.

(Let's ignore patch 1/1 for now and assume we get that part right)

Let me try one last ditch attempt to convince you using maybe a
different perspective: this is how sysfs is intended to work and how
the device model already does everywhere else except the gluedirs.

Sooo... let's start pulling the string bit by bit...

I think you agree since (I'm not going to try to quote you here) you
did mention adding/removing need to generally be protected by higher
level locks in whatever subsystem owns the kobjects. I don't think
there's an argument here.

This is hinted by some comments (among others) in sysfs/dir.c about the
kobject owner being responsible to ensure that removal doesn't race
with any operation, which is hard to do when such removal doesn't
happen under subsystem internal locks.

Now let's look at how things are handled for a normal struct device
kobject.

The device core provides device_add and device_del functions. Those are
completely orthogonal to the object lifetime controlled by
device_initialize (creation) and device_put. And for good reasons (yes,
there are helpers, device_register/unregister that combine them, but
those are just that... helpers).

Why ? For the exact same reason mentioned above: Along with all the
driver binding/unbinding business etc... those functions ensure that
the kobject of the device is added/removed from sysfs under all the
necessary locks in the device model, so that they cannot race with each
other.

This guarantees among others that it is possible to do a new device_add
for the same device immediately after device_del, without clashing with
a duplicate sysfs name, *even* if something still holds a reference to
the kobject for any reason (and there are plenty of these, see the cdev
example below).

The actual lifetime of the struct device is handled *separately* by
device_get/put which are just wrappers on kobject_get/put.

This is the same with a cdev. cdev_add and cdev_del or their friends
cdev_device_add/del do the same thing. The chardev case is interesting
specifically because it is much more common for these to persist
(lifetime) beyond cdev_del. You just need userspace to have an open
file descriptor to the chardev. Drivers have to deal with it of course,
they need to handle things like read/write/ioctl being called on
already open files after remove/del/unbind, and hopefully do so.

*But* it's also possible to create a new identical cdev with the same
sysfs name immediately after cdev_device_del even if the old one still
has open instanced precisely because we separate the sysfs presence,
handled through the device model internal locking, from the object
lifetime.

Now, going back to our gluedirs, in essence those are just more objects
owned by the device core. More specifically, they need to be created
synchronously with the first child device of a class added under a
given parent, and removed synchronously with the last child device of
that class under that same parent.

Delaying that removal from sysfs simply means that we now are unable to
re-add the same device after removing it for some arbitrary amount of
time, because something might still hold a kobject reference to the
gluedir for any reason, it doesn't matter.

It also means that the deletion from sysfs is no longer synchronized
with addition since it no longer happens from within the context of
decice_del which is the "owner" of the glue dir, under whatever locks
it uses to ensure there is no races between multiple devices addition
removal etc...

In fact, if you look at the code and the way the gdp_mutex is used, I
think this is *exactly* the intention of the code. That the
kobject_put() done inside device_del() is the last one, done under that
mutex, thus removing the gluedir synchronously and we have no race.

However, that code just fails to take into account the delayed removal
by kobject debugging and the possibility of another reference existing.

Now it's possible there there can be no other references of a gluedir
today, to be frank I haven't audited the entire code base to figure out
if anything can possibly hold one that isn't a child of that gluedir,
so if indeed there can't be any and the only references to the kref
possible *ever* on a gluedir are its children, then it's possibe that
the assumption holds, except with kobject debugging.

However, even if that is the case, I call it fragile. I think we should
ensure, which is what patch 2/2 does, that regardless of "when" the
object is actually freed, and regardless of what other references might
exist, the gluedir behaves just like other struct device or cdev, and
gets removed from sysfs synchronously, along with the last child, with
all the necessary device core mutexes held, as to never race with an
addition and as to never trigger the duplicate name problem.

Now, I know why you dislike patch 2/2, because it adds another counter,
and there's something "itchy" about having a refcount and that
childcount.

We don't absolutely need that childcount. It could be that we have ways
to ask sysfs whether there are any children left in that directory,
which would work as well, I just haven't really figured out how, and a
child count felt like a trivial way to solve the problem. It's
protected by the gdp_mutex so it's safe.

There are precedents to such dual counters, such as mm_users vs
mm_count. There are precedents to separating file system visibility
from lifetime, every fs does it with unlink() when there are still open
files, so I think overall, what I'm trying to achieve is just pretty
much standard practice: unlink the gluedir exactly when it needs to be
unlinked so a new one can be created when needed, and deal with
lifetime of the actual structure the usual way which can be delayed.

Now if this doesn't convince you, I'm afraid nothing will ;-)

I'm curious to hear from Greg though.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-02 10:23                     ` Benjamin Herrenschmidt
@ 2018-07-02 19:24                       ` Linus Torvalds
  2018-07-03  0:57                         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2018-07-02 19:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Mon, Jul 2, 2018 at 3:23 AM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> Let me try one last ditch attempt to convince you using maybe a
> different perspective: this is how sysfs is intended to work and how
> the device model already does everywhere else except the gluedirs.

So don't get me wrong. I don't think that patch is _wrong_. It may
well be the best thing we can do now.

I just think some of the arguments for the patch are bogus.

I still think that the auto-cleanup is actually a good thing in
general. Not because it simplifies things (which it can do), but
because it fundamentally *allows* you to use less locking.

And that ends up being my real reason to not like your patch all that
much. It depends on

 (a) an extra reference count

 (b) on fairly heavy-handed locking (ie the whole "lock on release
too, not just on allocation").

and I think both of those are non-optimal.

So:

> The actual lifetime of the struct device is handled *separately* by
> device_get/put which are just wrappers on kobject_get/put.

I agree that that is the end result of your patch (and perhaps the
buggy intent of the original code).

I just don't necessarily agree it's a great thing in general. It
basically uses sysfs visibility as an argument for why lifetimes
should differ from the refcounted lifetimes.

And that may be a practical argument, but I don't think it's a very
good argument in general. I think it would arguably be much better to
be less serialized, and depend on refcounting more, and make less of a
deal about the sysfs visibility.

For example, if we *really* add back the exact same device immediately
after removing it, and the device was still in use somehow (ie the
refcount never went down to zero), maybe we really should not have
reused the same name? Or if we do re-use the same name, maybe we
should have re-used the device node itself?

See what I'm saying? I understand where your patch is coming from, but
I am suspicious of the fact that it seems to want to re-use a name
(but not the object) that is by definition still in use.

Maybe that's the right thing in this case. Considering that we have a
real oops and a real problem, and I don't have an alternative patch, I
guess the "patches talk, bullshit walks" rule applies.

                Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-02 19:24                       ` Linus Torvalds
@ 2018-07-03  0:57                         ` Benjamin Herrenschmidt
  2018-07-03  2:15                           ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-03  0:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Mon, 2018-07-02 at 12:24 -0700, Linus Torvalds wrote:
> On Mon, Jul 2, 2018 at 3:23 AM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > Let me try one last ditch attempt to convince you using maybe a
> > different perspective: this is how sysfs is intended to work and how
> > the device model already does everywhere else except the gluedirs.
> 
> So don't get me wrong. I don't think that patch is _wrong_. It may
> well be the best thing we can do now.
> 
> I just think some of the arguments for the patch are bogus.
> 
> I still think that the auto-cleanup is actually a good thing in
> general. Not because it simplifies things (which it can do), but
> because it fundamentally *allows* you to use less locking.
> 
> And that ends up being my real reason to not like your patch all that
> much. It depends on
> 
>  (a) an extra reference count

Right, that is not that great, and it might be possible to avoid it if
we have a way to check that the sysfs dir is empty. Maybe kobj->sd-
>subdirs ? This will find subdirs, not files, but for gluedirs that's
sufficient. I can play with that today see if it works.

BTW. Talking of extra reference counts, we already have two everywhere
afaik... kernfs has its own that's separate from the kobject one. I got
lost in that maze a couple of times already.

>  (b) on fairly heavy-handed locking (ie the whole "lock on release
> too, not just on allocation").

This is already there, I'm not adding it ;-) It's just ineffective
without my patch due to the possibility of the "auto-cleanup" to leak
outside the lock.

> and I think both of those are non-optimal.
> 
> So:
> 
> > The actual lifetime of the struct device is handled *separately* by
> > device_get/put which are just wrappers on kobject_get/put.
> 
> I agree that that is the end result of your patch (and perhaps the
> buggy intent of the original code).

This is the existing device_add/del without my patch. I was describing
how the core device stuff works for the normal device directories
today. My patch makes the gludirs do the same thing basically, which I
think was the intend of the code already but was broken.

> I just don't necessarily agree it's a great thing in general. It
> basically uses sysfs visibility as an argument for why lifetimes
> should differ from the refcounted lifetimes.

Yes, it does that so that we can remove and re-add without name
clashes.

> And that may be a practical argument, but I don't think it's a very
> good argument in general. I think it would arguably be much better to
> be less serialized, and depend on refcounting more, and make less of a
> deal about the sysfs visibility.
> 
> For example, if we *really* add back the exact same device immediately
> after removing it, and the device was still in use somehow (ie the
> refcount never went down to zero), maybe we really should not have
> reused the same name? Or if we do re-use the same name, maybe we
> should have re-used the device node itself?

Well, maybe but that's what we've been doing forever, and I think
drivers deal with it already. It applies to devices and to chardevs for
example.

> See what I'm saying? I understand where your patch is coming from, but
> I am suspicious of the fact that it seems to want to re-use a name
> (but not the object) that is by definition still in use.

Yes, I see what you mean.

I don't really have the same qualms as you do, it's how we've done
things forever (it's just that the gluedir code was broken), and it
generally works well.

It's not different in principle to what we do with files. You can
unlink an open file. It's still "in use" but the directory entry is
gone and one can create a new one.

It a slightly more roundabout way, we do that with mm_structs, the last
process can be gone but we keep them around due to being referenced (or
lazily attached to a kthread). Here we have 2 counters too.

> Maybe that's the right thing in this case. Considering that we have a
> real oops and a real problem, and I don't have an alternative patch, I
> guess the "patches talk, bullshit walks" rule applies.

Haha, as you prefer. Patch 1 fixes the oops, patch 2 fixes the name
collision, so in theory you could get away with just patch 1. But then
device would start failing to be re-added or re-bound.

(In my case it was a sub-device created by the driver of a parent
device, so it happens on bind/unbind of that parent driver).

I'll have a look see if I can find a way to check that the sysfs dir is
empty without adding that childcount, that would at least alleviate
some of your objection, and would probably make the patch
smaller/simpler.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-03  0:57                         ` Benjamin Herrenschmidt
@ 2018-07-03  2:15                           ` Linus Torvalds
  2018-07-03  2:26                             ` Linus Torvalds
  2018-07-03  2:37                             ` [PATCH " Benjamin Herrenschmidt
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2018-07-03  2:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Tejun Heo
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Mon, Jul 2, 2018 at 5:57 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> I'll have a look see if I can find a way to check that the sysfs dir is
> empty without adding that childcount, that would at least alleviate
> some of your objection, and would probably make the patch
> smaller/simpler.

That would definitely make me happier.

Right now we already remove the actual device node sysfs associations
synchronously with "kobject_del()" (even if it still then stays around
as a kobject), and that does the remove for the object itself - just
not the glue directory.

If we then just did a "if glue dir is empty, delete the glue dir too"
in cleanup_glue_dir(), at least ther patch would be prettier.

I *think* that should be easy. Maybe. Something almost, but not quite,
entirely unlike the below UNTESTED hack?

It's whitespace-damaged on purpose. It's probably broken shit. DO NOT
USE UNDER ANY CIRCUMSTANCES.  Think of it more as a "something like
this might work, but probably doesn't". Maybe it gives you an idea,
although that idea might be "Linus has finally lost it".

I suspect it's broken because kernfs has this notion of inactive
nodes, so just testing for the rbtree being empty is probably wrong. I
think maybe it needs to test for there not being any active nodes, but
then you'd need the kernfs_mutex etc, so that would make things much
worse.

I really haven't touched the kernfs code much, thus the "this may be
complete and utter garbage" warning.

Adding Tejun to the participants, he should know. Or Greg, who has
been silent, presumably because he's still getting over his crazy
travels.

                Linus

---

    --- a/drivers/base/core.c
   +++ b/drivers/base/core.c
   @@ -1585,6 +1585,12 @@ static inline struct kobject
*get_glue_dir(struct device *dev)
        return dev->kobj.parent;
    }

   +static inline bool has_children(struct kobject *dir)
   +{
   +    struct kernfs_node *kn = dir->sd;
   +    return kn && kn->dir.children.rb_node;
   +}
   +
    /*
     * make sure cleaning up dir as the last step, we need to make
     * sure .release handler of kobject is run with holding the
   @@ -1597,6 +1603,10 @@ static void cleanup_glue_dir(struct device
*dev, struct kobject *glue_dir)
                return;

        mutex_lock(&gdp_mutex);
   -    kobject_put(glue_dir);
   +    if (has_children(glue_dir))
   +            kobject_put(glue_dir);
   +    else
   +            kobject_del(glue_dir);
        mutex_unlock(&gdp_mutex);
    }

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-03  2:15                           ` Linus Torvalds
@ 2018-07-03  2:26                             ` Linus Torvalds
  2018-07-03  2:39                               ` Benjamin Herrenschmidt
  2018-07-03  2:37                             ` [PATCH " Benjamin Herrenschmidt
  1 sibling, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2018-07-03  2:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Tejun Heo
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Mon, Jul 2, 2018 at 7:15 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It's whitespace-damaged on purpose. It's probably broken shit. DO NOT
> USE UNDER ANY CIRCUMSTANCES.  Think of it more as a "something like
> this might work, but probably doesn't". Maybe it gives you an idea,
> although that idea might be "Linus has finally lost it".

Even if it were to work, it should probably just be done inside kernfs
and exposed as some kind of "kernfs_remove_if_empty()" function.

We happen to know that we hold the lock that creates all the entries
in the glue directory (and we don't allow users to move or create
stuff, afaik, even if we alloc chmod etc), so there should be no
races, but a generic kernfs helper function would probably have to get
proper locks and check it the right way.

                Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-03  2:15                           ` Linus Torvalds
  2018-07-03  2:26                             ` Linus Torvalds
@ 2018-07-03  2:37                             ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-03  2:37 UTC (permalink / raw)
  To: Linus Torvalds, Tejun Heo
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Mon, 2018-07-02 at 19:15 -0700, Linus Torvalds wrote:
> On Mon, Jul 2, 2018 at 5:57 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > I'll have a look see if I can find a way to check that the sysfs dir is
> > empty without adding that childcount, that would at least alleviate
> > some of your objection, and would probably make the patch
> > smaller/simpler.
> 
> That would definitely make me happier.
> 
> Right now we already remove the actual device node sysfs associations
> synchronously with "kobject_del()" (even if it still then stays around
> as a kobject), and that does the remove for the object itself - just
> not the glue directory.
> 
> If we then just did a "if glue dir is empty, delete the glue dir too"
> in cleanup_glue_dir(), at least ther patch would be prettier.
> 
> I *think* that should be easy. Maybe. Something almost, but not quite,
> entirely unlike the below UNTESTED hack?

Yes, something like that. I have a bunch of other things I need to
attend to today, so I might not come up with a patch before some time
tomorrow but I'll have a look.
 
> It's whitespace-damaged on purpose. It's probably broken shit. DO NOT
> USE UNDER ANY CIRCUMSTANCES.  Think of it more as a "something like
> this might work, but probably doesn't". Maybe it gives you an idea,
> although that idea might be "Linus has finally lost it".
> 
> I suspect it's broken because kernfs has this notion of inactive
> nodes, so just testing for the rbtree being empty is probably wrong. I
> think maybe it needs to test for there not being any active nodes, but
> then you'd need the kernfs_mutex etc, so that would make things much
> worse.

There's also a "subdirs" in kernfs_node which we could use. That's a
bit ugly because it only account directories but we happen to know that
the gluedirs only contain directories, so it would work fine in
practice for that case.

> I really haven't touched the kernfs code much, thus the "this may be
> complete and utter garbage" warning.
> 
> Adding Tejun to the participants, he should know. Or Greg, who has
> been silent, presumably because he's still getting over his crazy
> travels.

Cheers,
Ben.

>                 Linus
> 
> ---
> 
>     --- a/drivers/base/core.c
>    +++ b/drivers/base/core.c
>    @@ -1585,6 +1585,12 @@ static inline struct kobject
> *get_glue_dir(struct device *dev)
>         return dev->kobj.parent;
>     }
> 
>    +static inline bool has_children(struct kobject *dir)
>    +{
>    +    struct kernfs_node *kn = dir->sd;
>    +    return kn && kn->dir.children.rb_node;
>    +}
>    +
>     /*
>      * make sure cleaning up dir as the last step, we need to make
>      * sure .release handler of kobject is run with holding the
>    @@ -1597,6 +1603,10 @@ static void cleanup_glue_dir(struct device
> *dev, struct kobject *glue_dir)
>                 return;
> 
>         mutex_lock(&gdp_mutex);
>    -    kobject_put(glue_dir);
>    +    if (has_children(glue_dir))
>    +            kobject_put(glue_dir);
>    +    else
>    +            kobject_del(glue_dir);
>         mutex_unlock(&gdp_mutex);
>     }

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-03  2:26                             ` Linus Torvalds
@ 2018-07-03  2:39                               ` Benjamin Herrenschmidt
  2018-07-03  5:22                                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-03  2:39 UTC (permalink / raw)
  To: Linus Torvalds, Tejun Heo
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Mon, 2018-07-02 at 19:26 -0700, Linus Torvalds wrote:
> On Mon, Jul 2, 2018 at 7:15 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > 
> > It's whitespace-damaged on purpose. It's probably broken shit. DO NOT
> > USE UNDER ANY CIRCUMSTANCES.  Think of it more as a "something like
> > this might work, but probably doesn't". Maybe it gives you an idea,
> > although that idea might be "Linus has finally lost it".
> 
> Even if it were to work, it should probably just be done inside kernfs
> and exposed as some kind of "kernfs_remove_if_empty()" function.

That would be harder, we'd have to expose it via a
kobject_del_if_empty() then.

Since we control completely when things are added/removed from the
gluedir and have a big fat mutex for it, I don't think locking is much
of an issue. At least in that specific case.

> We happen to know that we hold the lock that creates all the entries
> in the glue directory (and we don't allow users to move or create
> stuff, afaik, even if we alloc chmod etc), so there should be no
> races, but a generic kernfs helper function would probably have to get
> proper locks and check it the right way.

Yes that or document very clearly under what circumstances that
function can be used.

>                 Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-03  2:39                               ` Benjamin Herrenschmidt
@ 2018-07-03  5:22                                 ` Benjamin Herrenschmidt
  2018-07-03 15:46                                   ` Tejun Heo
       [not found]                                   ` <CA+55aFzKmzC-2_6+RsRRu9KfK_r=UGgLN2Q0hSNBV=ScGR7=8g@mail.gmail.com>
  0 siblings, 2 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-03  5:22 UTC (permalink / raw)
  To: Linus Torvalds, Tejun Heo
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Eric W. Biederman,
	Joel Stanley

On Tue, 2018-07-03 at 12:39 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-07-02 at 19:26 -0700, Linus Torvalds wrote:
> > On Mon, Jul 2, 2018 at 7:15 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > 
> > > It's whitespace-damaged on purpose. It's probably broken shit. DO NOT
> > > USE UNDER ANY CIRCUMSTANCES.  Think of it more as a "something like
> > > this might work, but probably doesn't". Maybe it gives you an idea,
> > > although that idea might be "Linus has finally lost it".
> > 
> > Even if it were to work, it should probably just be done inside kernfs
> > and exposed as some kind of "kernfs_remove_if_empty()" function.
> 
> That would be harder, we'd have to expose it via a
> kobject_del_if_empty() then.
> 
> Since we control completely when things are added/removed from the
> gluedir and have a big fat mutex for it, I don't think locking is much
> of an issue. At least in that specific case.

Ok, kernfs was a tad tougher to crack than I hoped but here's a
prototype lightly tested.

Tejun, Greg, does it look reasonable to you ?

Cheers,
Ben.

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b610816eb887..9166f68276c6 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1570,6 +1570,8 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
 		return;
 
 	mutex_lock(&gdp_mutex);
+	if (!kobject_has_children(glue_dir))
+		kobject_del(glue_dir);
 	kobject_put(glue_dir);
 	mutex_unlock(&gdp_mutex);
 }
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 89d1dc19340b..e4bec09bc602 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1251,6 +1251,29 @@ void kernfs_activate(struct kernfs_node *kn)
 	mutex_unlock(&kernfs_mutex);
 }
 
+bool kernfs_has_children(struct kernfs_node *kn)
+{
+	bool has_children = false;
+	struct kernfs_node *pos;
+
+	/* Lockless shortcut */
+	if (RB_EMPTY_NODE(&kn->rb))
+		return false;
+
+	/* Now check for active children */
+	mutex_lock(&kernfs_mutex);
+	pos = NULL;
+	while ((pos = kernfs_next_descendant_post(pos, kn)) && pos != kn) {
+		if (kernfs_active(pos)) {
+			has_children = true;
+			break;
+		}
+	}
+	mutex_unlock(&kernfs_mutex);
+
+	return has_children;
+}
+
 static void __kernfs_remove(struct kernfs_node *kn)
 {
 	struct kernfs_node *pos;
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index ab25c8b6d9e3..776350ac1cbc 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -300,6 +300,12 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 	return kn->flags & KERNFS_NS;
 }
 
+/**
+ * kernfs_has_children - Returns whether a kernfs node has children.
+ * @kn: the node to test
+ */
+bool kernfs_has_children(struct kernfs_node *kn);
+
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
 			  char *buf, size_t buflen);
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 7f6f93c3df9c..1e3294ab95f0 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -116,6 +116,17 @@ extern void kobject_put(struct kobject *kobj);
 extern const void *kobject_namespace(struct kobject *kobj);
 extern char *kobject_get_path(struct kobject *kobj, gfp_t flag);
 
+/**
+ * kobject_has_children - Returns whether a kobject has children.
+ * @kobj: the object to test
+ */
+static inline bool kobject_has_children(struct kobject *kobj)
+{
+	WARN_ON_ONCE(kref_read(&kobj->kref) == 0);
+
+	return kobj->sd && kernfs_has_children(kobj->sd);
+}
+
 struct kobj_type {
 	void (*release)(struct kobject *kobj);
 	const struct sysfs_ops *sysfs_ops;


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-03  5:22                                 ` Benjamin Herrenschmidt
@ 2018-07-03 15:46                                   ` Tejun Heo
  2018-07-04  1:10                                     ` Benjamin Herrenschmidt
       [not found]                                   ` <CA+55aFzKmzC-2_6+RsRRu9KfK_r=UGgLN2Q0hSNBV=ScGR7=8g@mail.gmail.com>
  1 sibling, 1 reply; 42+ messages in thread
From: Tejun Heo @ 2018-07-03 15:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Linux Kernel Mailing List, Greg Kroah-Hartman,
	Eric W. Biederman, Joel Stanley

Hello,

On Tue, Jul 03, 2018 at 03:22:47PM +1000, Benjamin Herrenschmidt wrote:
> +bool kernfs_has_children(struct kernfs_node *kn)
> +{
> +	bool has_children = false;
> +	struct kernfs_node *pos;
> +
> +	/* Lockless shortcut */
> +	if (RB_EMPTY_NODE(&kn->rb))
> +		return false;

Hmm... shouldn't it be testing !rb_first(kn->dir.children)?  The above
would test whether @kn itself is unlinked.

> +
> +	/* Now check for active children */
> +	mutex_lock(&kernfs_mutex);
> +	pos = NULL;
> +	while ((pos = kernfs_next_descendant_post(pos, kn)) && pos != kn) {
> +		if (kernfs_active(pos)) {
> +			has_children = true;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&kernfs_mutex);
> +
> +	return has_children;

The active ref is there to synchronize removal against in-flight reads
so that kernfs_remove() can wait for them to drain.  On return from
kernfs_remove(), the node is guaranteed to be off the rbtree and the
above test isn't necessary.  !rb_first() test should be enough
(provided that there's external synchronization, of course).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-03 15:46                                   ` Tejun Heo
@ 2018-07-04  1:10                                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-04  1:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Linux Kernel Mailing List, Greg Kroah-Hartman,
	Eric W. Biederman, Joel Stanley

On Tue, 2018-07-03 at 08:46 -0700, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 03, 2018 at 03:22:47PM +1000, Benjamin Herrenschmidt wrote:
> > +bool kernfs_has_children(struct kernfs_node *kn)
> > +{
> > +	bool has_children = false;
> > +	struct kernfs_node *pos;
> > +
> > +	/* Lockless shortcut */
> > +	if (RB_EMPTY_NODE(&kn->rb))
> > +		return false;
> 
> Hmm... shouldn't it be testing !rb_first(kn->dir.children)?  The above
> would test whether @kn itself is unlinked.

Ah possibly, the rb tree usage got me confused a few times in there.

> > +
> > +	/* Now check for active children */
> > +	mutex_lock(&kernfs_mutex);
> > +	pos = NULL;
> > +	while ((pos = kernfs_next_descendant_post(pos, kn)) && pos != kn) {
> > +		if (kernfs_active(pos)) {
> > +			has_children = true;
> > +			break;
> > +		}
> > +	}
> > +	mutex_unlock(&kernfs_mutex);
> > +
> > +	return has_children;
> 
> The active ref is there to synchronize removal against in-flight reads
> so that kernfs_remove() can wait for them to drain.  On return from
> kernfs_remove(), the node is guaranteed to be off the rbtree and the
> above test isn't necessary.  !rb_first() test should be enough
> (provided that there's external synchronization, of course).

Ok. Yes there's external synchronization. So a simple !rb_first test
doesn't need the mutex I suppose ?

Cheers,
Ben.

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-06-30 19:45     ` Linus Torvalds
@ 2018-07-07 16:48       ` Greg Kroah-Hartman
  2018-07-09 23:44         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 42+ messages in thread
From: Greg Kroah-Hartman @ 2018-07-07 16:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Benjamin Herrenschmidt, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Sat, Jun 30, 2018 at 12:45:21PM -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 7:21 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> >
> > Under some circumstances (such as when using kobject debugging)
> > a gluedir whose kref is 0 might remain in the class kset for
> > a long time. The reason is that we don't actively remove glue
> > dirs when they become empty, but instead rely on the implicit
> > removal done by kobject_release(), which can happen some amount
> > of time after the last kobject_put().
> >
> > Using such a dead object is a bad idea and will lead to warnings
> > and crashes.
> 
> So with the other patch in mind, here's my comments on this one. Pick
> one of two scenarios:
> 
>  (a) it's obviously correct.
> 
>      We obviously can *not* take an object with a zero refcount,
> because it is already been scheduled for kobject_cleanup(), and
> incrementing the refcount is simply fundamentally wrong, because
> incrementing the refcount won't unschedule the deletion of the object.
> 
>  (b) the patch is wrong, and our "kobject_get()" should cancel the
> kobject_cleanup() instead.
> 
> There are problems with both of the above cases.
> 
> The "patch is obviously correct" case leads to another issue: why
> would kobject_get() _ever_ succeed on an object wioth a zero refcount?
> IOW, why do we have kobject_get() vs kobject_get_unless_zero() in the
> first place? It is *never* ok to get an kobject with a zero refcount
> because of the above "it's already scheduled for deletion" issue.
> 
> The (b) case sounds nice, and would actually fix the problem that
> patch 2/2 was tryihng to address, and would make
> CONFIG_DEBUG_KOBJECT_RELEASE work.
> 
> HOWEVER. It's completely untenable in reality - it's a nightmare from
> a locking standpoint, because kref_put() literally depends not on
> locking, but on the exclusive "went to zero".
> 
> So I think (b) is practically not acceptable. Which means that (a) is
> the right reaction, and "kobject_get()" on an object with a zero
> refcount is _always_ wrong.
> 
> But that says that "yes, the patch is obviously correct", but it also
> says "the patch should be pointless, because kobject_get() should just
> _always_ have the semantics of "kobject_get_unless_zero()", and the
> latter shouldn't even exist.
> 
> Greg? When would it possibly be valid to do "kobject_get()" on a zero
> refcount object? I don't see it. But this is all very much your code.

No, kobject_get() should never happen on a 0 refcount object.  That
being said, the code does allow it, so if things are messed up, it will
happen.  I think that change happened when the switch to refcount_t
occured, before then we would WARN_ON() if that ever happened.  I should
go fix that up, and restore that old behavior, so that syzbot starts
complaining loudly when stuff like that hits.

So I hate using kobject_get_unless_zero(), and resisted ever adding it
to the tree as it shows a bad locking/tree situation as you point out
here.  But for some reason, the block developers seemed to insist they
needed it, and so it is in the tree for them.  I don't want it to spread
if at all possible, which makes me want to reject this patch as this
should be "a case that can never be hit".

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-06-29  2:21   ` [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir Benjamin Herrenschmidt
  2018-06-30 19:45     ` Linus Torvalds
@ 2018-07-07 16:51     ` Greg Kroah-Hartman
  2018-07-09 23:50       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 42+ messages in thread
From: Greg Kroah-Hartman @ 2018-07-07 16:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Eric W. Biederman, Joel Stanley, linux-kernel

On Fri, Jun 29, 2018 at 12:21:51PM +1000, Benjamin Herrenschmidt wrote:
> Under some circumstances (such as when using kobject debugging)
> a gluedir whose kref is 0 might remain in the class kset for
> a long time. The reason is that we don't actively remove glue
> dirs when they become empty, but instead rely on the implicit
> removal done by kobject_release(), which can happen some amount
> of time after the last kobject_put().
> 
> Using such a dead object is a bad idea and will lead to warnings
> and crashes.
> 
> Unfortunately that can happen in get_device_parent() if the
> last child of a glue dir was removed and a new one added
> before the glue dir gets fully released().
> 
> This prevents this by making get_device_parent() only "find"
> a glue dir whose refcount is non-0.
> 
> While this fixes the crash, it doesn't fully fix the problem,
> instead the race will now result in an error attempting to
> use a duplicate file name in sysfs. A fix for that will come
> separately.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> 
> (Adding lkml, I just realized I completely forgot to CC it in
> the first place on this whole conversation, blame the 1am debugging
> session)
> 
>  drivers/base/core.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b610816eb887..e9eff2099896 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1517,11 +1517,13 @@ static struct kobject *get_device_parent(struct device *dev,
>  
>  		/* find our class-directory at the parent and reference it */
>  		spin_lock(&dev->class->p->glue_dirs.list_lock);
> -		list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
> +		list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) {
>  			if (k->parent == parent_kobj) {
> -				kobj = kobject_get(k);
> -				break;
> +				kobj = kobject_get_unless_zero(k);
> +				if (kobj)
> +					break;

A parent directory _should_ not ever be able to be removed before the
object being removed was, as we should have had a reference to it,
right?  So I don't see how this can get hit "in real life".

Yes, enabling kobject debugging does keep objects around for a long time
in order to try to help figure out where people are messing up their
usage of them.  What subsystem is doing this in a way that causes
problems here?  Shouldn't we fix that up instead?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-07-07 16:48       ` Greg Kroah-Hartman
@ 2018-07-09 23:44         ` Benjamin Herrenschmidt
  2018-07-10 14:55           ` Greg Kroah-Hartman
  2018-07-21  7:53           ` Greg Kroah-Hartman
  0 siblings, 2 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-09 23:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Linus Torvalds
  Cc: Eric W. Biederman, Joel Stanley, Linux Kernel Mailing List

On Sat, 2018-07-07 at 18:48 +0200, Greg Kroah-Hartman wrote:
> No, kobject_get() should never happen on a 0 refcount object.  That
> being said, the code does allow it, so if things are messed up, it will
> happen.  I think that change happened when the switch to refcount_t
> occured, before then we would WARN_ON() if that ever happened.  I should
> go fix that up, and restore that old behavior, so that syzbot starts
> complaining loudly when stuff like that hits.
> 
> So I hate using kobject_get_unless_zero(), and resisted ever adding it
> to the tree as it shows a bad locking/tree situation as you point out
> here.  But for some reason, the block developers seemed to insist they
> needed it, and so it is in the tree for them.  I don't want it to spread
> if at all possible, which makes me want to reject this patch as this
> should be "a case that can never be hit".

Except it can in that situation... at least unless you get my patch 2/2
(or the newer one I'm about to send that avoids adding a child counter
and uses the one in kernfs instead).

Ben.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-07-07 16:51     ` Greg Kroah-Hartman
@ 2018-07-09 23:50       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-09 23:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Eric W. Biederman, Joel Stanley, linux-kernel

On Sat, 2018-07-07 at 18:51 +0200, Greg Kroah-Hartman wrote:
> 
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index b610816eb887..e9eff2099896 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1517,11 +1517,13 @@ static struct kobject *get_device_parent(struct device *dev,
> >  
> >  		/* find our class-directory at the parent and reference it */
> >  		spin_lock(&dev->class->p->glue_dirs.list_lock);
> > -		list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
> > +		list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) {
> >  			if (k->parent == parent_kobj) {
> > -				kobj = kobject_get(k);
> > -				break;
> > +				kobj = kobject_get_unless_zero(k);
> > +				if (kobj)
> > +					break;
> 
> A parent directory _should_ not ever be able to be removed before the
> object being removed was, as we should have had a reference to it,
> right?  So I don't see how this can get hit "in real life".
> 
> Yes, enabling kobject debugging does keep objects around for a long time
> in order to try to help figure out where people are messing up their
> usage of them.  What subsystem is doing this in a way that causes
> problems here?  Shouldn't we fix that up instead?

The broken subsystem is the driver core itself :-) See the descriptions
here and in patch 2/2.

Note: This is a more generic problem with ksets vs relying on the magic
sysfs cleanup happening in kobject_release().

Any kobject that is a member of a kset and doesn't get explicitely
removed from sysfs with kobject_del() prior to dropping the last
reference with kobject_put() (and thus relies instead on the automatic
cleanup done by kobject_release()) will be exposed to the race:

The last kobject_put() will drop the refcount to 0 while the object is
still in the kset. Only some amount of time later (which can be very
short or very long if you enable kobject debugging), will
kobject_release() take it out of sysfs and out of the kset.

Thus the object will be visible, with a 0 refcount, to anything that
"walks" the kset during that period.

This is exactly what happens with the gluedirs in the device core, but
it could happen elsewhere for all I know.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/2] drivers: core: Remove glue dirs from sysfs earlier
       [not found]                                             ` <CA+55aFx-UX8nxewRFFWdBgYfPqfipnxaqJuJCUni9h4JvhoPFw@mail.gmail.com>
@ 2018-07-10  0:29                                               ` Benjamin Herrenschmidt
  2018-07-10  0:33                                                 ` Linus Torvalds
  2018-07-10 14:55                                                 ` Greg Kroah-Hartman
  0 siblings, 2 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-10  0:29 UTC (permalink / raw)
  To: Linus Torvalds, Greg Kroah-Hartman; +Cc: Tejun Heo, linux-kernel

For devices with a class, we create a "glue" directory between
the parent device and the new device with the class name.

This directory is never "explicitely" removed when empty however,
this is left to the implicit sysfs removal done by kobject_release()
when the object loses its last reference via kobject_put().

This is problematic because as long as it's not been removed from
sysfs, it is still present in the class kset and in sysfs directory
structure.

The presence in the class kset exposes a use after free bug fixed
by the previous patch, but the presence in sysfs means that until
the kobject is released, which can take a while (especially with
kobject debugging), any attempt at re-creating such as binding a
new device for that class/parent pair, will result in a sysfs
duplicate file name error.

This fixes it by instead doing an explicit kobject_del() when
the glue dir is empty, by keeping track of the number of
child devices of the gluedir.

This is made easy by the fact that all glue dir operations are
done with a global mutex, and there's already a function
(cleanup_glue_dir) called in all the right places taking that
mutex that can be enhanced for this. It appears that this was
in fact the intent of the function, but the implementation was
wrong.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 drivers/base/core.c     |  2 ++
 include/linux/kobject.h | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index e9eff2099896..93c0f8d1a447 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1572,6 +1572,8 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
 		return;
 
 	mutex_lock(&gdp_mutex);
+	if (!kobject_has_children(glue_dir))
+		kobject_del(glue_dir);
 	kobject_put(glue_dir);
 	mutex_unlock(&gdp_mutex);
 }
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 7f6f93c3df9c..270b40515e79 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -116,6 +116,23 @@ extern void kobject_put(struct kobject *kobj);
 extern const void *kobject_namespace(struct kobject *kobj);
 extern char *kobject_get_path(struct kobject *kobj, gfp_t flag);
 
+/**
+ * kobject_has_children - Returns whether a kobject has children.
+ * @kobj: the object to test
+ *
+ * This will return whether a kobject has other kobjects as children.
+ *
+ * It does NOT account for the presence of attribute files, only sub
+ * directories. It also assumes there is no concurrent addition or
+ * removal of such children, and thus relies on external locking.
+ */
+static inline bool kobject_has_children(struct kobject *kobj)
+{
+	WARN_ON_ONCE(kref_read(&kobj->kref) == 0);
+
+	return kobj->sd && kobj->sd->dir.subdirs;
+}
+
 struct kobj_type {
 	void (*release)(struct kobject *kobj);
 	const struct sysfs_ops *sysfs_ops;


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-10  0:29                                               ` [PATCH v2 " Benjamin Herrenschmidt
@ 2018-07-10  0:33                                                 ` Linus Torvalds
  2018-07-10  1:37                                                   ` Benjamin Herrenschmidt
  2018-07-10 14:55                                                 ` Greg Kroah-Hartman
  1 sibling, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2018-07-10  0:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Greg Kroah-Hartman, Tejun Heo, Linux Kernel Mailing List

On Mon, Jul 9, 2018 at 5:29 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> For devices with a class, we create a "glue" directory between
> the parent device and the new device with the class name.
>
> This directory is never "explicitely" removed when empty however,

explicitly

Is the mis-spelling why you had the quotes? I do find that spelling in
the kernel, but not in drivers/base/.

> This fixes it by instead doing an explicit kobject_del() when
> the glue dir is empty, by keeping track of the number of
> child devices of the gluedir.

Ack. This looks good to me.

I didn't see your 1/2 - you should probably re-send that one too so
that Greg doesn't have to fish for it. But I'll Ack that one too in
this same email regardless.

                Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-10  0:33                                                 ` Linus Torvalds
@ 2018-07-10  1:37                                                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-10  1:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Greg Kroah-Hartman, Tejun Heo, Linux Kernel Mailing List

On Mon, 2018-07-09 at 17:33 -0700, Linus Torvalds wrote:
> On Mon, Jul 9, 2018 at 5:29 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > For devices with a class, we create a "glue" directory between
> > the parent device and the new device with the class name.
> > 
> > This directory is never "explicitely" removed when empty however,
> 
> explicitly
> 
> Is the mis-spelling why you had the quotes? I do find that spelling in
> the kernel, but not in drivers/base/.

No idea :-) Just my poor english, I'm not sure why I put quotes, I
think I meant *explictly* as enphasis, not sure.

> > This fixes it by instead doing an explicit kobject_del() when
> > the glue dir is empty, by keeping track of the number of
> > child devices of the gluedir.
> 
> Ack. This looks good to me.
> 
> I didn't see your 1/2 - you should probably re-send that one too so
> that Greg doesn't have to fish for it. But I'll Ack that one too in
> this same email regardless.

OK, I didn't re-send it. Greg just nak'ed it though :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-10  0:29                                               ` [PATCH v2 " Benjamin Herrenschmidt
  2018-07-10  0:33                                                 ` Linus Torvalds
@ 2018-07-10 14:55                                                 ` Greg Kroah-Hartman
  2018-07-10 23:31                                                   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 42+ messages in thread
From: Greg Kroah-Hartman @ 2018-07-10 14:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, Tejun Heo, linux-kernel

On Tue, Jul 10, 2018 at 10:29:10AM +1000, Benjamin Herrenschmidt wrote:
> For devices with a class, we create a "glue" directory between
> the parent device and the new device with the class name.
> 
> This directory is never "explicitely" removed when empty however,
> this is left to the implicit sysfs removal done by kobject_release()
> when the object loses its last reference via kobject_put().
> 
> This is problematic because as long as it's not been removed from
> sysfs, it is still present in the class kset and in sysfs directory
> structure.
> 
> The presence in the class kset exposes a use after free bug fixed
> by the previous patch, but the presence in sysfs means that until
> the kobject is released, which can take a while (especially with
> kobject debugging), any attempt at re-creating such as binding a
> new device for that class/parent pair, will result in a sysfs
> duplicate file name error.
> 
> This fixes it by instead doing an explicit kobject_del() when
> the glue dir is empty, by keeping track of the number of
> child devices of the gluedir.
> 
> This is made easy by the fact that all glue dir operations are
> done with a global mutex, and there's already a function
> (cleanup_glue_dir) called in all the right places taking that
> mutex that can be enhanced for this. It appears that this was
> in fact the intent of the function, but the implementation was
> wrong.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  drivers/base/core.c     |  2 ++
>  include/linux/kobject.h | 17 +++++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index e9eff2099896..93c0f8d1a447 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1572,6 +1572,8 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
>  		return;
>  
>  	mutex_lock(&gdp_mutex);
> +	if (!kobject_has_children(glue_dir))
> +		kobject_del(glue_dir);
>  	kobject_put(glue_dir);
>  	mutex_unlock(&gdp_mutex);
>  }
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index 7f6f93c3df9c..270b40515e79 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -116,6 +116,23 @@ extern void kobject_put(struct kobject *kobj);
>  extern const void *kobject_namespace(struct kobject *kobj);
>  extern char *kobject_get_path(struct kobject *kobj, gfp_t flag);
>  
> +/**
> + * kobject_has_children - Returns whether a kobject has children.
> + * @kobj: the object to test
> + *
> + * This will return whether a kobject has other kobjects as children.
> + *
> + * It does NOT account for the presence of attribute files, only sub
> + * directories. It also assumes there is no concurrent addition or
> + * removal of such children, and thus relies on external locking.
> + */
> +static inline bool kobject_has_children(struct kobject *kobj)
> +{
> +	WARN_ON_ONCE(kref_read(&kobj->kref) == 0);

Why warn on?  Who is going to hit this and how are you going to fix up
the syzbot reports?  :)

Anyway, this looks good, I can just take this and not the 1/2 patch now,
right?  I really didn't like that patch.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-07-09 23:44         ` Benjamin Herrenschmidt
@ 2018-07-10 14:55           ` Greg Kroah-Hartman
  2018-07-10 23:32             ` Benjamin Herrenschmidt
  2018-07-21  7:53           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 42+ messages in thread
From: Greg Kroah-Hartman @ 2018-07-10 14:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Tue, Jul 10, 2018 at 09:44:33AM +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2018-07-07 at 18:48 +0200, Greg Kroah-Hartman wrote:
> > No, kobject_get() should never happen on a 0 refcount object.  That
> > being said, the code does allow it, so if things are messed up, it will
> > happen.  I think that change happened when the switch to refcount_t
> > occured, before then we would WARN_ON() if that ever happened.  I should
> > go fix that up, and restore that old behavior, so that syzbot starts
> > complaining loudly when stuff like that hits.
> > 
> > So I hate using kobject_get_unless_zero(), and resisted ever adding it
> > to the tree as it shows a bad locking/tree situation as you point out
> > here.  But for some reason, the block developers seemed to insist they
> > needed it, and so it is in the tree for them.  I don't want it to spread
> > if at all possible, which makes me want to reject this patch as this
> > should be "a case that can never be hit".
> 
> Except it can in that situation... at least unless you get my patch 2/2
> (or the newer one I'm about to send that avoids adding a child counter
> and uses the one in kernfs instead).

I like that fix, which should make this patch obsolete, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/2] drivers: core: Remove glue dirs from sysfs earlier
  2018-07-10 14:55                                                 ` Greg Kroah-Hartman
@ 2018-07-10 23:31                                                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-10 23:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Linus Torvalds, Tejun Heo, linux-kernel

On Tue, 2018-07-10 at 16:55 +0200, Greg Kroah-Hartman wrote:
> 
> > +/**
> > + * kobject_has_children - Returns whether a kobject has children.
> > + * @kobj: the object to test
> > + *
> > + * This will return whether a kobject has other kobjects as children.
> > + *
> > + * It does NOT account for the presence of attribute files, only sub
> > + * directories. It also assumes there is no concurrent addition or
> > + * removal of such children, and thus relies on external locking.
> > + */
> > +static inline bool kobject_has_children(struct kobject *kobj)
> > +{
> > +	WARN_ON_ONCE(kref_read(&kobj->kref) == 0);
> 
> Why warn on?  Who is going to hit this and how are you going to fix up
> the syzbot reports?  :)

Well, that's it, the hope is nobody ever hits it ... but if one does it
would be useful to get a backtrace to figure it out. You can shoot the
reports my way I suppose :-)

> Anyway, this looks good, I can just take this and not the 1/2 patch now,
> right?  I really didn't like that patch.

Yes, it will fix the practical problem. As for patch 1, it's rather
funny, you and Linus seem to have a completely opposite idea of how
this stuff should work :-)

Cheers,
Ben.

> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-07-10 14:55           ` Greg Kroah-Hartman
@ 2018-07-10 23:32             ` Benjamin Herrenschmidt
  2018-07-10 23:55               ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-10 23:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Tue, 2018-07-10 at 16:55 +0200, Greg Kroah-Hartman wrote:
> On Tue, Jul 10, 2018 at 09:44:33AM +1000, Benjamin Herrenschmidt wrote:
> > On Sat, 2018-07-07 at 18:48 +0200, Greg Kroah-Hartman wrote:
> > > No, kobject_get() should never happen on a 0 refcount object.  That
> > > being said, the code does allow it, so if things are messed up, it will
> > > happen.  I think that change happened when the switch to refcount_t
> > > occured, before then we would WARN_ON() if that ever happened.  I should
> > > go fix that up, and restore that old behavior, so that syzbot starts
> > > complaining loudly when stuff like that hits.
> > > 
> > > So I hate using kobject_get_unless_zero(), and resisted ever adding it
> > > to the tree as it shows a bad locking/tree situation as you point out
> > > here.  But for some reason, the block developers seemed to insist they
> > > needed it, and so it is in the tree for them.  I don't want it to spread
> > > if at all possible, which makes me want to reject this patch as this
> > > should be "a case that can never be hit".
> > 
> > Except it can in that situation... at least unless you get my patch 2/2
> > (or the newer one I'm about to send that avoids adding a child counter
> > and uses the one in kernfs instead).
> 
> I like that fix, which should make this patch obsolete, right?

Yes, for that specific issue, but Linus seemed to think patch 1 was the
"right thing to do" regardless...

I suggest you read the backlog of thread if you are interested in the
ins and outs of his position, we had a rather extensive discussion on
this stuff.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-07-10 23:32             ` Benjamin Herrenschmidt
@ 2018-07-10 23:55               ` Linus Torvalds
  2018-07-11  0:07                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2018-07-10 23:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Tue, Jul 10, 2018 at 4:32 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> > I like that fix, which should make this patch obsolete, right?
>
> Yes, for that specific issue, but Linus seemed to think patch 1 was the
> "right thing to do" regardless...

I would definitely prefer either a kobject_get_unless_zero() or a
warning if it is ever zero.

The fact that right now it silently can do known bad things, and then
causes odd corruption _later_, is not good.

            Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-07-10 23:55               ` Linus Torvalds
@ 2018-07-11  0:07                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-11  0:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Tue, 2018-07-10 at 16:55 -0700, Linus Torvalds wrote:
> On Tue, Jul 10, 2018 at 4:32 PM Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > 
> > > I like that fix, which should make this patch obsolete, right?
> > 
> > Yes, for that specific issue, but Linus seemed to think patch 1 was the
> > "right thing to do" regardless...
> 
> I would definitely prefer either a kobject_get_unless_zero() or a
> warning if it is ever zero.
> 
> The fact that right now it silently can do known bad things, and then
> causes odd corruption _later_, is not good.

Maybe we should make the existing warning in refcount unconditional ? 

This is whe warning that allowed me to pull that string with the
gluedirs and fix what ended up very weird crashes caused by the
resulting use-after-free. So it's definitely valuable.

As for whether we should generalize kobject_get_unless_zero() vs avoid
using it alltogether, this is a debate for you to have with Greg ;-)

He seems to think nothing should ever try to get a zeroed object (which
I tend to agree with, it's close to my opinion that visibility and
lifetime should be disconnected).

That being said, there are existing constructs such as the "late
removal from sysfs from kobject_release" that mean that zero-reference
objects *can* still be visible, via either sysfs or ksets, as far as I
can tell.

So it's a bit of a mess... but if we chose to go Greg's way we should
probably put a WARN'ing in kobject_release() for late-removal from
sysfs since this exposes 0-ref objects to outside visibility.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-07-09 23:44         ` Benjamin Herrenschmidt
  2018-07-10 14:55           ` Greg Kroah-Hartman
@ 2018-07-21  7:53           ` Greg Kroah-Hartman
  2018-07-23  0:35             ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 42+ messages in thread
From: Greg Kroah-Hartman @ 2018-07-21  7:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Tue, Jul 10, 2018 at 09:44:33AM +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2018-07-07 at 18:48 +0200, Greg Kroah-Hartman wrote:
> > No, kobject_get() should never happen on a 0 refcount object.  That
> > being said, the code does allow it, so if things are messed up, it will
> > happen.  I think that change happened when the switch to refcount_t
> > occured, before then we would WARN_ON() if that ever happened.  I should
> > go fix that up, and restore that old behavior, so that syzbot starts
> > complaining loudly when stuff like that hits.
> > 
> > So I hate using kobject_get_unless_zero(), and resisted ever adding it
> > to the tree as it shows a bad locking/tree situation as you point out
> > here.  But for some reason, the block developers seemed to insist they
> > needed it, and so it is in the tree for them.  I don't want it to spread
> > if at all possible, which makes me want to reject this patch as this
> > should be "a case that can never be hit".
> 
> Except it can in that situation... at least unless you get my patch 2/2
> (or the newer one I'm about to send that avoids adding a child counter
> and uses the one in kernfs instead).

To follow up on this.  I've applied the 2/2 patch for this series, so
this 1/2 "should" not be needed.  Ben, if you still see this trigger
with that, I guess I can take this, but it still feels wrong to me :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir
  2018-07-21  7:53           ` Greg Kroah-Hartman
@ 2018-07-23  0:35             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 42+ messages in thread
From: Benjamin Herrenschmidt @ 2018-07-23  0:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Eric W. Biederman, Joel Stanley,
	Linux Kernel Mailing List

On Sat, 2018-07-21 at 09:53 +0200, Greg Kroah-Hartman wrote:
> > > So I hate using kobject_get_unless_zero(), and resisted ever adding it
> > > to the tree as it shows a bad locking/tree situation as you point out
> > > here.  But for some reason, the block developers seemed to insist they
> > > needed it, and so it is in the tree for them.  I don't want it to spread
> > > if at all possible, which makes me want to reject this patch as this
> > > should be "a case that can never be hit".
> > 
> > Except it can in that situation... at least unless you get my patch 2/2
> > (or the newer one I'm about to send that avoids adding a child counter
> > and uses the one in kernfs instead).
> 
> To follow up on this.  I've applied the 2/2 patch for this series, so
> this 1/2 "should" not be needed.  Ben, if you still see this trigger
> with that, I guess I can take this, but it still feels wrong to me :)

I've tested my repro-case with only patch 2 and it doesn't trigger
anymore, as expected.

That said, you and Linus might want to sync your clocks on how sysfs
"should" work :-)

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-07-23  0:38 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <c40fe912fe008b1b531a3867e8784ed79d68023e.camel@kernel.crashing.org>
     [not found] ` <CA+55aFxR0qg0yY-NWnH0DDruVWw8qRqp8=CRLq13p=TyxosJKw@mail.gmail.com>
2018-06-29  2:21   ` [PATCH 1/2] drivers: core: Don't try to use a dead glue_dir Benjamin Herrenschmidt
2018-06-30 19:45     ` Linus Torvalds
2018-07-07 16:48       ` Greg Kroah-Hartman
2018-07-09 23:44         ` Benjamin Herrenschmidt
2018-07-10 14:55           ` Greg Kroah-Hartman
2018-07-10 23:32             ` Benjamin Herrenschmidt
2018-07-10 23:55               ` Linus Torvalds
2018-07-11  0:07                 ` Benjamin Herrenschmidt
2018-07-21  7:53           ` Greg Kroah-Hartman
2018-07-23  0:35             ` Benjamin Herrenschmidt
2018-07-07 16:51     ` Greg Kroah-Hartman
2018-07-09 23:50       ` Benjamin Herrenschmidt
2018-06-29  2:21   ` [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier Benjamin Herrenschmidt
2018-06-29 13:56     ` Linus Torvalds
2018-06-29 13:57       ` Linus Torvalds
2018-06-30  1:04       ` Benjamin Herrenschmidt
2018-06-30  3:51         ` Benjamin Herrenschmidt
     [not found]   ` <edc7b03b9550ddcf1291ebf5a6dafd24f4455c23.camel@kernel.crashing.org>
     [not found]     ` <CA+55aFxS7OVEN5XrxceC5ibz780mhn-qRa50w1gVFjsz2JjMbw@mail.gmail.com>
     [not found]       ` <7eb06b499f2be366cf68c6b6588b16c603e6a567.camel@kernel.crashing.org>
2018-07-01  2:07         ` Linus Torvalds
2018-07-01  2:18           ` Linus Torvalds
2018-07-01  3:49             ` Benjamin Herrenschmidt
2018-07-01  3:42           ` Benjamin Herrenschmidt
2018-07-01  3:57             ` Linus Torvalds
2018-07-01  7:16               ` Benjamin Herrenschmidt
2018-07-01 17:04                 ` Linus Torvalds
2018-07-01 23:36                   ` Benjamin Herrenschmidt
2018-07-02 10:23                     ` Benjamin Herrenschmidt
2018-07-02 19:24                       ` Linus Torvalds
2018-07-03  0:57                         ` Benjamin Herrenschmidt
2018-07-03  2:15                           ` Linus Torvalds
2018-07-03  2:26                             ` Linus Torvalds
2018-07-03  2:39                               ` Benjamin Herrenschmidt
2018-07-03  5:22                                 ` Benjamin Herrenschmidt
2018-07-03 15:46                                   ` Tejun Heo
2018-07-04  1:10                                     ` Benjamin Herrenschmidt
     [not found]                                   ` <CA+55aFzKmzC-2_6+RsRRu9KfK_r=UGgLN2Q0hSNBV=ScGR7=8g@mail.gmail.com>
     [not found]                                     ` <6e3ca577f8dd5f3621d1054447d3f928a73dfcf9.camel@kernel.crashing.org>
     [not found]                                       ` <CA+55aFy+ZSu5cPzk887N-ZgXqvTB=Bp1JQYMWT1SZY81MqLH6Q@mail.gmail.com>
     [not found]                                         ` <1bc873980e7f63291fbe19dbc7e1607b8e126241.camel@kernel.crashing.org>
     [not found]                                           ` <20180707164241.GB16279@kroah.com>
     [not found]                                             ` <CA+55aFx-UX8nxewRFFWdBgYfPqfipnxaqJuJCUni9h4JvhoPFw@mail.gmail.com>
2018-07-10  0:29                                               ` [PATCH v2 " Benjamin Herrenschmidt
2018-07-10  0:33                                                 ` Linus Torvalds
2018-07-10  1:37                                                   ` Benjamin Herrenschmidt
2018-07-10 14:55                                                 ` Greg Kroah-Hartman
2018-07-10 23:31                                                   ` Benjamin Herrenschmidt
2018-07-03  2:37                             ` [PATCH " Benjamin Herrenschmidt
2018-07-02 10:22                   ` Benjamin Herrenschmidt
2018-07-01  3:52       ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).