linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
@ 2021-06-23 21:50 Luis Chamberlain
  2021-06-23 22:59 ` Kees Cook
  2021-06-24 11:09 ` Greg KH
  0 siblings, 2 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-06-23 21:50 UTC (permalink / raw)
  To: gregkh, rafael, davem, kuba, ast, andriin, daniel, atenart,
	alobakin, weiwan, ap420073
  Cc: jeyu, ngupta, sergey.senozhatsky.work, minchan, mcgrof, axboe,
	mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kernel

It's possible today to have a device attribute read or store
race against device removal. This is known to happen as follows:

write system call -->
  ksys_write () -->
    vfs_write() -->
      __vfs_write() -->
        kernfs_fop_write_iter() -->
          sysfs_kf_write() -->
            dev_attr_store() -->
              null reference

This happens because the dev_attr->store() callback can be
removed prior to its call, after dev_attr_store() was initiated.
The null dereference is possible because the sysfs ops can be
removed on module removal, for instance, when device_del() is
called, and a sysfs read / store is not doing any kobject reference
bumps either. This allows a read/store call to initiate, a
device_del() to kick off, and then the read/store call can be
gone by the time to execute it.

The sysfs filesystem is not doing any kobject reference bumps during a
read / store ops to prevent this.

To fix this in a simplified way, just bump the kobject reference when
we create a directory and remove it on directory removal.

The big unfortunate eye-sore is addressing the manual kobject reference
assumption on the networking code, which leads me to believe we should
end up replacing that eventually with another sort of check.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---

This v4 moves to fixing the race condition on dev_attr_store() and
dev_attr_read() to sysfs by bumping the kobject reference count
on directory creation / deletion as suggested by Greg.

Unfortunately at least the networking core has a manual refcount
assumption, which needs to be adjusted to account for this change.
This should also mean there is runtime for other kobjects which may
not be explored yet which may need fixing as well. We may want to
change the check to something else on the networking front, but its
not clear to me yet what to use.

 fs/sysfs/dir.c | 3 +++
 net/core/dev.c | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 59dffd5ca517..6c47aa4af6f5 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -56,12 +56,14 @@ int sysfs_create_dir_ns(struct kobject *kobj, const void *ns)
 
 	kobject_get_ownership(kobj, &uid, &gid);
 
+	kobject_get(kobj);
 	kn = kernfs_create_dir_ns(parent, kobject_name(kobj),
 				  S_IRWXU | S_IRUGO | S_IXUGO, uid, gid,
 				  kobj, ns);
 	if (IS_ERR(kn)) {
 		if (PTR_ERR(kn) == -EEXIST)
 			sysfs_warn_dup(parent, kobject_name(kobj));
+		kobject_put(kobj);
 		return PTR_ERR(kn);
 	}
 
@@ -100,6 +102,7 @@ void sysfs_remove_dir(struct kobject *kobj)
 	if (kn) {
 		WARN_ON_ONCE(kernfs_type(kn) != KERNFS_DIR);
 		kernfs_remove(kn);
+		kobject_put(kobj);
 	}
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 222b1d322c96..3a0ffa603d14 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10429,7 +10429,7 @@ static void netdev_wait_allrefs(struct net_device *dev)
 	rebroadcast_time = warning_time = jiffies;
 	refcnt = netdev_refcnt_read(dev);
 
-	while (refcnt != 1) {
+	while (refcnt != 3) {
 		if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
 			rtnl_lock();
 
@@ -10544,7 +10544,7 @@ void netdev_run_todo(void)
 		netdev_wait_allrefs(dev);
 
 		/* paranoia */
-		BUG_ON(netdev_refcnt_read(dev) != 1);
+		BUG_ON(netdev_refcnt_read(dev) != 3);
 		BUG_ON(!list_empty(&dev->ptype_all));
 		BUG_ON(!list_empty(&dev->ptype_specific));
 		WARN_ON(rcu_access_pointer(dev->ip_ptr));
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-06-23 21:50 [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal Luis Chamberlain
@ 2021-06-23 22:59 ` Kees Cook
  2021-06-24  1:09   ` Luis Chamberlain
  2021-06-24 11:06   ` Greg KH
  2021-06-24 11:09 ` Greg KH
  1 sibling, 2 replies; 12+ messages in thread
From: Kees Cook @ 2021-06-23 22:59 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: gregkh, rafael, davem, kuba, ast, andriin, daniel, atenart,
	alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	jikos, rostedt, peterz, linux-block, netdev, linux-kernel

On Wed, Jun 23, 2021 at 02:50:07PM -0700, Luis Chamberlain wrote:
> It's possible today to have a device attribute read or store
> race against device removal. This is known to happen as follows:
> 
> write system call -->
>   ksys_write () -->
>     vfs_write() -->
>       __vfs_write() -->
>         kernfs_fop_write_iter() -->
>           sysfs_kf_write() -->
>             dev_attr_store() -->
>               null reference
> 
> This happens because the dev_attr->store() callback can be
> removed prior to its call, after dev_attr_store() was initiated.
> The null dereference is possible because the sysfs ops can be
> removed on module removal, for instance, when device_del() is
> called, and a sysfs read / store is not doing any kobject reference
> bumps either. This allows a read/store call to initiate, a
> device_del() to kick off, and then the read/store call can be
> gone by the time to execute it.
> 
> The sysfs filesystem is not doing any kobject reference bumps during a
> read / store ops to prevent this.
> 
> To fix this in a simplified way, just bump the kobject reference when
> we create a directory and remove it on directory removal.
> 
> The big unfortunate eye-sore is addressing the manual kobject reference
> assumption on the networking code, which leads me to believe we should
> end up replacing that eventually with another sort of check.
> 
> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> 
> This v4 moves to fixing the race condition on dev_attr_store() and
> dev_attr_read() to sysfs by bumping the kobject reference count
> on directory creation / deletion as suggested by Greg.
> 
> Unfortunately at least the networking core has a manual refcount
> assumption, which needs to be adjusted to account for this change.
> This should also mean there is runtime for other kobjects which may
> not be explored yet which may need fixing as well. We may want to
> change the check to something else on the networking front, but its
> not clear to me yet what to use.
> 
>  fs/sysfs/dir.c | 3 +++
>  net/core/dev.c | 4 ++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 59dffd5ca517..6c47aa4af6f5 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -56,12 +56,14 @@ int sysfs_create_dir_ns(struct kobject *kobj, const void *ns)
>  
>  	kobject_get_ownership(kobj, &uid, &gid);
>  
> +	kobject_get(kobj);
>  	kn = kernfs_create_dir_ns(parent, kobject_name(kobj),
>  				  S_IRWXU | S_IRUGO | S_IXUGO, uid, gid,
>  				  kobj, ns);
>  	if (IS_ERR(kn)) {
>  		if (PTR_ERR(kn) == -EEXIST)
>  			sysfs_warn_dup(parent, kobject_name(kobj));
> +		kobject_put(kobj);
>  		return PTR_ERR(kn);
>  	}
>  
> @@ -100,6 +102,7 @@ void sysfs_remove_dir(struct kobject *kobj)
>  	if (kn) {
>  		WARN_ON_ONCE(kernfs_type(kn) != KERNFS_DIR);
>  		kernfs_remove(kn);
> +		kobject_put(kobj);
>  	}
>  }

Shouldn't this be taken on open() not sysfs creation, or is the problem
that the kobject is held the module memory rather than duplicated by
sysfs?

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 222b1d322c96..3a0ffa603d14 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -10429,7 +10429,7 @@ static void netdev_wait_allrefs(struct net_device *dev)
>  	rebroadcast_time = warning_time = jiffies;
>  	refcnt = netdev_refcnt_read(dev);
>  
> -	while (refcnt != 1) {
> +	while (refcnt != 3) {
>  		if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>  			rtnl_lock();
>  
> @@ -10544,7 +10544,7 @@ void netdev_run_todo(void)
>  		netdev_wait_allrefs(dev);
>  
>  		/* paranoia */
> -		BUG_ON(netdev_refcnt_read(dev) != 1);
> +		BUG_ON(netdev_refcnt_read(dev) != 3);

And surely there are things besides netdevs that would suffer from this
change?

>  		BUG_ON(!list_empty(&dev->ptype_all));
>  		BUG_ON(!list_empty(&dev->ptype_specific));
>  		WARN_ON(rcu_access_pointer(dev->ip_ptr));
> -- 
> 2.27.0
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-06-23 22:59 ` Kees Cook
@ 2021-06-24  1:09   ` Luis Chamberlain
  2021-06-24 11:06   ` Greg KH
  1 sibling, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-06-24  1:09 UTC (permalink / raw)
  To: Kees Cook
  Cc: gregkh, rafael, davem, kuba, ast, andriin, daniel, atenart,
	alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	jikos, rostedt, peterz, linux-block, netdev, linux-kernel

On Wed, Jun 23, 2021 at 03:59:37PM -0700, Kees Cook wrote:
> On Wed, Jun 23, 2021 at 02:50:07PM -0700, Luis Chamberlain wrote:
> > It's possible today to have a device attribute read or store
> > race against device removal. This is known to happen as follows:
> > 
> > write system call -->
> >   ksys_write () -->
> >     vfs_write() -->
> >       __vfs_write() -->
> >         kernfs_fop_write_iter() -->
> >           sysfs_kf_write() -->
> >             dev_attr_store() -->
> >               null reference
> > 
> > This happens because the dev_attr->store() callback can be
> > removed prior to its call, after dev_attr_store() was initiated.
> > The null dereference is possible because the sysfs ops can be
> > removed on module removal, for instance, when device_del() is
> > called, and a sysfs read / store is not doing any kobject reference
> > bumps either. This allows a read/store call to initiate, a
> > device_del() to kick off, and then the read/store call can be
> > gone by the time to execute it.
> > 
> > The sysfs filesystem is not doing any kobject reference bumps during a
> > read / store ops to prevent this.
> > 
> > To fix this in a simplified way, just bump the kobject reference when
> > we create a directory and remove it on directory removal.
> > 
> > The big unfortunate eye-sore is addressing the manual kobject reference
> > assumption on the networking code, which leads me to believe we should
> > end up replacing that eventually with another sort of check.
> > 
> > Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> > 
> > This v4 moves to fixing the race condition on dev_attr_store() and
> > dev_attr_read() to sysfs by bumping the kobject reference count
> > on directory creation / deletion as suggested by Greg.
> > 
> > Unfortunately at least the networking core has a manual refcount
> > assumption, which needs to be adjusted to account for this change.
> > This should also mean there is runtime for other kobjects which may
> > not be explored yet which may need fixing as well. We may want to
> > change the check to something else on the networking front, but its
> > not clear to me yet what to use.
> > 
> >  fs/sysfs/dir.c | 3 +++
> >  net/core/dev.c | 4 ++--
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> > index 59dffd5ca517..6c47aa4af6f5 100644
> > --- a/fs/sysfs/dir.c
> > +++ b/fs/sysfs/dir.c
> > @@ -56,12 +56,14 @@ int sysfs_create_dir_ns(struct kobject *kobj, const void *ns)
> >  
> >  	kobject_get_ownership(kobj, &uid, &gid);
> >  
> > +	kobject_get(kobj);
> >  	kn = kernfs_create_dir_ns(parent, kobject_name(kobj),
> >  				  S_IRWXU | S_IRUGO | S_IXUGO, uid, gid,
> >  				  kobj, ns);
> >  	if (IS_ERR(kn)) {
> >  		if (PTR_ERR(kn) == -EEXIST)
> >  			sysfs_warn_dup(parent, kobject_name(kobj));
> > +		kobject_put(kobj);
> >  		return PTR_ERR(kn);
> >  	}
> >  
> > @@ -100,6 +102,7 @@ void sysfs_remove_dir(struct kobject *kobj)
> >  	if (kn) {
> >  		WARN_ON_ONCE(kernfs_type(kn) != KERNFS_DIR);
> >  		kernfs_remove(kn);
> > +		kobject_put(kobj);
> >  	}
> >  }
> 
> Shouldn't this be taken on open() not sysfs creation, or is the problem
> that the kobject is held the module memory rather than duplicated by
> sysfs?

If you kobject_get() you also protect from module removal to complete
until the calls ends. The issue is the dereference for the attribute
op will be gone if we don't protect the kobject during the op.

We only have an open() call for the bin sysfs files. So the alternative
is to sprinkly the get on all calls. The goal with adding it to the
directory creation was to simplify this.

> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 222b1d322c96..3a0ffa603d14 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -10429,7 +10429,7 @@ static void netdev_wait_allrefs(struct net_device *dev)
> >  	rebroadcast_time = warning_time = jiffies;
> >  	refcnt = netdev_refcnt_read(dev);
> >  
> > -	while (refcnt != 1) {
> > +	while (refcnt != 3) {
> >  		if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
> >  			rtnl_lock();
> >  
> > @@ -10544,7 +10544,7 @@ void netdev_run_todo(void)
> >  		netdev_wait_allrefs(dev);
> >  
> >  		/* paranoia */
> > -		BUG_ON(netdev_refcnt_read(dev) != 1);
> > +		BUG_ON(netdev_refcnt_read(dev) != 3);
> 
> And surely there are things besides netdevs that would suffer from this
> change?

Yes, I noted this possibility below the commit log, below the "---".

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-06-23 22:59 ` Kees Cook
  2021-06-24  1:09   ` Luis Chamberlain
@ 2021-06-24 11:06   ` Greg KH
  1 sibling, 0 replies; 12+ messages in thread
From: Greg KH @ 2021-06-24 11:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis Chamberlain, rafael, davem, kuba, ast, andriin, daniel,
	atenart, alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	jikos, rostedt, peterz, linux-block, netdev, linux-kernel

On Wed, Jun 23, 2021 at 03:59:37PM -0700, Kees Cook wrote:
> On Wed, Jun 23, 2021 at 02:50:07PM -0700, Luis Chamberlain wrote:
> > It's possible today to have a device attribute read or store
> > race against device removal. This is known to happen as follows:
> > 
> > write system call -->
> >   ksys_write () -->
> >     vfs_write() -->
> >       __vfs_write() -->
> >         kernfs_fop_write_iter() -->
> >           sysfs_kf_write() -->
> >             dev_attr_store() -->
> >               null reference
> > 
> > This happens because the dev_attr->store() callback can be
> > removed prior to its call, after dev_attr_store() was initiated.
> > The null dereference is possible because the sysfs ops can be
> > removed on module removal, for instance, when device_del() is
> > called, and a sysfs read / store is not doing any kobject reference
> > bumps either. This allows a read/store call to initiate, a
> > device_del() to kick off, and then the read/store call can be
> > gone by the time to execute it.
> > 
> > The sysfs filesystem is not doing any kobject reference bumps during a
> > read / store ops to prevent this.
> > 
> > To fix this in a simplified way, just bump the kobject reference when
> > we create a directory and remove it on directory removal.
> > 
> > The big unfortunate eye-sore is addressing the manual kobject reference
> > assumption on the networking code, which leads me to believe we should
> > end up replacing that eventually with another sort of check.
> > 
> > Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> > 
> > This v4 moves to fixing the race condition on dev_attr_store() and
> > dev_attr_read() to sysfs by bumping the kobject reference count
> > on directory creation / deletion as suggested by Greg.
> > 
> > Unfortunately at least the networking core has a manual refcount
> > assumption, which needs to be adjusted to account for this change.
> > This should also mean there is runtime for other kobjects which may
> > not be explored yet which may need fixing as well. We may want to
> > change the check to something else on the networking front, but its
> > not clear to me yet what to use.
> > 
> >  fs/sysfs/dir.c | 3 +++
> >  net/core/dev.c | 4 ++--
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> > index 59dffd5ca517..6c47aa4af6f5 100644
> > --- a/fs/sysfs/dir.c
> > +++ b/fs/sysfs/dir.c
> > @@ -56,12 +56,14 @@ int sysfs_create_dir_ns(struct kobject *kobj, const void *ns)
> >  
> >  	kobject_get_ownership(kobj, &uid, &gid);
> >  
> > +	kobject_get(kobj);
> >  	kn = kernfs_create_dir_ns(parent, kobject_name(kobj),
> >  				  S_IRWXU | S_IRUGO | S_IXUGO, uid, gid,
> >  				  kobj, ns);
> >  	if (IS_ERR(kn)) {
> >  		if (PTR_ERR(kn) == -EEXIST)
> >  			sysfs_warn_dup(parent, kobject_name(kobj));
> > +		kobject_put(kobj);
> >  		return PTR_ERR(kn);
> >  	}
> >  
> > @@ -100,6 +102,7 @@ void sysfs_remove_dir(struct kobject *kobj)
> >  	if (kn) {
> >  		WARN_ON_ONCE(kernfs_type(kn) != KERNFS_DIR);
> >  		kernfs_remove(kn);
> > +		kobject_put(kobj);
> >  	}
> >  }
> 
> Shouldn't this be taken on open() not sysfs creation, or is the problem
> that the kobject is held the module memory rather than duplicated by
> sysfs?

No, this is when we are passing a pointer off to kernfs, it needs to
have the reference count incremented then otherwise it could "disappear"
without kernfs knowing about it.

> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 222b1d322c96..3a0ffa603d14 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -10429,7 +10429,7 @@ static void netdev_wait_allrefs(struct net_device *dev)
> >  	rebroadcast_time = warning_time = jiffies;
> >  	refcnt = netdev_refcnt_read(dev);
> >  
> > -	while (refcnt != 1) {
> > +	while (refcnt != 3) {
> >  		if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
> >  			rtnl_lock();
> >  
> > @@ -10544,7 +10544,7 @@ void netdev_run_todo(void)
> >  		netdev_wait_allrefs(dev);
> >  
> >  		/* paranoia */
> > -		BUG_ON(netdev_refcnt_read(dev) != 1);
> > +		BUG_ON(netdev_refcnt_read(dev) != 3);
> 
> And surely there are things besides netdevs that would suffer from this
> change?

I _REALLY_ hope not.  No one should ever be looking at the count of a
kobject for anything as there are way too many ways that means nothing.

This is a fun hack for networking to be paranoid that they got their
initializion logic correct, so I'll leave it be right now.  No one else
should care, and if they do, we should fix it up...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-06-23 21:50 [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal Luis Chamberlain
  2021-06-23 22:59 ` Kees Cook
@ 2021-06-24 11:09 ` Greg KH
  2021-06-25 21:55   ` Luis Chamberlain
  1 sibling, 1 reply; 12+ messages in thread
From: Greg KH @ 2021-06-24 11:09 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: rafael, davem, kuba, ast, andriin, daniel, atenart, alobakin,
	weiwan, ap420073, jeyu, ngupta, sergey.senozhatsky.work, minchan,
	axboe, mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kernel

On Wed, Jun 23, 2021 at 02:50:07PM -0700, Luis Chamberlain wrote:
> It's possible today to have a device attribute read or store
> race against device removal. This is known to happen as follows:
> 
> write system call -->
>   ksys_write () -->
>     vfs_write() -->
>       __vfs_write() -->
>         kernfs_fop_write_iter() -->
>           sysfs_kf_write() -->
>             dev_attr_store() -->
>               null reference
> 
> This happens because the dev_attr->store() callback can be
> removed prior to its call, after dev_attr_store() was initiated.
> The null dereference is possible because the sysfs ops can be
> removed on module removal, for instance, when device_del() is
> called, and a sysfs read / store is not doing any kobject reference
> bumps either. This allows a read/store call to initiate, a
> device_del() to kick off, and then the read/store call can be
> gone by the time to execute it.
> 
> The sysfs filesystem is not doing any kobject reference bumps during a
> read / store ops to prevent this.
> 
> To fix this in a simplified way, just bump the kobject reference when
> we create a directory and remove it on directory removal.
> 
> The big unfortunate eye-sore is addressing the manual kobject reference
> assumption on the networking code, which leads me to believe we should
> end up replacing that eventually with another sort of check.
> 
> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> 
> This v4 moves to fixing the race condition on dev_attr_store() and
> dev_attr_read() to sysfs by bumping the kobject reference count
> on directory creation / deletion as suggested by Greg.

This looks good.

It's late in the development cycle, I'll hold off on adding this to my
tree until 5.14-rc1 is out because of:

> Unfortunately at least the networking core has a manual refcount
> assumption, which needs to be adjusted to account for this change.
> This should also mean there is runtime for other kobjects which may
> not be explored yet which may need fixing as well. We may want to
> change the check to something else on the networking front, but its
> not clear to me yet what to use.

That's crazy what networking is doing here, hopefully no one else is.
If they are, let's shake it out in linux-next to find the problems which
is why a good "soak" there is a good idea.

thanks for making this change and sticking with it!

Oh, and with this change, does your modprobe/rmmod crazy test now work?

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-06-24 11:09 ` Greg KH
@ 2021-06-25 21:55   ` Luis Chamberlain
  2021-07-01 22:48     ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2021-06-25 21:55 UTC (permalink / raw)
  To: Greg KH
  Cc: rafael, davem, kuba, ast, andriin, daniel, atenart, alobakin,
	weiwan, ap420073, jeyu, ngupta, sergey.senozhatsky.work, minchan,
	axboe, mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kernel

On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> thanks for making this change and sticking with it!
> 
> Oh, and with this change, does your modprobe/rmmod crazy test now work?

It does but I wrote a test_syfs driver and I believe I see an issue with
this. I'll debug a bit more and see what it was, and I'll then also use
the driver to demo the issue more clearly, and then verification can be
an easy selftest test.

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-06-25 21:55   ` Luis Chamberlain
@ 2021-07-01 22:48     ` Luis Chamberlain
  2021-07-02  1:04       ` Luis Chamberlain
  2021-07-21 11:30       ` Greg KH
  0 siblings, 2 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-01 22:48 UTC (permalink / raw)
  To: Greg KH, Tejun Heo
  Cc: rafael, davem, kuba, ast, andriin, daniel, atenart, alobakin,
	weiwan, ap420073, jeyu, ngupta, sergey.senozhatsky.work, minchan,
	axboe, mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kernel

On Fri, Jun 25, 2021 at 02:56:03PM -0700, Luis Chamberlain wrote:
> On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> > thanks for making this change and sticking with it!
> > 
> > Oh, and with this change, does your modprobe/rmmod crazy test now work?
> 
> It does but I wrote a test_syfs driver and I believe I see an issue with
> this. I'll debug a bit more and see what it was, and I'll then also use
> the driver to demo the issue more clearly, and then verification can be
> an easy selftest test.

OK my conclusion based on a new selftest driver I wrote is we can drop
this patch safely. The selftest will cover this corner case well now.

In short: the kernfs active reference will ensure the store operation
still exists. The kernfs mutex is not enough, but if the driver removes
the operation prior to getting the active reference, the write will just
fail. The deferencing inside of the sysfs operation is abstract to
kernfs, and while kernfs can't do anything to prevent a driver from
doing something stupid, it at least can ensure an open file ensure the
op is not removed until the operation completes.

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-07-01 22:48     ` Luis Chamberlain
@ 2021-07-02  1:04       ` Luis Chamberlain
  2021-07-21 11:30       ` Greg KH
  1 sibling, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-02  1:04 UTC (permalink / raw)
  To: Greg KH, Tejun Heo
  Cc: rafael, davem, kuba, ast, andriin, daniel, atenart, alobakin,
	weiwan, ap420073, jeyu, ngupta, sergey.senozhatsky.work, minchan,
	axboe, mbenes, jpoimboe, tglx, keescook, jikos, rostedt, peterz,
	linux-block, netdev, linux-kernel

On Thu, Jul 01, 2021 at 03:48:22PM -0700, Luis Chamberlain wrote:
> On Fri, Jun 25, 2021 at 02:56:03PM -0700, Luis Chamberlain wrote:
> > On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> > > thanks for making this change and sticking with it!
> > > 
> > > Oh, and with this change, does your modprobe/rmmod crazy test now work?
> > 
> > It does but I wrote a test_syfs driver and I believe I see an issue with
> > this. I'll debug a bit more and see what it was, and I'll then also use
> > the driver to demo the issue more clearly, and then verification can be
> > an easy selftest test.
> 
> OK my conclusion based on a new selftest driver I wrote is we can drop
> this patch safely. The selftest will cover this corner case well now.
> 
> In short: the kernfs active reference will ensure the store operation
> still exists. The kernfs mutex is not enough, but if the driver removes
> the operation prior to getting the active reference, the write will just
> fail. The deferencing inside of the sysfs operation is abstract to
> kernfs, and while kernfs can't do anything to prevent a driver from
> doing something stupid, it at least can ensure an open file ensure the
> op is not removed until the operation completes.

OK and now its not so clear, as it would seem the refcount can indeed
get reduced after we validated it. In any case we'll have enough tools
to reproduce any possible failure soon.

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-07-01 22:48     ` Luis Chamberlain
  2021-07-02  1:04       ` Luis Chamberlain
@ 2021-07-21 11:30       ` Greg KH
  2021-07-22 21:31         ` Luis Chamberlain
  1 sibling, 1 reply; 12+ messages in thread
From: Greg KH @ 2021-07-21 11:30 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Tejun Heo, rafael, davem, kuba, ast, andriin, daniel, atenart,
	alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kernel

On Thu, Jul 01, 2021 at 03:48:16PM -0700, Luis Chamberlain wrote:
> On Fri, Jun 25, 2021 at 02:56:03PM -0700, Luis Chamberlain wrote:
> > On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> > > thanks for making this change and sticking with it!
> > > 
> > > Oh, and with this change, does your modprobe/rmmod crazy test now work?
> > 
> > It does but I wrote a test_syfs driver and I believe I see an issue with
> > this. I'll debug a bit more and see what it was, and I'll then also use
> > the driver to demo the issue more clearly, and then verification can be
> > an easy selftest test.
> 
> OK my conclusion based on a new selftest driver I wrote is we can drop
> this patch safely. The selftest will cover this corner case well now.
> 
> In short: the kernfs active reference will ensure the store operation
> still exists. The kernfs mutex is not enough, but if the driver removes
> the operation prior to getting the active reference, the write will just
> fail. The deferencing inside of the sysfs operation is abstract to
> kernfs, and while kernfs can't do anything to prevent a driver from
> doing something stupid, it at least can ensure an open file ensure the
> op is not removed until the operation completes.

Ok, so all is good?  Then why is your zram test code blowing up so
badly?  Where is the reference counting going wrong?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-07-21 11:30       ` Greg KH
@ 2021-07-22 21:31         ` Luis Chamberlain
  2021-07-23 11:14           ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-22 21:31 UTC (permalink / raw)
  To: Greg KH
  Cc: Tejun Heo, rafael, davem, kuba, ast, andriin, daniel, atenart,
	alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kernel

On Wed, Jul 21, 2021 at 01:30:29PM +0200, Greg KH wrote:
> On Thu, Jul 01, 2021 at 03:48:16PM -0700, Luis Chamberlain wrote:
> > On Fri, Jun 25, 2021 at 02:56:03PM -0700, Luis Chamberlain wrote:
> > > On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> > > > thanks for making this change and sticking with it!
> > > > 
> > > > Oh, and with this change, does your modprobe/rmmod crazy test now work?
> > > 
> > > It does but I wrote a test_syfs driver and I believe I see an issue with
> > > this. I'll debug a bit more and see what it was, and I'll then also use
> > > the driver to demo the issue more clearly, and then verification can be
> > > an easy selftest test.
> > 
> > OK my conclusion based on a new selftest driver I wrote is we can drop
> > this patch safely. The selftest will cover this corner case well now.
> > 
> > In short: the kernfs active reference will ensure the store operation
> > still exists. The kernfs mutex is not enough, but if the driver removes
> > the operation prior to getting the active reference, the write will just
> > fail. The deferencing inside of the sysfs operation is abstract to
> > kernfs, and while kernfs can't do anything to prevent a driver from
> > doing something stupid, it at least can ensure an open file ensure the
> > op is not removed until the operation completes.
> 
> Ok, so all is good?

It would seem to be the case.

> Then why is your zram test code blowing up so badly?

I checked the logs for the backtrace where the crash did happen
and we did see clear evidence of the race we feared here. The *first*
bug that happened was the CPU hotplug race:

[132004.787099] Error: Removing state 61 which has instances left.
[132004.787124] WARNING: CPU: 17 PID: 9307 at ../kernel/cpu.c:1879 __cpuhp_remove_state_cpuslocked+0x1c4/0x1d0

After this the crash happen:

[132005.254022] BUG: Unable to handle kernel instruction fetch
[132005.254049] Faulting instruction address: 0xc0080000004a0c24
[132005.254059] Oops: Kernel access of bad area, sig: 11 [#1]

And that's when the backtrace does come up with race. Given the first
race though, I think we can be skeptical of the rest, specially since
I cannot reproduce with a self bombing selftest.

> Where is the reference counting going wrong?

It's not clear, as the misuse with the CPU multistate could lead
to to us leaking per cpu stuct zcomp instances, leaving these
behind as there is no one to remove them. I can't think of the
relationship of this leak and the crash other then memory pressure.

Because of this and the deadlock which is easily triggerable,
I decided to write a selftest to allow is to more cleanly be
able to reproduce any races we can dream up of.

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-07-22 21:31         ` Luis Chamberlain
@ 2021-07-23 11:14           ` Greg KH
  2021-07-23 17:35             ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2021-07-23 11:14 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Tejun Heo, rafael, davem, kuba, ast, andriin, daniel, atenart,
	alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kernel

On Thu, Jul 22, 2021 at 02:31:37PM -0700, Luis Chamberlain wrote:
> On Wed, Jul 21, 2021 at 01:30:29PM +0200, Greg KH wrote:
> > On Thu, Jul 01, 2021 at 03:48:16PM -0700, Luis Chamberlain wrote:
> > > On Fri, Jun 25, 2021 at 02:56:03PM -0700, Luis Chamberlain wrote:
> > > > On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> > > > > thanks for making this change and sticking with it!
> > > > > 
> > > > > Oh, and with this change, does your modprobe/rmmod crazy test now work?
> > > > 
> > > > It does but I wrote a test_syfs driver and I believe I see an issue with
> > > > this. I'll debug a bit more and see what it was, and I'll then also use
> > > > the driver to demo the issue more clearly, and then verification can be
> > > > an easy selftest test.
> > > 
> > > OK my conclusion based on a new selftest driver I wrote is we can drop
> > > this patch safely. The selftest will cover this corner case well now.
> > > 
> > > In short: the kernfs active reference will ensure the store operation
> > > still exists. The kernfs mutex is not enough, but if the driver removes
> > > the operation prior to getting the active reference, the write will just
> > > fail. The deferencing inside of the sysfs operation is abstract to
> > > kernfs, and while kernfs can't do anything to prevent a driver from
> > > doing something stupid, it at least can ensure an open file ensure the
> > > op is not removed until the operation completes.
> > 
> > Ok, so all is good?
> 
> It would seem to be the case.
> 
> > Then why is your zram test code blowing up so badly?
> 
> I checked the logs for the backtrace where the crash did happen
> and we did see clear evidence of the race we feared here. The *first*
> bug that happened was the CPU hotplug race:
> 
> [132004.787099] Error: Removing state 61 which has instances left.
> [132004.787124] WARNING: CPU: 17 PID: 9307 at ../kernel/cpu.c:1879 __cpuhp_remove_state_cpuslocked+0x1c4/0x1d0

I do not understand what this issue is, is it fixed?  Why is a cpu being
hot unplugged at the same time a zram?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
  2021-07-23 11:14           ` Greg KH
@ 2021-07-23 17:35             ` Luis Chamberlain
  0 siblings, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-23 17:35 UTC (permalink / raw)
  To: Greg KH
  Cc: Tejun Heo, rafael, davem, kuba, ast, andriin, daniel, atenart,
	alobakin, weiwan, ap420073, jeyu, ngupta,
	sergey.senozhatsky.work, minchan, axboe, mbenes, jpoimboe, tglx,
	keescook, jikos, rostedt, peterz, linux-block, netdev,
	linux-kernel

On Fri, Jul 23, 2021 at 01:14:10PM +0200, Greg KH wrote:
> On Thu, Jul 22, 2021 at 02:31:37PM -0700, Luis Chamberlain wrote:
> > On Wed, Jul 21, 2021 at 01:30:29PM +0200, Greg KH wrote:
> > > On Thu, Jul 01, 2021 at 03:48:16PM -0700, Luis Chamberlain wrote:
> > > > On Fri, Jun 25, 2021 at 02:56:03PM -0700, Luis Chamberlain wrote:
> > > > > On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> > > > > > thanks for making this change and sticking with it!
> > > > > > 
> > > > > > Oh, and with this change, does your modprobe/rmmod crazy test now work?
> > > > > 
> > > > > It does but I wrote a test_syfs driver and I believe I see an issue with
> > > > > this. I'll debug a bit more and see what it was, and I'll then also use
> > > > > the driver to demo the issue more clearly, and then verification can be
> > > > > an easy selftest test.
> > > > 
> > > > OK my conclusion based on a new selftest driver I wrote is we can drop
> > > > this patch safely. The selftest will cover this corner case well now.
> > > > 
> > > > In short: the kernfs active reference will ensure the store operation
> > > > still exists. The kernfs mutex is not enough, but if the driver removes
> > > > the operation prior to getting the active reference, the write will just
> > > > fail. The deferencing inside of the sysfs operation is abstract to
> > > > kernfs, and while kernfs can't do anything to prevent a driver from
> > > > doing something stupid, it at least can ensure an open file ensure the
> > > > op is not removed until the operation completes.
> > > 
> > > Ok, so all is good?
> > 
> > It would seem to be the case.
> > 
> > > Then why is your zram test code blowing up so badly?
> > 
> > I checked the logs for the backtrace where the crash did happen
> > and we did see clear evidence of the race we feared here. The *first*
> > bug that happened was the CPU hotplug race:
> > 
> > [132004.787099] Error: Removing state 61 which has instances left.
> > [132004.787124] WARNING: CPU: 17 PID: 9307 at ../kernel/cpu.c:1879 __cpuhp_remove_state_cpuslocked+0x1c4/0x1d0
> 
> I do not understand what this issue is, is it fixed?

My first patch for zram fixes the CPU multistate mis-use. And after that
patch is applied triggering the other race does not happen.  It is why I
decided to write a selftest driver, so that we can have a way to do all
sorts of crazy races in a self contained driver example.

> Why is a cpu being hot unplugged at the same time a zram?

That's not what is happening. The description of the issue with zram's
misuse of CPU multistate is described clearly in my commit log for the
fix for that driver. You can refer to that commit log description.

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-23 17:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-23 21:50 [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal Luis Chamberlain
2021-06-23 22:59 ` Kees Cook
2021-06-24  1:09   ` Luis Chamberlain
2021-06-24 11:06   ` Greg KH
2021-06-24 11:09 ` Greg KH
2021-06-25 21:55   ` Luis Chamberlain
2021-07-01 22:48     ` Luis Chamberlain
2021-07-02  1:04       ` Luis Chamberlain
2021-07-21 11:30       ` Greg KH
2021-07-22 21:31         ` Luis Chamberlain
2021-07-23 11:14           ` Greg KH
2021-07-23 17:35             ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).