linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: rafael@kernel.org, jeyu@kernel.org, ngupta@vflare.org,
	sergey.senozhatsky.work@gmail.com, minchan@kernel.org,
	axboe@kernel.dk, mbenes@suse.com, jpoimboe@redhat.com,
	tglx@linutronix.de, keescook@chromium.org, jikos@kernel.org,
	rostedt@goodmis.org, peterz@infradead.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] drivers/base/core: refcount kobject and bus on device attribute read / store
Date: Wed, 23 Jun 2021 18:51:42 +0200	[thread overview]
Message-ID: <YNNmnrjpOGGVXsP2@kroah.com> (raw)
In-Reply-To: <20210623161434.qraapo4xaprte7bs@garbanzo>

On Wed, Jun 23, 2021 at 09:14:34AM -0700, Luis Chamberlain wrote:
> kernfs / sysfs is in not struct device specific, it has no semantics for
> modules or even devices. The aspect of kernfs which deals with locking
> a struct kernfs_node is kernfs_get_active(). The refcount there of
> importance is the call to atomic_inc_unless_negative(&kn->active).
> 
> struct kernfs_node *kernfs_get_active(struct kernfs_node *kn)                   
> {                                                                               
> 	if (unlikely(!kn))
> 		return NULL;                                                    
> 
> 	if (!atomic_inc_unless_negative(&kn->active))                           
> 		return NULL;                                                    
> 
> 	if (kernfs_lockdep(kn))
> 		rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_);               
> 	return kn;                                                              
> }
> 
> BTW this was also precisely where I had suggested to extend the
> kernfs_node with an optional kn->owner and if set we try_module_get() to
> prevent the deadlock case if the module exit routine also happens to use
> a lock also used by a sysfs attribute store/read.
> 
> The flow would be (from a real live gdb crash backtrace from an original
> bug report from a customer):
> 
> write system call -->
>   ksys_write ()
>     vfs_write() -->
>       __vfs_write() -->
>        kernfs_fop_write() (note now this is kernfs_fop_write_iter()) -->
>          sysfs_kf_write() -->
>            dev_attr_store() -->
> 	     null reference
> 
> The dereference is because the dev_attr_store() call which we are
> modifying
> 
> LINE-001 static ssize_t dev_attr_store(struct kobject *kobj, struct attribute *attr,                                                                                         
> LINE-002 const char *buf, size_t count)                                                                                                                 
> LINE-003 {
> LINE-004	struct device_attribute *dev_attr = to_dev_attr(attr);
> LINE-005	struct device *dev = kobj_to_dev(kobj);
> LINE-006	ssize_t ret = -EIO;
> LINE-007
> LINE-008	if (dev_attr->store)
> LINE-009		ret = dev_attr->store(dev, dev_attr, buf, count);
> 	...
> 	}
> 
> The race happens because a sysfs store / read can be triggered, the CPU
> could be preempted after LINE-008, and the ->store is gone by LINE-009.
> This begs the question if kernfs_fop_write_iter() or sysfs protects this
> somehow? Let's see.
> 
> For kernfs we have:
> 
> static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) 
> {
> 	struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
> 	...
> 	mutex_lock(&of->mutex);
> 	if (!kernfs_get_active(of->kn)) {
> 		mutex_unlock(&of->mutex);
> 		len = -ENODEV;
> 		goto out_free;
> 	}
> 
> 	ops = kernfs_ops(of->kn);
> 	if (ops->write)
> 		len = ops->write(of, buf, len, iocb->ki_pos);
> 	else
> 		len = -EINVAL;
> 
> 	kernfs_put_active(of->kn);
> 	mutex_unlock(&of->mutex);
> 	...
> }
> 
> And the write call here is a syfs calls:
> 
> static ssize_t sysfs_kf_write(struct kernfs_open_file *of, char *buf,           
> 			      size_t count, loff_t pos)                         
> {                                                                               
> 	const struct sysfs_ops *ops = sysfs_file_ops(of->kn);                   
> 	struct kobject *kobj = of->kn->parent->priv;                            
> 
> 	if (!count)                                                             
> 		return 0;                                                       
> 
> 	return ops->store(kobj, of->kn->priv, buf, count);                      
> }           
> 
> As we have observed already, the active reference obtained through
> kernfs_get_active() was for the struct kernfs_node. Sure, the
> ops->write() is valid, in this case it sysfs_kf_write().
> 
> sysfs isn't doing any active reference check for the kobject device
> attribute as it doesn't care for them. So syfs calls
> dev_attr_store(), but the dev_attr_store() is not preventing the device
> attribute ops to go fishing, and we destroy them while we're destroying
> the device on module removal.

Ah, but sysfs _should_ be doing this properly.

I think the issue is that when we store the kobject pointer in kernfs,
it is NOT incremented.  Look at sysfs_create_dir_ns(), if we call
kobject_get(kobj) right before we call kernfs_create_dir_ns(), and then
properly clean things up later on when we remove the sysfs directory
(i.e. the kobject), it _should_ fix this problem.

Then, we know, whenever show/store/whatever is called, when we cast out
of kernfs the private pointer to a kobject, that the kobject really is
still alive, so we can use it properly.

Can you try that, it should be a much "simpler" change here.

thanks,

greg k-h

  parent reply	other threads:[~2021-06-23 16:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23  0:36 [PATCH v3] drivers/base/core: refcount kobject and bus on device attribute read / store Luis Chamberlain
2021-06-23  8:32 ` Greg KH
2021-06-23 16:14   ` Luis Chamberlain
2021-06-23 16:28     ` Greg KH
2021-06-23 16:51     ` Greg KH [this message]
2021-06-23 17:23       ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNNmnrjpOGGVXsP2@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=jeyu@kernel.org \
    --cc=jikos@kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbenes@suse.com \
    --cc=mcgrof@kernel.org \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).