linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Li Zhong <zhong@linux.vnet.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	gregkh@linuxfoundation.org, rafael.j.wysocki@intel.com,
	toshi.kani@hp.com
Subject: Re: [RFC PATCH v4] Use kernfs_break_active_protection() for device online store callbacks
Date: Fri, 18 Apr 2014 16:33:49 +0800	[thread overview]
Message-ID: <1397810029.3142.15.camel@ThinkPad-T5421.cn.ibm.com> (raw)
In-Reply-To: <20140417151728.GK15326@htj.dyndns.org>

On Thu, 2014-04-17 at 11:17 -0400, Tejun Heo wrote:
> Hello,
> 
> On Thu, Apr 17, 2014 at 02:50:44PM +0800, Li Zhong wrote:
> > This patch tries to solve the device hot remove locking issues in a
> > different way from commit 5e33bc41, as kernfs already has a mechanism 
> > to break active protection. 
> > 
> > The problem here is the order of s_active, and series of hotplug related
> > lock. 
> 
> It prolly deservse more detailed explanation of the deadlock along
> with how 5e33bc41 ("$SUBJ") tried to solve it.  The active protetion
> is there to keep the file alive by blocking deletion while operations
> are on-going in the file.  This blocking creates a dependency loop
> when an operation running off a sysfs knob ends up grabbing a lock
> which may be held while removing the said sysfs knob.

OK, I'll try to add these and something more in next version.

> 
> > +	kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> > +	if (WARN_ON_ONCE(!kn))
> > +		return -ENODEV;
> > +
> > +	/*
> > +	 * Break active protection here to avoid deadlocks with device
> > +	 * removing process, which tries to remove sysfs entries including this
> > +	 * "online" attribute while holding some hotplug related locks.
> > +	 *
> > +	 * @dev needs to be protected here, or it could go away any time after
> > +	 * dropping active protection. But it is still unreasonable/unsafe to
> > +	 * online/offline a device after it being removed. Fortunately, there
> 
> I think this is something driver layer proper should provide
> synchronization for.  It shouldn't be difficult to synchronize this
> function against device_del(), right?  And, please note that @dev is
> guaranteed to have not been removed (at least hasn't gone through attr
> removal) upto this point.

Ok, I think what we need here is the check below, after getting
device_hotplug_lock, and abort this function if device already removed.
We should allow device_del() to remove the device in the other process,
which is why we are breaking the active protection. 

> 
> > +	 * are some checks in online/offline knobs. Like cpu, it checks cpu
> > +	 * present/online mask before doing the real work.
> > +	 */
> > +
> > +	get_device(dev);
> > +	kernfs_break_active_protection(kn);
> > +
> > +	lock_device_hotplug();
> > +
> > +	/*
> > +	 * If we assume device_hotplug_lock must be acquired before removing
> > +	 * device, we may try to find a way to check whether the device has
> > +	 * been removed here, so we don't call device_{on|off}line against
> > +	 * removed device.
> > +	 */
> 
> Yeah, let's please fix this.

OK, I guess we can check whether dev->kobj.sd becomes NULL. If so, it
means the device has already been deleted by device_del().

> 
> >  	ret = val ? device_online(dev) : device_offline(dev);
> >  	unlock_device_hotplug();
> > +
> > +	kernfs_unbreak_active_protection(kn);
> > +	put_device(dev);
> > +
> > +	kernfs_put(kn);
> > +
> >  	return ret < 0 ? ret : count;
> >  }
> >  static DEVICE_ATTR_RW(online);
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index bece691..0d2f3a5 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -320,10 +320,17 @@ store_mem_state(struct device *dev,
> >  {
> >  	struct memory_block *mem = to_memory_block(dev);
> >  	int ret, online_type;
> > +	struct kernfs_node *kn;
> >  
> > -	ret = lock_device_hotplug_sysfs();
> > -	if (ret)
> > -		return ret;
> > +	kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name);
> > +	if (WARN_ON_ONCE(!kn))
> > +		return -ENODEV;
> > +
> > +	/* refer to comments in online_store() for more information */
> > +	get_device(dev);
> > +	kernfs_break_active_protection(kn);
> > +
> > +	lock_device_hotplug();
> >  
> >  	if (!strncmp(buf, "online_kernel", min_t(int, count, 13)))
> >  		online_type = ONLINE_KERNEL;
> > @@ -362,6 +369,11 @@ store_mem_state(struct device *dev,
> >  err:
> >  	unlock_device_hotplug();
> >  
> > +	kernfs_unbreak_active_protection(kn);
> > +	put_device(dev);
> > +
> > +	kernfs_put(kn);
> 
> There are other users of lock_device_hotplug_sysfs().  We probably
> want to audit them and convert them too, preferably with helper
> routines so that they don't end up duplicating the complexity?

I see, I guess I could keep lock_device_hotplug_sysfs(), just replace it
with the implementation here; and provide a new
unlock_device_hotplug_sysfs(), which will do the unlock, unbreak, and
puts. 

I'll try to get the code ready sometime next week for your review.

Thanks, Zhong

> 
> Thanks.
> 



  reply	other threads:[~2014-04-18  8:34 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-10  9:18 [RFC PATCH] Suppress a device hot remove related lockdep warning Li Zhong
2014-04-10 13:31 ` Tejun Heo
2014-04-11  4:10   ` [RFC PATCH v2] Use kernfs_break_active_protection() for device online store callbacks Li Zhong
2014-04-11 10:26     ` Tejun Heo
2014-04-14  7:47       ` [RFC PATCH v3] " Li Zhong
2014-04-14 20:13         ` Tejun Heo
2014-04-15  2:44           ` Li Zhong
2014-04-15 14:50             ` Tejun Heo
2014-04-16  1:41               ` Li Zhong
2014-04-16 15:17                 ` Tejun Heo
2014-04-17  3:05                   ` Li Zhong
2014-04-17 15:06                     ` Tejun Heo
2014-04-17  6:50                   ` [RFC PATCH v4] " Li Zhong
2014-04-17 15:17                     ` Tejun Heo
2014-04-18  8:33                       ` Li Zhong [this message]
2014-04-21  9:20                       ` [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store() Li Zhong
2014-04-21  9:23                         ` [RFC PATCH v5 2/2] Use kernfs_break_active_protection() for device online store callbacks Li Zhong
2014-04-21 22:46                           ` Tejun Heo
2014-04-22  3:34                             ` Li Zhong
2014-04-22 10:11                               ` Rafael J. Wysocki
2014-04-23  1:50                                 ` Li Zhong
2014-04-23 10:54                                   ` Rafael J. Wysocki
2014-04-24  1:13                                     ` Li Zhong
2014-04-22 20:44                               ` Tejun Heo
2014-04-22 22:21                                 ` Rafael J. Wysocki
2014-04-23 14:23                                   ` Tejun Heo
2014-04-23 16:12                                     ` Rafael J. Wysocki
2014-04-23 16:52                                       ` Tejun Heo
2014-04-24  8:59                                       ` Li Zhong
2014-04-24 10:02                                         ` Rafael J. Wysocki
2014-04-25  1:46                                           ` Li Zhong
2014-04-25 12:47                                             ` Rafael J. Wysocki
2014-04-28  1:49                                               ` Li Zhong
2014-04-23  5:03                                 ` Li Zhong
2014-04-23 10:58                                   ` Rafael J. Wysocki
2014-04-24  1:33                                     ` Li Zhong
2014-05-09  8:35                               ` Li Zhong
2014-05-09  8:40                                 ` [RFC PATCH v6 1/2 ] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store() Li Zhong
2014-05-09  8:40                                   ` [RFC PATCH v6 2/2] Implement lock_device_hotplug_sysfs() by breaking active protection Li Zhong
2014-04-21 22:38                         ` [RFC PATCH v5 1/2] Use lock_device_hotplug() in cpu_probe_store() and cpu_release_store() Tejun Heo
2014-04-22  2:29                           ` Li Zhong
2014-04-22 20:40                             ` Tejun Heo
2014-04-23  2:00                               ` Li Zhong
2014-04-23 14:39                                 ` Tejun Heo
2014-04-24  8:37                                   ` Li Zhong
2014-04-24 14:32                                     ` Tejun Heo
2014-04-25  1:56                                       ` Li Zhong
2014-04-25 12:28                                         ` Tejun Heo
2014-04-28  0:51                                           ` Li Zhong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1397810029.3142.15.camel@ThinkPad-T5421.cn.ibm.com \
    --to=zhong@linux.vnet.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).