linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race
@ 2013-02-18  6:12 Li Zefan
  2013-02-18 10:36 ` Kirill A. Shutemov
  2013-02-18 17:16 ` Tejun Heo
  0 siblings, 2 replies; 5+ messages in thread
From: Li Zefan @ 2013-02-18  6:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, LKML, Kirill A. Shutemov

commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep
warning for event_control") solved a deadlock by introducing a new
bug.

Move cgrp->event_list to a temporary list doesn't mean you can traverse
this list locklessly, because at the same time cgroup_event_wake() can
be called and remove the event from the list. The result of this race
is disastrous.

We adopt the way how kvm irqfd code implements race-free event removal,
which is now described in the comments in cgroup_event_wake().

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 50 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 26c071c..65c8101 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -217,6 +217,10 @@ struct cgroup_event {
 	 */
 	struct list_head list;
 	/*
+	 * Need to notify userspace when this event is removed?
+	 */
+	bool signal_on_remove;
+	/*
 	 * All fields below needed to unregister event when
 	 * userspace closes eventfd.
 	 */
@@ -3833,8 +3837,17 @@ static void cgroup_event_remove(struct work_struct *work)
 			remove);
 	struct cgroup *cgrp = event->cgrp;
 
+	remove_wait_queue(event->wqh, &event->wait);
+
 	event->cft->unregister_event(cgrp, event->cft, event->eventfd);
 
+	/*
+	 * If this event is to be removed due to cgroup removal,
+	 * we notify userspace.
+	 */
+	if (event->signal_on_remove)
+		eventfd_signal(event->eventfd, 1);
+
 	eventfd_ctx_put(event->eventfd);
 	kfree(event);
 	dput(cgrp->dentry);
@@ -3854,15 +3867,25 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
 	unsigned long flags = (unsigned long)key;
 
 	if (flags & POLLHUP) {
-		__remove_wait_queue(event->wqh, &event->wait);
-		spin_lock(&cgrp->event_list_lock);
-		list_del_init(&event->list);
-		spin_unlock(&cgrp->event_list_lock);
 		/*
-		 * We are in atomic context, but cgroup_event_remove() may
-		 * sleep, so we have to call it in workqueue.
+		 * If the event has been detached at cgroup removal, we
+		 * can simply return knowing the other side will cleanup
+		 * for us.
+		 *
+		 * We can't race against event freeing since the other
+		 * side will require wqh->lock via remove_wait_queue(),
+		 * which we hold.
 		 */
-		schedule_work(&event->remove);
+		spin_lock(&cgrp->event_list_lock);
+		if (!list_empty(&event->list)) {
+			list_del_init(&event->list);
+			/*
+			 * We are in atomic context, but cgroup_event_remove()
+			 * may sleep, so we have to call it in workqueue.
+			 */
+			schedule_work(&event->remove);
+		}
+		spin_unlock(&cgrp->event_list_lock);
 	}
 
 	return 0;
@@ -4428,20 +4451,15 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	/*
 	 * Unregister events and notify userspace.
 	 * Notify userspace about cgroup removing only after rmdir of cgroup
-	 * directory to avoid race between userspace and kernelspace. Use
-	 * a temporary list to avoid a deadlock with cgroup_event_wake(). Since
-	 * cgroup_event_wake() is called with the wait queue head locked,
-	 * remove_wait_queue() cannot be called while holding event_list_lock.
+	 * directory to avoid race between userspace and kernelspace.
 	 */
 	spin_lock(&cgrp->event_list_lock);
-	list_splice_init(&cgrp->event_list, &tmp_list);
-	spin_unlock(&cgrp->event_list_lock);
-	list_for_each_entry_safe(event, tmp, &tmp_list, list) {
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		event->signal_on_remove = true;
 		list_del_init(&event->list);
-		remove_wait_queue(event->wqh, &event->wait);
-		eventfd_signal(event->eventfd, 1);
 		schedule_work(&event->remove);
 	}
+	spin_unlock(&cgrp->event_list_lock);
 
 	return 0;
 }
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race
  2013-02-18  6:12 [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race Li Zefan
@ 2013-02-18 10:36 ` Kirill A. Shutemov
  2013-02-18 10:39   ` Li Zefan
  2013-02-18 17:16 ` Tejun Heo
  1 sibling, 1 reply; 5+ messages in thread
From: Kirill A. Shutemov @ 2013-02-18 10:36 UTC (permalink / raw)
  To: Li Zefan; +Cc: Tejun Heo, Cgroups, LKML

On Mon, Feb 18, 2013 at 02:12:23PM +0800, Li Zefan wrote:
> commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep
> warning for event_control") solved a deadlock by introducing a new
> bug.
> 
> Move cgrp->event_list to a temporary list doesn't mean you can traverse
> this list locklessly, because at the same time cgroup_event_wake() can
> be called and remove the event from the list. The result of this race
> is disastrous.
> 
> We adopt the way how kvm irqfd code implements race-free event removal,
> which is now described in the comments in cgroup_event_wake().
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> ---
>  kernel/cgroup.c | 50 ++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 34 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 26c071c..65c8101 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -217,6 +217,10 @@ struct cgroup_event {
>  	 */
>  	struct list_head list;
>  	/*
> +	 * Need to notify userspace when this event is removed?
> +	 */
> +	bool signal_on_remove;
> +	/*
>  	 * All fields below needed to unregister event when
>  	 * userspace closes eventfd.
>  	 */
> @@ -3833,8 +3837,17 @@ static void cgroup_event_remove(struct work_struct *work)
>  			remove);
>  	struct cgroup *cgrp = event->cgrp;
>  
> +	remove_wait_queue(event->wqh, &event->wait);
> +
>  	event->cft->unregister_event(cgrp, event->cft, event->eventfd);
>  
> +	/*
> +	 * If this event is to be removed due to cgroup removal,
> +	 * we notify userspace.
> +	 */
> +	if (event->signal_on_remove)
> +		eventfd_signal(event->eventfd, 1);

It's safe to notify anyway, isn't it? Let's just drop signal_on_remove.

Otherwise, look good.

Acked-by: Kirill A. Shutemov <kirill@shutemov.name>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race
  2013-02-18 10:36 ` Kirill A. Shutemov
@ 2013-02-18 10:39   ` Li Zefan
  0 siblings, 0 replies; 5+ messages in thread
From: Li Zefan @ 2013-02-18 10:39 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Tejun Heo, Cgroups, LKML

On 2013/2/18 18:36, Kirill A. Shutemov wrote:
> On Mon, Feb 18, 2013 at 02:12:23PM +0800, Li Zefan wrote:
>> commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep
>> warning for event_control") solved a deadlock by introducing a new
>> bug.
>>
>> Move cgrp->event_list to a temporary list doesn't mean you can traverse
>> this list locklessly, because at the same time cgroup_event_wake() can
>> be called and remove the event from the list. The result of this race
>> is disastrous.
>>
>> We adopt the way how kvm irqfd code implements race-free event removal,
>> which is now described in the comments in cgroup_event_wake().
>>
>> Signed-off-by: Li Zefan <lizefan@huawei.com>
>> ---
>>  kernel/cgroup.c | 50 ++++++++++++++++++++++++++++++++++----------------
>>  1 file changed, 34 insertions(+), 16 deletions(-)
>>
>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> index 26c071c..65c8101 100644
>> --- a/kernel/cgroup.c
>> +++ b/kernel/cgroup.c
>> @@ -217,6 +217,10 @@ struct cgroup_event {
>>  	 */
>>  	struct list_head list;
>>  	/*
>> +	 * Need to notify userspace when this event is removed?
>> +	 */
>> +	bool signal_on_remove;
>> +	/*
>>  	 * All fields below needed to unregister event when
>>  	 * userspace closes eventfd.
>>  	 */
>> @@ -3833,8 +3837,17 @@ static void cgroup_event_remove(struct work_struct *work)
>>  			remove);
>>  	struct cgroup *cgrp = event->cgrp;
>>  
>> +	remove_wait_queue(event->wqh, &event->wait);
>> +
>>  	event->cft->unregister_event(cgrp, event->cft, event->eventfd);
>>  
>> +	/*
>> +	 * If this event is to be removed due to cgroup removal,
>> +	 * we notify userspace.
>> +	 */
>> +	if (event->signal_on_remove)
>> +		eventfd_signal(event->eventfd, 1);
> 
> It's safe to notify anyway, isn't it? Let's just drop signal_on_remove.
> 

should be. just tried to be conservative to make sure I fix the bug without changing
any behavior.

> Otherwise, look good.
> 
> Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race
  2013-02-18  6:12 [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race Li Zefan
  2013-02-18 10:36 ` Kirill A. Shutemov
@ 2013-02-18 17:16 ` Tejun Heo
  2013-02-18 17:18   ` Tejun Heo
  1 sibling, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2013-02-18 17:16 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, LKML, Kirill A. Shutemov

On Mon, Feb 18, 2013 at 02:12:23PM +0800, Li Zefan wrote:
> commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep
> warning for event_control") solved a deadlock by introducing a new
> bug.
> 
> Move cgrp->event_list to a temporary list doesn't mean you can traverse
> this list locklessly, because at the same time cgroup_event_wake() can
> be called and remove the event from the list. The result of this race
> is disastrous.
> 
> We adopt the way how kvm irqfd code implements race-free event removal,
> which is now described in the comments in cgroup_event_wake().
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>

Applied to cgroup/for-3.9.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race
  2013-02-18 17:16 ` Tejun Heo
@ 2013-02-18 17:18   ` Tejun Heo
  0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2013-02-18 17:18 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, LKML, Kirill A. Shutemov

On Mon, Feb 18, 2013 at 09:16:47AM -0800, Tejun Heo wrote:
> On Mon, Feb 18, 2013 at 02:12:23PM +0800, Li Zefan wrote:
> > commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep
> > warning for event_control") solved a deadlock by introducing a new
> > bug.
> > 
> > Move cgrp->event_list to a temporary list doesn't mean you can traverse
> > this list locklessly, because at the same time cgroup_event_wake() can
> > be called and remove the event from the list. The result of this race
> > is disastrous.
> > 
> > We adopt the way how kvm irqfd code implements race-free event removal,
> > which is now described in the comments in cgroup_event_wake().
> > 
> > Signed-off-by: Li Zefan <lizefan@huawei.com>
> 
> Applied to cgroup/for-3.9.

Never mind.  Just spotted v3 and applied that one instead.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-02-18 17:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-18  6:12 [PATCH v2] cgroup: fix cgroup_rmdir() vs close(eventfd) race Li Zefan
2013-02-18 10:36 ` Kirill A. Shutemov
2013-02-18 10:39   ` Li Zefan
2013-02-18 17:16 ` Tejun Heo
2013-02-18 17:18   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).