[PATCH 0/4] cgroup: bug fixes for eventfd

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] cgroup: bug fixes for eventfd
@ 2013-02-02  6:50 Li Zefan
  2013-02-02  6:50 ` [PATCH 1/4] eventfd: introduce eventfd_signal_hangup() Li Zefan
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Li Zefan @ 2013-02-02  6:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

There're three bugs.

- If thread A is removing a cgroup, while thread B is closing an eventfd, the
two threads might free the same cgroup event and thus crash the kernel.

This is fixed by patch #1 and patch #2.

- If there're multiple threads are blocking in read() on the same eventfd,
and someone removes the cgroup, only one thread will be notified and unblocked,
and others won't be unblocked until those threads are killed.

- If thread A is removing a cgroup, while thread B is registering a cgroup event
and then read the eventfd, it might block until the thread is killed.

These two are fixed by patch #3.

0001-eventfd-Introduce-eventfd_signal_hangup.patch
0002-cgroup-fix-cgroup_rmdir-vs-close-eventfd-race.patch
0003-eventfd-make-operations-on-eventfd-return-EIDRM-if-i.patch
0004-cgroup-adapt-to-the-new-way-of-detecting-cgroup-remo.patch

--
 fs/eventfd.c                         | 30 ++++++++++++++++++++++++++++++
 include/linux/eventfd.h              |  5 +++++
 kernel/cgroup.c                      | 30 ++++++++++++++++++------------
 tools/cgroup/cgroup_event_listener.c | 12 +++++-------
 4 files changed, 58 insertions(+), 19 deletions(-)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] eventfd: introduce eventfd_signal_hangup()
  2013-02-02  6:50 [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
@ 2013-02-02  6:50 ` Li Zefan
  2013-02-02 15:58   ` Kirill A. Shutemov
  2013-02-02  6:51 ` [PATCH 2/4] cgroup: fix cgroup_rmdir() vs close(eventfd) race Li Zefan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2013-02-02  6:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

When an eventfd is closed, a wakeup with POLLHUP will be issued,
but cgroup wants to issue wakeup explicitly, so when a cgroup is
removed userspace can be notified.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 fs/eventfd.c            | 11 +++++++++++
 include/linux/eventfd.h |  5 +++++
 2 files changed, 16 insertions(+)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 35470d9..acf15e3 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -67,6 +67,17 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
 }
 EXPORT_SYMBOL_GPL(eventfd_signal);
 
+/**
+ * eventfd_signal_hangup - Notify that this eventfd is hung up.
+ * @ctx: [in] Pointer to the eventfd context.
+ *
+ * Issue a POLLHUP wakeup.
+ */
+void eventfd_signal_hangup(struct eventfd_ctx *ctx)
+{
+	wake_up_poll(&ctx->wqh, POLLHUP);
+}
+
 static void eventfd_free_ctx(struct eventfd_ctx *ctx)
 {
 	kfree(ctx);
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 3c3ef19..68af706 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -35,6 +35,7 @@ struct file *eventfd_fget(int fd);
 struct eventfd_ctx *eventfd_ctx_fdget(int fd);
 struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
 __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n);
+void eventfd_signal_hangup(struct eventfd_ctx *ctx);
 ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt);
 int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
 				  __u64 *cnt);
@@ -60,6 +61,10 @@ static inline int eventfd_signal(struct eventfd_ctx *ctx, int n)
 	return -ENOSYS;
 }
 
+static inline void eventfd_signal_hangup(struct eventfd_ctx *ctx)
+{
+}
+
 static inline void eventfd_ctx_put(struct eventfd_ctx *ctx)
 {
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] cgroup: fix cgroup_rmdir() vs close(eventfd) race
  2013-02-02  6:50 [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
  2013-02-02  6:50 ` [PATCH 1/4] eventfd: introduce eventfd_signal_hangup() Li Zefan
@ 2013-02-02  6:51 ` Li Zefan
  2013-02-02 15:59   ` Kirill A. Shutemov
  2013-02-02  6:51 ` [PATCH 3/4] eventfd: make operations on eventfd return -EIDRM if it's hung up Li Zefan
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2013-02-02  6:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep
warning for event_control") sovled a deadlock by introducing a new
bug.

We can't access @event without event_list_lock, otherwise we'll race
with cgroup_event_wake() called when closing the eventfd, and then
both threads will try to free the same @event.

CPU0                                  CPU1
---------------------------           -----------------------------
cgroup_rmdir()                        close(eventfd)
  list_for_each_entry()                 cgroup_event_wake()
                                          list_del(event)
    list_del(event)
    cgroup_event_remove(event)
                                          cgroup_event_remove(event)

To fix this, use the new eventfd_signal_hangup() to notify userspace
cgroup is removed.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3d21adf..a3d361b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4302,9 +4302,9 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	struct dentry *d = cgrp->dentry;
 	struct cgroup *parent = cgrp->parent;
 	DEFINE_WAIT(wait);
-	struct cgroup_event *event, *tmp;
+	struct eventfd_ctx *ctx;
+	struct cgroup_event *event;
 	struct cgroup_subsys *ss;
-	LIST_HEAD(tmp_list);
 
 	lockdep_assert_held(&d->d_inode->i_mutex);
 	lockdep_assert_held(&cgroup_mutex);
@@ -4359,20 +4359,27 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	/*
 	 * Unregister events and notify userspace.
 	 * Notify userspace about cgroup removing only after rmdir of cgroup
-	 * directory to avoid race between userspace and kernelspace. Use
-	 * a temporary list to avoid a deadlock with cgroup_event_wake(). Since
+	 * directory to avoid race between userspace and kernelspace. Since
 	 * cgroup_event_wake() is called with the wait queue head locked,
-	 * remove_wait_queue() cannot be called while holding event_list_lock.
+	 * eventfd_signal() cannot be called while holding event_list_lock.
 	 */
 	spin_lock(&cgrp->event_list_lock);
-	list_splice_init(&cgrp->event_list, &tmp_list);
-	spin_unlock(&cgrp->event_list_lock);
-	list_for_each_entry_safe(event, tmp, &tmp_list, list) {
-		list_del_init(&event->list);
-		remove_wait_queue(event->wqh, &event->wait);
-		eventfd_signal(event->eventfd, 1);
-		schedule_work(&event->remove);
+	while (true) {
+		if (list_empty(&cgrp->event_list))
+			break;
+
+		event = list_first_entry(&cgrp->event_list,
+					 struct cgroup_event, list);
+		ctx = eventfd_ctx_get(event->eventfd);
+		spin_unlock(&cgrp->event_list_lock);
+
+		eventfd_signal(ctx, 1);
+		eventfd_signal_hangup(ctx);
+		eventfd_ctx_put(ctx);
+
+		spin_lock(&cgrp->event_list_lock);
 	}
+	spin_unlock(&cgrp->event_list_lock);
 
 	return 0;
 }
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] eventfd: make operations on eventfd return -EIDRM if it's hung up
  2013-02-02  6:50 [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
  2013-02-02  6:50 ` [PATCH 1/4] eventfd: introduce eventfd_signal_hangup() Li Zefan
  2013-02-02  6:51 ` [PATCH 2/4] cgroup: fix cgroup_rmdir() vs close(eventfd) race Li Zefan
@ 2013-02-02  6:51 ` Li Zefan
  2013-02-02 16:12   ` Kirill A. Shutemov
  2013-02-02  6:51 ` [PATCH 4/4] cgroup: adapt to the new way of detecting cgroup removal Li Zefan
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2013-02-02  6:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

Currently when a cgroup is removed, it calls eventfd_signal() for
each registered cgroup event, so userspace can be notified and blocked
reads can be unblocked.

The problem is, if we have multiple threads blocked on the same eventfd,
only one of them will be unlocked.

This patch makes sure all operations on the same eventfd can be unbocked.

There's another problem, a new cgroup event can be registered while we
are removing the cgroup, and then reading the eventfd will be blocked
until the thread is killed. This patch also fixes this issue.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 fs/eventfd.c    | 23 +++++++++++++++++++++--
 kernel/cgroup.c |  1 -
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index acf15e3..48de15a 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -35,6 +35,7 @@ struct eventfd_ctx {
 	 */
 	__u64 count;
 	unsigned int flags;
+	bool hung_up;
 };
 
 /**
@@ -71,11 +72,19 @@ EXPORT_SYMBOL_GPL(eventfd_signal);
  * eventfd_signal_hangup - Notify that this eventfd is hung up.
  * @ctx: [in] Pointer to the eventfd context.
  *
- * Issue a POLLHUP wakeup.
+ * Issue a POLLHUP wakeup. All current blocked reads, writes and polls on
+ * this eventfd will return with -EIDRM. Future operations on it will also
+ * return with -EDIRM.
  */
 void eventfd_signal_hangup(struct eventfd_ctx *ctx)
 {
-	wake_up_poll(&ctx->wqh, POLLHUP);
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx->wqh.lock, flags);
+	ctx->hung_up = true;
+	if (waitqueue_active(&ctx->wqh))
+		wake_up_locked_poll(&ctx->wqh, POLLHUP);
+	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
 }
 
 static void eventfd_free_ctx(struct eventfd_ctx *ctx)
@@ -140,6 +149,8 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait)
 		events |= POLLERR;
 	if (ULLONG_MAX - 1 > ctx->count)
 		events |= POLLOUT;
+	if (ctx->hung_up)
+		events |= POLLHUP;
 	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
 
 	return events;
@@ -208,6 +219,10 @@ ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt)
 		__add_wait_queue(&ctx->wqh, &wait);
 		for (;;) {
 			set_current_state(TASK_INTERRUPTIBLE);
+			if (ctx->hung_up) {
+				res = -EIDRM;
+				break;
+			}
 			if (ctx->count > 0) {
 				res = 0;
 				break;
@@ -272,6 +287,10 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
 		__add_wait_queue(&ctx->wqh, &wait);
 		for (res = 0;;) {
 			set_current_state(TASK_INTERRUPTIBLE);
+			if (ctx->hung_up) {
+				res = -EIDRM;
+				break;
+			}
 			if (ULLONG_MAX - ctx->count > ucnt) {
 				res = sizeof(ucnt);
 				break;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a3d361b..fcb1ab6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4373,7 +4373,6 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 		ctx = eventfd_ctx_get(event->eventfd);
 		spin_unlock(&cgrp->event_list_lock);
 
-		eventfd_signal(ctx, 1);
 		eventfd_signal_hangup(ctx);
 		eventfd_ctx_put(ctx);
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] cgroup: adapt to the new way of detecting cgroup removal
  2013-02-02  6:50 [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
                   ` (2 preceding siblings ...)
  2013-02-02  6:51 ` [PATCH 3/4] eventfd: make operations on eventfd return -EIDRM if it's hung up Li Zefan
@ 2013-02-02  6:51 ` Li Zefan
  2013-02-02  6:59 ` [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
  2013-02-04 19:27 ` Tejun Heo
  5 siblings, 0 replies; 16+ messages in thread
From: Li Zefan @ 2013-02-02  6:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

If read() returns with errno == EIDRM, we know the cgroup has been
removed.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 tools/cgroup/cgroup_event_listener.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/tools/cgroup/cgroup_event_listener.c b/tools/cgroup/cgroup_event_listener.c
index 4eb5507..19e3d4a 100644
--- a/tools/cgroup/cgroup_event_listener.c
+++ b/tools/cgroup/cgroup_event_listener.c
@@ -62,18 +62,16 @@ int main(int argc, char **argv)
 		if (ret == -1) {
 			if (errno == EINTR)
 				continue;
-			err(1, "Cannot read from eventfd");
+			if (errno != EIDRM)
+				err(1, "Cannot read from eventfd");
 		}
-		assert(ret == sizeof(result));
 
-		ret = access(event_control_path, W_OK);
-		if ((ret == -1) && (errno == ENOENT)) {
-			puts("The cgroup seems to have removed.");
+		if (ret == -1 && errno == EIDRM) {
+			puts("The cgroup has been removed.");
 			break;
 		}
 
-		if (ret == -1)
-			err(1, "cgroup.event_control is not accessible any more");
+		assert(ret == sizeof(result));
 
 		printf("%s %s: crossed\n", argv[1], argv[2]);
 	}
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] cgroup: bug fixes for eventfd
  2013-02-02  6:50 [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
                   ` (3 preceding siblings ...)
  2013-02-02  6:51 ` [PATCH 4/4] cgroup: adapt to the new way of detecting cgroup removal Li Zefan
@ 2013-02-02  6:59 ` Li Zefan
  2013-02-04 19:27 ` Tejun Heo
  5 siblings, 0 replies; 16+ messages in thread
From: Li Zefan @ 2013-02-02  6:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen,
	Kirill A. Shutemov

(forgot to cc Kirill A. Shutemov <kirill.shutemov@linux.intel.com>, added)

On 2013/2/2 14:50, Li Zefan wrote:
> There're three bugs.
> 
> - If thread A is removing a cgroup, while thread B is closing an eventfd, the
> two threads might free the same cgroup event and thus crash the kernel.
> 
> This is fixed by patch #1 and patch #2.
> 
> - If there're multiple threads are blocking in read() on the same eventfd,
> and someone removes the cgroup, only one thread will be notified and unblocked,
> and others won't be unblocked until those threads are killed.
> 
> - If thread A is removing a cgroup, while thread B is registering a cgroup event
> and then read the eventfd, it might block until the thread is killed.
> 
> These two are fixed by patch #3.
> 
> 0001-eventfd-Introduce-eventfd_signal_hangup.patch
> 0002-cgroup-fix-cgroup_rmdir-vs-close-eventfd-race.patch
> 0003-eventfd-make-operations-on-eventfd-return-EIDRM-if-i.patch
> 0004-cgroup-adapt-to-the-new-way-of-detecting-cgroup-remo.patch
> 
> --
>  fs/eventfd.c                         | 30 ++++++++++++++++++++++++++++++
>  include/linux/eventfd.h              |  5 +++++
>  kernel/cgroup.c                      | 30 ++++++++++++++++++------------
>  tools/cgroup/cgroup_event_listener.c | 12 +++++-------
>  4 files changed, 58 insertions(+), 19 deletions(-)
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] eventfd: introduce eventfd_signal_hangup()
  2013-02-02  6:50 ` [PATCH 1/4] eventfd: introduce eventfd_signal_hangup() Li Zefan
@ 2013-02-02 15:58   ` Kirill A. Shutemov
  2013-02-04 10:15     ` Kirill A. Shutemov
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2013-02-02 15:58 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On Sat, Feb 02, 2013 at 02:50:44PM +0800, Li Zefan wrote:
> When an eventfd is closed, a wakeup with POLLHUP will be issued,
> but cgroup wants to issue wakeup explicitly, so when a cgroup is
> removed userspace can be notified.
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>

Acked-by: Kirill A. Shutemov <kirill@shutemov.name>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] cgroup: fix cgroup_rmdir() vs close(eventfd) race
  2013-02-02  6:51 ` [PATCH 2/4] cgroup: fix cgroup_rmdir() vs close(eventfd) race Li Zefan
@ 2013-02-02 15:59   ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2013-02-02 15:59 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On Sat, Feb 02, 2013 at 02:51:15PM +0800, Li Zefan wrote:
> commit 205a872bd6f9a9a09ef035ef1e90185a8245cc58 ("cgroup: fix lockdep
> warning for event_control") sovled a deadlock by introducing a new
> bug.
> 
> We can't access @event without event_list_lock, otherwise we'll race
> with cgroup_event_wake() called when closing the eventfd, and then
> both threads will try to free the same @event.
> 
> CPU0                                  CPU1
> ---------------------------           -----------------------------
> cgroup_rmdir()                        close(eventfd)
>   list_for_each_entry()                 cgroup_event_wake()
>                                           list_del(event)
>     list_del(event)
>     cgroup_event_remove(event)
>                                           cgroup_event_remove(event)
> 
> To fix this, use the new eventfd_signal_hangup() to notify userspace
> cgroup is removed.
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> ---
>  kernel/cgroup.c | 31 +++++++++++++++++++------------
>  1 file changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 3d21adf..a3d361b 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4302,9 +4302,9 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
>  	struct dentry *d = cgrp->dentry;
>  	struct cgroup *parent = cgrp->parent;
>  	DEFINE_WAIT(wait);
> -	struct cgroup_event *event, *tmp;
> +	struct eventfd_ctx *ctx;
> +	struct cgroup_event *event;
>  	struct cgroup_subsys *ss;
> -	LIST_HEAD(tmp_list);
>  
>  	lockdep_assert_held(&d->d_inode->i_mutex);
>  	lockdep_assert_held(&cgroup_mutex);
> @@ -4359,20 +4359,27 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
>  	/*
>  	 * Unregister events and notify userspace.
>  	 * Notify userspace about cgroup removing only after rmdir of cgroup
> +	 * directory to avoid race between userspace and kernelspace. Since
>  	 * cgroup_event_wake() is called with the wait queue head locked,
> +	 * eventfd_signal() cannot be called while holding event_list_lock.
>  	 */
>  	spin_lock(&cgrp->event_list_lock);
> -	list_splice_init(&cgrp->event_list, &tmp_list);
> -	spin_unlock(&cgrp->event_list_lock);
> -	list_for_each_entry_safe(event, tmp, &tmp_list, list) {
> -		list_del_init(&event->list);
> -		remove_wait_queue(event->wqh, &event->wait);
> -		eventfd_signal(event->eventfd, 1);
> -		schedule_work(&event->remove);
> +	while (true) {
> +		if (list_empty(&cgrp->event_list))
> +			break;

while (!list_empty(&cgrp->event_list)) ?

Otherwise:
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>

> +
> +		event = list_first_entry(&cgrp->event_list,
> +					 struct cgroup_event, list);
> +		ctx = eventfd_ctx_get(event->eventfd);
> +		spin_unlock(&cgrp->event_list_lock);
> +
> +		eventfd_signal(ctx, 1);
> +		eventfd_signal_hangup(ctx);
> +		eventfd_ctx_put(ctx);
> +
> +		spin_lock(&cgrp->event_list_lock);
>  	}
> +	spin_unlock(&cgrp->event_list_lock);
>  
>  	return 0;
>  }
> -- 
> 1.8.0.2
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] eventfd: make operations on eventfd return -EIDRM if it's hung up
  2013-02-02  6:51 ` [PATCH 3/4] eventfd: make operations on eventfd return -EIDRM if it's hung up Li Zefan
@ 2013-02-02 16:12   ` Kirill A. Shutemov
  2013-02-04  3:15     ` Li Zefan
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2013-02-02 16:12 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On Sat, Feb 02, 2013 at 02:51:30PM +0800, Li Zefan wrote:
> Currently when a cgroup is removed, it calls eventfd_signal() for
> each registered cgroup event, so userspace can be notified and blocked
> reads can be unblocked.
> 
> The problem is, if we have multiple threads blocked on the same eventfd,
> only one of them will be unlocked.
> 
> This patch makes sure all operations on the same eventfd can be unbocked.
> 
> There's another problem, a new cgroup event can be registered while we
> are removing the cgroup, and then reading the eventfd will be blocked
> until the thread is killed. This patch also fixes this issue.
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> ---
>  fs/eventfd.c    | 23 +++++++++++++++++++++--
>  kernel/cgroup.c |  1 -
>  2 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index acf15e3..48de15a 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -35,6 +35,7 @@ struct eventfd_ctx {
>  	 */
>  	__u64 count;
>  	unsigned int flags;
> +	bool hung_up;
>  };
>  
>  /**
> @@ -71,11 +72,19 @@ EXPORT_SYMBOL_GPL(eventfd_signal);
>   * eventfd_signal_hangup - Notify that this eventfd is hung up.
>   * @ctx: [in] Pointer to the eventfd context.
>   *
> - * Issue a POLLHUP wakeup.
> + * Issue a POLLHUP wakeup. All current blocked reads, writes and polls on
> + * this eventfd will return with -EIDRM. Future operations on it will also
> + * return with -EDIRM.
>   */
>  void eventfd_signal_hangup(struct eventfd_ctx *ctx)
>  {
> -	wake_up_poll(&ctx->wqh, POLLHUP);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ctx->wqh.lock, flags);
> +	ctx->hung_up = true;
> +	if (waitqueue_active(&ctx->wqh))
> +		wake_up_locked_poll(&ctx->wqh, POLLHUP);
> +	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
>  }
>  
>  static void eventfd_free_ctx(struct eventfd_ctx *ctx)
> @@ -140,6 +149,8 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait)
>  		events |= POLLERR;
>  	if (ULLONG_MAX - 1 > ctx->count)
>  		events |= POLLOUT;
> +	if (ctx->hung_up)
> +		events |= POLLHUP;
>  	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
>  
>  	return events;
> @@ -208,6 +219,10 @@ ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt)
>  		__add_wait_queue(&ctx->wqh, &wait);
>  		for (;;) {
>  			set_current_state(TASK_INTERRUPTIBLE);
> +			if (ctx->hung_up) {
> +				res = -EIDRM;
> +				break;
> +			}

Shouldn't we just return indicate end-of-file if ctx->hung_up?

>  			if (ctx->count > 0) {
>  				res = 0;
>  				break;
> @@ -272,6 +287,10 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
>  		__add_wait_queue(&ctx->wqh, &wait);
>  		for (res = 0;;) {
>  			set_current_state(TASK_INTERRUPTIBLE);
> +			if (ctx->hung_up) {
> +				res = -EIDRM;

-EPIPE?

> +				break;
> +			}
>  			if (ULLONG_MAX - ctx->count > ucnt) {
>  				res = sizeof(ucnt);
>  				break;
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index a3d361b..fcb1ab6 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4373,7 +4373,6 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
>  		ctx = eventfd_ctx_get(event->eventfd);
>  		spin_unlock(&cgrp->event_list_lock);
>  
> -		eventfd_signal(ctx, 1);
>  		eventfd_signal_hangup(ctx);
>  		eventfd_ctx_put(ctx);
>  
> -- 
> 1.8.0.2
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] eventfd: make operations on eventfd return -EIDRM if it's hung up
  2013-02-02 16:12   ` Kirill A. Shutemov
@ 2013-02-04  3:15     ` Li Zefan
  0 siblings, 0 replies; 16+ messages in thread
From: Li Zefan @ 2013-02-04  3:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On 2013/2/3 0:12, Kirill A. Shutemov wrote:
> On Sat, Feb 02, 2013 at 02:51:30PM +0800, Li Zefan wrote:
>> Currently when a cgroup is removed, it calls eventfd_signal() for
>> each registered cgroup event, so userspace can be notified and blocked
>> reads can be unblocked.
>>
>> The problem is, if we have multiple threads blocked on the same eventfd,
>> only one of them will be unlocked.
>>
>> This patch makes sure all operations on the same eventfd can be unbocked.
>>
>> There's another problem, a new cgroup event can be registered while we
>> are removing the cgroup, and then reading the eventfd will be blocked
>> until the thread is killed. This patch also fixes this issue.
>>
>> Signed-off-by: Li Zefan <lizefan@huawei.com>
>> ---
>>  fs/eventfd.c    | 23 +++++++++++++++++++++--
>>  kernel/cgroup.c |  1 -
>>  2 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>> index acf15e3..48de15a 100644
>> --- a/fs/eventfd.c
>> +++ b/fs/eventfd.c
>> @@ -35,6 +35,7 @@ struct eventfd_ctx {
>>  	 */
>>  	__u64 count;
>>  	unsigned int flags;
>> +	bool hung_up;
>>  };
>>  
>>  /**
>> @@ -71,11 +72,19 @@ EXPORT_SYMBOL_GPL(eventfd_signal);
>>   * eventfd_signal_hangup - Notify that this eventfd is hung up.
>>   * @ctx: [in] Pointer to the eventfd context.
>>   *
>> - * Issue a POLLHUP wakeup.
>> + * Issue a POLLHUP wakeup. All current blocked reads, writes and polls on
>> + * this eventfd will return with -EIDRM. Future operations on it will also
>> + * return with -EDIRM.
>>   */
>>  void eventfd_signal_hangup(struct eventfd_ctx *ctx)
>>  {
>> -	wake_up_poll(&ctx->wqh, POLLHUP);
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&ctx->wqh.lock, flags);
>> +	ctx->hung_up = true;
>> +	if (waitqueue_active(&ctx->wqh))
>> +		wake_up_locked_poll(&ctx->wqh, POLLHUP);
>> +	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
>>  }
>>  
>>  static void eventfd_free_ctx(struct eventfd_ctx *ctx)
>> @@ -140,6 +149,8 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait)
>>  		events |= POLLERR;
>>  	if (ULLONG_MAX - 1 > ctx->count)
>>  		events |= POLLOUT;
>> +	if (ctx->hung_up)
>> +		events |= POLLHUP;
>>  	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
>>  
>>  	return events;
>> @@ -208,6 +219,10 @@ ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt)
>>  		__add_wait_queue(&ctx->wqh, &wait);
>>  		for (;;) {
>>  			set_current_state(TASK_INTERRUPTIBLE);
>> +			if (ctx->hung_up) {
>> +				res = -EIDRM;
>> +				break;
>> +			}
> 
> Shouldn't we just return indicate end-of-file if ctx->hung_up?
> 

makes sense

>>  			if (ctx->count > 0) {
>>  				res = 0;
>>  				break;
>> @@ -272,6 +287,10 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
>>  		__add_wait_queue(&ctx->wqh, &wait);
>>  		for (res = 0;;) {
>>  			set_current_state(TASK_INTERRUPTIBLE);
>> +			if (ctx->hung_up) {
>> +				res = -EIDRM;
> 
> -EPIPE?
> 

I wan't sure which errno is most appropriate, but EPIPE seems better.

>> +				break;
>> +			}
>>  			if (ULLONG_MAX - ctx->count > ucnt) {
>>  				res = sizeof(ucnt);
>>  				break;
>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> index a3d361b..fcb1ab6 100644
>> --- a/kernel/cgroup.c
>> +++ b/kernel/cgroup.c
>> @@ -4373,7 +4373,6 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
>>  		ctx = eventfd_ctx_get(event->eventfd);
>>  		spin_unlock(&cgrp->event_list_lock);
>>  
>> -		eventfd_signal(ctx, 1);
>>  		eventfd_signal_hangup(ctx);
>>  		eventfd_ctx_put(ctx);
>>  
>> -- 
>> 1.8.0.2
>> --
>> To unsubscribe from this list: send the line "unsubscribe cgroups" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] eventfd: introduce eventfd_signal_hangup()
  2013-02-02 15:58   ` Kirill A. Shutemov
@ 2013-02-04 10:15     ` Kirill A. Shutemov
  2013-02-05  3:40       ` Li Zefan
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2013-02-04 10:15 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On Sat, Feb 02, 2013 at 05:58:58PM +0200, Kirill A. Shutemov wrote:
> On Sat, Feb 02, 2013 at 02:50:44PM +0800, Li Zefan wrote:
> > When an eventfd is closed, a wakeup with POLLHUP will be issued,
> > but cgroup wants to issue wakeup explicitly, so when a cgroup is
> > removed userspace can be notified.
> > 
> > Signed-off-by: Li Zefan <lizefan@huawei.com>

Hm.. Looks like it will break eventfd semantics:

1. One eventfd can be used for deliver more then one notification from
one or more cgroups. POLLHUP on removing one of cgroups is not valid.

2. It's valid to have eventfd opened only by one userspace application. We
should not close it, just because cgroup is removed.

I think problem with multiple threads waiting an event on eventfd should
be handled in userspace.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] cgroup: bug fixes for eventfd
  2013-02-02  6:50 [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
                   ` (4 preceding siblings ...)
  2013-02-02  6:59 ` [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
@ 2013-02-04 19:27 ` Tejun Heo
  5 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2013-02-04 19:27 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On Sat, Feb 02, 2013 at 02:50:27PM +0800, Li Zefan wrote:
> There're three bugs.

You're gonna send an updated round, right?  Will wait for it.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] eventfd: introduce eventfd_signal_hangup()
  2013-02-04 10:15     ` Kirill A. Shutemov
@ 2013-02-05  3:40       ` Li Zefan
  2013-02-05  8:28         ` Kirill A. Shutemov
  0 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2013-02-05  3:40 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On 2013/2/4 18:15, Kirill A. Shutemov wrote:
> On Sat, Feb 02, 2013 at 05:58:58PM +0200, Kirill A. Shutemov wrote:
>> On Sat, Feb 02, 2013 at 02:50:44PM +0800, Li Zefan wrote:
>>> When an eventfd is closed, a wakeup with POLLHUP will be issued,
>>> but cgroup wants to issue wakeup explicitly, so when a cgroup is
>>> removed userspace can be notified.
>>>
>>> Signed-off-by: Li Zefan <lizefan@huawei.com>
> 
> Hm.. Looks like it will break eventfd semantics:
> 
> 1. One eventfd can be used for deliver more then one notification from
> one or more cgroups. POLLHUP on removing one of cgroups is not valid.
> 
> 2. It's valid to have eventfd opened only by one userspace application. We
> should not close it, just because cgroup is removed.
> 
> I think problem with multiple threads waiting an event on eventfd should
> be handled in userspace.
> 

I didn't realize this.. and if a cgroup is removed, the woken thread may not
be the thread that is waiting on this cgroup. How crappy.. I don't know how
userspace is going to deal with all these.

And another bug spotted. We can pass fd of memory.usage_in_bytes of cgroup A
to cgroup.event_control of cgroup B, and then we won't get memory usage
notification from A but B! What's worse, if A and B are in different mount
hierarchy, boom!


Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3d21adf..e496359 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3825,6 +3825,7 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft,
 				      const char *buffer)
 {
 	struct cgroup_event *event = NULL;
+	struct cgroup *cgrp_cfile;
 	unsigned int efd, cfd;
 	struct file *efile = NULL;
 	struct file *cfile = NULL;
@@ -3880,6 +3881,16 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft,
 		goto fail;
 	}
 
+	/*
+	 * The file to be monitored must be in the same cgroup as
+	 * cgroup.event_control is.
+	 */
+	cgrp_cfile = __d_cgrp(cfile->f_dentry->d_parent);
+	if (cgrp_cfile != cgrp) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
 	if (!event->cft->register_event || !event->cft->unregister_event) {
 		ret = -EINVAL;
 		goto fail;
-- 
1.8.0.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] eventfd: introduce eventfd_signal_hangup()
  2013-02-05  3:40       ` Li Zefan
@ 2013-02-05  8:28         ` Kirill A. Shutemov
  2013-02-06  1:48           ` Li Zefan
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2013-02-05  8:28 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On Tue, Feb 05, 2013 at 11:40:50AM +0800, Li Zefan wrote:
> On 2013/2/4 18:15, Kirill A. Shutemov wrote:
> > On Sat, Feb 02, 2013 at 05:58:58PM +0200, Kirill A. Shutemov wrote:
> >> On Sat, Feb 02, 2013 at 02:50:44PM +0800, Li Zefan wrote:
> >>> When an eventfd is closed, a wakeup with POLLHUP will be issued,
> >>> but cgroup wants to issue wakeup explicitly, so when a cgroup is
> >>> removed userspace can be notified.
> >>>
> >>> Signed-off-by: Li Zefan <lizefan@huawei.com>
> > 
> > Hm.. Looks like it will break eventfd semantics:
> > 
> > 1. One eventfd can be used for deliver more then one notification from
> > one or more cgroups. POLLHUP on removing one of cgroups is not valid.
> > 
> > 2. It's valid to have eventfd opened only by one userspace application. We
> > should not close it, just because cgroup is removed.
> > 
> > I think problem with multiple threads waiting an event on eventfd should
> > be handled in userspace.
> > 
> 
> I didn't realize this.. and if a cgroup is removed, the woken thread may not
> be the thread that is waiting on this cgroup.

Why? The only threads who read() or poll() the eventfd will be wake up,
won't they? Do you have a code sample to demonstrate the issue?

> How crappy.. I don't know how
> userspace is going to deal with all these.
> 
> And another bug spotted. We can pass fd of memory.usage_in_bytes of cgroup A
> to cgroup.event_control of cgroup B, and then we won't get memory usage
> notification from A but B! What's worse, if A and B are in different mount
> hierarchy, boom!

I think we can ignore which cgroup event_control is belong to, and just
use cgroup of cfile as base. It also means you can use one event_control fd
for registering events to different cgroups. It can be handy.

> Signed-off-by: Li Zefan <lizefan@huawei.com>
> ---
>  kernel/cgroup.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 3d21adf..e496359 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -3825,6 +3825,7 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft,
>  				      const char *buffer)
>  {
>  	struct cgroup_event *event = NULL;
> +	struct cgroup *cgrp_cfile;
>  	unsigned int efd, cfd;
>  	struct file *efile = NULL;
>  	struct file *cfile = NULL;
> @@ -3880,6 +3881,16 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft,
>  		goto fail;
>  	}
>  
> +	/*
> +	 * The file to be monitored must be in the same cgroup as
> +	 * cgroup.event_control is.
> +	 */
> +	cgrp_cfile = __d_cgrp(cfile->f_dentry->d_parent);
> +	if (cgrp_cfile != cgrp) {
> +		ret = -EINVAL;
> +		goto fail;
> +	}
> +
>  	if (!event->cft->register_event || !event->cft->unregister_event) {
>  		ret = -EINVAL;
>  		goto fail;
> -- 
> 1.8.0.2
> 
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] eventfd: introduce eventfd_signal_hangup()
  2013-02-05  8:28         ` Kirill A. Shutemov
@ 2013-02-06  1:48           ` Li Zefan
  2013-02-06 14:53             ` Kirill A. Shutemov
  0 siblings, 1 reply; 16+ messages in thread
From: Li Zefan @ 2013-02-06  1:48 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On 2013/2/5 16:28, Kirill A. Shutemov wrote:
> On Tue, Feb 05, 2013 at 11:40:50AM +0800, Li Zefan wrote:
>> On 2013/2/4 18:15, Kirill A. Shutemov wrote:
>>> On Sat, Feb 02, 2013 at 05:58:58PM +0200, Kirill A. Shutemov wrote:
>>>> On Sat, Feb 02, 2013 at 02:50:44PM +0800, Li Zefan wrote:
>>>>> When an eventfd is closed, a wakeup with POLLHUP will be issued,
>>>>> but cgroup wants to issue wakeup explicitly, so when a cgroup is
>>>>> removed userspace can be notified.
>>>>>
>>>>> Signed-off-by: Li Zefan <lizefan@huawei.com>
>>>
>>> Hm.. Looks like it will break eventfd semantics:
>>>
>>> 1. One eventfd can be used for deliver more then one notification from
>>> one or more cgroups. POLLHUP on removing one of cgroups is not valid.
>>>
>>> 2. It's valid to have eventfd opened only by one userspace application. We
>>> should not close it, just because cgroup is removed.
>>>
>>> I think problem with multiple threads waiting an event on eventfd should
>>> be handled in userspace.
>>>
>>
>> I didn't realize this.. and if a cgroup is removed, the woken thread may not
>> be the thread that is waiting on this cgroup.
> 
> Why? The only threads who read() or poll() the eventfd will be wake up,
> won't they? Do you have a code sample to demonstrate the issue?
> 

All the threads will be woken up, but one of them will consume the event counter,
and then all other threads will keep waiting.

>> How crappy.. I don't know how
>> userspace is going to deal with all these.
>>
>> And another bug spotted. We can pass fd of memory.usage_in_bytes of cgroup A
>> to cgroup.event_control of cgroup B, and then we won't get memory usage
>> notification from A but B! What's worse, if A and B are in different mount
>> hierarchy, boom!
> 
> I think we can ignore which cgroup event_control is belong to, and just
> use cgroup of cfile as base. It also means you can use one event_control fd
> for registering events to different cgroups. It can be handy.
> 

The most reasonal usage is, cgroup.event_control exists in the root cgroup only,
and it's used to register events to all cgroups. But I don't think we can
change the current interface that each cgroup has a cgroup.event_control, so
we'll restrict event registration as my patch does.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] eventfd: introduce eventfd_signal_hangup()
  2013-02-06  1:48           ` Li Zefan
@ 2013-02-06 14:53             ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2013-02-06 14:53 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, LKML, Cgroups, Davide Libenzi, Aaron Durbin, Greg Thelen

On Wed, Feb 06, 2013 at 09:48:20AM +0800, Li Zefan wrote:
> On 2013/2/5 16:28, Kirill A. Shutemov wrote:
> > On Tue, Feb 05, 2013 at 11:40:50AM +0800, Li Zefan wrote:
> >> On 2013/2/4 18:15, Kirill A. Shutemov wrote:
> >>> On Sat, Feb 02, 2013 at 05:58:58PM +0200, Kirill A. Shutemov wrote:
> >>>> On Sat, Feb 02, 2013 at 02:50:44PM +0800, Li Zefan wrote:
> >>>>> When an eventfd is closed, a wakeup with POLLHUP will be issued,
> >>>>> but cgroup wants to issue wakeup explicitly, so when a cgroup is
> >>>>> removed userspace can be notified.
> >>>>>
> >>>>> Signed-off-by: Li Zefan <lizefan@huawei.com>
> >>>
> >>> Hm.. Looks like it will break eventfd semantics:
> >>>
> >>> 1. One eventfd can be used for deliver more then one notification from
> >>> one or more cgroups. POLLHUP on removing one of cgroups is not valid.
> >>>
> >>> 2. It's valid to have eventfd opened only by one userspace application. We
> >>> should not close it, just because cgroup is removed.
> >>>
> >>> I think problem with multiple threads waiting an event on eventfd should
> >>> be handled in userspace.
> >>>
> >>
> >> I didn't realize this.. and if a cgroup is removed, the woken thread may not
> >> be the thread that is waiting on this cgroup.
> > 
> > Why? The only threads who read() or poll() the eventfd will be wake up,
> > won't they? Do you have a code sample to demonstrate the issue?
> > 
> 
> All the threads will be woken up, but one of them will consume the event counter,
> and then all other threads will keep waiting.

The woken up thread can identify that cgroup had disappeared and notify
other threads (probably with the same eventfd).

For me it looks like userspace mess rather then kernel issue.

> >> How crappy.. I don't know how
> >> userspace is going to deal with all these.
> >>
> >> And another bug spotted. We can pass fd of memory.usage_in_bytes of cgroup A
> >> to cgroup.event_control of cgroup B, and then we won't get memory usage
> >> notification from A but B! What's worse, if A and B are in different mount
> >> hierarchy, boom!
> > 
> > I think we can ignore which cgroup event_control is belong to, and just
> > use cgroup of cfile as base. It also means you can use one event_control fd
> > for registering events to different cgroups. It can be handy.
> > 
> 
> The most reasonal usage is, cgroup.event_control exists in the root cgroup only,
> and it's used to register events to all cgroups. But I don't think we can
> change the current interface that each cgroup has a cgroup.event_control, so
> we'll restrict event registration as my patch does.

Looks good to me.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-02-06 14:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-02  6:50 [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
2013-02-02  6:50 ` [PATCH 1/4] eventfd: introduce eventfd_signal_hangup() Li Zefan
2013-02-02 15:58   ` Kirill A. Shutemov
2013-02-04 10:15     ` Kirill A. Shutemov
2013-02-05  3:40       ` Li Zefan
2013-02-05  8:28         ` Kirill A. Shutemov
2013-02-06  1:48           ` Li Zefan
2013-02-06 14:53             ` Kirill A. Shutemov
2013-02-02  6:51 ` [PATCH 2/4] cgroup: fix cgroup_rmdir() vs close(eventfd) race Li Zefan
2013-02-02 15:59   ` Kirill A. Shutemov
2013-02-02  6:51 ` [PATCH 3/4] eventfd: make operations on eventfd return -EIDRM if it's hung up Li Zefan
2013-02-02 16:12   ` Kirill A. Shutemov
2013-02-04  3:15     ` Li Zefan
2013-02-02  6:51 ` [PATCH 4/4] cgroup: adapt to the new way of detecting cgroup removal Li Zefan
2013-02-02  6:59 ` [PATCH 0/4] cgroup: bug fixes for eventfd Li Zefan
2013-02-04 19:27 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).