[PATCH RFC v0 0/3] cgroup notifications API and memory thresholds

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC v0 0/3] cgroup notifications API and memory thresholds
@ 2009-11-26 17:11 ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

It's my first attempt to implement cgroup notifications API and memory
thresholds on top of it. The idea of API was proposed by Paul Menage.

It lacks some important features and need more testing, but I want publish
it as soon as possible to get feedback from community.

TODO:
 - memory thresholds on root cgroup;
 - memsw support;
 - documentation.

Kirill A. Shutemov (3):
  cgroup: implement eventfd-based generic API for notifications
  res_counter: implement thresholds
  memcg: implement memory thresholds

 include/linux/cgroup.h      |    8 ++
 include/linux/res_counter.h |   44 +++++++++++
 kernel/cgroup.c             |  181 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/res_counter.c        |    4 +
 mm/memcontrol.c             |  149 +++++++++++++++++++++++++++++++++++
 5 files changed, 385 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 0/3] cgroup notifications API and memory thresholds
@ 2009-11-26 17:11 ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

It's my first attempt to implement cgroup notifications API and memory
thresholds on top of it. The idea of API was proposed by Paul Menage.

It lacks some important features and need more testing, but I want publish
it as soon as possible to get feedback from community.

TODO:
 - memory thresholds on root cgroup;
 - memsw support;
 - documentation.

Kirill A. Shutemov (3):
  cgroup: implement eventfd-based generic API for notifications
  res_counter: implement thresholds
  memcg: implement memory thresholds

 include/linux/cgroup.h      |    8 ++
 include/linux/res_counter.h |   44 +++++++++++
 kernel/cgroup.c             |  181 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/res_counter.c        |    4 +
 mm/memcontrol.c             |  149 +++++++++++++++++++++++++++++++++++
 5 files changed, 385 insertions(+), 1 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 1/3] cgroup: implement eventfd-based generic API for notifications
       [not found] ` <cover.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2009-11-26 17:11   ` Kirill A. Shutemov
       [not found]   ` <bc4dc055a7307c8667da85a4d4d9d5d189af27d5.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Menage, Balbir Singh,
	Andrew Morton, Pavel Emelyanov

This patch introduces write-only file "cgroup.event_control" in every
cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
  unregister_event() must be defined for the control file;
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
  Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

Signed-off-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
---
 include/linux/cgroup.h |    8 ++
 kernel/cgroup.c        |  181 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 1 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..285eaff 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -220,6 +220,9 @@ struct cgroup {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+
+	struct list_head event_list;
+	struct mutex event_list_mutex;
 };
 
 /*
@@ -362,6 +365,11 @@ struct cftype {
 	int (*trigger)(struct cgroup *cgrp, unsigned int event);
 
 	int (*release)(struct inode *inode, struct file *file);
+
+	int (*register_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd, const char *args);
+	int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd);
 };
 
 struct cgroup_scanner {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..5438d46 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4,6 +4,10 @@
  *  Based originally on the cpuset system, extracted by Paul Menage
  *  Copyright (C) 2006 Google, Inc
  *
+ *  Notifiactions support
+ *  Copyright (C) 2009 Nokia Corporation
+ *  Author: Kirill A. Shutemov
+ *
  *  Copyright notices from the original cpuset code:
  *  --------------------------------------------------
  *  Copyright (C) 2003 BULL SA.
@@ -51,6 +55,8 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
+#include <linux/eventfd.h>
+#include <linux/poll.h>
 
 #include <asm/atomic.h>
 
@@ -146,6 +152,16 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
+struct cgroup_event {
+	struct cgroup *cgrp;
+	struct cftype *cft;
+	struct eventfd_ctx *eventfd;
+	struct list_head list;
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+};
+static int cgroup_event_remove(struct cgroup_event *event);
 
 /* The list of hierarchy roots */
 
@@ -734,14 +750,26 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
+	struct cgroup_event *event, *tmp;
 	int ret = 0;
 
 	for_each_subsys(cgrp->root, ss)
 		if (ss->pre_destroy) {
 			ret = ss->pre_destroy(ss, cgrp);
 			if (ret)
-				break;
+				goto out;
 		}
+
+	mutex_lock(&cgrp->event_list_mutex);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		ret = cgroup_event_remove(event);
+		if (ret)
+			break;
+		eventfd_signal(event->eventfd, 1);
+	}
+	mutex_unlock(&cgrp->event_list_mutex);
+
+out:
 	return ret;
 }
 
@@ -1136,6 +1164,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->release_list);
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
+	INIT_LIST_HEAD(&cgrp->event_list);
+	mutex_init(&cgrp->event_list_mutex);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1935,6 +1965,13 @@ static const struct inode_operations cgroup_dir_inode_operations = {
 	.rename = cgroup_rename,
 };
 
+static inline struct cftype *__file_cft(struct file *file)
+{
+	if (file->f_dentry->d_inode->i_fop != &cgroup_file_operations)
+		return ERR_PTR(-EINVAL);
+	return __d_cft(file->f_dentry);
+}
+
 static int cgroup_create_file(struct dentry *dentry, mode_t mode,
 				struct super_block *sb)
 {
@@ -2789,6 +2826,143 @@ static int cgroup_write_notify_on_release(struct cgroup *cgrp,
 	return 0;
 }
 
+static int cgroup_event_remove(struct cgroup_event *event)
+{
+	struct cgroup *cgrp = event->cgrp;
+	int ret;
+
+	BUG_ON(!mutex_is_locked(&cgrp->event_list_mutex));
+	ret = event->cft->unregister_event(cgrp, event->cft, event->eventfd);
+	eventfd_ctx_put(event->eventfd);
+	remove_wait_queue(event->wqh, &event->wait);
+	list_del(&event->list);
+	kfree(event);
+
+	return ret;
+}
+
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->cgrp;
+	unsigned long flags = (unsigned long)key;
+	int ret;
+
+	if (!(flags & POLLHUP))
+		return 0;
+
+	mutex_lock(&cgrp->event_list_mutex);
+	ret = cgroup_event_remove(event);
+	mutex_unlock(&cgrp->event_list_mutex);
+
+	return ret;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+static int cgroup_write_event_control(struct cgroup *cont, struct cftype *cft,
+				      const char *buffer)
+{
+	struct cgroup_event *event = NULL;
+	unsigned int efd, cfd;
+	struct file *efile = NULL;
+	struct file *cfile = NULL;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+	event->cgrp = cont;
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto fail;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto fail;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto fail;
+	}
+
+	ret = file_permission(cfile, MAY_READ);
+	if (ret < 0)
+		goto fail;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto fail;
+	}
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
+	ret = event->cft->register_event(cont, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto fail;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	mutex_lock(&cont->event_list_mutex);
+	list_add(&event->list, &cont->event_list);
+	mutex_unlock(&cont->event_list_mutex);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+fail:
+	if (!IS_ERR(cfile))
+		fput(cfile);
+
+	if (event && event->eventfd && !IS_ERR(event->eventfd))
+		eventfd_ctx_put(event->eventfd);
+
+	if (!IS_ERR(efile))
+		fput(efile);
+
+	if (event)
+		kfree(event);
+
+	return ret;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
@@ -2814,6 +2988,11 @@ static struct cftype files[] = {
 		.read_u64 = cgroup_read_notify_on_release,
 		.write_u64 = cgroup_write_notify_on_release,
 	},
+	{
+		.name = CGROUP_FILE_GENERIC_PREFIX "event_control",
+		.write_string = cgroup_write_event_control,
+		.mode = S_IWUGO,
+	},
 };
 
 static struct cftype cft_release_agent = {
-- 
1.6.5.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 1/3] cgroup: implement eventfd-based generic API for notifications
  2009-11-26 17:11 ` Kirill A. Shutemov
@ 2009-11-26 17:11   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

This patch introduces write-only file "cgroup.event_control" in every
cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
  unregister_event() must be defined for the control file;
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
  Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 include/linux/cgroup.h |    8 ++
 kernel/cgroup.c        |  181 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 1 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..285eaff 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -220,6 +220,9 @@ struct cgroup {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+
+	struct list_head event_list;
+	struct mutex event_list_mutex;
 };
 
 /*
@@ -362,6 +365,11 @@ struct cftype {
 	int (*trigger)(struct cgroup *cgrp, unsigned int event);
 
 	int (*release)(struct inode *inode, struct file *file);
+
+	int (*register_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd, const char *args);
+	int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd);
 };
 
 struct cgroup_scanner {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..5438d46 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4,6 +4,10 @@
  *  Based originally on the cpuset system, extracted by Paul Menage
  *  Copyright (C) 2006 Google, Inc
  *
+ *  Notifiactions support
+ *  Copyright (C) 2009 Nokia Corporation
+ *  Author: Kirill A. Shutemov
+ *
  *  Copyright notices from the original cpuset code:
  *  --------------------------------------------------
  *  Copyright (C) 2003 BULL SA.
@@ -51,6 +55,8 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
+#include <linux/eventfd.h>
+#include <linux/poll.h>
 
 #include <asm/atomic.h>
 
@@ -146,6 +152,16 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
+struct cgroup_event {
+	struct cgroup *cgrp;
+	struct cftype *cft;
+	struct eventfd_ctx *eventfd;
+	struct list_head list;
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+};
+static int cgroup_event_remove(struct cgroup_event *event);
 
 /* The list of hierarchy roots */
 
@@ -734,14 +750,26 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
+	struct cgroup_event *event, *tmp;
 	int ret = 0;
 
 	for_each_subsys(cgrp->root, ss)
 		if (ss->pre_destroy) {
 			ret = ss->pre_destroy(ss, cgrp);
 			if (ret)
-				break;
+				goto out;
 		}
+
+	mutex_lock(&cgrp->event_list_mutex);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		ret = cgroup_event_remove(event);
+		if (ret)
+			break;
+		eventfd_signal(event->eventfd, 1);
+	}
+	mutex_unlock(&cgrp->event_list_mutex);
+
+out:
 	return ret;
 }
 
@@ -1136,6 +1164,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->release_list);
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
+	INIT_LIST_HEAD(&cgrp->event_list);
+	mutex_init(&cgrp->event_list_mutex);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1935,6 +1965,13 @@ static const struct inode_operations cgroup_dir_inode_operations = {
 	.rename = cgroup_rename,
 };
 
+static inline struct cftype *__file_cft(struct file *file)
+{
+	if (file->f_dentry->d_inode->i_fop != &cgroup_file_operations)
+		return ERR_PTR(-EINVAL);
+	return __d_cft(file->f_dentry);
+}
+
 static int cgroup_create_file(struct dentry *dentry, mode_t mode,
 				struct super_block *sb)
 {
@@ -2789,6 +2826,143 @@ static int cgroup_write_notify_on_release(struct cgroup *cgrp,
 	return 0;
 }
 
+static int cgroup_event_remove(struct cgroup_event *event)
+{
+	struct cgroup *cgrp = event->cgrp;
+	int ret;
+
+	BUG_ON(!mutex_is_locked(&cgrp->event_list_mutex));
+	ret = event->cft->unregister_event(cgrp, event->cft, event->eventfd);
+	eventfd_ctx_put(event->eventfd);
+	remove_wait_queue(event->wqh, &event->wait);
+	list_del(&event->list);
+	kfree(event);
+
+	return ret;
+}
+
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->cgrp;
+	unsigned long flags = (unsigned long)key;
+	int ret;
+
+	if (!(flags & POLLHUP))
+		return 0;
+
+	mutex_lock(&cgrp->event_list_mutex);
+	ret = cgroup_event_remove(event);
+	mutex_unlock(&cgrp->event_list_mutex);
+
+	return ret;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+static int cgroup_write_event_control(struct cgroup *cont, struct cftype *cft,
+				      const char *buffer)
+{
+	struct cgroup_event *event = NULL;
+	unsigned int efd, cfd;
+	struct file *efile = NULL;
+	struct file *cfile = NULL;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+	event->cgrp = cont;
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto fail;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto fail;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto fail;
+	}
+
+	ret = file_permission(cfile, MAY_READ);
+	if (ret < 0)
+		goto fail;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto fail;
+	}
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
+	ret = event->cft->register_event(cont, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto fail;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	mutex_lock(&cont->event_list_mutex);
+	list_add(&event->list, &cont->event_list);
+	mutex_unlock(&cont->event_list_mutex);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+fail:
+	if (!IS_ERR(cfile))
+		fput(cfile);
+
+	if (event && event->eventfd && !IS_ERR(event->eventfd))
+		eventfd_ctx_put(event->eventfd);
+
+	if (!IS_ERR(efile))
+		fput(efile);
+
+	if (event)
+		kfree(event);
+
+	return ret;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
@@ -2814,6 +2988,11 @@ static struct cftype files[] = {
 		.read_u64 = cgroup_read_notify_on_release,
 		.write_u64 = cgroup_write_notify_on_release,
 	},
+	{
+		.name = CGROUP_FILE_GENERIC_PREFIX "event_control",
+		.write_string = cgroup_write_event_control,
+		.mode = S_IWUGO,
+	},
 };
 
 static struct cftype cft_release_agent = {
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 1/3] cgroup: implement eventfd-based generic API for notifications
@ 2009-11-26 17:11   ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

This patch introduces write-only file "cgroup.event_control" in every
cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
  unregister_event() must be defined for the control file;
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
  Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 include/linux/cgroup.h |    8 ++
 kernel/cgroup.c        |  181 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 1 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..285eaff 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -220,6 +220,9 @@ struct cgroup {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+
+	struct list_head event_list;
+	struct mutex event_list_mutex;
 };
 
 /*
@@ -362,6 +365,11 @@ struct cftype {
 	int (*trigger)(struct cgroup *cgrp, unsigned int event);
 
 	int (*release)(struct inode *inode, struct file *file);
+
+	int (*register_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd, const char *args);
+	int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd);
 };
 
 struct cgroup_scanner {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..5438d46 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4,6 +4,10 @@
  *  Based originally on the cpuset system, extracted by Paul Menage
  *  Copyright (C) 2006 Google, Inc
  *
+ *  Notifiactions support
+ *  Copyright (C) 2009 Nokia Corporation
+ *  Author: Kirill A. Shutemov
+ *
  *  Copyright notices from the original cpuset code:
  *  --------------------------------------------------
  *  Copyright (C) 2003 BULL SA.
@@ -51,6 +55,8 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
+#include <linux/eventfd.h>
+#include <linux/poll.h>
 
 #include <asm/atomic.h>
 
@@ -146,6 +152,16 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
+struct cgroup_event {
+	struct cgroup *cgrp;
+	struct cftype *cft;
+	struct eventfd_ctx *eventfd;
+	struct list_head list;
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+};
+static int cgroup_event_remove(struct cgroup_event *event);
 
 /* The list of hierarchy roots */
 
@@ -734,14 +750,26 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
+	struct cgroup_event *event, *tmp;
 	int ret = 0;
 
 	for_each_subsys(cgrp->root, ss)
 		if (ss->pre_destroy) {
 			ret = ss->pre_destroy(ss, cgrp);
 			if (ret)
-				break;
+				goto out;
 		}
+
+	mutex_lock(&cgrp->event_list_mutex);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		ret = cgroup_event_remove(event);
+		if (ret)
+			break;
+		eventfd_signal(event->eventfd, 1);
+	}
+	mutex_unlock(&cgrp->event_list_mutex);
+
+out:
 	return ret;
 }
 
@@ -1136,6 +1164,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->release_list);
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
+	INIT_LIST_HEAD(&cgrp->event_list);
+	mutex_init(&cgrp->event_list_mutex);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1935,6 +1965,13 @@ static const struct inode_operations cgroup_dir_inode_operations = {
 	.rename = cgroup_rename,
 };
 
+static inline struct cftype *__file_cft(struct file *file)
+{
+	if (file->f_dentry->d_inode->i_fop != &cgroup_file_operations)
+		return ERR_PTR(-EINVAL);
+	return __d_cft(file->f_dentry);
+}
+
 static int cgroup_create_file(struct dentry *dentry, mode_t mode,
 				struct super_block *sb)
 {
@@ -2789,6 +2826,143 @@ static int cgroup_write_notify_on_release(struct cgroup *cgrp,
 	return 0;
 }
 
+static int cgroup_event_remove(struct cgroup_event *event)
+{
+	struct cgroup *cgrp = event->cgrp;
+	int ret;
+
+	BUG_ON(!mutex_is_locked(&cgrp->event_list_mutex));
+	ret = event->cft->unregister_event(cgrp, event->cft, event->eventfd);
+	eventfd_ctx_put(event->eventfd);
+	remove_wait_queue(event->wqh, &event->wait);
+	list_del(&event->list);
+	kfree(event);
+
+	return ret;
+}
+
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->cgrp;
+	unsigned long flags = (unsigned long)key;
+	int ret;
+
+	if (!(flags & POLLHUP))
+		return 0;
+
+	mutex_lock(&cgrp->event_list_mutex);
+	ret = cgroup_event_remove(event);
+	mutex_unlock(&cgrp->event_list_mutex);
+
+	return ret;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+static int cgroup_write_event_control(struct cgroup *cont, struct cftype *cft,
+				      const char *buffer)
+{
+	struct cgroup_event *event = NULL;
+	unsigned int efd, cfd;
+	struct file *efile = NULL;
+	struct file *cfile = NULL;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+	event->cgrp = cont;
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto fail;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto fail;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto fail;
+	}
+
+	ret = file_permission(cfile, MAY_READ);
+	if (ret < 0)
+		goto fail;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto fail;
+	}
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
+	ret = event->cft->register_event(cont, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto fail;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	mutex_lock(&cont->event_list_mutex);
+	list_add(&event->list, &cont->event_list);
+	mutex_unlock(&cont->event_list_mutex);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+fail:
+	if (!IS_ERR(cfile))
+		fput(cfile);
+
+	if (event && event->eventfd && !IS_ERR(event->eventfd))
+		eventfd_ctx_put(event->eventfd);
+
+	if (!IS_ERR(efile))
+		fput(efile);
+
+	if (event)
+		kfree(event);
+
+	return ret;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
@@ -2814,6 +2988,11 @@ static struct cftype files[] = {
 		.read_u64 = cgroup_read_notify_on_release,
 		.write_u64 = cgroup_write_notify_on_release,
 	},
+	{
+		.name = CGROUP_FILE_GENERIC_PREFIX "event_control",
+		.write_string = cgroup_write_event_control,
+		.mode = S_IWUGO,
+	},
 };
 
 static struct cftype cft_release_agent = {
-- 
1.6.5.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 2/3] res_counter: implement thresholds
       [not found]   ` <bc4dc055a7307c8667da85a4d4d9d5d189af27d5.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2009-11-26 17:11     ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Menage, Balbir Singh,
	Andrew Morton, Pavel Emelyanov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
---
 include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
 kernel/res_counter.c        |    4 +++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index fcb9884..bca99a5 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -9,6 +9,10 @@
  *
  * Author: Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  *
+ * Thresholds support
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * See Documentation/cgroups/resource_counter.txt for more
  * info about what this counter is.
  */
@@ -42,6 +46,13 @@ struct res_counter {
 	 * the number of unsuccessful attempts to consume the resource
 	 */
 	unsigned long long failcnt;
+
+	unsigned long long threshold_above;
+	unsigned long long threshold_below;
+	void (*threshold_notifier)(struct res_counter *counter,
+			unsigned long long usage,
+			unsigned long long threshold);
+
 	/*
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
@@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
 	return false;
 }
 
+static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
+{
+	if (cnt->usage >= cnt->threshold_above) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
+		return;
+	}
+
+	if (cnt->usage < cnt->threshold_below) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
+		return;
+	}
+}
+
+
 /**
  * Get the difference between the usage and the soft limit
  * @cnt: The counter
@@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
 	return 0;
 }
 
+static inline int
+res_counter_set_thresholds(struct res_counter *cnt,
+		unsigned long long threshold_above,
+		unsigned long long threshold_below)
+{
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if ((cnt->usage < threshold_above) &&
+			(cnt->usage >= threshold_below)) {
+		cnt->threshold_above = threshold_above;
+		cnt->threshold_below = threshold_below;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 #endif
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index bcdabf3..646c29c 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
 	spin_lock_init(&counter->lock);
 	counter->limit = RESOURCE_MAX;
 	counter->soft_limit = RESOURCE_MAX;
+	counter->threshold_above = RESOURCE_MAX;
+	counter->threshold_below = 0ULL;
 	counter->parent = parent;
 }
 
@@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 	counter->usage += val;
 	if (counter->usage > counter->max_usage)
 		counter->max_usage = counter->usage;
+	res_counter_threshold_notify_locked(counter);
 	return 0;
 }
 
@@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
 		val = counter->usage;
 
 	counter->usage -= val;
+	res_counter_threshold_notify_locked(counter);
 }
 
 void res_counter_uncharge(struct res_counter *counter, unsigned long val)
-- 
1.6.5.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 2/3] res_counter: implement thresholds
  2009-11-26 17:11   ` Kirill A. Shutemov
@ 2009-11-26 17:11     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
 kernel/res_counter.c        |    4 +++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index fcb9884..bca99a5 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -9,6 +9,10 @@
  *
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Thresholds support
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * See Documentation/cgroups/resource_counter.txt for more
  * info about what this counter is.
  */
@@ -42,6 +46,13 @@ struct res_counter {
 	 * the number of unsuccessful attempts to consume the resource
 	 */
 	unsigned long long failcnt;
+
+	unsigned long long threshold_above;
+	unsigned long long threshold_below;
+	void (*threshold_notifier)(struct res_counter *counter,
+			unsigned long long usage,
+			unsigned long long threshold);
+
 	/*
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
@@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
 	return false;
 }
 
+static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
+{
+	if (cnt->usage >= cnt->threshold_above) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
+		return;
+	}
+
+	if (cnt->usage < cnt->threshold_below) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
+		return;
+	}
+}
+
+
 /**
  * Get the difference between the usage and the soft limit
  * @cnt: The counter
@@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
 	return 0;
 }
 
+static inline int
+res_counter_set_thresholds(struct res_counter *cnt,
+		unsigned long long threshold_above,
+		unsigned long long threshold_below)
+{
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if ((cnt->usage < threshold_above) &&
+			(cnt->usage >= threshold_below)) {
+		cnt->threshold_above = threshold_above;
+		cnt->threshold_below = threshold_below;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 #endif
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index bcdabf3..646c29c 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
 	spin_lock_init(&counter->lock);
 	counter->limit = RESOURCE_MAX;
 	counter->soft_limit = RESOURCE_MAX;
+	counter->threshold_above = RESOURCE_MAX;
+	counter->threshold_below = 0ULL;
 	counter->parent = parent;
 }
 
@@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 	counter->usage += val;
 	if (counter->usage > counter->max_usage)
 		counter->max_usage = counter->usage;
+	res_counter_threshold_notify_locked(counter);
 	return 0;
 }
 
@@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
 		val = counter->usage;
 
 	counter->usage -= val;
+	res_counter_threshold_notify_locked(counter);
 }
 
 void res_counter_uncharge(struct res_counter *counter, unsigned long val)
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 2/3] res_counter: implement thresholds
@ 2009-11-26 17:11     ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
 kernel/res_counter.c        |    4 +++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index fcb9884..bca99a5 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -9,6 +9,10 @@
  *
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Thresholds support
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * See Documentation/cgroups/resource_counter.txt for more
  * info about what this counter is.
  */
@@ -42,6 +46,13 @@ struct res_counter {
 	 * the number of unsuccessful attempts to consume the resource
 	 */
 	unsigned long long failcnt;
+
+	unsigned long long threshold_above;
+	unsigned long long threshold_below;
+	void (*threshold_notifier)(struct res_counter *counter,
+			unsigned long long usage,
+			unsigned long long threshold);
+
 	/*
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
@@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
 	return false;
 }
 
+static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
+{
+	if (cnt->usage >= cnt->threshold_above) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
+		return;
+	}
+
+	if (cnt->usage < cnt->threshold_below) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
+		return;
+	}
+}
+
+
 /**
  * Get the difference between the usage and the soft limit
  * @cnt: The counter
@@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
 	return 0;
 }
 
+static inline int
+res_counter_set_thresholds(struct res_counter *cnt,
+		unsigned long long threshold_above,
+		unsigned long long threshold_below)
+{
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if ((cnt->usage < threshold_above) &&
+			(cnt->usage >= threshold_below)) {
+		cnt->threshold_above = threshold_above;
+		cnt->threshold_below = threshold_below;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 #endif
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index bcdabf3..646c29c 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
 	spin_lock_init(&counter->lock);
 	counter->limit = RESOURCE_MAX;
 	counter->soft_limit = RESOURCE_MAX;
+	counter->threshold_above = RESOURCE_MAX;
+	counter->threshold_below = 0ULL;
 	counter->parent = parent;
 }
 
@@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 	counter->usage += val;
 	if (counter->usage > counter->max_usage)
 		counter->max_usage = counter->usage;
+	res_counter_threshold_notify_locked(counter);
 	return 0;
 }
 
@@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
 		val = counter->usage;
 
 	counter->usage -= val;
+	res_counter_threshold_notify_locked(counter);
 }
 
 void res_counter_uncharge(struct res_counter *counter, unsigned long val)
-- 
1.6.5.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 3/3] memcg: implement memory thresholds
       [not found]   ` <8524ba285f6dd59cda939c28da523f344cdab3da.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2009-11-26 17:11     ` Kirill A. Shutemov
  2009-11-27  0:20     ` [PATCH RFC v0 2/3] res_counter: implement thresholds Daisuke Nishimura
  1 sibling, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Menage, Balbir Singh,
	Andrew Morton, Pavel Emelyanov

It allows to register multiple memory thresholds and gets notifications
when it crosses.

To register a threshold application need:
- create an eventfd;
- open file memory.usage_in_bytes of a cgroup
- write string "<event_fd> <memory.usage_in_bytes> <threshold>" to
  cgroup.event_control.

Application will be notified through eventfd when memory usage crosses
threshold in any direction.

Signed-off-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
---
 mm/memcontrol.c |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 149 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f99f599..af1af0b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6,6 +6,10 @@
  * Copyright 2007 OpenVZ SWsoft Inc
  * Author: Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  *
+ * Memory thresholds
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -38,6 +42,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
+#include <linux/eventfd.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -174,6 +179,12 @@ struct mem_cgroup_tree {
 
 static struct mem_cgroup_tree soft_limit_tree __read_mostly;
 
+struct mem_cgroup_threshold {
+	struct list_head list;
+	struct eventfd_ctx *eventfd;
+	u64 threshold;
+};
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -225,6 +236,9 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	struct list_head thresholds;
+	struct mem_cgroup_threshold *current_threshold;
+
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -2839,12 +2853,119 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static inline void mem_cgroup_set_thresholds(struct res_counter *counter,
+		u64 above, u64 below)
+{
+	BUG_ON(res_counter_set_thresholds(counter, above, below));
+}
+
+static void mem_cgroup_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter,
+			struct mem_cgroup,res);
+	struct mem_cgroup_threshold *above, *below;
+
+	above = below = memcg->current_threshold;
+
+	if (threshold <= usage) {
+		list_for_each_entry_continue(above, &memcg->thresholds,
+				list) {
+			if (above->threshold > usage)
+				break;
+			below = above;
+			eventfd_signal(below->eventfd, 1);
+		}
+	} else {
+		list_for_each_entry_continue_reverse(below,
+				&memcg->thresholds, list) {
+			eventfd_signal(above->eventfd, 1);
+			if (below->threshold <= usage)
+				break;
+			above = below;
+		}
+	}
+
+	mem_cgroup_set_thresholds(&memcg->res, above->threshold,
+			below->threshold);
+	memcg->current_threshold = below;
+}
+
+static void mem_cgroup_invalidate_thresholds(struct cgroup *cgrp)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct mem_cgroup_threshold *tmp, *prev = NULL;
+	u64 usage = memcg->res.usage;
+
+	list_for_each_entry(tmp, &memcg->thresholds, list) {
+		if (tmp->threshold > usage) {
+			BUG_ON(!prev);
+			memcg->current_threshold = prev;
+			break;
+		}
+		prev = tmp;
+	}
+
+	mem_cgroup_set_thresholds(&memcg->res, tmp->threshold,
+			prev->threshold);
+}
+
+static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd, const char *args)
+{
+	u64 threshold;
+	struct mem_cgroup_threshold *new, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	int ret;
+
+	/* TODO: Root cgroup is a special case */
+	if (mem_cgroup_is_root(memcg))
+		return -ENOSYS;
+
+	ret = res_counter_memparse_write_strategy(args, &threshold);
+	if (ret)
+		return ret;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->eventfd = eventfd;
+	new->threshold = threshold;
+
+	list_for_each_entry(tmp, &memcg->thresholds, list)
+		if (new->threshold < tmp->threshold) {
+			list_add_tail(&new->list, &tmp->list);
+			break;
+		}
+	mem_cgroup_invalidate_thresholds(cgrp);
+
+	return 0;
+}
+
+static int mem_cgroup_unregister_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd)
+{
+	struct mem_cgroup_threshold *threshold, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+
+	list_for_each_entry_safe(threshold, tmp, &memcg->thresholds, list)
+		if (threshold->eventfd == eventfd) {
+			list_del(&threshold->list);
+			kfree(threshold);
+		}
+	mem_cgroup_invalidate_thresholds(cgrp);
+
+	return 0;
+}
 
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "max_usage_in_bytes",
@@ -3080,6 +3201,32 @@ static int mem_cgroup_soft_limit_tree_init(void)
 	return 0;
 }
 
+static int mem_cgroup_thresholds_init(struct mem_cgroup *mem)
+{
+	struct mem_cgroup_threshold *new;
+
+	mem->res.threshold_notifier = mem_cgroup_threshold;
+	INIT_LIST_HEAD(&mem->thresholds);
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = 0ULL;
+	list_add(&new->list, &mem->thresholds);
+
+	mem->current_threshold = new;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = RESOURCE_MAX;
+	list_add_tail(&new->list, &mem->thresholds);
+
+	return 0;
+}
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
@@ -3125,6 +3272,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	mem->last_scanned_child = 0;
 	spin_lock_init(&mem->reclaim_param_lock);
 
+	mem_cgroup_thresholds_init(mem);
+
 	if (parent)
 		mem->swappiness = get_swappiness(parent);
 	atomic_set(&mem->refcnt, 1);
-- 
1.6.5.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 3/3] memcg: implement memory thresholds
  2009-11-26 17:11     ` Kirill A. Shutemov
@ 2009-11-26 17:11       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

It allows to register multiple memory thresholds and gets notifications
when it crosses.

To register a threshold application need:
- create an eventfd;
- open file memory.usage_in_bytes of a cgroup
- write string "<event_fd> <memory.usage_in_bytes> <threshold>" to
  cgroup.event_control.

Application will be notified through eventfd when memory usage crosses
threshold in any direction.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 mm/memcontrol.c |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 149 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f99f599..af1af0b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6,6 +6,10 @@
  * Copyright 2007 OpenVZ SWsoft Inc
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Memory thresholds
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -38,6 +42,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
+#include <linux/eventfd.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -174,6 +179,12 @@ struct mem_cgroup_tree {
 
 static struct mem_cgroup_tree soft_limit_tree __read_mostly;
 
+struct mem_cgroup_threshold {
+	struct list_head list;
+	struct eventfd_ctx *eventfd;
+	u64 threshold;
+};
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -225,6 +236,9 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	struct list_head thresholds;
+	struct mem_cgroup_threshold *current_threshold;
+
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -2839,12 +2853,119 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static inline void mem_cgroup_set_thresholds(struct res_counter *counter,
+		u64 above, u64 below)
+{
+	BUG_ON(res_counter_set_thresholds(counter, above, below));
+}
+
+static void mem_cgroup_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter,
+			struct mem_cgroup,res);
+	struct mem_cgroup_threshold *above, *below;
+
+	above = below = memcg->current_threshold;
+
+	if (threshold <= usage) {
+		list_for_each_entry_continue(above, &memcg->thresholds,
+				list) {
+			if (above->threshold > usage)
+				break;
+			below = above;
+			eventfd_signal(below->eventfd, 1);
+		}
+	} else {
+		list_for_each_entry_continue_reverse(below,
+				&memcg->thresholds, list) {
+			eventfd_signal(above->eventfd, 1);
+			if (below->threshold <= usage)
+				break;
+			above = below;
+		}
+	}
+
+	mem_cgroup_set_thresholds(&memcg->res, above->threshold,
+			below->threshold);
+	memcg->current_threshold = below;
+}
+
+static void mem_cgroup_invalidate_thresholds(struct cgroup *cgrp)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct mem_cgroup_threshold *tmp, *prev = NULL;
+	u64 usage = memcg->res.usage;
+
+	list_for_each_entry(tmp, &memcg->thresholds, list) {
+		if (tmp->threshold > usage) {
+			BUG_ON(!prev);
+			memcg->current_threshold = prev;
+			break;
+		}
+		prev = tmp;
+	}
+
+	mem_cgroup_set_thresholds(&memcg->res, tmp->threshold,
+			prev->threshold);
+}
+
+static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd, const char *args)
+{
+	u64 threshold;
+	struct mem_cgroup_threshold *new, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	int ret;
+
+	/* TODO: Root cgroup is a special case */
+	if (mem_cgroup_is_root(memcg))
+		return -ENOSYS;
+
+	ret = res_counter_memparse_write_strategy(args, &threshold);
+	if (ret)
+		return ret;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->eventfd = eventfd;
+	new->threshold = threshold;
+
+	list_for_each_entry(tmp, &memcg->thresholds, list)
+		if (new->threshold < tmp->threshold) {
+			list_add_tail(&new->list, &tmp->list);
+			break;
+		}
+	mem_cgroup_invalidate_thresholds(cgrp);
+
+	return 0;
+}
+
+static int mem_cgroup_unregister_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd)
+{
+	struct mem_cgroup_threshold *threshold, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+
+	list_for_each_entry_safe(threshold, tmp, &memcg->thresholds, list)
+		if (threshold->eventfd == eventfd) {
+			list_del(&threshold->list);
+			kfree(threshold);
+		}
+	mem_cgroup_invalidate_thresholds(cgrp);
+
+	return 0;
+}
 
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "max_usage_in_bytes",
@@ -3080,6 +3201,32 @@ static int mem_cgroup_soft_limit_tree_init(void)
 	return 0;
 }
 
+static int mem_cgroup_thresholds_init(struct mem_cgroup *mem)
+{
+	struct mem_cgroup_threshold *new;
+
+	mem->res.threshold_notifier = mem_cgroup_threshold;
+	INIT_LIST_HEAD(&mem->thresholds);
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = 0ULL;
+	list_add(&new->list, &mem->thresholds);
+
+	mem->current_threshold = new;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = RESOURCE_MAX;
+	list_add_tail(&new->list, &mem->thresholds);
+
+	return 0;
+}
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
@@ -3125,6 +3272,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	mem->last_scanned_child = 0;
 	spin_lock_init(&mem->reclaim_param_lock);
 
+	mem_cgroup_thresholds_init(mem);
+
 	if (parent)
 		mem->swappiness = get_swappiness(parent);
 	atomic_set(&mem->refcnt, 1);
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 3/3] memcg: implement memory thresholds
@ 2009-11-26 17:11       ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 17:11 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

It allows to register multiple memory thresholds and gets notifications
when it crosses.

To register a threshold application need:
- create an eventfd;
- open file memory.usage_in_bytes of a cgroup
- write string "<event_fd> <memory.usage_in_bytes> <threshold>" to
  cgroup.event_control.

Application will be notified through eventfd when memory usage crosses
threshold in any direction.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 mm/memcontrol.c |  149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 149 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f99f599..af1af0b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6,6 +6,10 @@
  * Copyright 2007 OpenVZ SWsoft Inc
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Memory thresholds
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -38,6 +42,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
+#include <linux/eventfd.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -174,6 +179,12 @@ struct mem_cgroup_tree {
 
 static struct mem_cgroup_tree soft_limit_tree __read_mostly;
 
+struct mem_cgroup_threshold {
+	struct list_head list;
+	struct eventfd_ctx *eventfd;
+	u64 threshold;
+};
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -225,6 +236,9 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	struct list_head thresholds;
+	struct mem_cgroup_threshold *current_threshold;
+
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -2839,12 +2853,119 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static inline void mem_cgroup_set_thresholds(struct res_counter *counter,
+		u64 above, u64 below)
+{
+	BUG_ON(res_counter_set_thresholds(counter, above, below));
+}
+
+static void mem_cgroup_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter,
+			struct mem_cgroup,res);
+	struct mem_cgroup_threshold *above, *below;
+
+	above = below = memcg->current_threshold;
+
+	if (threshold <= usage) {
+		list_for_each_entry_continue(above, &memcg->thresholds,
+				list) {
+			if (above->threshold > usage)
+				break;
+			below = above;
+			eventfd_signal(below->eventfd, 1);
+		}
+	} else {
+		list_for_each_entry_continue_reverse(below,
+				&memcg->thresholds, list) {
+			eventfd_signal(above->eventfd, 1);
+			if (below->threshold <= usage)
+				break;
+			above = below;
+		}
+	}
+
+	mem_cgroup_set_thresholds(&memcg->res, above->threshold,
+			below->threshold);
+	memcg->current_threshold = below;
+}
+
+static void mem_cgroup_invalidate_thresholds(struct cgroup *cgrp)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct mem_cgroup_threshold *tmp, *prev = NULL;
+	u64 usage = memcg->res.usage;
+
+	list_for_each_entry(tmp, &memcg->thresholds, list) {
+		if (tmp->threshold > usage) {
+			BUG_ON(!prev);
+			memcg->current_threshold = prev;
+			break;
+		}
+		prev = tmp;
+	}
+
+	mem_cgroup_set_thresholds(&memcg->res, tmp->threshold,
+			prev->threshold);
+}
+
+static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd, const char *args)
+{
+	u64 threshold;
+	struct mem_cgroup_threshold *new, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	int ret;
+
+	/* TODO: Root cgroup is a special case */
+	if (mem_cgroup_is_root(memcg))
+		return -ENOSYS;
+
+	ret = res_counter_memparse_write_strategy(args, &threshold);
+	if (ret)
+		return ret;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->eventfd = eventfd;
+	new->threshold = threshold;
+
+	list_for_each_entry(tmp, &memcg->thresholds, list)
+		if (new->threshold < tmp->threshold) {
+			list_add_tail(&new->list, &tmp->list);
+			break;
+		}
+	mem_cgroup_invalidate_thresholds(cgrp);
+
+	return 0;
+}
+
+static int mem_cgroup_unregister_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd)
+{
+	struct mem_cgroup_threshold *threshold, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+
+	list_for_each_entry_safe(threshold, tmp, &memcg->thresholds, list)
+		if (threshold->eventfd == eventfd) {
+			list_del(&threshold->list);
+			kfree(threshold);
+		}
+	mem_cgroup_invalidate_thresholds(cgrp);
+
+	return 0;
+}
 
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "max_usage_in_bytes",
@@ -3080,6 +3201,32 @@ static int mem_cgroup_soft_limit_tree_init(void)
 	return 0;
 }
 
+static int mem_cgroup_thresholds_init(struct mem_cgroup *mem)
+{
+	struct mem_cgroup_threshold *new;
+
+	mem->res.threshold_notifier = mem_cgroup_threshold;
+	INIT_LIST_HEAD(&mem->thresholds);
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = 0ULL;
+	list_add(&new->list, &mem->thresholds);
+
+	mem->current_threshold = new;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = RESOURCE_MAX;
+	list_add_tail(&new->list, &mem->thresholds);
+
+	return 0;
+}
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
@@ -3125,6 +3272,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	mem->last_scanned_child = 0;
 	spin_lock_init(&mem->reclaim_param_lock);
 
+	mem_cgroup_thresholds_init(mem);
+
 	if (parent)
 		mem->swappiness = get_swappiness(parent);
 	atomic_set(&mem->refcnt, 1);
-- 
1.6.5.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
       [not found]   ` <8524ba285f6dd59cda939c28da523f344cdab3da.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
  2009-11-26 17:11     ` [PATCH RFC v0 3/3] memcg: implement memory thresholds Kirill A. Shutemov
@ 2009-11-27  0:20     ` Daisuke Nishimura
  1 sibling, 0 replies; 35+ messages in thread
From: Daisuke Nishimura @ 2009-11-27  0:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Daisuke Nishimura,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Paul Menage, Pavel Emelyanov,
	Andrew Morton, Balbir Singh

Hi.

On Thu, 26 Nov 2009 19:11:16 +0200, "Kirill A. Shutemov" <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org> wrote:
> It allows to setup two thresholds: one above current usage and one
> below. Callback threshold_notifier() will be called if a threshold is
> crossed.
> 
> Signed-off-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
> ---
>  include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
>  kernel/res_counter.c        |    4 +++
>  2 files changed, 48 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index fcb9884..bca99a5 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -9,6 +9,10 @@
>   *
>   * Author: Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
>   *
> + * Thresholds support
> + * Copyright (C) 2009 Nokia Corporation
> + * Author: Kirill A. Shutemov
> + *
>   * See Documentation/cgroups/resource_counter.txt for more
>   * info about what this counter is.
>   */
> @@ -42,6 +46,13 @@ struct res_counter {
>  	 * the number of unsuccessful attempts to consume the resource
>  	 */
>  	unsigned long long failcnt;
> +
> +	unsigned long long threshold_above;
> +	unsigned long long threshold_below;
> +	void (*threshold_notifier)(struct res_counter *counter,
> +			unsigned long long usage,
> +			unsigned long long threshold);
> +
>  	/*
>  	 * the lock to protect all of the above.
>  	 * the routines below consider this to be IRQ-safe
> @@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
>  	return false;
>  }
>  
> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> +{
> +	if (cnt->usage >= cnt->threshold_above) {
> +		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
> +		return;
> +	}
> +
> +	if (cnt->usage < cnt->threshold_below) {
> +		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
> +		return;
> +	}
> +}
> +
> +
>  /**
>   * Get the difference between the usage and the soft limit
>   * @cnt: The counter
> @@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
>  	return 0;
>  }
>  
> +static inline int
> +res_counter_set_thresholds(struct res_counter *cnt,
> +		unsigned long long threshold_above,
> +		unsigned long long threshold_below)
> +{
> +	unsigned long flags;
> +	int ret = -EINVAL;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	if ((cnt->usage < threshold_above) &&
> +			(cnt->usage >= threshold_below)) {
> +		cnt->threshold_above = threshold_above;
> +		cnt->threshold_below = threshold_below;
> +		ret = 0;
> +	}
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return ret;
> +}
> +
>  #endif
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index bcdabf3..646c29c 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
>  	spin_lock_init(&counter->lock);
>  	counter->limit = RESOURCE_MAX;
>  	counter->soft_limit = RESOURCE_MAX;
> +	counter->threshold_above = RESOURCE_MAX;
> +	counter->threshold_below = 0ULL;
>  	counter->parent = parent;
>  }
>  
> @@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>  	counter->usage += val;
>  	if (counter->usage > counter->max_usage)
>  		counter->max_usage = counter->usage;
> +	res_counter_threshold_notify_locked(counter);
>  	return 0;
>  }
>  
> @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>  		val = counter->usage;
>  
>  	counter->usage -= val;
> +	res_counter_threshold_notify_locked(counter);
>  }
>  
hmm.. this adds new checks to hot-path of process life cycle.

Do you have any number on performance impact of these patches(w/o setting any threshold)?
IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.


Regards,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
  2009-11-26 17:11     ` Kirill A. Shutemov
@ 2009-11-27  0:20       ` Daisuke Nishimura
  -1 siblings, 0 replies; 35+ messages in thread
From: Daisuke Nishimura @ 2009-11-27  0:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
	KAMEZAWA Hiroyuki, Balbir Singh, Pavel Emelyanov, linux-kernel,
	Daisuke Nishimura

Hi.

On Thu, 26 Nov 2009 19:11:16 +0200, "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> It allows to setup two thresholds: one above current usage and one
> below. Callback threshold_notifier() will be called if a threshold is
> crossed.
> 
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> ---
>  include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
>  kernel/res_counter.c        |    4 +++
>  2 files changed, 48 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index fcb9884..bca99a5 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -9,6 +9,10 @@
>   *
>   * Author: Pavel Emelianov <xemul@openvz.org>
>   *
> + * Thresholds support
> + * Copyright (C) 2009 Nokia Corporation
> + * Author: Kirill A. Shutemov
> + *
>   * See Documentation/cgroups/resource_counter.txt for more
>   * info about what this counter is.
>   */
> @@ -42,6 +46,13 @@ struct res_counter {
>  	 * the number of unsuccessful attempts to consume the resource
>  	 */
>  	unsigned long long failcnt;
> +
> +	unsigned long long threshold_above;
> +	unsigned long long threshold_below;
> +	void (*threshold_notifier)(struct res_counter *counter,
> +			unsigned long long usage,
> +			unsigned long long threshold);
> +
>  	/*
>  	 * the lock to protect all of the above.
>  	 * the routines below consider this to be IRQ-safe
> @@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
>  	return false;
>  }
>  
> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> +{
> +	if (cnt->usage >= cnt->threshold_above) {
> +		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
> +		return;
> +	}
> +
> +	if (cnt->usage < cnt->threshold_below) {
> +		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
> +		return;
> +	}
> +}
> +
> +
>  /**
>   * Get the difference between the usage and the soft limit
>   * @cnt: The counter
> @@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
>  	return 0;
>  }
>  
> +static inline int
> +res_counter_set_thresholds(struct res_counter *cnt,
> +		unsigned long long threshold_above,
> +		unsigned long long threshold_below)
> +{
> +	unsigned long flags;
> +	int ret = -EINVAL;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	if ((cnt->usage < threshold_above) &&
> +			(cnt->usage >= threshold_below)) {
> +		cnt->threshold_above = threshold_above;
> +		cnt->threshold_below = threshold_below;
> +		ret = 0;
> +	}
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return ret;
> +}
> +
>  #endif
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index bcdabf3..646c29c 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
>  	spin_lock_init(&counter->lock);
>  	counter->limit = RESOURCE_MAX;
>  	counter->soft_limit = RESOURCE_MAX;
> +	counter->threshold_above = RESOURCE_MAX;
> +	counter->threshold_below = 0ULL;
>  	counter->parent = parent;
>  }
>  
> @@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>  	counter->usage += val;
>  	if (counter->usage > counter->max_usage)
>  		counter->max_usage = counter->usage;
> +	res_counter_threshold_notify_locked(counter);
>  	return 0;
>  }
>  
> @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>  		val = counter->usage;
>  
>  	counter->usage -= val;
> +	res_counter_threshold_notify_locked(counter);
>  }
>  
hmm.. this adds new checks to hot-path of process life cycle.

Do you have any number on performance impact of these patches(w/o setting any threshold)?
IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.


Regards,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
@ 2009-11-27  0:20       ` Daisuke Nishimura
  0 siblings, 0 replies; 35+ messages in thread
From: Daisuke Nishimura @ 2009-11-27  0:20 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
	KAMEZAWA Hiroyuki, Balbir Singh, Pavel Emelyanov, linux-kernel,
	Daisuke Nishimura

Hi.

On Thu, 26 Nov 2009 19:11:16 +0200, "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> It allows to setup two thresholds: one above current usage and one
> below. Callback threshold_notifier() will be called if a threshold is
> crossed.
> 
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
> ---
>  include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
>  kernel/res_counter.c        |    4 +++
>  2 files changed, 48 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index fcb9884..bca99a5 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -9,6 +9,10 @@
>   *
>   * Author: Pavel Emelianov <xemul@openvz.org>
>   *
> + * Thresholds support
> + * Copyright (C) 2009 Nokia Corporation
> + * Author: Kirill A. Shutemov
> + *
>   * See Documentation/cgroups/resource_counter.txt for more
>   * info about what this counter is.
>   */
> @@ -42,6 +46,13 @@ struct res_counter {
>  	 * the number of unsuccessful attempts to consume the resource
>  	 */
>  	unsigned long long failcnt;
> +
> +	unsigned long long threshold_above;
> +	unsigned long long threshold_below;
> +	void (*threshold_notifier)(struct res_counter *counter,
> +			unsigned long long usage,
> +			unsigned long long threshold);
> +
>  	/*
>  	 * the lock to protect all of the above.
>  	 * the routines below consider this to be IRQ-safe
> @@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
>  	return false;
>  }
>  
> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> +{
> +	if (cnt->usage >= cnt->threshold_above) {
> +		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
> +		return;
> +	}
> +
> +	if (cnt->usage < cnt->threshold_below) {
> +		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
> +		return;
> +	}
> +}
> +
> +
>  /**
>   * Get the difference between the usage and the soft limit
>   * @cnt: The counter
> @@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
>  	return 0;
>  }
>  
> +static inline int
> +res_counter_set_thresholds(struct res_counter *cnt,
> +		unsigned long long threshold_above,
> +		unsigned long long threshold_below)
> +{
> +	unsigned long flags;
> +	int ret = -EINVAL;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	if ((cnt->usage < threshold_above) &&
> +			(cnt->usage >= threshold_below)) {
> +		cnt->threshold_above = threshold_above;
> +		cnt->threshold_below = threshold_below;
> +		ret = 0;
> +	}
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return ret;
> +}
> +
>  #endif
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index bcdabf3..646c29c 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
>  	spin_lock_init(&counter->lock);
>  	counter->limit = RESOURCE_MAX;
>  	counter->soft_limit = RESOURCE_MAX;
> +	counter->threshold_above = RESOURCE_MAX;
> +	counter->threshold_below = 0ULL;
>  	counter->parent = parent;
>  }
>  
> @@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>  	counter->usage += val;
>  	if (counter->usage > counter->max_usage)
>  		counter->max_usage = counter->usage;
> +	res_counter_threshold_notify_locked(counter);
>  	return 0;
>  }
>  
> @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>  		val = counter->usage;
>  
>  	counter->usage -= val;
> +	res_counter_threshold_notify_locked(counter);
>  }
>  
hmm.. this adds new checks to hot-path of process life cycle.

Do you have any number on performance impact of these patches(w/o setting any threshold)?
IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.


Regards,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
       [not found]       ` <20091127092035.bbf2efdc.nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
@ 2009-11-27  2:45         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-27  2:45 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Paul Menage, Pavel Emelyanov,
	Andrew Morton, Balbir Singh

On Fri, 27 Nov 2009 09:20:35 +0900
Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:

> Hi.
> >  
> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
> >  		val = counter->usage;
> >  
> >  	counter->usage -= val;
> > +	res_counter_threshold_notify_locked(counter);
> >  }
> >  
> hmm.. this adds new checks to hot-path of process life cycle.
> 
> Do you have any number on performance impact of these patches(w/o setting any threshold)?
> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
> 
Another concern is to support root cgroup, you need another notifier hook in
memcg because root cgroup doesn't use res_counter now.

Can't this be implemented in a way like softlimit check ? 
Filter by the number of event will be good for notifier behavior, for avoiding
too much wake up, too.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
  2009-11-27  0:20       ` Daisuke Nishimura
@ 2009-11-27  2:45         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-27  2:45 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Kirill A. Shutemov, containers, linux-mm, Paul Menage, Li Zefan,
	Andrew Morton, Balbir Singh, Pavel Emelyanov, linux-kernel

On Fri, 27 Nov 2009 09:20:35 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Hi.
> >  
> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
> >  		val = counter->usage;
> >  
> >  	counter->usage -= val;
> > +	res_counter_threshold_notify_locked(counter);
> >  }
> >  
> hmm.. this adds new checks to hot-path of process life cycle.
> 
> Do you have any number on performance impact of these patches(w/o setting any threshold)?
> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
> 
Another concern is to support root cgroup, you need another notifier hook in
memcg because root cgroup doesn't use res_counter now.

Can't this be implemented in a way like softlimit check ? 
Filter by the number of event will be good for notifier behavior, for avoiding
too much wake up, too.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
@ 2009-11-27  2:45         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 35+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-11-27  2:45 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: Kirill A. Shutemov, containers, linux-mm, Paul Menage, Li Zefan,
	Andrew Morton, Balbir Singh, Pavel Emelyanov, linux-kernel

On Fri, 27 Nov 2009 09:20:35 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Hi.
> >  
> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
> >  		val = counter->usage;
> >  
> >  	counter->usage -= val;
> > +	res_counter_threshold_notify_locked(counter);
> >  }
> >  
> hmm.. this adds new checks to hot-path of process life cycle.
> 
> Do you have any number on performance impact of these patches(w/o setting any threshold)?
> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
> 
Another concern is to support root cgroup, you need another notifier hook in
memcg because root cgroup doesn't use res_counter now.

Can't this be implemented in a way like softlimit check ? 
Filter by the number of event will be good for notifier behavior, for avoiding
too much wake up, too.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
       [not found]         ` <20091127114511.bbb43d5a.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2009-11-27  3:08           ` Balbir Singh
  0 siblings, 0 replies; 35+ messages in thread
From: Balbir Singh @ 2009-11-27  3:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Dan Malek, Paul Menage,
	Andrew Morton, Vladislav Buzov, Pavel Emelyanov

On Fri, Nov 27, 2009 at 8:15 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> wrote:
> On Fri, 27 Nov 2009 09:20:35 +0900
> Daisuke Nishimura <nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org> wrote:
>
>> Hi.
>> >
>> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>> >             val = counter->usage;
>> >
>> >     counter->usage -= val;
>> > +   res_counter_threshold_notify_locked(counter);
>> >  }
>> >
>> hmm.. this adds new checks to hot-path of process life cycle.
>>
>> Do you have any number on performance impact of these patches(w/o setting any threshold)?
>> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
>> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
>>
> Another concern is to support root cgroup, you need another notifier hook in
> memcg because root cgroup doesn't use res_counter now.
>
> Can't this be implemented in a way like softlimit check ?
> Filter by the number of event will be good for notifier behavior, for avoiding
> too much wake up, too.

I guess the semantics would vary then, they would become activity
semantics. I think we should avoid threshold notification for root,
since we have no limits in root anymore.

BTW, Kirill, I've been meaning to write this layer on top of
cgroupstats, is there anything that prevents us from using that today?
CC'ing Dan Malek and Vladslav Buzov who worked on similar patches
earlier.

Balbir Singh.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
  2009-11-27  2:45         ` KAMEZAWA Hiroyuki
@ 2009-11-27  3:08           ` Balbir Singh
  -1 siblings, 0 replies; 35+ messages in thread
From: Balbir Singh @ 2009-11-27  3:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Kirill A. Shutemov, containers, linux-mm,
	Paul Menage, Li Zefan, Andrew Morton, Pavel Emelyanov,
	linux-kernel, Dan Malek, Vladislav Buzov

On Fri, Nov 27, 2009 at 8:15 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 27 Nov 2009 09:20:35 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>
>> Hi.
>> >
>> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>> >             val = counter->usage;
>> >
>> >     counter->usage -= val;
>> > +   res_counter_threshold_notify_locked(counter);
>> >  }
>> >
>> hmm.. this adds new checks to hot-path of process life cycle.
>>
>> Do you have any number on performance impact of these patches(w/o setting any threshold)?
>> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
>> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
>>
> Another concern is to support root cgroup, you need another notifier hook in
> memcg because root cgroup doesn't use res_counter now.
>
> Can't this be implemented in a way like softlimit check ?
> Filter by the number of event will be good for notifier behavior, for avoiding
> too much wake up, too.

I guess the semantics would vary then, they would become activity
semantics. I think we should avoid threshold notification for root,
since we have no limits in root anymore.

BTW, Kirill, I've been meaning to write this layer on top of
cgroupstats, is there anything that prevents us from using that today?
CC'ing Dan Malek and Vladslav Buzov who worked on similar patches
earlier.

Balbir Singh.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
@ 2009-11-27  3:08           ` Balbir Singh
  0 siblings, 0 replies; 35+ messages in thread
From: Balbir Singh @ 2009-11-27  3:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Kirill A. Shutemov, containers, linux-mm,
	Paul Menage, Li Zefan, Andrew Morton, Pavel Emelyanov,
	linux-kernel, Dan Malek, Vladislav Buzov

On Fri, Nov 27, 2009 at 8:15 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 27 Nov 2009 09:20:35 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>
>> Hi.
>> >
>> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>> >             val = counter->usage;
>> >
>> >     counter->usage -= val;
>> > +   res_counter_threshold_notify_locked(counter);
>> >  }
>> >
>> hmm.. this adds new checks to hot-path of process life cycle.
>>
>> Do you have any number on performance impact of these patches(w/o setting any threshold)?
>> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
>> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
>>
> Another concern is to support root cgroup, you need another notifier hook in
> memcg because root cgroup doesn't use res_counter now.
>
> Can't this be implemented in a way like softlimit check ?
> Filter by the number of event will be good for notifier behavior, for avoiding
> too much wake up, too.

I guess the semantics would vary then, they would become activity
semantics. I think we should avoid threshold notification for root,
since we have no limits in root anymore.

BTW, Kirill, I've been meaning to write this layer on top of
cgroupstats, is there anything that prevents us from using that today?
CC'ing Dan Malek and Vladslav Buzov who worked on similar patches
earlier.

Balbir Singh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
       [not found]           ` <661de9470911261908i4bb51e91v649025e6c75bd91b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-11-27  7:08             ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27  7:08 UTC (permalink / raw)
  To: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Dan Malek, Paul Menage,
	Andrew Morton, Vladislav Buzov, Pavel Emelyanov

On Fri, Nov 27, 2009 at 5:08 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> On Fri, Nov 27, 2009 at 8:15 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> On Fri, 27 Nov 2009 09:20:35 +0900
>> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>>
>>> Hi.
>>> >
>>> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>>> >             val = counter->usage;
>>> >
>>> >     counter->usage -= val;
>>> > +   res_counter_threshold_notify_locked(counter);
>>> >  }
>>> >
>>> hmm.. this adds new checks to hot-path of process life cycle.
>>>
>>> Do you have any number on performance impact of these patches(w/o setting any threshold)?

No, I don't. I did only functional testing on this stage.

>>> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
>>> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
>>>
>> Another concern is to support root cgroup, you need another notifier hook in
>> memcg because root cgroup doesn't use res_counter now.
>>
>> Can't this be implemented in a way like softlimit check ?

I'll investigate it.

>> Filter by the number of event will be good for notifier behavior, for avoiding
>> too much wake up, too.

Good idea, thanks.

> I guess the semantics would vary then, they would become activity
> semantics. I think we should avoid threshold notification for root,
> since we have no limits in root anymore.

Threshold notifications for root cgroup is really needed on embedded
systems to avid OOM-killer.

>
> BTW, Kirill, I've been meaning to write this layer on top of
> cgroupstats, is there anything that prevents us from using that today?

I'll investigate it.

> CC'ing Dan Malek and Vladslav Buzov who worked on similar patches
> earlier.
>
> Balbir Singh.
>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
  2009-11-27  3:08           ` Balbir Singh
@ 2009-11-27  7:08             ` Kirill A. Shutemov
  -1 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27  7:08 UTC (permalink / raw)
  To: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura
  Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
	Pavel Emelyanov, linux-kernel, Dan Malek, Vladislav Buzov

On Fri, Nov 27, 2009 at 5:08 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> On Fri, Nov 27, 2009 at 8:15 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> On Fri, 27 Nov 2009 09:20:35 +0900
>> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>>
>>> Hi.
>>> >
>>> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>>> >             val = counter->usage;
>>> >
>>> >     counter->usage -= val;
>>> > +   res_counter_threshold_notify_locked(counter);
>>> >  }
>>> >
>>> hmm.. this adds new checks to hot-path of process life cycle.
>>>
>>> Do you have any number on performance impact of these patches(w/o setting any threshold)?

No, I don't. I did only functional testing on this stage.

>>> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
>>> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
>>>
>> Another concern is to support root cgroup, you need another notifier hook in
>> memcg because root cgroup doesn't use res_counter now.
>>
>> Can't this be implemented in a way like softlimit check ?

I'll investigate it.

>> Filter by the number of event will be good for notifier behavior, for avoiding
>> too much wake up, too.

Good idea, thanks.

> I guess the semantics would vary then, they would become activity
> semantics. I think we should avoid threshold notification for root,
> since we have no limits in root anymore.

Threshold notifications for root cgroup is really needed on embedded
systems to avid OOM-killer.

>
> BTW, Kirill, I've been meaning to write this layer on top of
> cgroupstats, is there anything that prevents us from using that today?

I'll investigate it.

> CC'ing Dan Malek and Vladslav Buzov who worked on similar patches
> earlier.
>
> Balbir Singh.
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v0 2/3] res_counter: implement thresholds
@ 2009-11-27  7:08             ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27  7:08 UTC (permalink / raw)
  To: Balbir Singh, KAMEZAWA Hiroyuki, Daisuke Nishimura
  Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
	Pavel Emelyanov, linux-kernel, Dan Malek, Vladislav Buzov

On Fri, Nov 27, 2009 at 5:08 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> On Fri, Nov 27, 2009 at 8:15 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> On Fri, 27 Nov 2009 09:20:35 +0900
>> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>>
>>> Hi.
>>> >
>>> > @@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
>>> >             val = counter->usage;
>>> >
>>> >     counter->usage -= val;
>>> > +   res_counter_threshold_notify_locked(counter);
>>> >  }
>>> >
>>> hmm.. this adds new checks to hot-path of process life cycle.
>>>
>>> Do you have any number on performance impact of these patches(w/o setting any threshold)?

No, I don't. I did only functional testing on this stage.

>>> IMHO, it might be small enough to be ignored because KAMEZAWA-san's coalesce charge/uncharge
>>> patches have decreased charge/uncharge for res_counter itself, but I want to know just to make sure.
>>>
>> Another concern is to support root cgroup, you need another notifier hook in
>> memcg because root cgroup doesn't use res_counter now.
>>
>> Can't this be implemented in a way like softlimit check ?

I'll investigate it.

>> Filter by the number of event will be good for notifier behavior, for avoiding
>> too much wake up, too.

Good idea, thanks.

> I guess the semantics would vary then, they would become activity
> semantics. I think we should avoid threshold notification for root,
> since we have no limits in root anymore.

Threshold notifications for root cgroup is really needed on embedded
systems to avid OOM-killer.

>
> BTW, Kirill, I've been meaning to write this layer on top of
> cgroupstats, is there anything that prevents us from using that today?

I'll investigate it.

> CC'ing Dan Malek and Vladslav Buzov who worked on similar patches
> earlier.
>
> Balbir Singh.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 1/3] cgroup: implement eventfd-based generic API for notifications
       [not found] ` <cover.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
                     ` (2 preceding siblings ...)
       [not found]   ` <8524ba285f6dd59cda939c28da523f344cdab3da.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2009-11-27 11:55   ` Kirill A. Shutemov
  3 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: Daisuke Nishimura, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Dan Malek, Vladislav Buzov, Paul Menage, Balbir Singh,
	Andrew Morton, Pavel Emelyanov

This patch introduces write-only file "cgroup.event_control" in every
cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
  unregister_event() must be defined for the control file;
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
  Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

Signed-off-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
---
 include/linux/cgroup.h |    8 ++
 kernel/cgroup.c        |  181 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 1 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..285eaff 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -220,6 +220,9 @@ struct cgroup {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+
+	struct list_head event_list;
+	struct mutex event_list_mutex;
 };
 
 /*
@@ -362,6 +365,11 @@ struct cftype {
 	int (*trigger)(struct cgroup *cgrp, unsigned int event);
 
 	int (*release)(struct inode *inode, struct file *file);
+
+	int (*register_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd, const char *args);
+	int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd);
 };
 
 struct cgroup_scanner {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..5438d46 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4,6 +4,10 @@
  *  Based originally on the cpuset system, extracted by Paul Menage
  *  Copyright (C) 2006 Google, Inc
  *
+ *  Notifiactions support
+ *  Copyright (C) 2009 Nokia Corporation
+ *  Author: Kirill A. Shutemov
+ *
  *  Copyright notices from the original cpuset code:
  *  --------------------------------------------------
  *  Copyright (C) 2003 BULL SA.
@@ -51,6 +55,8 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
+#include <linux/eventfd.h>
+#include <linux/poll.h>
 
 #include <asm/atomic.h>
 
@@ -146,6 +152,16 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
+struct cgroup_event {
+	struct cgroup *cgrp;
+	struct cftype *cft;
+	struct eventfd_ctx *eventfd;
+	struct list_head list;
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+};
+static int cgroup_event_remove(struct cgroup_event *event);
 
 /* The list of hierarchy roots */
 
@@ -734,14 +750,26 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
+	struct cgroup_event *event, *tmp;
 	int ret = 0;
 
 	for_each_subsys(cgrp->root, ss)
 		if (ss->pre_destroy) {
 			ret = ss->pre_destroy(ss, cgrp);
 			if (ret)
-				break;
+				goto out;
 		}
+
+	mutex_lock(&cgrp->event_list_mutex);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		ret = cgroup_event_remove(event);
+		if (ret)
+			break;
+		eventfd_signal(event->eventfd, 1);
+	}
+	mutex_unlock(&cgrp->event_list_mutex);
+
+out:
 	return ret;
 }
 
@@ -1136,6 +1164,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->release_list);
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
+	INIT_LIST_HEAD(&cgrp->event_list);
+	mutex_init(&cgrp->event_list_mutex);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1935,6 +1965,13 @@ static const struct inode_operations cgroup_dir_inode_operations = {
 	.rename = cgroup_rename,
 };
 
+static inline struct cftype *__file_cft(struct file *file)
+{
+	if (file->f_dentry->d_inode->i_fop != &cgroup_file_operations)
+		return ERR_PTR(-EINVAL);
+	return __d_cft(file->f_dentry);
+}
+
 static int cgroup_create_file(struct dentry *dentry, mode_t mode,
 				struct super_block *sb)
 {
@@ -2789,6 +2826,143 @@ static int cgroup_write_notify_on_release(struct cgroup *cgrp,
 	return 0;
 }
 
+static int cgroup_event_remove(struct cgroup_event *event)
+{
+	struct cgroup *cgrp = event->cgrp;
+	int ret;
+
+	BUG_ON(!mutex_is_locked(&cgrp->event_list_mutex));
+	ret = event->cft->unregister_event(cgrp, event->cft, event->eventfd);
+	eventfd_ctx_put(event->eventfd);
+	remove_wait_queue(event->wqh, &event->wait);
+	list_del(&event->list);
+	kfree(event);
+
+	return ret;
+}
+
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->cgrp;
+	unsigned long flags = (unsigned long)key;
+	int ret;
+
+	if (!(flags & POLLHUP))
+		return 0;
+
+	mutex_lock(&cgrp->event_list_mutex);
+	ret = cgroup_event_remove(event);
+	mutex_unlock(&cgrp->event_list_mutex);
+
+	return ret;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+static int cgroup_write_event_control(struct cgroup *cont, struct cftype *cft,
+				      const char *buffer)
+{
+	struct cgroup_event *event = NULL;
+	unsigned int efd, cfd;
+	struct file *efile = NULL;
+	struct file *cfile = NULL;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+	event->cgrp = cont;
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto fail;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto fail;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto fail;
+	}
+
+	ret = file_permission(cfile, MAY_READ);
+	if (ret < 0)
+		goto fail;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto fail;
+	}
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
+	ret = event->cft->register_event(cont, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto fail;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	mutex_lock(&cont->event_list_mutex);
+	list_add(&event->list, &cont->event_list);
+	mutex_unlock(&cont->event_list_mutex);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+fail:
+	if (!IS_ERR(cfile))
+		fput(cfile);
+
+	if (event && event->eventfd && !IS_ERR(event->eventfd))
+		eventfd_ctx_put(event->eventfd);
+
+	if (!IS_ERR(efile))
+		fput(efile);
+
+	if (event)
+		kfree(event);
+
+	return ret;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
@@ -2814,6 +2988,11 @@ static struct cftype files[] = {
 		.read_u64 = cgroup_read_notify_on_release,
 		.write_u64 = cgroup_write_notify_on_release,
 	},
+	{
+		.name = CGROUP_FILE_GENERIC_PREFIX "event_control",
+		.write_string = cgroup_write_event_control,
+		.mode = S_IWUGO,
+	},
 };
 
 static struct cftype cft_release_agent = {
-- 
1.6.5.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 1/3] cgroup: implement eventfd-based generic API for notifications
       [not found] ` <cover.1259321503.git.kirill@shutemov.name>
@ 2009-11-27 11:55     ` Kirill A. Shutemov
  2009-11-27 11:55     ` Kirill A. Shutemov
  2009-11-27 11:55     ` Kirill A. Shutemov
  2 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, Dan Malek, Vladislav Buzov,
	Daisuke Nishimura, linux-kernel, Kirill A. Shutemov

This patch introduces write-only file "cgroup.event_control" in every
cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
  unregister_event() must be defined for the control file;
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
  Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 include/linux/cgroup.h |    8 ++
 kernel/cgroup.c        |  181 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 1 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..285eaff 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -220,6 +220,9 @@ struct cgroup {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+
+	struct list_head event_list;
+	struct mutex event_list_mutex;
 };
 
 /*
@@ -362,6 +365,11 @@ struct cftype {
 	int (*trigger)(struct cgroup *cgrp, unsigned int event);
 
 	int (*release)(struct inode *inode, struct file *file);
+
+	int (*register_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd, const char *args);
+	int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd);
 };
 
 struct cgroup_scanner {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..5438d46 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4,6 +4,10 @@
  *  Based originally on the cpuset system, extracted by Paul Menage
  *  Copyright (C) 2006 Google, Inc
  *
+ *  Notifiactions support
+ *  Copyright (C) 2009 Nokia Corporation
+ *  Author: Kirill A. Shutemov
+ *
  *  Copyright notices from the original cpuset code:
  *  --------------------------------------------------
  *  Copyright (C) 2003 BULL SA.
@@ -51,6 +55,8 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
+#include <linux/eventfd.h>
+#include <linux/poll.h>
 
 #include <asm/atomic.h>
 
@@ -146,6 +152,16 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
+struct cgroup_event {
+	struct cgroup *cgrp;
+	struct cftype *cft;
+	struct eventfd_ctx *eventfd;
+	struct list_head list;
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+};
+static int cgroup_event_remove(struct cgroup_event *event);
 
 /* The list of hierarchy roots */
 
@@ -734,14 +750,26 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
+	struct cgroup_event *event, *tmp;
 	int ret = 0;
 
 	for_each_subsys(cgrp->root, ss)
 		if (ss->pre_destroy) {
 			ret = ss->pre_destroy(ss, cgrp);
 			if (ret)
-				break;
+				goto out;
 		}
+
+	mutex_lock(&cgrp->event_list_mutex);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		ret = cgroup_event_remove(event);
+		if (ret)
+			break;
+		eventfd_signal(event->eventfd, 1);
+	}
+	mutex_unlock(&cgrp->event_list_mutex);
+
+out:
 	return ret;
 }
 
@@ -1136,6 +1164,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->release_list);
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
+	INIT_LIST_HEAD(&cgrp->event_list);
+	mutex_init(&cgrp->event_list_mutex);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1935,6 +1965,13 @@ static const struct inode_operations cgroup_dir_inode_operations = {
 	.rename = cgroup_rename,
 };
 
+static inline struct cftype *__file_cft(struct file *file)
+{
+	if (file->f_dentry->d_inode->i_fop != &cgroup_file_operations)
+		return ERR_PTR(-EINVAL);
+	return __d_cft(file->f_dentry);
+}
+
 static int cgroup_create_file(struct dentry *dentry, mode_t mode,
 				struct super_block *sb)
 {
@@ -2789,6 +2826,143 @@ static int cgroup_write_notify_on_release(struct cgroup *cgrp,
 	return 0;
 }
 
+static int cgroup_event_remove(struct cgroup_event *event)
+{
+	struct cgroup *cgrp = event->cgrp;
+	int ret;
+
+	BUG_ON(!mutex_is_locked(&cgrp->event_list_mutex));
+	ret = event->cft->unregister_event(cgrp, event->cft, event->eventfd);
+	eventfd_ctx_put(event->eventfd);
+	remove_wait_queue(event->wqh, &event->wait);
+	list_del(&event->list);
+	kfree(event);
+
+	return ret;
+}
+
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->cgrp;
+	unsigned long flags = (unsigned long)key;
+	int ret;
+
+	if (!(flags & POLLHUP))
+		return 0;
+
+	mutex_lock(&cgrp->event_list_mutex);
+	ret = cgroup_event_remove(event);
+	mutex_unlock(&cgrp->event_list_mutex);
+
+	return ret;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+static int cgroup_write_event_control(struct cgroup *cont, struct cftype *cft,
+				      const char *buffer)
+{
+	struct cgroup_event *event = NULL;
+	unsigned int efd, cfd;
+	struct file *efile = NULL;
+	struct file *cfile = NULL;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+	event->cgrp = cont;
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto fail;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto fail;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto fail;
+	}
+
+	ret = file_permission(cfile, MAY_READ);
+	if (ret < 0)
+		goto fail;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto fail;
+	}
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
+	ret = event->cft->register_event(cont, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto fail;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	mutex_lock(&cont->event_list_mutex);
+	list_add(&event->list, &cont->event_list);
+	mutex_unlock(&cont->event_list_mutex);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+fail:
+	if (!IS_ERR(cfile))
+		fput(cfile);
+
+	if (event && event->eventfd && !IS_ERR(event->eventfd))
+		eventfd_ctx_put(event->eventfd);
+
+	if (!IS_ERR(efile))
+		fput(efile);
+
+	if (event)
+		kfree(event);
+
+	return ret;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
@@ -2814,6 +2988,11 @@ static struct cftype files[] = {
 		.read_u64 = cgroup_read_notify_on_release,
 		.write_u64 = cgroup_write_notify_on_release,
 	},
+	{
+		.name = CGROUP_FILE_GENERIC_PREFIX "event_control",
+		.write_string = cgroup_write_event_control,
+		.mode = S_IWUGO,
+	},
 };
 
 static struct cftype cft_release_agent = {
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 1/3] cgroup: implement eventfd-based generic API for notifications
@ 2009-11-27 11:55     ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, Dan Malek, Vladislav Buzov,
	Daisuke Nishimura, linux-kernel, Kirill A. Shutemov

This patch introduces write-only file "cgroup.event_control" in every
cgroup.

To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
  unregister_event() must be defined for the control file;
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
  Interpretation of args is defined by control file implementation;

eventfd will be woken up by control file implementation or when the
cgroup is removed.

To unregister notification handler just close eventfd.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 include/linux/cgroup.h |    8 ++
 kernel/cgroup.c        |  181 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 1 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..285eaff 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -220,6 +220,9 @@ struct cgroup {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+
+	struct list_head event_list;
+	struct mutex event_list_mutex;
 };
 
 /*
@@ -362,6 +365,11 @@ struct cftype {
 	int (*trigger)(struct cgroup *cgrp, unsigned int event);
 
 	int (*release)(struct inode *inode, struct file *file);
+
+	int (*register_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd, const char *args);
+	int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
+			struct eventfd_ctx *eventfd);
 };
 
 struct cgroup_scanner {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..5438d46 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4,6 +4,10 @@
  *  Based originally on the cpuset system, extracted by Paul Menage
  *  Copyright (C) 2006 Google, Inc
  *
+ *  Notifiactions support
+ *  Copyright (C) 2009 Nokia Corporation
+ *  Author: Kirill A. Shutemov
+ *
  *  Copyright notices from the original cpuset code:
  *  --------------------------------------------------
  *  Copyright (C) 2003 BULL SA.
@@ -51,6 +55,8 @@
 #include <linux/pid_namespace.h>
 #include <linux/idr.h>
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
+#include <linux/eventfd.h>
+#include <linux/poll.h>
 
 #include <asm/atomic.h>
 
@@ -146,6 +152,16 @@ struct css_id {
 	unsigned short stack[0]; /* Array of Length (depth+1) */
 };
 
+struct cgroup_event {
+	struct cgroup *cgrp;
+	struct cftype *cft;
+	struct eventfd_ctx *eventfd;
+	struct list_head list;
+	poll_table pt;
+	wait_queue_head_t *wqh;
+	wait_queue_t wait;
+};
+static int cgroup_event_remove(struct cgroup_event *event);
 
 /* The list of hierarchy roots */
 
@@ -734,14 +750,26 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
+	struct cgroup_event *event, *tmp;
 	int ret = 0;
 
 	for_each_subsys(cgrp->root, ss)
 		if (ss->pre_destroy) {
 			ret = ss->pre_destroy(ss, cgrp);
 			if (ret)
-				break;
+				goto out;
 		}
+
+	mutex_lock(&cgrp->event_list_mutex);
+	list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+		ret = cgroup_event_remove(event);
+		if (ret)
+			break;
+		eventfd_signal(event->eventfd, 1);
+	}
+	mutex_unlock(&cgrp->event_list_mutex);
+
+out:
 	return ret;
 }
 
@@ -1136,6 +1164,8 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	INIT_LIST_HEAD(&cgrp->release_list);
 	INIT_LIST_HEAD(&cgrp->pidlists);
 	mutex_init(&cgrp->pidlist_mutex);
+	INIT_LIST_HEAD(&cgrp->event_list);
+	mutex_init(&cgrp->event_list_mutex);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1935,6 +1965,13 @@ static const struct inode_operations cgroup_dir_inode_operations = {
 	.rename = cgroup_rename,
 };
 
+static inline struct cftype *__file_cft(struct file *file)
+{
+	if (file->f_dentry->d_inode->i_fop != &cgroup_file_operations)
+		return ERR_PTR(-EINVAL);
+	return __d_cft(file->f_dentry);
+}
+
 static int cgroup_create_file(struct dentry *dentry, mode_t mode,
 				struct super_block *sb)
 {
@@ -2789,6 +2826,143 @@ static int cgroup_write_notify_on_release(struct cgroup *cgrp,
 	return 0;
 }
 
+static int cgroup_event_remove(struct cgroup_event *event)
+{
+	struct cgroup *cgrp = event->cgrp;
+	int ret;
+
+	BUG_ON(!mutex_is_locked(&cgrp->event_list_mutex));
+	ret = event->cft->unregister_event(cgrp, event->cft, event->eventfd);
+	eventfd_ctx_put(event->eventfd);
+	remove_wait_queue(event->wqh, &event->wait);
+	list_del(&event->list);
+	kfree(event);
+
+	return ret;
+}
+
+static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
+		int sync, void *key)
+{
+	struct cgroup_event *event = container_of(wait,
+			struct cgroup_event, wait);
+	struct cgroup *cgrp = event->cgrp;
+	unsigned long flags = (unsigned long)key;
+	int ret;
+
+	if (!(flags & POLLHUP))
+		return 0;
+
+	mutex_lock(&cgrp->event_list_mutex);
+	ret = cgroup_event_remove(event);
+	mutex_unlock(&cgrp->event_list_mutex);
+
+	return ret;
+}
+
+static void cgroup_event_ptable_queue_proc(struct file *file,
+		wait_queue_head_t *wqh, poll_table *pt)
+{
+	struct cgroup_event *event = container_of(pt,
+			struct cgroup_event, pt);
+
+	event->wqh = wqh;
+	add_wait_queue(wqh, &event->wait);
+}
+
+static int cgroup_write_event_control(struct cgroup *cont, struct cftype *cft,
+				      const char *buffer)
+{
+	struct cgroup_event *event = NULL;
+	unsigned int efd, cfd;
+	struct file *efile = NULL;
+	struct file *cfile = NULL;
+	char *endp;
+	int ret;
+
+	efd = simple_strtoul(buffer, &endp, 10);
+	if (*endp != ' ')
+		return -EINVAL;
+	buffer = endp + 1;
+
+	cfd = simple_strtoul(buffer, &endp, 10);
+	if ((*endp != ' ') && (*endp != '\0'))
+		return -EINVAL;
+	buffer = endp + 1;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
+		return -ENOMEM;
+	event->cgrp = cont;
+	INIT_LIST_HEAD(&event->list);
+	init_poll_funcptr(&event->pt, cgroup_event_ptable_queue_proc);
+	init_waitqueue_func_entry(&event->wait, cgroup_event_wake);
+
+	efile = eventfd_fget(efd);
+	if (IS_ERR(efile)) {
+		ret = PTR_ERR(efile);
+		goto fail;
+	}
+
+	event->eventfd = eventfd_ctx_fileget(efile);
+	if (IS_ERR(event->eventfd)) {
+		ret = PTR_ERR(event->eventfd);
+		goto fail;
+	}
+
+	cfile = fget(cfd);
+	if (!cfile) {
+		ret = -EBADF;
+		goto fail;
+	}
+
+	ret = file_permission(cfile, MAY_READ);
+	if (ret < 0)
+		goto fail;
+
+	event->cft = __file_cft(cfile);
+	if (IS_ERR(event->cft)) {
+		ret = PTR_ERR(event->cft);
+		goto fail;
+	}
+
+	if (!event->cft->register_event || !event->cft->unregister_event) {
+		ret = -EINVAL;
+		goto fail;
+	}
+
+	ret = event->cft->register_event(cont, event->cft,
+			event->eventfd, buffer);
+	if (ret)
+		goto fail;
+
+	efile->f_op->poll(efile, &event->pt);
+
+	mutex_lock(&cont->event_list_mutex);
+	list_add(&event->list, &cont->event_list);
+	mutex_unlock(&cont->event_list_mutex);
+
+	fput(cfile);
+	fput(efile);
+
+	return 0;
+
+fail:
+	if (!IS_ERR(cfile))
+		fput(cfile);
+
+	if (event && event->eventfd && !IS_ERR(event->eventfd))
+		eventfd_ctx_put(event->eventfd);
+
+	if (!IS_ERR(efile))
+		fput(efile);
+
+	if (event)
+		kfree(event);
+
+	return ret;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
@@ -2814,6 +2988,11 @@ static struct cftype files[] = {
 		.read_u64 = cgroup_read_notify_on_release,
 		.write_u64 = cgroup_write_notify_on_release,
 	},
+	{
+		.name = CGROUP_FILE_GENERIC_PREFIX "event_control",
+		.write_string = cgroup_write_event_control,
+		.mode = S_IWUGO,
+	},
 };
 
 static struct cftype cft_release_agent = {
-- 
1.6.5.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 2/3] res_counter: implement thresholds
       [not found]   ` <bc4dc055a7307c8667da85a4d4d9d5d189af27d5.1259321503.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2009-11-27 11:55     ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: Daisuke Nishimura, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Dan Malek, Vladislav Buzov, Paul Menage, Balbir Singh,
	Andrew Morton, Pavel Emelyanov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
---
 include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
 kernel/res_counter.c        |    4 +++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index fcb9884..bca99a5 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -9,6 +9,10 @@
  *
  * Author: Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  *
+ * Thresholds support
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * See Documentation/cgroups/resource_counter.txt for more
  * info about what this counter is.
  */
@@ -42,6 +46,13 @@ struct res_counter {
 	 * the number of unsuccessful attempts to consume the resource
 	 */
 	unsigned long long failcnt;
+
+	unsigned long long threshold_above;
+	unsigned long long threshold_below;
+	void (*threshold_notifier)(struct res_counter *counter,
+			unsigned long long usage,
+			unsigned long long threshold);
+
 	/*
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
@@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
 	return false;
 }
 
+static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
+{
+	if (cnt->usage >= cnt->threshold_above) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
+		return;
+	}
+
+	if (cnt->usage < cnt->threshold_below) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
+		return;
+	}
+}
+
+
 /**
  * Get the difference between the usage and the soft limit
  * @cnt: The counter
@@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
 	return 0;
 }
 
+static inline int
+res_counter_set_thresholds(struct res_counter *cnt,
+		unsigned long long threshold_above,
+		unsigned long long threshold_below)
+{
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if ((cnt->usage < threshold_above) &&
+			(cnt->usage >= threshold_below)) {
+		cnt->threshold_above = threshold_above;
+		cnt->threshold_below = threshold_below;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 #endif
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index bcdabf3..646c29c 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
 	spin_lock_init(&counter->lock);
 	counter->limit = RESOURCE_MAX;
 	counter->soft_limit = RESOURCE_MAX;
+	counter->threshold_above = RESOURCE_MAX;
+	counter->threshold_below = 0ULL;
 	counter->parent = parent;
 }
 
@@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 	counter->usage += val;
 	if (counter->usage > counter->max_usage)
 		counter->max_usage = counter->usage;
+	res_counter_threshold_notify_locked(counter);
 	return 0;
 }
 
@@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
 		val = counter->usage;
 
 	counter->usage -= val;
+	res_counter_threshold_notify_locked(counter);
 }
 
 void res_counter_uncharge(struct res_counter *counter, unsigned long val)
-- 
1.6.5.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 2/3] res_counter: implement thresholds
       [not found] ` <cover.1259321503.git.kirill@shutemov.name>
@ 2009-11-27 11:55     ` Kirill A. Shutemov
  2009-11-27 11:55     ` Kirill A. Shutemov
  2009-11-27 11:55     ` Kirill A. Shutemov
  2 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, Dan Malek, Vladislav Buzov,
	Daisuke Nishimura, linux-kernel, Kirill A. Shutemov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
 kernel/res_counter.c        |    4 +++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index fcb9884..bca99a5 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -9,6 +9,10 @@
  *
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Thresholds support
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * See Documentation/cgroups/resource_counter.txt for more
  * info about what this counter is.
  */
@@ -42,6 +46,13 @@ struct res_counter {
 	 * the number of unsuccessful attempts to consume the resource
 	 */
 	unsigned long long failcnt;
+
+	unsigned long long threshold_above;
+	unsigned long long threshold_below;
+	void (*threshold_notifier)(struct res_counter *counter,
+			unsigned long long usage,
+			unsigned long long threshold);
+
 	/*
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
@@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
 	return false;
 }
 
+static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
+{
+	if (cnt->usage >= cnt->threshold_above) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
+		return;
+	}
+
+	if (cnt->usage < cnt->threshold_below) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
+		return;
+	}
+}
+
+
 /**
  * Get the difference between the usage and the soft limit
  * @cnt: The counter
@@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
 	return 0;
 }
 
+static inline int
+res_counter_set_thresholds(struct res_counter *cnt,
+		unsigned long long threshold_above,
+		unsigned long long threshold_below)
+{
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if ((cnt->usage < threshold_above) &&
+			(cnt->usage >= threshold_below)) {
+		cnt->threshold_above = threshold_above;
+		cnt->threshold_below = threshold_below;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 #endif
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index bcdabf3..646c29c 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
 	spin_lock_init(&counter->lock);
 	counter->limit = RESOURCE_MAX;
 	counter->soft_limit = RESOURCE_MAX;
+	counter->threshold_above = RESOURCE_MAX;
+	counter->threshold_below = 0ULL;
 	counter->parent = parent;
 }
 
@@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 	counter->usage += val;
 	if (counter->usage > counter->max_usage)
 		counter->max_usage = counter->usage;
+	res_counter_threshold_notify_locked(counter);
 	return 0;
 }
 
@@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
 		val = counter->usage;
 
 	counter->usage -= val;
+	res_counter_threshold_notify_locked(counter);
 }
 
 void res_counter_uncharge(struct res_counter *counter, unsigned long val)
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 2/3] res_counter: implement thresholds
@ 2009-11-27 11:55     ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, Dan Malek, Vladislav Buzov,
	Daisuke Nishimura, linux-kernel, Kirill A. Shutemov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 include/linux/res_counter.h |   44 +++++++++++++++++++++++++++++++++++++++++++
 kernel/res_counter.c        |    4 +++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index fcb9884..bca99a5 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -9,6 +9,10 @@
  *
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Thresholds support
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * See Documentation/cgroups/resource_counter.txt for more
  * info about what this counter is.
  */
@@ -42,6 +46,13 @@ struct res_counter {
 	 * the number of unsuccessful attempts to consume the resource
 	 */
 	unsigned long long failcnt;
+
+	unsigned long long threshold_above;
+	unsigned long long threshold_below;
+	void (*threshold_notifier)(struct res_counter *counter,
+			unsigned long long usage,
+			unsigned long long threshold);
+
 	/*
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
@@ -145,6 +156,20 @@ static inline bool res_counter_soft_limit_check_locked(struct res_counter *cnt)
 	return false;
 }
 
+static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
+{
+	if (cnt->usage >= cnt->threshold_above) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_above);
+		return;
+	}
+
+	if (cnt->usage < cnt->threshold_below) {
+		cnt->threshold_notifier(cnt, cnt->usage, cnt->threshold_below);
+		return;
+	}
+}
+
+
 /**
  * Get the difference between the usage and the soft limit
  * @cnt: The counter
@@ -238,4 +263,23 @@ res_counter_set_soft_limit(struct res_counter *cnt,
 	return 0;
 }
 
+static inline int
+res_counter_set_thresholds(struct res_counter *cnt,
+		unsigned long long threshold_above,
+		unsigned long long threshold_below)
+{
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if ((cnt->usage < threshold_above) &&
+			(cnt->usage >= threshold_below)) {
+		cnt->threshold_above = threshold_above;
+		cnt->threshold_below = threshold_below;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 #endif
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index bcdabf3..646c29c 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -20,6 +20,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
 	spin_lock_init(&counter->lock);
 	counter->limit = RESOURCE_MAX;
 	counter->soft_limit = RESOURCE_MAX;
+	counter->threshold_above = RESOURCE_MAX;
+	counter->threshold_below = 0ULL;
 	counter->parent = parent;
 }
 
@@ -33,6 +35,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 	counter->usage += val;
 	if (counter->usage > counter->max_usage)
 		counter->max_usage = counter->usage;
+	res_counter_threshold_notify_locked(counter);
 	return 0;
 }
 
@@ -73,6 +76,7 @@ void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
 		val = counter->usage;
 
 	counter->usage -= val;
+	res_counter_threshold_notify_locked(counter);
 }
 
 void res_counter_uncharge(struct res_counter *counter, unsigned long val)
-- 
1.6.5.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 3/3] memcg: implement memory thresholds
       [not found]   ` <8524ba285f6dd59cda939c28da523f344cdab3da.1259321503.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2009-11-27 11:55     ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: Daisuke Nishimura, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Dan Malek, Vladislav Buzov, Paul Menage, Balbir Singh,
	Andrew Morton, Pavel Emelyanov

It allows to register multiple memory and memsw thresholds and gets
notifications when it crosses.

To register a threshold application need:
- create an eventfd;
- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
- write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
  cgroup.event_control.

Application will be notified through eventfd when memory usage crosses
threshold in any direction.

Signed-off-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
---
 mm/memcontrol.c |  224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 224 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f99f599..333f67e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6,6 +6,10 @@
  * Copyright 2007 OpenVZ SWsoft Inc
  * Author: Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  *
+ * Memory thresholds
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -38,6 +42,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
+#include <linux/eventfd.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -174,6 +179,12 @@ struct mem_cgroup_tree {
 
 static struct mem_cgroup_tree soft_limit_tree __read_mostly;
 
+struct mem_cgroup_threshold {
+	struct list_head list;
+	struct eventfd_ctx *eventfd;
+	u64 threshold;
+};
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -225,6 +236,12 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	struct list_head thresholds;
+	struct mem_cgroup_threshold *current_threshold;
+
+	struct list_head memsw_thresholds;
+	struct mem_cgroup_threshold *memsw_current_threshold;
+
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -2839,12 +2856,184 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static inline void mem_cgroup_set_thresholds(struct res_counter *counter,
+		u64 above, u64 below)
+{
+	BUG_ON(res_counter_set_thresholds(counter, above, below));
+}
+
+static void mem_cgroup_threshold(struct mem_cgroup *memcg,
+		struct res_counter *counter, u64 usage, u64 threshold)
+{
+	struct mem_cgroup_threshold *above, *below;
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+
+	if (&memcg->res == counter) {
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+	} else if (&memcg->memsw == counter) {
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+	} else
+		BUG();
+
+	above = below = *current_threshold;
+
+	if (threshold <= usage) {
+		list_for_each_entry_continue(above, thresholds, list) {
+			if (above->threshold > usage)
+				break;
+			below = above;
+			eventfd_signal(below->eventfd, 1);
+		}
+	} else {
+		list_for_each_entry_continue_reverse(below, thresholds, list) {
+			eventfd_signal(above->eventfd, 1);
+			if (below->threshold <= usage)
+				break;
+			above = below;
+		}
+	}
+
+	mem_cgroup_set_thresholds(counter, above->threshold, below->threshold);
+	*current_threshold = below;
+}
+
+static void mem_cgroup_mem_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter, struct mem_cgroup,
+			res);
+
+	mem_cgroup_threshold(memcg, counter, usage, threshold);
+}
+
+static void mem_cgroup_memsw_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter, struct mem_cgroup,
+			memsw);
+
+	mem_cgroup_threshold(memcg, counter, usage, threshold);
+}
+
+static void mem_cgroup_invalidate_thresholds(struct res_counter *counter,
+		struct list_head *thresholds,
+		struct mem_cgroup_threshold **current_threshold)
+{
+	struct mem_cgroup_threshold *tmp, *prev = NULL;
+
+	list_for_each_entry(tmp, thresholds, list) {
+		if (tmp->threshold > counter->usage) {
+			BUG_ON(!prev);
+			*current_threshold = prev;
+			break;
+		}
+		prev = tmp;
+	}
+
+	mem_cgroup_set_thresholds(counter, tmp->threshold, prev->threshold);
+}
+
+static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd, const char *args)
+{
+	u64 threshold;
+	struct mem_cgroup_threshold *new, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+	struct res_counter *counter;
+	int type = MEMFILE_TYPE(cft->private);
+	int ret;
+
+	/* XXX: Should we implement thresholds for root cgroup */
+	if (mem_cgroup_is_root(memcg))
+		return -EINVAL;
+
+	ret = res_counter_memparse_write_strategy(args, &threshold);
+	if (ret)
+		return ret;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->eventfd = eventfd;
+	new->threshold = threshold;
+
+	switch (type) {
+	case _MEM:
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+		counter = &memcg->res;
+		break;
+	case _MEMSWAP:
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+		counter = &memcg->memsw;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	list_for_each_entry(tmp, thresholds, list)
+		if (new->threshold < tmp->threshold) {
+			list_add_tail(&new->list, &tmp->list);
+			break;
+		}
+	mem_cgroup_invalidate_thresholds(counter, thresholds,
+			current_threshold);
+
+	return 0;
+}
+
+static int mem_cgroup_unregister_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd)
+{
+	struct mem_cgroup_threshold *threshold, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+	struct res_counter *counter;
+	int type = MEMFILE_TYPE(cft->private);
+
+	switch (type) {
+	case _MEM:
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+		counter = &memcg->res;
+		break;
+	case _MEMSWAP:
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+		counter = &memcg->memsw;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	list_for_each_entry_safe(threshold, tmp, thresholds, list)
+		if (threshold->eventfd == eventfd) {
+			list_del(&threshold->list);
+			kfree(threshold);
+		}
+	mem_cgroup_invalidate_thresholds(counter, thresholds,
+			current_threshold);
+
+	return 0;
+}
 
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "max_usage_in_bytes",
@@ -2896,6 +3085,8 @@ static struct cftype memsw_cgroup_files[] = {
 		.name = "memsw.usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "memsw.max_usage_in_bytes",
@@ -3080,6 +3271,33 @@ static int mem_cgroup_soft_limit_tree_init(void)
 	return 0;
 }
 
+
+static int mem_cgroup_thresholds_init(struct list_head *thresholds,
+		struct mem_cgroup_threshold **current_threshold)
+{
+	struct mem_cgroup_threshold *new;
+
+	INIT_LIST_HEAD(thresholds);
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = 0ULL;
+	list_add(&new->list, thresholds);
+
+	*current_threshold = new;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = RESOURCE_MAX;
+	list_add_tail(&new->list, thresholds);
+
+	return 0;
+}
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
@@ -3125,6 +3343,12 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	mem->last_scanned_child = 0;
 	spin_lock_init(&mem->reclaim_param_lock);
 
+	mem->res.threshold_notifier = mem_cgroup_mem_threshold;
+	mem->memsw.threshold_notifier = mem_cgroup_memsw_threshold;
+	mem_cgroup_thresholds_init(&mem->thresholds, &mem->current_threshold);
+	mem_cgroup_thresholds_init(&mem->memsw_thresholds,
+			&mem->memsw_current_threshold);
+
 	if (parent)
 		mem->swappiness = get_swappiness(parent);
 	atomic_set(&mem->refcnt, 1);
-- 
1.6.5.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 3/3] memcg: implement memory thresholds
       [not found] ` <cover.1259321503.git.kirill@shutemov.name>
@ 2009-11-27 11:55     ` Kirill A. Shutemov
  2009-11-27 11:55     ` Kirill A. Shutemov
  2009-11-27 11:55     ` Kirill A. Shutemov
  2 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, Dan Malek, Vladislav Buzov,
	Daisuke Nishimura, linux-kernel, Kirill A. Shutemov

It allows to register multiple memory and memsw thresholds and gets
notifications when it crosses.

To register a threshold application need:
- create an eventfd;
- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
- write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
  cgroup.event_control.

Application will be notified through eventfd when memory usage crosses
threshold in any direction.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 mm/memcontrol.c |  224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 224 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f99f599..333f67e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6,6 +6,10 @@
  * Copyright 2007 OpenVZ SWsoft Inc
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Memory thresholds
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -38,6 +42,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
+#include <linux/eventfd.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -174,6 +179,12 @@ struct mem_cgroup_tree {
 
 static struct mem_cgroup_tree soft_limit_tree __read_mostly;
 
+struct mem_cgroup_threshold {
+	struct list_head list;
+	struct eventfd_ctx *eventfd;
+	u64 threshold;
+};
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -225,6 +236,12 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	struct list_head thresholds;
+	struct mem_cgroup_threshold *current_threshold;
+
+	struct list_head memsw_thresholds;
+	struct mem_cgroup_threshold *memsw_current_threshold;
+
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -2839,12 +2856,184 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static inline void mem_cgroup_set_thresholds(struct res_counter *counter,
+		u64 above, u64 below)
+{
+	BUG_ON(res_counter_set_thresholds(counter, above, below));
+}
+
+static void mem_cgroup_threshold(struct mem_cgroup *memcg,
+		struct res_counter *counter, u64 usage, u64 threshold)
+{
+	struct mem_cgroup_threshold *above, *below;
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+
+	if (&memcg->res == counter) {
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+	} else if (&memcg->memsw == counter) {
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+	} else
+		BUG();
+
+	above = below = *current_threshold;
+
+	if (threshold <= usage) {
+		list_for_each_entry_continue(above, thresholds, list) {
+			if (above->threshold > usage)
+				break;
+			below = above;
+			eventfd_signal(below->eventfd, 1);
+		}
+	} else {
+		list_for_each_entry_continue_reverse(below, thresholds, list) {
+			eventfd_signal(above->eventfd, 1);
+			if (below->threshold <= usage)
+				break;
+			above = below;
+		}
+	}
+
+	mem_cgroup_set_thresholds(counter, above->threshold, below->threshold);
+	*current_threshold = below;
+}
+
+static void mem_cgroup_mem_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter, struct mem_cgroup,
+			res);
+
+	mem_cgroup_threshold(memcg, counter, usage, threshold);
+}
+
+static void mem_cgroup_memsw_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter, struct mem_cgroup,
+			memsw);
+
+	mem_cgroup_threshold(memcg, counter, usage, threshold);
+}
+
+static void mem_cgroup_invalidate_thresholds(struct res_counter *counter,
+		struct list_head *thresholds,
+		struct mem_cgroup_threshold **current_threshold)
+{
+	struct mem_cgroup_threshold *tmp, *prev = NULL;
+
+	list_for_each_entry(tmp, thresholds, list) {
+		if (tmp->threshold > counter->usage) {
+			BUG_ON(!prev);
+			*current_threshold = prev;
+			break;
+		}
+		prev = tmp;
+	}
+
+	mem_cgroup_set_thresholds(counter, tmp->threshold, prev->threshold);
+}
+
+static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd, const char *args)
+{
+	u64 threshold;
+	struct mem_cgroup_threshold *new, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+	struct res_counter *counter;
+	int type = MEMFILE_TYPE(cft->private);
+	int ret;
+
+	/* XXX: Should we implement thresholds for root cgroup */
+	if (mem_cgroup_is_root(memcg))
+		return -EINVAL;
+
+	ret = res_counter_memparse_write_strategy(args, &threshold);
+	if (ret)
+		return ret;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->eventfd = eventfd;
+	new->threshold = threshold;
+
+	switch (type) {
+	case _MEM:
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+		counter = &memcg->res;
+		break;
+	case _MEMSWAP:
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+		counter = &memcg->memsw;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	list_for_each_entry(tmp, thresholds, list)
+		if (new->threshold < tmp->threshold) {
+			list_add_tail(&new->list, &tmp->list);
+			break;
+		}
+	mem_cgroup_invalidate_thresholds(counter, thresholds,
+			current_threshold);
+
+	return 0;
+}
+
+static int mem_cgroup_unregister_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd)
+{
+	struct mem_cgroup_threshold *threshold, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+	struct res_counter *counter;
+	int type = MEMFILE_TYPE(cft->private);
+
+	switch (type) {
+	case _MEM:
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+		counter = &memcg->res;
+		break;
+	case _MEMSWAP:
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+		counter = &memcg->memsw;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	list_for_each_entry_safe(threshold, tmp, thresholds, list)
+		if (threshold->eventfd == eventfd) {
+			list_del(&threshold->list);
+			kfree(threshold);
+		}
+	mem_cgroup_invalidate_thresholds(counter, thresholds,
+			current_threshold);
+
+	return 0;
+}
 
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "max_usage_in_bytes",
@@ -2896,6 +3085,8 @@ static struct cftype memsw_cgroup_files[] = {
 		.name = "memsw.usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "memsw.max_usage_in_bytes",
@@ -3080,6 +3271,33 @@ static int mem_cgroup_soft_limit_tree_init(void)
 	return 0;
 }
 
+
+static int mem_cgroup_thresholds_init(struct list_head *thresholds,
+		struct mem_cgroup_threshold **current_threshold)
+{
+	struct mem_cgroup_threshold *new;
+
+	INIT_LIST_HEAD(thresholds);
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = 0ULL;
+	list_add(&new->list, thresholds);
+
+	*current_threshold = new;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = RESOURCE_MAX;
+	list_add_tail(&new->list, thresholds);
+
+	return 0;
+}
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
@@ -3125,6 +3343,12 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	mem->last_scanned_child = 0;
 	spin_lock_init(&mem->reclaim_param_lock);
 
+	mem->res.threshold_notifier = mem_cgroup_mem_threshold;
+	mem->memsw.threshold_notifier = mem_cgroup_memsw_threshold;
+	mem_cgroup_thresholds_init(&mem->thresholds, &mem->current_threshold);
+	mem_cgroup_thresholds_init(&mem->memsw_thresholds,
+			&mem->memsw_current_threshold);
+
 	if (parent)
 		mem->swappiness = get_swappiness(parent);
 	atomic_set(&mem->refcnt, 1);
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 3/3] memcg: implement memory thresholds
@ 2009-11-27 11:55     ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-27 11:55 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, Dan Malek, Vladislav Buzov,
	Daisuke Nishimura, linux-kernel, Kirill A. Shutemov

It allows to register multiple memory and memsw thresholds and gets
notifications when it crosses.

To register a threshold application need:
- create an eventfd;
- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
- write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
  cgroup.event_control.

Application will be notified through eventfd when memory usage crosses
threshold in any direction.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 mm/memcontrol.c |  224 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 224 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f99f599..333f67e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6,6 +6,10 @@
  * Copyright 2007 OpenVZ SWsoft Inc
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Memory thresholds
+ * Copyright (C) 2009 Nokia Corporation
+ * Author: Kirill A. Shutemov
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -38,6 +42,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mm_inline.h>
 #include <linux/page_cgroup.h>
+#include <linux/eventfd.h>
 #include "internal.h"
 
 #include <asm/uaccess.h>
@@ -174,6 +179,12 @@ struct mem_cgroup_tree {
 
 static struct mem_cgroup_tree soft_limit_tree __read_mostly;
 
+struct mem_cgroup_threshold {
+	struct list_head list;
+	struct eventfd_ctx *eventfd;
+	u64 threshold;
+};
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -225,6 +236,12 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	struct list_head thresholds;
+	struct mem_cgroup_threshold *current_threshold;
+
+	struct list_head memsw_thresholds;
+	struct mem_cgroup_threshold *memsw_current_threshold;
+
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -2839,12 +2856,184 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+static inline void mem_cgroup_set_thresholds(struct res_counter *counter,
+		u64 above, u64 below)
+{
+	BUG_ON(res_counter_set_thresholds(counter, above, below));
+}
+
+static void mem_cgroup_threshold(struct mem_cgroup *memcg,
+		struct res_counter *counter, u64 usage, u64 threshold)
+{
+	struct mem_cgroup_threshold *above, *below;
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+
+	if (&memcg->res == counter) {
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+	} else if (&memcg->memsw == counter) {
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+	} else
+		BUG();
+
+	above = below = *current_threshold;
+
+	if (threshold <= usage) {
+		list_for_each_entry_continue(above, thresholds, list) {
+			if (above->threshold > usage)
+				break;
+			below = above;
+			eventfd_signal(below->eventfd, 1);
+		}
+	} else {
+		list_for_each_entry_continue_reverse(below, thresholds, list) {
+			eventfd_signal(above->eventfd, 1);
+			if (below->threshold <= usage)
+				break;
+			above = below;
+		}
+	}
+
+	mem_cgroup_set_thresholds(counter, above->threshold, below->threshold);
+	*current_threshold = below;
+}
+
+static void mem_cgroup_mem_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter, struct mem_cgroup,
+			res);
+
+	mem_cgroup_threshold(memcg, counter, usage, threshold);
+}
+
+static void mem_cgroup_memsw_threshold(struct res_counter *counter, u64 usage,
+		u64 threshold)
+{
+	struct mem_cgroup *memcg = container_of(counter, struct mem_cgroup,
+			memsw);
+
+	mem_cgroup_threshold(memcg, counter, usage, threshold);
+}
+
+static void mem_cgroup_invalidate_thresholds(struct res_counter *counter,
+		struct list_head *thresholds,
+		struct mem_cgroup_threshold **current_threshold)
+{
+	struct mem_cgroup_threshold *tmp, *prev = NULL;
+
+	list_for_each_entry(tmp, thresholds, list) {
+		if (tmp->threshold > counter->usage) {
+			BUG_ON(!prev);
+			*current_threshold = prev;
+			break;
+		}
+		prev = tmp;
+	}
+
+	mem_cgroup_set_thresholds(counter, tmp->threshold, prev->threshold);
+}
+
+static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd, const char *args)
+{
+	u64 threshold;
+	struct mem_cgroup_threshold *new, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+	struct res_counter *counter;
+	int type = MEMFILE_TYPE(cft->private);
+	int ret;
+
+	/* XXX: Should we implement thresholds for root cgroup */
+	if (mem_cgroup_is_root(memcg))
+		return -EINVAL;
+
+	ret = res_counter_memparse_write_strategy(args, &threshold);
+	if (ret)
+		return ret;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->eventfd = eventfd;
+	new->threshold = threshold;
+
+	switch (type) {
+	case _MEM:
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+		counter = &memcg->res;
+		break;
+	case _MEMSWAP:
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+		counter = &memcg->memsw;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	list_for_each_entry(tmp, thresholds, list)
+		if (new->threshold < tmp->threshold) {
+			list_add_tail(&new->list, &tmp->list);
+			break;
+		}
+	mem_cgroup_invalidate_thresholds(counter, thresholds,
+			current_threshold);
+
+	return 0;
+}
+
+static int mem_cgroup_unregister_event(struct cgroup *cgrp, struct cftype *cft,
+		struct eventfd_ctx *eventfd)
+{
+	struct mem_cgroup_threshold *threshold, *tmp;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	struct list_head *thresholds;
+	struct mem_cgroup_threshold **current_threshold;
+	struct res_counter *counter;
+	int type = MEMFILE_TYPE(cft->private);
+
+	switch (type) {
+	case _MEM:
+		thresholds = &memcg->thresholds;
+		current_threshold = &memcg->current_threshold;
+		counter = &memcg->res;
+		break;
+	case _MEMSWAP:
+		thresholds = &memcg->memsw_thresholds;
+		current_threshold = &memcg->memsw_current_threshold;
+		counter = &memcg->memsw;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	list_for_each_entry_safe(threshold, tmp, thresholds, list)
+		if (threshold->eventfd == eventfd) {
+			list_del(&threshold->list);
+			kfree(threshold);
+		}
+	mem_cgroup_invalidate_thresholds(counter, thresholds,
+			current_threshold);
+
+	return 0;
+}
 
 static struct cftype mem_cgroup_files[] = {
 	{
 		.name = "usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "max_usage_in_bytes",
@@ -2896,6 +3085,8 @@ static struct cftype memsw_cgroup_files[] = {
 		.name = "memsw.usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
 		.read_u64 = mem_cgroup_read,
+		.register_event = mem_cgroup_register_event,
+		.unregister_event = mem_cgroup_unregister_event,
 	},
 	{
 		.name = "memsw.max_usage_in_bytes",
@@ -3080,6 +3271,33 @@ static int mem_cgroup_soft_limit_tree_init(void)
 	return 0;
 }
 
+
+static int mem_cgroup_thresholds_init(struct list_head *thresholds,
+		struct mem_cgroup_threshold **current_threshold)
+{
+	struct mem_cgroup_threshold *new;
+
+	INIT_LIST_HEAD(thresholds);
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = 0ULL;
+	list_add(&new->list, thresholds);
+
+	*current_threshold = new;
+
+	new = kmalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&new->list);
+	new->threshold = RESOURCE_MAX;
+	list_add_tail(&new->list, thresholds);
+
+	return 0;
+}
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
@@ -3125,6 +3343,12 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	mem->last_scanned_child = 0;
 	spin_lock_init(&mem->reclaim_param_lock);
 
+	mem->res.threshold_notifier = mem_cgroup_mem_threshold;
+	mem->memsw.threshold_notifier = mem_cgroup_memsw_threshold;
+	mem_cgroup_thresholds_init(&mem->thresholds, &mem->current_threshold);
+	mem_cgroup_thresholds_init(&mem->memsw_thresholds,
+			&mem->memsw_current_threshold);
+
 	if (parent)
 		mem->swappiness = get_swappiness(parent);
 	atomic_set(&mem->refcnt, 1);
-- 
1.6.5.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 2/3] res_counter: implement thresholds
       [not found]     ` <bc4dc055a7307c8667da85a4d4d9d5d189af27d5.1259248846.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
@ 2009-11-26 16:27       ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 16:27 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Paul Menage, Balbir Singh,
	Andrew Morton, Pavel Emelyanov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>

-- 
1.6.5.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 2/3] res_counter: implement thresholds
  2009-11-26 16:27 ` [PATCH RFC v0 1/3] cgroup: implement eventfd-based generic API for notifications Kirill A. Shutemov
@ 2009-11-26 16:27     ` Kirill A. Shutemov
       [not found]   ` <cover.1259248846.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
  1 sibling, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 16:27 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>

-- 
1.6.5.3


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH RFC v0 2/3] res_counter: implement thresholds
@ 2009-11-26 16:27     ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2009-11-26 16:27 UTC (permalink / raw)
  To: containers, linux-mm
  Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
	Balbir Singh, Pavel Emelyanov, linux-kernel, Kirill A. Shutemov

It allows to setup two thresholds: one above current usage and one
below. Callback threshold_notifier() will be called if a threshold is
crossed.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>

-- 
1.6.5.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2009-11-27 11:55 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-26 17:11 [PATCH RFC v0 0/3] cgroup notifications API and memory thresholds Kirill A. Shutemov
2009-11-26 17:11 ` Kirill A. Shutemov
2009-11-26 17:11 ` [PATCH RFC v0 1/3] cgroup: implement eventfd-based generic API for notifications Kirill A. Shutemov
2009-11-26 17:11   ` Kirill A. Shutemov
2009-11-26 17:11   ` [PATCH RFC v0 2/3] res_counter: implement thresholds Kirill A. Shutemov
2009-11-26 17:11     ` Kirill A. Shutemov
2009-11-26 17:11     ` [PATCH RFC v0 3/3] memcg: implement memory thresholds Kirill A. Shutemov
2009-11-26 17:11       ` Kirill A. Shutemov
2009-11-27  0:20     ` [PATCH RFC v0 2/3] res_counter: implement thresholds Daisuke Nishimura
2009-11-27  0:20       ` Daisuke Nishimura
2009-11-27  2:45       ` KAMEZAWA Hiroyuki
2009-11-27  2:45         ` KAMEZAWA Hiroyuki
     [not found]         ` <20091127114511.bbb43d5a.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2009-11-27  3:08           ` Balbir Singh
2009-11-27  3:08         ` Balbir Singh
2009-11-27  3:08           ` Balbir Singh
2009-11-27  7:08           ` Kirill A. Shutemov
2009-11-27  7:08             ` Kirill A. Shutemov
     [not found]           ` <661de9470911261908i4bb51e91v649025e6c75bd91b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-27  7:08             ` Kirill A. Shutemov
     [not found]       ` <20091127092035.bbf2efdc.nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2009-11-27  2:45         ` KAMEZAWA Hiroyuki
     [not found] ` <cover.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2009-11-26 17:11   ` [PATCH RFC v0 1/3] cgroup: implement eventfd-based generic API for notifications Kirill A. Shutemov
     [not found]   ` <bc4dc055a7307c8667da85a4d4d9d5d189af27d5.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2009-11-26 17:11     ` [PATCH RFC v0 2/3] res_counter: implement thresholds Kirill A. Shutemov
     [not found]   ` <8524ba285f6dd59cda939c28da523f344cdab3da.1259255307.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2009-11-26 17:11     ` [PATCH RFC v0 3/3] memcg: implement memory thresholds Kirill A. Shutemov
2009-11-27  0:20     ` [PATCH RFC v0 2/3] res_counter: implement thresholds Daisuke Nishimura
2009-11-27 11:55   ` [PATCH RFC v1 1/3] cgroup: implement eventfd-based generic API for notifications Kirill A. Shutemov
     [not found] ` <cover.1259321503.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
     [not found]   ` <bc4dc055a7307c8667da85a4d4d9d5d189af27d5.1259321503.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2009-11-27 11:55     ` [PATCH RFC v1 2/3] res_counter: implement thresholds Kirill A. Shutemov
     [not found]   ` <8524ba285f6dd59cda939c28da523f344cdab3da.1259321503.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2009-11-27 11:55     ` [PATCH RFC v1 3/3] memcg: implement memory thresholds Kirill A. Shutemov
     [not found] ` <cover.1259321503.git.kirill@shutemov.name>
2009-11-27 11:55   ` [PATCH RFC v1 1/3] cgroup: implement eventfd-based generic API for notifications Kirill A. Shutemov
2009-11-27 11:55     ` Kirill A. Shutemov
2009-11-27 11:55   ` [PATCH RFC v1 2/3] res_counter: implement thresholds Kirill A. Shutemov
2009-11-27 11:55     ` Kirill A. Shutemov
2009-11-27 11:55   ` [PATCH RFC v1 3/3] memcg: implement memory thresholds Kirill A. Shutemov
2009-11-27 11:55     ` Kirill A. Shutemov
  -- strict thread matches above, loose matches on Subject: below --
2009-11-26 16:27 [PATCH RFC v0 0/3] cgroup notifications API and " Kirill A. Shutemov
2009-11-26 16:27 ` [PATCH RFC v0 1/3] cgroup: implement eventfd-based generic API for notifications Kirill A. Shutemov
2009-11-26 16:27   ` [PATCH RFC v0 2/3] res_counter: implement thresholds Kirill A. Shutemov
2009-11-26 16:27     ` Kirill A. Shutemov
     [not found]   ` <cover.1259248846.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
     [not found]     ` <bc4dc055a7307c8667da85a4d4d9d5d189af27d5.1259248846.git.kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2009-11-26 16:27       ` Kirill A. Shutemov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.