All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] reuse vmpressure for in-kernel events
@ 2013-04-23  8:22 ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-23  8:22 UTC (permalink / raw)
  To: linux-mm; +Cc: cgroups, Andrew Morton

During the past weeks, it became clear to us that the shrinker interface we
have right now works very well for some particular types of users, but not that
well for others. The later are usually people interested in one-shot
notifications, that were forced to adapt themselves to the count+scan behavior
of shrinkers. To do so, they had no choice than to greatly abuse the shrinker
interface producing little monsters all over.

During LSF/MM, one of the proposals that popped out during our session was to
reuse Anton Voronstsov's vmpressure for this. They are designed for userspace
consumption, but also provide a well-stablished, cgroup-aware entry point for
notifications.

I am demonstrating this interface by registering a memcg event that tries to
get rid of all dead caches in the system. We never used a shrinker for that
because it didn't feel too natural. The new interface integrates quite nicely,
though.

Please note that due to my lack of understanding of each shrinker user, I will
stay away from converting the actual users, you are all welcome to do so.

Glauber Costa (2):
  vmpressure: in-kernel notifications
  memcg: reap dead memcgs under pressure

 include/linux/vmpressure.h |  6 ++++
 mm/memcontrol.c            | 81 ++++++++++++++++++++++++++++++++++++++++++++--
 mm/vmpressure.c            | 48 ++++++++++++++++++++++++---
 3 files changed, 128 insertions(+), 7 deletions(-)

-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/2] reuse vmpressure for in-kernel events
@ 2013-04-23  8:22 ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-23  8:22 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

During the past weeks, it became clear to us that the shrinker interface we
have right now works very well for some particular types of users, but not that
well for others. The later are usually people interested in one-shot
notifications, that were forced to adapt themselves to the count+scan behavior
of shrinkers. To do so, they had no choice than to greatly abuse the shrinker
interface producing little monsters all over.

During LSF/MM, one of the proposals that popped out during our session was to
reuse Anton Voronstsov's vmpressure for this. They are designed for userspace
consumption, but also provide a well-stablished, cgroup-aware entry point for
notifications.

I am demonstrating this interface by registering a memcg event that tries to
get rid of all dead caches in the system. We never used a shrinker for that
because it didn't feel too natural. The new interface integrates quite nicely,
though.

Please note that due to my lack of understanding of each shrinker user, I will
stay away from converting the actual users, you are all welcome to do so.

Glauber Costa (2):
  vmpressure: in-kernel notifications
  memcg: reap dead memcgs under pressure

 include/linux/vmpressure.h |  6 ++++
 mm/memcontrol.c            | 81 ++++++++++++++++++++++++++++++++++++++++++++--
 mm/vmpressure.c            | 48 ++++++++++++++++++++++++---
 3 files changed, 128 insertions(+), 7 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-23  8:22   ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-23  8:22 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, Andrew Morton, Glauber Costa, Dave Chinner,
	Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
	Kamezawa Hiroyuki, Johannes Weiner

From: Glauber Costa <glommer@parallels.com>

During the past weeks, it became clear to us that the shrinker interface
we have right now works very well for some particular types of users,
but not that well for others. The later are usually people interested in
one-shot notifications, that were forced to adapt themselves to the
count+scan behavior of shrinkers. To do so, they had no choice than to
greatly abuse the shrinker interface producing little monsters all over.

During LSF/MM, one of the proposals that popped out during our session
was to reuse Anton Voronstsov's vmpressure for this. They are designed
for userspace consumption, but also provide a well-stablished,
cgroup-aware entry point for notifications.

This patch extends that to also support in-kernel users. Events that
should be generated for in-kernel consumption will be marked as such,
and for those, we will call a registered function instead of triggering
an eventfd notification.

Please note that due to my lack of understanding of each shrinker user,
I will stay away from converting the actual users, you are all welcome
to do so.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/vmpressure.h |  6 ++++++
 mm/vmpressure.c            | 48 ++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 76be077..1862012 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -19,6 +19,9 @@ struct vmpressure {
 	/* Have to grab the lock on events traversal or modifications. */
 	struct mutex events_lock;
 
+	/* false if only kernel users want to be notified, true otherwise */
+	bool notify_userspace;
+
 	struct work_struct work;
 };
 
@@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
 extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 				     struct eventfd_ctx *eventfd,
 				     const char *args);
+
+extern int vmpressure_register_kernel_event(struct cgroup *cg,
+					    void (*fn)(void));
 extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
 					struct eventfd_ctx *eventfd);
 #else
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 736a601..8d77ad0 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
 }
 
 struct vmpressure_event {
-	struct eventfd_ctx *efd;
+	union {
+		struct eventfd_ctx *efd;
+		void (*fn)(void);
+	};
 	enum vmpressure_levels level;
+	bool kernel_event;
 	struct list_head node;
 };
 
@@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
 	mutex_lock(&vmpr->events_lock);
 
 	list_for_each_entry(ev, &vmpr->events, node) {
-		if (level >= ev->level) {
+		if (ev->kernel_event)
+			ev->fn();
+		else if (vmpr->notify_userspace && (level >= ev->level)) {
 			eventfd_signal(ev->efd, 1);
 			signalled = true;
 		}
@@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	 * we account it too.
 	 */
 	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
-		return;
+		goto schedule;
 
 	/*
 	 * If we got here with no pages scanned, then that is an indicator
@@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	 * through vmpressure_prio(). But so far, keep calm.
 	 */
 	if (!scanned)
-		return;
+		goto schedule;
 
 	mutex_lock(&vmpr->sr_lock);
 	vmpr->scanned += scanned;
 	vmpr->reclaimed += reclaimed;
+	vmpr->notify_userspace = true;
 	scanned = vmpr->scanned;
 	mutex_unlock(&vmpr->sr_lock);
 
+schedule:
 	if (scanned < vmpressure_win || work_pending(&vmpr->work))
 		return;
 	schedule_work(&vmpr->work);
@@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 }
 
 /**
+ * vmpressure_register_kernel_event() - Register kernel-side notification
+ * @cg:		cgroup that is interested in vmpressure notifications
+ * @fn:		function to be called when pressure happens
+ *
+ * This function register in-kernel users interested in receiving notifications
+ * about pressure conditions. Pressure notifications will be triggered at the
+ * same time as userspace notifications (with no particular ordering relative
+ * to it).
+ *
+ * Pressure notifications are a alternative method to shrinkers and will serve
+ * well users that are interested in a one-shot notification, with a
+ * well-defined cgroup aware interface.
+ */
+int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
+{
+	struct vmpressure *vmpr = cg_to_vmpressure(cg);
+	struct vmpressure_event *ev;
+
+	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+	if (!ev)
+		return -ENOMEM;
+
+	ev->kernel_event = true;
+	ev->fn = fn;
+
+	mutex_lock(&vmpr->events_lock);
+	list_add(&ev->node, &vmpr->events);
+	mutex_unlock(&vmpr->events_lock);
+	return 0;
+}
+
+/**
  * vmpressure_unregister_event() - Unbind eventfd from vmpressure
  * @cg:		cgroup handle
  * @cft:	cgroup control files handle
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-23  8:22   ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-23  8:22 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Andrew Morton, Glauber Costa,
	Dave Chinner, Anton Vorontsov, John Stultz, Joonsoo Kim,
	Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

During the past weeks, it became clear to us that the shrinker interface
we have right now works very well for some particular types of users,
but not that well for others. The later are usually people interested in
one-shot notifications, that were forced to adapt themselves to the
count+scan behavior of shrinkers. To do so, they had no choice than to
greatly abuse the shrinker interface producing little monsters all over.

During LSF/MM, one of the proposals that popped out during our session
was to reuse Anton Voronstsov's vmpressure for this. They are designed
for userspace consumption, but also provide a well-stablished,
cgroup-aware entry point for notifications.

This patch extends that to also support in-kernel users. Events that
should be generated for in-kernel consumption will be marked as such,
and for those, we will call a registered function instead of triggering
an eventfd notification.

Please note that due to my lack of understanding of each shrinker user,
I will stay away from converting the actual users, you are all welcome
to do so.

Signed-off-by: Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Cc: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: Anton Vorontsov <anton.vorontsov-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Joonsoo Kim <js1304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
---
 include/linux/vmpressure.h |  6 ++++++
 mm/vmpressure.c            | 48 ++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 76be077..1862012 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -19,6 +19,9 @@ struct vmpressure {
 	/* Have to grab the lock on events traversal or modifications. */
 	struct mutex events_lock;
 
+	/* false if only kernel users want to be notified, true otherwise */
+	bool notify_userspace;
+
 	struct work_struct work;
 };
 
@@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
 extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 				     struct eventfd_ctx *eventfd,
 				     const char *args);
+
+extern int vmpressure_register_kernel_event(struct cgroup *cg,
+					    void (*fn)(void));
 extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
 					struct eventfd_ctx *eventfd);
 #else
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 736a601..8d77ad0 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
 }
 
 struct vmpressure_event {
-	struct eventfd_ctx *efd;
+	union {
+		struct eventfd_ctx *efd;
+		void (*fn)(void);
+	};
 	enum vmpressure_levels level;
+	bool kernel_event;
 	struct list_head node;
 };
 
@@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
 	mutex_lock(&vmpr->events_lock);
 
 	list_for_each_entry(ev, &vmpr->events, node) {
-		if (level >= ev->level) {
+		if (ev->kernel_event)
+			ev->fn();
+		else if (vmpr->notify_userspace && (level >= ev->level)) {
 			eventfd_signal(ev->efd, 1);
 			signalled = true;
 		}
@@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	 * we account it too.
 	 */
 	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
-		return;
+		goto schedule;
 
 	/*
 	 * If we got here with no pages scanned, then that is an indicator
@@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	 * through vmpressure_prio(). But so far, keep calm.
 	 */
 	if (!scanned)
-		return;
+		goto schedule;
 
 	mutex_lock(&vmpr->sr_lock);
 	vmpr->scanned += scanned;
 	vmpr->reclaimed += reclaimed;
+	vmpr->notify_userspace = true;
 	scanned = vmpr->scanned;
 	mutex_unlock(&vmpr->sr_lock);
 
+schedule:
 	if (scanned < vmpressure_win || work_pending(&vmpr->work))
 		return;
 	schedule_work(&vmpr->work);
@@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 }
 
 /**
+ * vmpressure_register_kernel_event() - Register kernel-side notification
+ * @cg:		cgroup that is interested in vmpressure notifications
+ * @fn:		function to be called when pressure happens
+ *
+ * This function register in-kernel users interested in receiving notifications
+ * about pressure conditions. Pressure notifications will be triggered at the
+ * same time as userspace notifications (with no particular ordering relative
+ * to it).
+ *
+ * Pressure notifications are a alternative method to shrinkers and will serve
+ * well users that are interested in a one-shot notification, with a
+ * well-defined cgroup aware interface.
+ */
+int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
+{
+	struct vmpressure *vmpr = cg_to_vmpressure(cg);
+	struct vmpressure_event *ev;
+
+	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+	if (!ev)
+		return -ENOMEM;
+
+	ev->kernel_event = true;
+	ev->fn = fn;
+
+	mutex_lock(&vmpr->events_lock);
+	list_add(&ev->node, &vmpr->events);
+	mutex_unlock(&vmpr->events_lock);
+	return 0;
+}
+
+/**
  * vmpressure_unregister_event() - Unbind eventfd from vmpressure
  * @cg:		cgroup handle
  * @cft:	cgroup control files handle
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] memcg: reap dead memcgs under pressure
@ 2013-04-23  8:22   ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-23  8:22 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, Andrew Morton, Glauber Costa, Dave Chinner,
	Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
	KAMEZAWA Hiroyuki, Johannes Weiner

From: Glauber Costa <glommer@parallels.com>

When we delete kmem-enabled memcgs, they can still be zombieing around
for a while. The reason is that the objects may still be alive, and we
won't be able to delete them at destruction time.

My initial patchset included a custom sleep-and-wake mechanism that
would keep trying to destroy them, but that was frowned upon, with the
argument that the pressure system should handle that.

The only entry point for that, though, are the shrinkers. The shrinker
interface, however, is not exactly tailored to our needs. It could be a
little bit better by using the API Dave Chinner proposed, but it is
still not ideal since we aren't really a count-and-scan event, but more
a one-off flush-all-you-can event that would have to abuse that somehow.

My in-flight shinkers patchset would eventually introduce a custom point
for dead memcg reapings, but during LSF/MM, we started to consider using
the newly introduced vmpressure events for this. With that, we will
blend well into a one shot kind of event.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>

---
Andrew, should you consider merging this, you will conflict with the
debugging patch "memcg: debugging facility to access dangling memcgs".
This is because I am reusing its infrastructure to build this.
Please remove it, and I will provide a new version of that patch that
adds just the debugging file.
---
 mm/memcontrol.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c92bcfc..33b118c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -319,8 +319,16 @@ struct mem_cgroup {
 	/* thresholds for mem+swap usage. RCU-protected */
 	struct mem_cgroup_thresholds memsw_thresholds;
 
-	/* For oom notifier event fd */
-	struct list_head oom_notify;
+	union {
+		/* For oom notifier event fd */
+		struct list_head oom_notify;
+		/*
+		 * we can only trigger an oom event if the memcg is alive.
+		 * so we will reuse this field to hook the memcg in the list
+		 * of dead memcgs.
+		 */
+		struct list_head dead;
+	};
 
 	/*
 	 * Should we move charges of a task when a task is moved into this
@@ -380,6 +388,24 @@ static size_t memcg_size(void)
 		nr_node_ids * sizeof(struct mem_cgroup_per_node);
 }
 
+static LIST_HEAD(dangling_memcgs);
+static DEFINE_MUTEX(dangling_memcgs_mutex);
+
+static inline void memcg_dangling_free(struct mem_cgroup *memcg)
+{
+	mutex_lock(&dangling_memcgs_mutex);
+	list_del(&memcg->dead);
+	mutex_unlock(&dangling_memcgs_mutex);
+}
+
+static inline void memcg_dangling_add(struct mem_cgroup *memcg)
+{
+	INIT_LIST_HEAD(&memcg->dead);
+	mutex_lock(&dangling_memcgs_mutex);
+	list_add(&memcg->dead, &dangling_memcgs);
+	mutex_unlock(&dangling_memcgs_mutex);
+}
+
 /* internal only representation about the status of kmem accounting. */
 enum {
 	KMEM_ACCOUNTED_ACTIVE = 0, /* accounted by this cgroup itself */
@@ -5856,6 +5882,46 @@ static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
 	if (memcg_kmem_test_and_clear_dead(memcg))
 		mem_cgroup_put(memcg);
 }
+
+static void memcg_vmpressure_shrink_dead(void)
+{
+	struct memcg_cache_params *params, *tmp;
+	struct kmem_cache *cachep;
+	struct mem_cgroup *memcg;
+
+	mutex_lock(&dangling_memcgs_mutex);
+	list_for_each_entry(memcg, &dangling_memcgs, dead) {
+
+		mem_cgroup_get(memcg);
+		mutex_lock(&memcg->slab_caches_mutex);
+		/* The element may go away as an indirect result of shrink */
+		list_for_each_entry_safe(params, tmp,
+					 &memcg->memcg_slab_caches, list) {
+
+			cachep = memcg_params_to_cache(params);
+			/*
+			 * the cpu_hotplug lock is taken in kmem_cache_create
+			 * outside the slab_caches_mutex manipulation. It will
+			 * be taken by kmem_cache_shrink to flush the cache.
+			 * So we need to drop the lock. It is all right because
+			 * the lock only protects elements moving in and out the
+			 * list.
+			 */
+			mutex_unlock(&memcg->slab_caches_mutex);
+			kmem_cache_shrink(cachep);
+			mutex_lock(&memcg->slab_caches_mutex);
+		}
+		mutex_unlock(&memcg->slab_caches_mutex);
+		mem_cgroup_put(memcg);
+	}
+	mutex_unlock(&dangling_memcgs_mutex);
+}
+
+static void memcg_register_kmem_events(struct cgroup *cont)
+{
+	vmpressure_register_kernel_event(cont, memcg_vmpressure_shrink_dead);
+}
+
 #else
 static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 {
@@ -5865,6 +5931,10 @@ static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
 {
 }
+
+static void memcg_register_kmem_events(struct mem_cgroup *memcg)
+{
+}
 #endif
 
 static struct cftype mem_cgroup_files[] = {
@@ -6121,6 +6191,8 @@ static void free_work(struct work_struct *work)
 	struct mem_cgroup *memcg;
 
 	memcg = container_of(work, struct mem_cgroup, work_freeing);
+
+	memcg_dangling_free(memcg);
 	__mem_cgroup_free(memcg);
 }
 
@@ -6231,8 +6303,10 @@ mem_cgroup_css_online(struct cgroup *cont)
 	struct mem_cgroup *memcg, *parent;
 	int error = 0;
 
-	if (!cont->parent)
+	if (!cont->parent) {
+		memcg_register_kmem_events(cont);
 		return 0;
+	}
 
 	mutex_lock(&memcg_create_mutex);
 	memcg = mem_cgroup_from_cont(cont);
@@ -6315,6 +6389,7 @@ static void mem_cgroup_css_free(struct cgroup *cont)
 
 	kmem_cgroup_destroy(memcg);
 
+	memcg_dangling_add(memcg);
 	mem_cgroup_put(memcg);
 }
 
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] memcg: reap dead memcgs under pressure
@ 2013-04-23  8:22   ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-23  8:22 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Andrew Morton, Glauber Costa,
	Dave Chinner, Anton Vorontsov, John Stultz, Joonsoo Kim,
	Michal Hocko, KAMEZAWA Hiroyuki, Johannes Weiner

From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>

When we delete kmem-enabled memcgs, they can still be zombieing around
for a while. The reason is that the objects may still be alive, and we
won't be able to delete them at destruction time.

My initial patchset included a custom sleep-and-wake mechanism that
would keep trying to destroy them, but that was frowned upon, with the
argument that the pressure system should handle that.

The only entry point for that, though, are the shrinkers. The shrinker
interface, however, is not exactly tailored to our needs. It could be a
little bit better by using the API Dave Chinner proposed, but it is
still not ideal since we aren't really a count-and-scan event, but more
a one-off flush-all-you-can event that would have to abuse that somehow.

My in-flight shinkers patchset would eventually introduce a custom point
for dead memcg reapings, but during LSF/MM, we started to consider using
the newly introduced vmpressure events for this. With that, we will
blend well into a one shot kind of event.

Signed-off-by: Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Cc: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: Anton Vorontsov <anton.vorontsov-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Joonsoo Kim <js1304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

---
Andrew, should you consider merging this, you will conflict with the
debugging patch "memcg: debugging facility to access dangling memcgs".
This is because I am reusing its infrastructure to build this.
Please remove it, and I will provide a new version of that patch that
adds just the debugging file.
---
 mm/memcontrol.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c92bcfc..33b118c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -319,8 +319,16 @@ struct mem_cgroup {
 	/* thresholds for mem+swap usage. RCU-protected */
 	struct mem_cgroup_thresholds memsw_thresholds;
 
-	/* For oom notifier event fd */
-	struct list_head oom_notify;
+	union {
+		/* For oom notifier event fd */
+		struct list_head oom_notify;
+		/*
+		 * we can only trigger an oom event if the memcg is alive.
+		 * so we will reuse this field to hook the memcg in the list
+		 * of dead memcgs.
+		 */
+		struct list_head dead;
+	};
 
 	/*
 	 * Should we move charges of a task when a task is moved into this
@@ -380,6 +388,24 @@ static size_t memcg_size(void)
 		nr_node_ids * sizeof(struct mem_cgroup_per_node);
 }
 
+static LIST_HEAD(dangling_memcgs);
+static DEFINE_MUTEX(dangling_memcgs_mutex);
+
+static inline void memcg_dangling_free(struct mem_cgroup *memcg)
+{
+	mutex_lock(&dangling_memcgs_mutex);
+	list_del(&memcg->dead);
+	mutex_unlock(&dangling_memcgs_mutex);
+}
+
+static inline void memcg_dangling_add(struct mem_cgroup *memcg)
+{
+	INIT_LIST_HEAD(&memcg->dead);
+	mutex_lock(&dangling_memcgs_mutex);
+	list_add(&memcg->dead, &dangling_memcgs);
+	mutex_unlock(&dangling_memcgs_mutex);
+}
+
 /* internal only representation about the status of kmem accounting. */
 enum {
 	KMEM_ACCOUNTED_ACTIVE = 0, /* accounted by this cgroup itself */
@@ -5856,6 +5882,46 @@ static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
 	if (memcg_kmem_test_and_clear_dead(memcg))
 		mem_cgroup_put(memcg);
 }
+
+static void memcg_vmpressure_shrink_dead(void)
+{
+	struct memcg_cache_params *params, *tmp;
+	struct kmem_cache *cachep;
+	struct mem_cgroup *memcg;
+
+	mutex_lock(&dangling_memcgs_mutex);
+	list_for_each_entry(memcg, &dangling_memcgs, dead) {
+
+		mem_cgroup_get(memcg);
+		mutex_lock(&memcg->slab_caches_mutex);
+		/* The element may go away as an indirect result of shrink */
+		list_for_each_entry_safe(params, tmp,
+					 &memcg->memcg_slab_caches, list) {
+
+			cachep = memcg_params_to_cache(params);
+			/*
+			 * the cpu_hotplug lock is taken in kmem_cache_create
+			 * outside the slab_caches_mutex manipulation. It will
+			 * be taken by kmem_cache_shrink to flush the cache.
+			 * So we need to drop the lock. It is all right because
+			 * the lock only protects elements moving in and out the
+			 * list.
+			 */
+			mutex_unlock(&memcg->slab_caches_mutex);
+			kmem_cache_shrink(cachep);
+			mutex_lock(&memcg->slab_caches_mutex);
+		}
+		mutex_unlock(&memcg->slab_caches_mutex);
+		mem_cgroup_put(memcg);
+	}
+	mutex_unlock(&dangling_memcgs_mutex);
+}
+
+static void memcg_register_kmem_events(struct cgroup *cont)
+{
+	vmpressure_register_kernel_event(cont, memcg_vmpressure_shrink_dead);
+}
+
 #else
 static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 {
@@ -5865,6 +5931,10 @@ static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
 static void kmem_cgroup_destroy(struct mem_cgroup *memcg)
 {
 }
+
+static void memcg_register_kmem_events(struct mem_cgroup *memcg)
+{
+}
 #endif
 
 static struct cftype mem_cgroup_files[] = {
@@ -6121,6 +6191,8 @@ static void free_work(struct work_struct *work)
 	struct mem_cgroup *memcg;
 
 	memcg = container_of(work, struct mem_cgroup, work_freeing);
+
+	memcg_dangling_free(memcg);
 	__mem_cgroup_free(memcg);
 }
 
@@ -6231,8 +6303,10 @@ mem_cgroup_css_online(struct cgroup *cont)
 	struct mem_cgroup *memcg, *parent;
 	int error = 0;
 
-	if (!cont->parent)
+	if (!cont->parent) {
+		memcg_register_kmem_events(cont);
 		return 0;
+	}
 
 	mutex_lock(&memcg_create_mutex);
 	memcg = mem_cgroup_from_cont(cont);
@@ -6315,6 +6389,7 @@ static void mem_cgroup_css_free(struct cgroup *cont)
 
 	kmem_cgroup_destroy(memcg);
 
+	memcg_dangling_add(memcg);
 	mem_cgroup_put(memcg);
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-23 17:11     ` Anton Vorontsov
  0 siblings, 0 replies; 26+ messages in thread
From: Anton Vorontsov @ 2013-04-23 17:11 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, John Stultz,
	Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
> From: Glauber Costa <glommer@parallels.com>
[...]
> This patch extends that to also support in-kernel users.

Yup, that is the next logical step. ;-) The patches look good to me, just
one question...

> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * we account it too.
>  	 */
>  	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))

I wonder if we want to let kernel users to specify the gfp mask here? The
current mask is good for userspace notifications, but in-kernel users
might be interested in including (or excluding) different types of
allocations, e.g. watch only for DMA allocations pressure?

Thanks!

Anton

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-23 17:11     ` Anton Vorontsov
  0 siblings, 0 replies; 26+ messages in thread
From: Anton Vorontsov @ 2013-04-23 17:11 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, Dave Chinner, John Stultz, Joonsoo Kim,
	Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
> From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
[...]
> This patch extends that to also support in-kernel users.

Yup, that is the next logical step. ;-) The patches look good to me, just
one question...

> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * we account it too.
>  	 */
>  	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))

I wonder if we want to let kernel users to specify the gfp mask here? The
current mask is good for userspace notifications, but in-kernel users
might be interested in including (or excluding) different types of
allocations, e.g. watch only for DMA allocations pressure?

Thanks!

Anton

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-23 17:11     ` Anton Vorontsov
  (?)
@ 2013-04-23 18:17     ` Glauber Costa
  -1 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-23 18:17 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
	Johannes Weiner

On 04/23/2013 10:11 AM, Anton Vorontsov wrote:
> On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
>> From: Glauber Costa <glommer@parallels.com>
> [...]
>> This patch extends that to also support in-kernel users.
>
> Yup, that is the next logical step. ;-) The patches look good to me, just
> one question...
>
>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>   	 * we account it too.
>>   	 */
>>   	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>
> I wonder if we want to let kernel users to specify the gfp mask here? The
> current mask is good for userspace notifications, but in-kernel users
> might be interested in including (or excluding) different types of
> allocations, e.g. watch only for DMA allocations pressure?
>

That is outside of the scope of this patch anyway. For this one, if you 
believe it is good, could I have your tag? =)

But answering your question regardless of the scope, I believe the 
context of the allocation is an implementation detail of the kernel - 
regardless of how widely understood it is. The thing I like the most 
about your work, is precisely the fact that is hides the implementation 
details so well.

So unless there is a strong use case that would benefit from it, I am 
inclined to say this is not wanted.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-23 19:13     ` Pekka Enberg
  0 siblings, 0 replies; 26+ messages in thread
From: Pekka Enberg @ 2013-04-23 19:13 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, <cgroups@vger.kernel.org>,
	Andrew Morton, Dave Chinner, Anton Vorontsov, John Stultz,
	Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

On Tue, Apr 23, 2013 at 11:22 AM, Glauber Costa <glommer@openvz.org> wrote:
> From: Glauber Costa <glommer@parallels.com>
>
> During the past weeks, it became clear to us that the shrinker interface
> we have right now works very well for some particular types of users,
> but not that well for others. The later are usually people interested in
> one-shot notifications, that were forced to adapt themselves to the
> count+scan behavior of shrinkers. To do so, they had no choice than to
> greatly abuse the shrinker interface producing little monsters all over.
>
> During LSF/MM, one of the proposals that popped out during our session
> was to reuse Anton Voronstsov's vmpressure for this. They are designed
> for userspace consumption, but also provide a well-stablished,
> cgroup-aware entry point for notifications.
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
>
> Please note that due to my lack of understanding of each shrinker user,
> I will stay away from converting the actual users, you are all welcome
> to do so.
>
> Signed-off-by: Glauber Costa <glommer@openvz.org>

Looks good to me.

Acked-by: Pekka Enberg <penberg@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-23 19:13     ` Pekka Enberg
  0 siblings, 0 replies; 26+ messages in thread
From: Pekka Enberg @ 2013-04-23 19:13 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Andrew Morton, Dave Chinner, Anton Vorontsov, John Stultz,
	Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

On Tue, Apr 23, 2013 at 11:22 AM, Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>
> During the past weeks, it became clear to us that the shrinker interface
> we have right now works very well for some particular types of users,
> but not that well for others. The later are usually people interested in
> one-shot notifications, that were forced to adapt themselves to the
> count+scan behavior of shrinkers. To do so, they had no choice than to
> greatly abuse the shrinker interface producing little monsters all over.
>
> During LSF/MM, one of the proposals that popped out during our session
> was to reuse Anton Voronstsov's vmpressure for this. They are designed
> for userspace consumption, but also provide a well-stablished,
> cgroup-aware entry point for notifications.
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
>
> Please note that due to my lack of understanding of each shrinker user,
> I will stay away from converting the actual users, you are all welcome
> to do so.
>
> Signed-off-by: Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>

Looks good to me.

Acked-by: Pekka Enberg <penberg-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-23  8:22   ` Glauber Costa
                     ` (2 preceding siblings ...)
  (?)
@ 2013-04-23 20:24   ` Anton Vorontsov
  2013-04-23 21:01       ` Anton Vorontsov
                       ` (2 more replies)
  -1 siblings, 3 replies; 26+ messages in thread
From: Anton Vorontsov @ 2013-04-23 20:24 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, John Stultz,
	Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
> From: Glauber Costa <glommer@parallels.com>
> 
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.

Just a couple more questions... :-)

[...]
> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * through vmpressure_prio(). But so far, keep calm.
>  	 */
>  	if (!scanned)
> -		return;
> +		goto schedule;
>  
>  	mutex_lock(&vmpr->sr_lock);
>  	vmpr->scanned += scanned;
>  	vmpr->reclaimed += reclaimed;
> +	vmpr->notify_userspace = true;

Setting the variable on every event seems a bit wasteful... does it make
sense to set it in vmpressure_register_event()? We'll have to make it a
counter, but the good thing is that we won't need any additional locks for
the counter.

>  /**
> + * vmpressure_register_kernel_event() - Register kernel-side notification

Why don't we need the unregister function? I see that the memcg portion
deals with dangling memcgs, but do they dangle forver?

Oh, and a few cosmetic changes down below...

Other than that, this particular patch looks perfect, feel free to add my:

	Acked-by: Anton Vorontsov <anton@enomsg.org>

Thanks!

Anton


diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 1862012..3131e72 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -19,7 +19,7 @@ struct vmpressure {
 	/* Have to grab the lock on events traversal or modifications. */
 	struct mutex events_lock;
 
-	/* false if only kernel users want to be notified, true otherwise */
+	/* False if only kernel users want to be notified, true otherwise. */
 	bool notify_userspace;
 
 	struct work_struct work;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 8d77ad0..acd3e66 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -156,9 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
 	mutex_lock(&vmpr->events_lock);
 
 	list_for_each_entry(ev, &vmpr->events, node) {
-		if (ev->kernel_event)
+		if (ev->kernel_event) {
 			ev->fn();
-		else if (vmpr->notify_userspace && (level >= ev->level)) {
+		} else if (vmpr->notify_userspace && level >= ev->level) {
 			eventfd_signal(ev->efd, 1);
 			signalled = true;
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-23 20:24   ` Anton Vorontsov
@ 2013-04-23 21:01       ` Anton Vorontsov
  2013-04-24  6:26     ` Glauber Costa
  2013-04-24 11:20     ` Glauber Costa
  2 siblings, 0 replies; 26+ messages in thread
From: Anton Vorontsov @ 2013-04-23 21:01 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, John Stultz,
	Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

On Tue, Apr 23, 2013 at 04:24:46PM -0400, Anton Vorontsov wrote:
[...]
> >  /**
> > + * vmpressure_register_kernel_event() - Register kernel-side notification
> 
> Why don't we need the unregister function? I see that the memcg portion
> deals with dangling memcgs, but do they dangle forver?

Oh, I got it. vmpressure_unregister_event() will unregister all the events
anyway. Cool.

Thanks!

Anton

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-23 21:01       ` Anton Vorontsov
  0 siblings, 0 replies; 26+ messages in thread
From: Anton Vorontsov @ 2013-04-23 21:01 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, Dave Chinner, John Stultz, Joonsoo Kim,
	Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

On Tue, Apr 23, 2013 at 04:24:46PM -0400, Anton Vorontsov wrote:
[...]
> >  /**
> > + * vmpressure_register_kernel_event() - Register kernel-side notification
> 
> Why don't we need the unregister function? I see that the memcg portion
> deals with dangling memcgs, but do they dangle forver?

Oh, I got it. vmpressure_unregister_event() will unregister all the events
anyway. Cool.

Thanks!

Anton

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-23 20:24   ` Anton Vorontsov
  2013-04-23 21:01       ` Anton Vorontsov
@ 2013-04-24  6:26     ` Glauber Costa
  2013-04-24 11:20     ` Glauber Costa
  2 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-24  6:26 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
	Johannes Weiner

On 04/24/2013 12:24 AM, Anton Vorontsov wrote:
> On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
>> From: Glauber Costa <glommer@parallels.com>
>>
>> This patch extends that to also support in-kernel users. Events that
>> should be generated for in-kernel consumption will be marked as such,
>> and for those, we will call a registered function instead of triggering
>> an eventfd notification.
>
> Just a couple more questions... :-)
>
> [...]
>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>   	 * through vmpressure_prio(). But so far, keep calm.
>>   	 */
>>   	if (!scanned)
>> -		return;
>> +		goto schedule;
>>
>>   	mutex_lock(&vmpr->sr_lock);
>>   	vmpr->scanned += scanned;
>>   	vmpr->reclaimed += reclaimed;
>> +	vmpr->notify_userspace = true;
>
> Setting the variable on every event seems a bit wasteful... does it make
> sense to set it in vmpressure_register_event()? We'll have to make it a
> counter, but the good thing is that we won't need any additional locks for
> the counter.
>
Yes, vmpressure_register_event would be a better place for it. I will 
change and keep the acks, since it does not change the spirit of the 
patch too much.

I will also apply the cosmetics you attached. Thanks

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-23  8:22   ` Glauber Costa
                     ` (3 preceding siblings ...)
  (?)
@ 2013-04-24  7:21   ` Greg Thelen
  2013-04-24  8:36     ` Glauber Costa
  -1 siblings, 1 reply; 26+ messages in thread
From: Greg Thelen @ 2013-04-24  7:21 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, Anton Vorontsov,
	John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
	Johannes Weiner

On Tue, Apr 23 2013, Glauber Costa wrote:

> From: Glauber Costa <glommer@parallels.com>
>
> During the past weeks, it became clear to us that the shrinker interface
> we have right now works very well for some particular types of users,
> but not that well for others. The later are usually people interested in
> one-shot notifications, that were forced to adapt themselves to the
> count+scan behavior of shrinkers. To do so, they had no choice than to
> greatly abuse the shrinker interface producing little monsters all over.
>
> During LSF/MM, one of the proposals that popped out during our session
> was to reuse Anton Voronstsov's vmpressure for this. They are designed
> for userspace consumption, but also provide a well-stablished,
> cgroup-aware entry point for notifications.
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
>
> Please note that due to my lack of understanding of each shrinker user,
> I will stay away from converting the actual users, you are all welcome
> to do so.
>
> Signed-off-by: Glauber Costa <glommer@openvz.org>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/vmpressure.h |  6 ++++++
>  mm/vmpressure.c            | 48 ++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> index 76be077..1862012 100644
> --- a/include/linux/vmpressure.h
> +++ b/include/linux/vmpressure.h
> @@ -19,6 +19,9 @@ struct vmpressure {
>  	/* Have to grab the lock on events traversal or modifications. */
>  	struct mutex events_lock;
>  
> +	/* false if only kernel users want to be notified, true otherwise */
> +	bool notify_userspace;
> +
>  	struct work_struct work;
>  };
>  
> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>  extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>  				     struct eventfd_ctx *eventfd,
>  				     const char *args);
> +
> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
> +					    void (*fn)(void));
>  extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>  					struct eventfd_ctx *eventfd);
>  #else
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 736a601..8d77ad0 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>  }
>  
>  struct vmpressure_event {
> -	struct eventfd_ctx *efd;
> +	union {
> +		struct eventfd_ctx *efd;
> +		void (*fn)(void);
> +	};
>  	enum vmpressure_levels level;
> +	bool kernel_event;
>  	struct list_head node;
>  };
>  
> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>  	mutex_lock(&vmpr->events_lock);
>  
>  	list_for_each_entry(ev, &vmpr->events, node) {
> -		if (level >= ev->level) {
> +		if (ev->kernel_event)
> +			ev->fn();
> +		else if (vmpr->notify_userspace && (level >= ev->level)) {
>  			eventfd_signal(ev->efd, 1);
>  			signalled = true;
>  		}
> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * we account it too.
>  	 */
>  	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
> -		return;
> +		goto schedule;
>  
>  	/*
>  	 * If we got here with no pages scanned, then that is an indicator
> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * through vmpressure_prio(). But so far, keep calm.
>  	 */
>  	if (!scanned)
> -		return;
> +		goto schedule;

If goto schedule is taken here then scanned==0.  Then
scanned<vmpressure_win (below), so this function would always simply
return.  So this change seems like a no-op.  Should the schedule: below
be just before schedule_work(&vmpr->work)?  But this wouldn't do much
either because vmpressure_work_fn() would immediately return if
vmpr->scanned==0.  Presumable the idea is to avoid notifying user space
or kernel callbacks if lru pages are not scanned - at least until
vmpressure_prio() is called with a priority more desperate than
vmpressure_level_critical_prio at which time this function's scanned!=0.

>  
>  	mutex_lock(&vmpr->sr_lock);
>  	vmpr->scanned += scanned;
>  	vmpr->reclaimed += reclaimed;
> +	vmpr->notify_userspace = true;
>  	scanned = vmpr->scanned;
>  	mutex_unlock(&vmpr->sr_lock);
>  
> +schedule:
>  	if (scanned < vmpressure_win || work_pending(&vmpr->work))
>  		return;
>  	schedule_work(&vmpr->work);
> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>  }
>  
>  /**
> + * vmpressure_register_kernel_event() - Register kernel-side notification
> + * @cg:		cgroup that is interested in vmpressure notifications
> + * @fn:		function to be called when pressure happens
> + *
> + * This function register in-kernel users interested in receiving notifications
> + * about pressure conditions. Pressure notifications will be triggered at the
> + * same time as userspace notifications (with no particular ordering relative
> + * to it).
> + *
> + * Pressure notifications are a alternative method to shrinkers and will serve
> + * well users that are interested in a one-shot notification, with a
> + * well-defined cgroup aware interface.
> + */
> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))

It seems useful to include the "struct cgroup *" as a parameter to fn.
This would allow for fn to shrink objects it's caching in the cgroup.

Also, why not allow level specification for kernel events?

It might be neat if vmpressure_register_event() used
vmpressure_register_kernel_event() with a callback function calls
eventfd_signal().  This would allow for a uniform event notification
type which is agnostic of user vs kernel.  However, as proposed there
are different signaling conditions.  So I'm not sure it's worth the time
to combine the even types.  So feel free to ignore this paragraph.

> +{
> +	struct vmpressure *vmpr = cg_to_vmpressure(cg);
> +	struct vmpressure_event *ev;
> +
> +	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> +	if (!ev)
> +		return -ENOMEM;
> +
> +	ev->kernel_event = true;
> +	ev->fn = fn;
> +
> +	mutex_lock(&vmpr->events_lock);
> +	list_add(&ev->node, &vmpr->events);
> +	mutex_unlock(&vmpr->events_lock);
> +	return 0;
> +}
> +
> +/**
>   * vmpressure_unregister_event() - Unbind eventfd from vmpressure
>   * @cg:		cgroup handle
>   * @cft:	cgroup control files handle

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-24  7:21   ` Greg Thelen
@ 2013-04-24  8:36     ` Glauber Costa
  2013-04-24 19:35       ` Greg Thelen
  0 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2013-04-24  8:36 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
	Kamezawa Hiroyuki, Johannes Weiner

On 04/24/2013 11:21 AM, Greg Thelen wrote:
> On Tue, Apr 23 2013, Glauber Costa wrote:
> 
>> From: Glauber Costa <glommer@parallels.com>
>>
>> During the past weeks, it became clear to us that the shrinker interface
>> we have right now works very well for some particular types of users,
>> but not that well for others. The later are usually people interested in
>> one-shot notifications, that were forced to adapt themselves to the
>> count+scan behavior of shrinkers. To do so, they had no choice than to
>> greatly abuse the shrinker interface producing little monsters all over.
>>
>> During LSF/MM, one of the proposals that popped out during our session
>> was to reuse Anton Voronstsov's vmpressure for this. They are designed
>> for userspace consumption, but also provide a well-stablished,
>> cgroup-aware entry point for notifications.
>>
>> This patch extends that to also support in-kernel users. Events that
>> should be generated for in-kernel consumption will be marked as such,
>> and for those, we will call a registered function instead of triggering
>> an eventfd notification.
>>
>> Please note that due to my lack of understanding of each shrinker user,
>> I will stay away from converting the actual users, you are all welcome
>> to do so.
>>
>> Signed-off-by: Glauber Costa <glommer@openvz.org>
>> Cc: Dave Chinner <david@fromorbit.com>
>> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Joonsoo Kim <js1304@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.cz>
>> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>>  include/linux/vmpressure.h |  6 ++++++
>>  mm/vmpressure.c            | 48 ++++++++++++++++++++++++++++++++++++++++++----
>>  2 files changed, 50 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
>> index 76be077..1862012 100644
>> --- a/include/linux/vmpressure.h
>> +++ b/include/linux/vmpressure.h
>> @@ -19,6 +19,9 @@ struct vmpressure {
>>  	/* Have to grab the lock on events traversal or modifications. */
>>  	struct mutex events_lock;
>>  
>> +	/* false if only kernel users want to be notified, true otherwise */
>> +	bool notify_userspace;
>> +
>>  	struct work_struct work;
>>  };
>>  
>> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>>  extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>  				     struct eventfd_ctx *eventfd,
>>  				     const char *args);
>> +
>> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
>> +					    void (*fn)(void));
>>  extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>>  					struct eventfd_ctx *eventfd);
>>  #else
>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>> index 736a601..8d77ad0 100644
>> --- a/mm/vmpressure.c
>> +++ b/mm/vmpressure.c
>> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>>  }
>>  
>>  struct vmpressure_event {
>> -	struct eventfd_ctx *efd;
>> +	union {
>> +		struct eventfd_ctx *efd;
>> +		void (*fn)(void);
>> +	};
>>  	enum vmpressure_levels level;
>> +	bool kernel_event;
>>  	struct list_head node;
>>  };
>>  
>> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>>  	mutex_lock(&vmpr->events_lock);
>>  
>>  	list_for_each_entry(ev, &vmpr->events, node) {
>> -		if (level >= ev->level) {
>> +		if (ev->kernel_event)
>> +			ev->fn();
>> +		else if (vmpr->notify_userspace && (level >= ev->level)) {
>>  			eventfd_signal(ev->efd, 1);
>>  			signalled = true;
>>  		}
>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>  	 * we account it too.
>>  	 */
>>  	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>> -		return;
>> +		goto schedule;
>>  
>>  	/*
>>  	 * If we got here with no pages scanned, then that is an indicator
>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>  	 * through vmpressure_prio(). But so far, keep calm.
>>  	 */
>>  	if (!scanned)
>> -		return;
>> +		goto schedule;
> 
> If goto schedule is taken here then scanned==0.  Then
> scanned<vmpressure_win (below), so this function would always simply
> return.  So this change seems like a no-op.  Should the schedule: below
> be just before schedule_work(&vmpr->work)?  But this wouldn't do much
> either because vmpressure_work_fn() would immediately return if
> vmpr->scanned==0.  Presumable the idea is to avoid notifying user space
> or kernel callbacks if lru pages are not scanned - at least until
> vmpressure_prio() is called with a priority more desperate than
> vmpressure_level_critical_prio at which time this function's scanned!=0.
> 

Yes, the idea is to avoid calling the callbacks. I can just return at
this point if you prefer. I figured that jumping to the common entry
point would be more consistent, only that. I don't care either way.

>>  
>>  	mutex_lock(&vmpr->sr_lock);
>>  	vmpr->scanned += scanned;
>>  	vmpr->reclaimed += reclaimed;
>> +	vmpr->notify_userspace = true;
>>  	scanned = vmpr->scanned;
>>  	mutex_unlock(&vmpr->sr_lock);
>>  
>> +schedule:
>>  	if (scanned < vmpressure_win || work_pending(&vmpr->work))
>>  		return;
>>  	schedule_work(&vmpr->work);
>> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>  }
>>  
>>  /**
>> + * vmpressure_register_kernel_event() - Register kernel-side notification
>> + * @cg:		cgroup that is interested in vmpressure notifications
>> + * @fn:		function to be called when pressure happens
>> + *
>> + * This function register in-kernel users interested in receiving notifications
>> + * about pressure conditions. Pressure notifications will be triggered at the
>> + * same time as userspace notifications (with no particular ordering relative
>> + * to it).
>> + *
>> + * Pressure notifications are a alternative method to shrinkers and will serve
>> + * well users that are interested in a one-shot notification, with a
>> + * well-defined cgroup aware interface.
>> + */
>> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
> 
> It seems useful to include the "struct cgroup *" as a parameter to fn.
> This would allow for fn to shrink objects it's caching in the cgroup.
> 
> Also, why not allow level specification for kernel events?
> 
Because I don't want to overdesign. This is a in-kernel API, so we can
change it if we want to. There is only one user, and that is called from
the root cgroup, without level distinction.

The cgroup argument makes sense, but I would rather leave it as is for
now. As for levels, it might make sense as well, but I would much rather
leave the implementation to someone actually using them - specially
since this is not a simple parameter passing.

> It might be neat if vmpressure_register_event() used
> vmpressure_register_kernel_event() with a callback function calls
> eventfd_signal().  This would allow for a uniform event notification
> type which is agnostic of user vs kernel.  However, as proposed there
> are different signaling conditions.  So I'm not sure it's worth the time
> to combine the even types.  So feel free to ignore this paragraph.
> 

I don't think it is worth it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-23 20:24   ` Anton Vorontsov
  2013-04-23 21:01       ` Anton Vorontsov
  2013-04-24  6:26     ` Glauber Costa
@ 2013-04-24 11:20     ` Glauber Costa
  2 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-24 11:20 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
	Johannes Weiner

On 04/24/2013 12:24 AM, Anton Vorontsov wrote:
> Setting the variable on every event seems a bit wasteful... does it make
> sense to set it in vmpressure_register_event()? We'll have to make it a
> counter, but the good thing is that we won't need any additional locks for
> the counter.
My bad, I was not looking at the code.

There are two variables here:

One of them is an event variable: kernel_event. It is set to true upon
registration when we are registering a kernel event. We use it to decide
whether we should call a function or signal eventfd for that event.

The other, is a vmpr event and should indeed, be set in all pressure
isntances. It tells us if we should only trigger kernel events (if
false), or userspace events as well (if true).

This is because, for instance, userspace events may not always be
triggered (for instance, due to flags mismatch)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-24  8:36     ` Glauber Costa
@ 2013-04-24 19:35       ` Greg Thelen
  0 siblings, 0 replies; 26+ messages in thread
From: Greg Thelen @ 2013-04-24 19:35 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
	Kamezawa Hiroyuki, Johannes Weiner

On Wed, Apr 24 2013, Glauber Costa wrote:

> On 04/24/2013 11:21 AM, Greg Thelen wrote:
>> On Tue, Apr 23 2013, Glauber Costa wrote:
>> 
>>> From: Glauber Costa <glommer@parallels.com>
>>>
>>> During the past weeks, it became clear to us that the shrinker interface
>>> we have right now works very well for some particular types of users,
>>> but not that well for others. The later are usually people interested in
>>> one-shot notifications, that were forced to adapt themselves to the
>>> count+scan behavior of shrinkers. To do so, they had no choice than to
>>> greatly abuse the shrinker interface producing little monsters all over.
>>>
>>> During LSF/MM, one of the proposals that popped out during our session
>>> was to reuse Anton Voronstsov's vmpressure for this. They are designed
>>> for userspace consumption, but also provide a well-stablished,
>>> cgroup-aware entry point for notifications.
>>>
>>> This patch extends that to also support in-kernel users. Events that
>>> should be generated for in-kernel consumption will be marked as such,
>>> and for those, we will call a registered function instead of triggering
>>> an eventfd notification.
>>>
>>> Please note that due to my lack of understanding of each shrinker user,
>>> I will stay away from converting the actual users, you are all welcome
>>> to do so.
>>>
>>> Signed-off-by: Glauber Costa <glommer@openvz.org>
>>> Cc: Dave Chinner <david@fromorbit.com>
>>> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
>>> Cc: John Stultz <john.stultz@linaro.org>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Michal Hocko <mhocko@suse.cz>
>>> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> ---
>>>  include/linux/vmpressure.h |  6 ++++++
>>>  mm/vmpressure.c            | 48 ++++++++++++++++++++++++++++++++++++++++++----
>>>  2 files changed, 50 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
>>> index 76be077..1862012 100644
>>> --- a/include/linux/vmpressure.h
>>> +++ b/include/linux/vmpressure.h
>>> @@ -19,6 +19,9 @@ struct vmpressure {
>>>  	/* Have to grab the lock on events traversal or modifications. */
>>>  	struct mutex events_lock;
>>>  
>>> +	/* false if only kernel users want to be notified, true otherwise */
>>> +	bool notify_userspace;
>>> +
>>>  	struct work_struct work;
>>>  };
>>>  
>>> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>>>  extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>>  				     struct eventfd_ctx *eventfd,
>>>  				     const char *args);
>>> +
>>> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
>>> +					    void (*fn)(void));
>>>  extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>>>  					struct eventfd_ctx *eventfd);
>>>  #else
>>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>>> index 736a601..8d77ad0 100644
>>> --- a/mm/vmpressure.c
>>> +++ b/mm/vmpressure.c
>>> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>>>  }
>>>  
>>>  struct vmpressure_event {
>>> -	struct eventfd_ctx *efd;
>>> +	union {
>>> +		struct eventfd_ctx *efd;
>>> +		void (*fn)(void);
>>> +	};
>>>  	enum vmpressure_levels level;
>>> +	bool kernel_event;
>>>  	struct list_head node;
>>>  };
>>>  
>>> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>>>  	mutex_lock(&vmpr->events_lock);
>>>  
>>>  	list_for_each_entry(ev, &vmpr->events, node) {
>>> -		if (level >= ev->level) {
>>> +		if (ev->kernel_event)
>>> +			ev->fn();
>>> +		else if (vmpr->notify_userspace && (level >= ev->level)) {
>>>  			eventfd_signal(ev->efd, 1);
>>>  			signalled = true;
>>>  		}
>>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>>  	 * we account it too.
>>>  	 */
>>>  	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>>> -		return;
>>> +		goto schedule;
>>>  
>>>  	/*
>>>  	 * If we got here with no pages scanned, then that is an indicator
>>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>>  	 * through vmpressure_prio(). But so far, keep calm.
>>>  	 */
>>>  	if (!scanned)
>>> -		return;
>>> +		goto schedule;
>> 
>> If goto schedule is taken here then scanned==0.  Then
>> scanned<vmpressure_win (below), so this function would always simply
>> return.  So this change seems like a no-op.  Should the schedule: below
>> be just before schedule_work(&vmpr->work)?  But this wouldn't do much
>> either because vmpressure_work_fn() would immediately return if
>> vmpr->scanned==0.  Presumable the idea is to avoid notifying user space
>> or kernel callbacks if lru pages are not scanned - at least until
>> vmpressure_prio() is called with a priority more desperate than
>> vmpressure_level_critical_prio at which time this function's scanned!=0.
>> 
>
> Yes, the idea is to avoid calling the callbacks. I can just return at
> this point if you prefer. I figured that jumping to the common entry
> point would be more consistent, only that. I don't care either way.

Leave it as is.  I don't really care either way.

>>>  	mutex_lock(&vmpr->sr_lock);
>>>  	vmpr->scanned += scanned;
>>>  	vmpr->reclaimed += reclaimed;
>>> +	vmpr->notify_userspace = true;
>>>  	scanned = vmpr->scanned;
>>>  	mutex_unlock(&vmpr->sr_lock);
>>>  
>>> +schedule:
>>>  	if (scanned < vmpressure_win || work_pending(&vmpr->work))
>>>  		return;
>>>  	schedule_work(&vmpr->work);
>>> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>>  }
>>>  
>>>  /**
>>> + * vmpressure_register_kernel_event() - Register kernel-side notification
>>> + * @cg:		cgroup that is interested in vmpressure notifications
>>> + * @fn:		function to be called when pressure happens
>>> + *
>>> + * This function register in-kernel users interested in receiving notifications
>>> + * about pressure conditions. Pressure notifications will be triggered at the
>>> + * same time as userspace notifications (with no particular ordering relative
>>> + * to it).
>>> + *
>>> + * Pressure notifications are a alternative method to shrinkers and will serve
>>> + * well users that are interested in a one-shot notification, with a
>>> + * well-defined cgroup aware interface.
>>> + */
>>> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
>> 
>> It seems useful to include the "struct cgroup *" as a parameter to fn.
>> This would allow for fn to shrink objects it's caching in the cgroup.
>> 
>> Also, why not allow level specification for kernel events?
>> 
> Because I don't want to overdesign. This is a in-kernel API, so we can
> change it if we want to. There is only one user, and that is called from
> the root cgroup, without level distinction.
>
> The cgroup argument makes sense, but I would rather leave it as is for
> now. As for levels, it might make sense as well, but I would much rather
> leave the implementation to someone actually using them - specially
> since this is not a simple parameter passing.

If there's going to be a single global listener for now, then I agree.
Leave your change as-is.

>> It might be neat if vmpressure_register_event() used
>> vmpressure_register_kernel_event() with a callback function calls
>> eventfd_signal().  This would allow for a uniform event notification
>> type which is agnostic of user vs kernel.  However, as proposed there
>> are different signaling conditions.  So I'm not sure it's worth the time
>> to combine the even types.  So feel free to ignore this paragraph.
>> 
>
> I don't think it is worth it.

Fine with me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-24 19:42     ` Greg Thelen
  0 siblings, 0 replies; 26+ messages in thread
From: Greg Thelen @ 2013-04-24 19:42 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, Anton Vorontsov,
	John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
	Johannes Weiner

On Tue, Apr 23 2013, Glauber Costa wrote:

> From: Glauber Costa <glommer@parallels.com>
>
> During the past weeks, it became clear to us that the shrinker interface
> we have right now works very well for some particular types of users,
> but not that well for others. The later are usually people interested in
> one-shot notifications, that were forced to adapt themselves to the
> count+scan behavior of shrinkers. To do so, they had no choice than to
> greatly abuse the shrinker interface producing little monsters all over.
>
> During LSF/MM, one of the proposals that popped out during our session
> was to reuse Anton Voronstsov's vmpressure for this. They are designed
> for userspace consumption, but also provide a well-stablished,
> cgroup-aware entry point for notifications.
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
>
> Please note that due to my lack of understanding of each shrinker user,
> I will stay away from converting the actual users, you are all welcome
> to do so.
>
> Signed-off-by: Glauber Costa <glommer@openvz.org>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/vmpressure.h |  6 ++++++
>  mm/vmpressure.c            | 48 ++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> index 76be077..1862012 100644
> --- a/include/linux/vmpressure.h
> +++ b/include/linux/vmpressure.h
> @@ -19,6 +19,9 @@ struct vmpressure {
>  	/* Have to grab the lock on events traversal or modifications. */
>  	struct mutex events_lock;
>  
> +	/* false if only kernel users want to be notified, true otherwise */
> +	bool notify_userspace;
> +
>  	struct work_struct work;
>  };
>  
> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>  extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>  				     struct eventfd_ctx *eventfd,
>  				     const char *args);
> +
> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
> +					    void (*fn)(void));
>  extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>  					struct eventfd_ctx *eventfd);
>  #else
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 736a601..8d77ad0 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>  }
>  
>  struct vmpressure_event {
> -	struct eventfd_ctx *efd;
> +	union {
> +		struct eventfd_ctx *efd;
> +		void (*fn)(void);
> +	};
>  	enum vmpressure_levels level;
> +	bool kernel_event;
>  	struct list_head node;
>  };
>  
> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>  	mutex_lock(&vmpr->events_lock);
>  
>  	list_for_each_entry(ev, &vmpr->events, node) {
> -		if (level >= ev->level) {
> +		if (ev->kernel_event)
> +			ev->fn();
> +		else if (vmpr->notify_userspace && (level >= ev->level)) {
>  			eventfd_signal(ev->efd, 1);
>  			signalled = true;
>  		}
> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * we account it too.
>  	 */
>  	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
> -		return;
> +		goto schedule;
>  
>  	/*
>  	 * If we got here with no pages scanned, then that is an indicator
> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * through vmpressure_prio(). But so far, keep calm.
>  	 */
>  	if (!scanned)
> -		return;
> +		goto schedule;
>  
>  	mutex_lock(&vmpr->sr_lock);
>  	vmpr->scanned += scanned;
>  	vmpr->reclaimed += reclaimed;
> +	vmpr->notify_userspace = true;

Should notify_userspace get cleared sometime?  Seems like we might need
to clear or decrement notify_userspace in vmpressure_event() when
calling eventfd_signal().

>  	scanned = vmpr->scanned;
>  	mutex_unlock(&vmpr->sr_lock);
>  
> +schedule:
>  	if (scanned < vmpressure_win || work_pending(&vmpr->work))
>  		return;
>  	schedule_work(&vmpr->work);
> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>  }
>  
>  /**
> + * vmpressure_register_kernel_event() - Register kernel-side notification
> + * @cg:		cgroup that is interested in vmpressure notifications
> + * @fn:		function to be called when pressure happens
> + *
> + * This function register in-kernel users interested in receiving notifications
> + * about pressure conditions. Pressure notifications will be triggered at the
> + * same time as userspace notifications (with no particular ordering relative
> + * to it).
> + *
> + * Pressure notifications are a alternative method to shrinkers and will serve
> + * well users that are interested in a one-shot notification, with a
> + * well-defined cgroup aware interface.
> + */
> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
> +{
> +	struct vmpressure *vmpr = cg_to_vmpressure(cg);
> +	struct vmpressure_event *ev;
> +
> +	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> +	if (!ev)
> +		return -ENOMEM;
> +
> +	ev->kernel_event = true;
> +	ev->fn = fn;
> +
> +	mutex_lock(&vmpr->events_lock);
> +	list_add(&ev->node, &vmpr->events);
> +	mutex_unlock(&vmpr->events_lock);
> +	return 0;
> +}
> +
> +/**
>   * vmpressure_unregister_event() - Unbind eventfd from vmpressure
>   * @cg:		cgroup handle
>   * @cft:	cgroup control files handle

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
@ 2013-04-24 19:42     ` Greg Thelen
  0 siblings, 0 replies; 26+ messages in thread
From: Greg Thelen @ 2013-04-24 19:42 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, Dave Chinner, Anton Vorontsov, John Stultz,
	Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner

On Tue, Apr 23 2013, Glauber Costa wrote:

> From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>
> During the past weeks, it became clear to us that the shrinker interface
> we have right now works very well for some particular types of users,
> but not that well for others. The later are usually people interested in
> one-shot notifications, that were forced to adapt themselves to the
> count+scan behavior of shrinkers. To do so, they had no choice than to
> greatly abuse the shrinker interface producing little monsters all over.
>
> During LSF/MM, one of the proposals that popped out during our session
> was to reuse Anton Voronstsov's vmpressure for this. They are designed
> for userspace consumption, but also provide a well-stablished,
> cgroup-aware entry point for notifications.
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
>
> Please note that due to my lack of understanding of each shrinker user,
> I will stay away from converting the actual users, you are all welcome
> to do so.
>
> Signed-off-by: Glauber Costa <glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
> Cc: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
> Cc: Anton Vorontsov <anton.vorontsov-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Cc: John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Cc: Joonsoo Kim <js1304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> ---
>  include/linux/vmpressure.h |  6 ++++++
>  mm/vmpressure.c            | 48 ++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> index 76be077..1862012 100644
> --- a/include/linux/vmpressure.h
> +++ b/include/linux/vmpressure.h
> @@ -19,6 +19,9 @@ struct vmpressure {
>  	/* Have to grab the lock on events traversal or modifications. */
>  	struct mutex events_lock;
>  
> +	/* false if only kernel users want to be notified, true otherwise */
> +	bool notify_userspace;
> +
>  	struct work_struct work;
>  };
>  
> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>  extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>  				     struct eventfd_ctx *eventfd,
>  				     const char *args);
> +
> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
> +					    void (*fn)(void));
>  extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>  					struct eventfd_ctx *eventfd);
>  #else
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 736a601..8d77ad0 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>  }
>  
>  struct vmpressure_event {
> -	struct eventfd_ctx *efd;
> +	union {
> +		struct eventfd_ctx *efd;
> +		void (*fn)(void);
> +	};
>  	enum vmpressure_levels level;
> +	bool kernel_event;
>  	struct list_head node;
>  };
>  
> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>  	mutex_lock(&vmpr->events_lock);
>  
>  	list_for_each_entry(ev, &vmpr->events, node) {
> -		if (level >= ev->level) {
> +		if (ev->kernel_event)
> +			ev->fn();
> +		else if (vmpr->notify_userspace && (level >= ev->level)) {
>  			eventfd_signal(ev->efd, 1);
>  			signalled = true;
>  		}
> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * we account it too.
>  	 */
>  	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
> -		return;
> +		goto schedule;
>  
>  	/*
>  	 * If we got here with no pages scanned, then that is an indicator
> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  	 * through vmpressure_prio(). But so far, keep calm.
>  	 */
>  	if (!scanned)
> -		return;
> +		goto schedule;
>  
>  	mutex_lock(&vmpr->sr_lock);
>  	vmpr->scanned += scanned;
>  	vmpr->reclaimed += reclaimed;
> +	vmpr->notify_userspace = true;

Should notify_userspace get cleared sometime?  Seems like we might need
to clear or decrement notify_userspace in vmpressure_event() when
calling eventfd_signal().

>  	scanned = vmpr->scanned;
>  	mutex_unlock(&vmpr->sr_lock);
>  
> +schedule:
>  	if (scanned < vmpressure_win || work_pending(&vmpr->work))
>  		return;
>  	schedule_work(&vmpr->work);
> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>  }
>  
>  /**
> + * vmpressure_register_kernel_event() - Register kernel-side notification
> + * @cg:		cgroup that is interested in vmpressure notifications
> + * @fn:		function to be called when pressure happens
> + *
> + * This function register in-kernel users interested in receiving notifications
> + * about pressure conditions. Pressure notifications will be triggered at the
> + * same time as userspace notifications (with no particular ordering relative
> + * to it).
> + *
> + * Pressure notifications are a alternative method to shrinkers and will serve
> + * well users that are interested in a one-shot notification, with a
> + * well-defined cgroup aware interface.
> + */
> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
> +{
> +	struct vmpressure *vmpr = cg_to_vmpressure(cg);
> +	struct vmpressure_event *ev;
> +
> +	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> +	if (!ev)
> +		return -ENOMEM;
> +
> +	ev->kernel_event = true;
> +	ev->fn = fn;
> +
> +	mutex_lock(&vmpr->events_lock);
> +	list_add(&ev->node, &vmpr->events);
> +	mutex_unlock(&vmpr->events_lock);
> +	return 0;
> +}
> +
> +/**
>   * vmpressure_unregister_event() - Unbind eventfd from vmpressure
>   * @cg:		cgroup handle
>   * @cft:	cgroup control files handle

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-24 19:42     ` Greg Thelen
  (?)
@ 2013-04-24 20:04     ` Glauber Costa
  -1 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-24 20:04 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
	Kamezawa Hiroyuki, Johannes Weiner

On 04/24/2013 11:42 PM, Greg Thelen wrote:
> On Tue, Apr 23 2013, Glauber Costa wrote:
> 
>> From: Glauber Costa <glommer@parallels.com>
>>
>> During the past weeks, it became clear to us that the shrinker interface
>> we have right now works very well for some particular types of users,
>> but not that well for others. The later are usually people interested in
>> one-shot notifications, that were forced to adapt themselves to the
>> count+scan behavior of shrinkers. To do so, they had no choice than to
>> greatly abuse the shrinker interface producing little monsters all over.
>>
>> During LSF/MM, one of the proposals that popped out during our session
>> was to reuse Anton Voronstsov's vmpressure for this. They are designed
>> for userspace consumption, but also provide a well-stablished,
>> cgroup-aware entry point for notifications.
>>
>> This patch extends that to also support in-kernel users. Events that
>> should be generated for in-kernel consumption will be marked as such,
>> and for those, we will call a registered function instead of triggering
>> an eventfd notification.
>>
>> Please note that due to my lack of understanding of each shrinker user,
>> I will stay away from converting the actual users, you are all welcome
>> to do so.
>>
>> Signed-off-by: Glauber Costa <glommer@openvz.org>
>> Cc: Dave Chinner <david@fromorbit.com>
>> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Joonsoo Kim <js1304@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.cz>
>> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>>  include/linux/vmpressure.h |  6 ++++++
>>  mm/vmpressure.c            | 48 ++++++++++++++++++++++++++++++++++++++++++----
>>  2 files changed, 50 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
>> index 76be077..1862012 100644
>> --- a/include/linux/vmpressure.h
>> +++ b/include/linux/vmpressure.h
>> @@ -19,6 +19,9 @@ struct vmpressure {
>>  	/* Have to grab the lock on events traversal or modifications. */
>>  	struct mutex events_lock;
>>  
>> +	/* false if only kernel users want to be notified, true otherwise */
>> +	bool notify_userspace;
>> +
>>  	struct work_struct work;
>>  };
>>  
>> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>>  extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>  				     struct eventfd_ctx *eventfd,
>>  				     const char *args);
>> +
>> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
>> +					    void (*fn)(void));
>>  extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>>  					struct eventfd_ctx *eventfd);
>>  #else
>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>> index 736a601..8d77ad0 100644
>> --- a/mm/vmpressure.c
>> +++ b/mm/vmpressure.c
>> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>>  }
>>  
>>  struct vmpressure_event {
>> -	struct eventfd_ctx *efd;
>> +	union {
>> +		struct eventfd_ctx *efd;
>> +		void (*fn)(void);
>> +	};
>>  	enum vmpressure_levels level;
>> +	bool kernel_event;
>>  	struct list_head node;
>>  };
>>  
>> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>>  	mutex_lock(&vmpr->events_lock);
>>  
>>  	list_for_each_entry(ev, &vmpr->events, node) {
>> -		if (level >= ev->level) {
>> +		if (ev->kernel_event)
>> +			ev->fn();
>> +		else if (vmpr->notify_userspace && (level >= ev->level)) {
>>  			eventfd_signal(ev->efd, 1);
>>  			signalled = true;
>>  		}
>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>  	 * we account it too.
>>  	 */
>>  	if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>> -		return;
>> +		goto schedule;
>>  
>>  	/*
>>  	 * If we got here with no pages scanned, then that is an indicator
>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>  	 * through vmpressure_prio(). But so far, keep calm.
>>  	 */
>>  	if (!scanned)
>> -		return;
>> +		goto schedule;
>>  
>>  	mutex_lock(&vmpr->sr_lock);
>>  	vmpr->scanned += scanned;
>>  	vmpr->reclaimed += reclaimed;
>> +	vmpr->notify_userspace = true;
> 
> Should notify_userspace get cleared sometime?  Seems like we might need
> to clear or decrement notify_userspace in vmpressure_event() when
> calling eventfd_signal().
> 

Hhummm, I was kind of assuming that it would always reach this point
zeroed. But looking at the code again, I am wrong, and you are right: it
should be cleared as soon as the notifications are fired.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-24 19:42     ` Greg Thelen
  (?)
  (?)
@ 2013-04-25 10:50     ` Glauber Costa
  2013-04-25 18:34       ` Greg Thelen
  -1 siblings, 1 reply; 26+ messages in thread
From: Glauber Costa @ 2013-04-25 10:50 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
	Kamezawa Hiroyuki, Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 399 bytes --]

On 04/24/2013 11:42 PM, Greg Thelen wrote:
>> +	vmpr->notify_userspace = true;
> Should notify_userspace get cleared sometime?  Seems like we might need
> to clear or decrement notify_userspace in vmpressure_event() when
> calling eventfd_signal().
> 
I am folding the attached patch and keeping the acks unless the ackers
oppose.

Greg, any other problem you spot here? Thanks for the review BTW.


[-- Attachment #2: vmpressure.diff --]
[-- Type: text/x-patch, Size: 789 bytes --]

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 1a082a0..e16256e 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -164,6 +164,7 @@ static bool vmpressure_event(struct vmpressure *vmpr,
 		}
 	}
 
+	vmpr->notify_userspace = false;
 	mutex_unlock(&vmpr->events_lock);
 
 	return signalled;
@@ -249,8 +250,13 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	mutex_lock(&vmpr->sr_lock);
 	vmpr->scanned += scanned;
 	vmpr->reclaimed += reclaimed;
-	vmpr->notify_userspace = true;
 	scanned = vmpr->scanned;
+	/*
+	 * If we didn't reach this point, only kernel events will be triggered.
+	 * It is the job of the worker thread to clean this up once the
+	 * notifications are all delivered.
+	 */
+	vmpr->notify_userspace = true;
 	mutex_unlock(&vmpr->sr_lock);
 
 schedule:

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] memcg: reap dead memcgs under pressure
  2013-04-23  8:22   ` Glauber Costa
  (?)
@ 2013-04-25 12:50   ` Li Zefan
  2013-04-26  7:38     ` Glauber Costa
  -1 siblings, 1 reply; 26+ messages in thread
From: Li Zefan @ 2013-04-25 12:50 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, Anton Vorontsov,
	John Stultz, Joonsoo Kim, Michal Hocko, KAMEZAWA Hiroyuki,
	Johannes Weiner

> +static void memcg_vmpressure_shrink_dead(void)
> +{
> +	struct memcg_cache_params *params, *tmp;
> +	struct kmem_cache *cachep;
> +	struct mem_cgroup *memcg;
> +
> +	mutex_lock(&dangling_memcgs_mutex);
> +	list_for_each_entry(memcg, &dangling_memcgs, dead) {
> +
> +		mem_cgroup_get(memcg);

This mem_cgroup_get() looks redundant to me, because you're iterating the list
and never release dangling_memcgs_mutex in the middle.

> +		mutex_lock(&memcg->slab_caches_mutex);
> +		/* The element may go away as an indirect result of shrink */
> +		list_for_each_entry_safe(params, tmp,
> +					 &memcg->memcg_slab_caches, list) {
> +
> +			cachep = memcg_params_to_cache(params);
> +			/*
> +			 * the cpu_hotplug lock is taken in kmem_cache_create
> +			 * outside the slab_caches_mutex manipulation. It will
> +			 * be taken by kmem_cache_shrink to flush the cache.
> +			 * So we need to drop the lock. It is all right because
> +			 * the lock only protects elements moving in and out the
> +			 * list.
> +			 */
> +			mutex_unlock(&memcg->slab_caches_mutex);
> +			kmem_cache_shrink(cachep);
> +			mutex_lock(&memcg->slab_caches_mutex);
> +		}
> +		mutex_unlock(&memcg->slab_caches_mutex);
> +		mem_cgroup_put(memcg);
> +	}
> +	mutex_unlock(&dangling_memcgs_mutex);
> +}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] vmpressure: in-kernel notifications
  2013-04-25 10:50     ` Glauber Costa
@ 2013-04-25 18:34       ` Greg Thelen
  0 siblings, 0 replies; 26+ messages in thread
From: Greg Thelen @ 2013-04-25 18:34 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
	Kamezawa Hiroyuki, Johannes Weiner

On Thu, Apr 25 2013, Glauber Costa wrote:

> On 04/24/2013 11:42 PM, Greg Thelen wrote:
>>> +	vmpr->notify_userspace = true;
>> Should notify_userspace get cleared sometime?  Seems like we might need
>> to clear or decrement notify_userspace in vmpressure_event() when
>> calling eventfd_signal().
>> 
> I am folding the attached patch and keeping the acks unless the ackers
> oppose.
>
> Greg, any other problem you spot here? Thanks for the review BTW.

Looks good to me.  Feel free to add my tag to the patch with
vmpressure.diff folded in.

Reviewed-by: Greg Thelen <gthelen@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] memcg: reap dead memcgs under pressure
  2013-04-25 12:50   ` Li Zefan
@ 2013-04-26  7:38     ` Glauber Costa
  0 siblings, 0 replies; 26+ messages in thread
From: Glauber Costa @ 2013-04-26  7:38 UTC (permalink / raw)
  To: Li Zefan
  Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
	Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
	KAMEZAWA Hiroyuki, Johannes Weiner

On 04/25/2013 04:50 PM, Li Zefan wrote:
>> +static void memcg_vmpressure_shrink_dead(void)
>> +{
>> +	struct memcg_cache_params *params, *tmp;
>> +	struct kmem_cache *cachep;
>> +	struct mem_cgroup *memcg;
>> +
>> +	mutex_lock(&dangling_memcgs_mutex);
>> +	list_for_each_entry(memcg, &dangling_memcgs, dead) {
>> +
>> +		mem_cgroup_get(memcg);
> 
> This mem_cgroup_get() looks redundant to me, because you're iterating the list
> and never release dangling_memcgs_mutex in the middle.
> 
You are right. We will never go all the way through free because
memcg_dangling_free is called before free, and needs the mutex.

Thanks

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-04-26  7:38 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-23  8:22 [PATCH 0/2] reuse vmpressure for in-kernel events Glauber Costa
2013-04-23  8:22 ` Glauber Costa
2013-04-23  8:22 ` [PATCH 1/2] vmpressure: in-kernel notifications Glauber Costa
2013-04-23  8:22   ` Glauber Costa
2013-04-23 17:11   ` Anton Vorontsov
2013-04-23 17:11     ` Anton Vorontsov
2013-04-23 18:17     ` Glauber Costa
2013-04-23 19:13   ` Pekka Enberg
2013-04-23 19:13     ` Pekka Enberg
2013-04-23 20:24   ` Anton Vorontsov
2013-04-23 21:01     ` Anton Vorontsov
2013-04-23 21:01       ` Anton Vorontsov
2013-04-24  6:26     ` Glauber Costa
2013-04-24 11:20     ` Glauber Costa
2013-04-24  7:21   ` Greg Thelen
2013-04-24  8:36     ` Glauber Costa
2013-04-24 19:35       ` Greg Thelen
2013-04-24 19:42   ` Greg Thelen
2013-04-24 19:42     ` Greg Thelen
2013-04-24 20:04     ` Glauber Costa
2013-04-25 10:50     ` Glauber Costa
2013-04-25 18:34       ` Greg Thelen
2013-04-23  8:22 ` [PATCH 2/2] memcg: reap dead memcgs under pressure Glauber Costa
2013-04-23  8:22   ` Glauber Costa
2013-04-25 12:50   ` Li Zefan
2013-04-26  7:38     ` Glauber Costa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.