All of lore.kernel.org
 help / color / mirror / Atom feed
* workqueue usage in vmpressure
@ 2013-07-10 18:42 Tejun Heo
       [not found] ` <20130710184254.GA16979-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Tejun Heo @ 2013-07-10 18:42 UTC (permalink / raw)
  To: Anton Vorontsov; +Cc: Michal Hocko, cgroups-u79uwXL29TY76Z2rM5mHXA, Li Zefan

Hello, guys.

While auditing cgroup_subsys_state() usages, I noticed something weird
in vmpressure.  vmpressure() schedules vmpr->work where vmpr is
embedded in mem_cgroup but I can't find where it's flushed / canceled?
What prevents the memcg going away while the work item is pending or
in flight?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: workqueue usage in vmpressure
       [not found] ` <20130710184254.GA16979-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-07-11  8:31   ` Michal Hocko
       [not found]     ` <20130711083110.GC21667-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2013-07-11  8:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA, Li Zefan

On Wed 10-07-13 11:42:54, Tejun Heo wrote:
> Hello, guys.
> 
> While auditing cgroup_subsys_state() usages, I noticed something weird
> in vmpressure.  vmpressure() schedules vmpr->work where vmpr is
> embedded in mem_cgroup but I can't find where it's flushed / canceled?
> What prevents the memcg going away while the work item is pending or
> in flight?

Well spotted Tejun! If there are no registered listeners then the memcg
could really have vanished in the mean time because nothing pins neither
dentry nor memcg.

What about the following patch?
---
From 95bfa2ae02e5a1e9d0a49524b323a0d7b7b42cef Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Date: Thu, 11 Jul 2013 10:30:31 +0200
Subject: [PATCH] vmpressure: make sure memcg stays alive until all users are
 signaled

vmpressure is called synchronously from the reclaim where the
target_memcg is guaranteed to be alive but the eventfd is signaled from
the work queue context which might happen after memcg reference has been
released and so there is no guarantee it will be still alive if there
are not listeners registered on the eventfd.

This patch adds two helpers vmpressure_{un}pin_memcg which take and
release memcg css reference count which are used by vmmpressure and
vmpressure_work_fn to fix this issue. Please note that root_mem_cgroup
doesn't need any reference counting as it is guaranteed to be alive.

We have to use these helpers rather than touching memcg->css directly
because mem_cgroup is not defined outside of memcontrol.o.

Reported-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
 include/linux/vmpressure.h |  3 +++
 mm/memcontrol.c            | 20 ++++++++++++++++++++
 mm/vmpressure.c            | 13 ++++++++++++-
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 76be077..d301c6a 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -38,6 +38,9 @@ extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 				     const char *args);
 extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
 					struct eventfd_ctx *eventfd);
+
+extern void vmpressure_pin_memcg(struct vmpressure *vmpr);
+extern void vmpressure_unpin_memcg(struct vmpressure *vmpr);
 #else
 static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 			      unsigned long scanned, unsigned long reclaimed) {}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6e120e4..0f2c909 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6195,6 +6195,26 @@ struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
 }
 EXPORT_SYMBOL(parent_mem_cgroup);
 
+static inline
+struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
+{
+	return container_of(vmpr, struct mem_cgroup, vmpressure);
+}
+
+void vmpressure_pin_memcg(struct vmpressure *vmpr)
+{
+	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
+	if (memcg != root_mem_cgroup)
+		css_get(&memcg->css);
+}
+
+void vmpressure_unpin_memcg(struct vmpressure *vmpr)
+{
+	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
+	if (memcg != root_mem_cgroup)
+		css_put(&memcg->css);
+}
+
 static void __init mem_cgroup_soft_limit_tree_init(void)
 {
 	struct mem_cgroup_tree_per_node *rtpn;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 736a601..89865e5 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -22,6 +22,7 @@
 #include <linux/swap.h>
 #include <linux/printk.h>
 #include <linux/vmpressure.h>
+#include <linux/memcontrol.h>
 
 /*
  * The window size (vmpressure_win) is the number of scanned pages before
@@ -178,7 +179,7 @@ static void vmpressure_work_fn(struct work_struct *work)
 	 * vmpr->reclaimed is in sync.
 	 */
 	if (!vmpr->scanned)
-		return;
+		goto out;
 
 	mutex_lock(&vmpr->sr_lock);
 	scanned = vmpr->scanned;
@@ -195,6 +196,13 @@ static void vmpressure_work_fn(struct work_struct *work)
 		 * hierarchy.
 		 */
 	} while ((vmpr = vmpressure_parent(vmpr)));
+out:
+	/*
+	 * Reference has been taken in vmpressure.
+	 * memcg which embeds this vmpr might go away after this call so
+	 * no further manipulation with vmprs or work item is safe
+	 */
+	vmpressure_unpin_memcg(vmpr);
 }
 
 /**
@@ -248,6 +256,9 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 
 	if (scanned < vmpressure_win || work_pending(&vmpr->work))
 		return;
+
+	/* Reference will be released in vmpressure_work_fn */
+	vmpressure_pin_memcg(vmpr);
 	schedule_work(&vmpr->work);
 }
 
-- 
1.8.3.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: workqueue usage in vmpressure
       [not found]     ` <20130711083110.GC21667-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2013-07-11  8:43       ` Li Zefan
       [not found]         ` <51DE701C.6010800-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2013-07-11  8:45       ` workqueue usage in vmpressure Li Zefan
  1 sibling, 1 reply; 40+ messages in thread
From: Li Zefan @ 2013-07-11  8:43 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA

> @@ -178,7 +179,7 @@ static void vmpressure_work_fn(struct work_struct *work)
>  	 * vmpr->reclaimed is in sync.
>  	 */
>  	if (!vmpr->scanned)
> -		return;
> +		goto out;
>  
>  	mutex_lock(&vmpr->sr_lock);
>  	scanned = vmpr->scanned;
> @@ -195,6 +196,13 @@ static void vmpressure_work_fn(struct work_struct *work)
>  		 * hierarchy.
>  		 */
>  	} while ((vmpr = vmpressure_parent(vmpr)));
> +out:
> +	/*
> +	 * Reference has been taken in vmpressure.
> +	 * memcg which embeds this vmpr might go away after this call so
> +	 * no further manipulation with vmprs or work item is safe
> +	 */
> +	vmpressure_unpin_memcg(vmpr);
>  }
>  
>  /**
> @@ -248,6 +256,9 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  
>  	if (scanned < vmpressure_win || work_pending(&vmpr->work))
>  		return;
> +
> +	/* Reference will be released in vmpressure_work_fn */
> +	vmpressure_pin_memcg(vmpr);
>  	schedule_work(&vmpr->work);

Looks like a work can be queued right after the above work_pending()
returns 0, then we should do this:

    if (schedule_work(&vmpr->work))
	vmpressure_pin_memcg(vmpr);

>  }
>  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: workqueue usage in vmpressure
       [not found]     ` <20130711083110.GC21667-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2013-07-11  8:43       ` Li Zefan
@ 2013-07-11  8:45       ` Li Zefan
  1 sibling, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-11  8:45 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA

> +void vmpressure_pin_memcg(struct vmpressure *vmpr)
> +{
> +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> +	if (memcg != root_mem_cgroup)

this check is redundant. see the definition of css_get():

static inline void css_get(struct cgroup_subsys_state *css)
{
        /* We don't need to reference count the root state */
        if (!(css->flags & CSS_ROOT))
                __css_get(css, 1);
}

> +		css_get(&memcg->css);
> +}
> +
> +void vmpressure_unpin_memcg(struct vmpressure *vmpr)
> +{
> +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> +	if (memcg != root_mem_cgroup)

ditto

> +		css_put(&memcg->css);
> +}


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: workqueue usage in vmpressure
       [not found]         ` <51DE701C.6010800-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-07-11  9:25           ` Michal Hocko
       [not found]             ` <20130711092542.GD21667-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2013-07-11  9:25 UTC (permalink / raw)
  To: Li Zefan; +Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu 11-07-13 16:43:08, Li Zefan wrote:
[...]
> > @@ -248,6 +256,9 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> >  
> >  	if (scanned < vmpressure_win || work_pending(&vmpr->work))
> >  		return;
> > +
> > +	/* Reference will be released in vmpressure_work_fn */
> > +	vmpressure_pin_memcg(vmpr);
> >  	schedule_work(&vmpr->work);
> 
> Looks like a work can be queued right after the above work_pending()
> returns 0, then we should do this:
> 
>     if (schedule_work(&vmpr->work))
> 	vmpressure_pin_memcg(vmpr);

But isn't this racy and the following would be safer?
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0f2c909..8329262 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6204,15 +6204,15 @@ struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
 void vmpressure_pin_memcg(struct vmpressure *vmpr)
 {
 	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
-	if (memcg != root_mem_cgroup)
-		css_get(&memcg->css);
+
+	css_get(&memcg->css);
 }
 
 void vmpressure_unpin_memcg(struct vmpressure *vmpr)
 {
 	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
-	if (memcg != root_mem_cgroup)
-		css_put(&memcg->css);
+
+	css_put(&memcg->css);
 }
 
 static void __init mem_cgroup_soft_limit_tree_init(void)
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 89865e5..3a955412 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -259,7 +259,8 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 
 	/* Reference will be released in vmpressure_work_fn */
 	vmpressure_pin_memcg(vmpr);
-	schedule_work(&vmpr->work);
+	if (!schedule_work(&vmpr->work))
+		vmpressure_unpin_memcg(vmpr);
 }
 
 /**
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: workqueue usage in vmpressure
       [not found]             ` <20130711092542.GD21667-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2013-07-11  9:28               ` Li Zefan
  2013-07-11  9:33                   ` Michal Hocko
  0 siblings, 1 reply; 40+ messages in thread
From: Li Zefan @ 2013-07-11  9:28 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA

On 2013/7/11 17:25, Michal Hocko wrote:
> On Thu 11-07-13 16:43:08, Li Zefan wrote:
> [...]
>>> @@ -248,6 +256,9 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>>  
>>>  	if (scanned < vmpressure_win || work_pending(&vmpr->work))
>>>  		return;
>>> +
>>> +	/* Reference will be released in vmpressure_work_fn */
>>> +	vmpressure_pin_memcg(vmpr);
>>>  	schedule_work(&vmpr->work);
>>
>> Looks like a work can be queued right after the above work_pending()
>> returns 0, then we should do this:
>>
>>     if (schedule_work(&vmpr->work))
>> 	vmpressure_pin_memcg(vmpr);
> 
> But isn't this racy and the following would be safer?

Absolutely! Looks good to me.

> ---
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0f2c909..8329262 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6204,15 +6204,15 @@ struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
>  void vmpressure_pin_memcg(struct vmpressure *vmpr)
>  {
>  	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> -	if (memcg != root_mem_cgroup)
> -		css_get(&memcg->css);
> +
> +	css_get(&memcg->css);
>  }
>  
>  void vmpressure_unpin_memcg(struct vmpressure *vmpr)
>  {
>  	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> -	if (memcg != root_mem_cgroup)
> -		css_put(&memcg->css);
> +
> +	css_put(&memcg->css);
>  }
>  
>  static void __init mem_cgroup_soft_limit_tree_init(void)
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 89865e5..3a955412 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -259,7 +259,8 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>  
>  	/* Reference will be released in vmpressure_work_fn */
>  	vmpressure_pin_memcg(vmpr);
> -	schedule_work(&vmpr->work);
> +	if (!schedule_work(&vmpr->work))
> +		vmpressure_unpin_memcg(vmpr);
>  }
>  
>  /**
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-11  9:33                   ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-11  9:33 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

OK, here we go with v2. Thanks a lot for catching up issues Li!
I am also CCing other memcg guys and linux-mm.
---

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-11  9:33                   ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-11  9:33 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

OK, here we go with v2. Thanks a lot for catching up issues Li!
I am also CCing other memcg guys and linux-mm.
---
From 0c2ae555132594fa3557ede99fc4bf5c22827253 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Date: Thu, 11 Jul 2013 10:30:31 +0200
Subject: [PATCH] vmpressure: make sure memcg stays alive until all users are
 signaled

vmpressure is called synchronously from the reclaim where the
target_memcg is guaranteed to be alive but the eventfd is signaled from
the work queue context which might happen after memcg reference has been
released and so there is no guarantee it will be still alive if there
are not listeners registered on the eventfd.

This patch adds two helpers vmpressure_{un}pin_memcg which take and
release memcg css reference count which are used by vmmpressure and
vmpressure_work_fn to fix this issue.

We have to use these helpers rather than touching memcg->css directly
because mem_cgroup is not defined outside of memcontrol.o.

[lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org: no need to check for root_mem_cgroup explicitly]
[lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org: handle schedule_work returning false properly]
Reported-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
 include/linux/vmpressure.h |  3 +++
 mm/memcontrol.c            | 20 ++++++++++++++++++++
 mm/vmpressure.c            | 16 ++++++++++++++--
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 76be077..d301c6a 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -38,6 +38,9 @@ extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
 				     const char *args);
 extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
 					struct eventfd_ctx *eventfd);
+
+extern void vmpressure_pin_memcg(struct vmpressure *vmpr);
+extern void vmpressure_unpin_memcg(struct vmpressure *vmpr);
 #else
 static inline void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 			      unsigned long scanned, unsigned long reclaimed) {}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6e120e4..8329262 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6195,6 +6195,26 @@ struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
 }
 EXPORT_SYMBOL(parent_mem_cgroup);
 
+static inline
+struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
+{
+	return container_of(vmpr, struct mem_cgroup, vmpressure);
+}
+
+void vmpressure_pin_memcg(struct vmpressure *vmpr)
+{
+	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
+
+	css_get(&memcg->css);
+}
+
+void vmpressure_unpin_memcg(struct vmpressure *vmpr)
+{
+	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
+
+	css_put(&memcg->css);
+}
+
 static void __init mem_cgroup_soft_limit_tree_init(void)
 {
 	struct mem_cgroup_tree_per_node *rtpn;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 736a601..3a955412 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -22,6 +22,7 @@
 #include <linux/swap.h>
 #include <linux/printk.h>
 #include <linux/vmpressure.h>
+#include <linux/memcontrol.h>
 
 /*
  * The window size (vmpressure_win) is the number of scanned pages before
@@ -178,7 +179,7 @@ static void vmpressure_work_fn(struct work_struct *work)
 	 * vmpr->reclaimed is in sync.
 	 */
 	if (!vmpr->scanned)
-		return;
+		goto out;
 
 	mutex_lock(&vmpr->sr_lock);
 	scanned = vmpr->scanned;
@@ -195,6 +196,13 @@ static void vmpressure_work_fn(struct work_struct *work)
 		 * hierarchy.
 		 */
 	} while ((vmpr = vmpressure_parent(vmpr)));
+out:
+	/*
+	 * Reference has been taken in vmpressure.
+	 * memcg which embeds this vmpr might go away after this call so
+	 * no further manipulation with vmprs or work item is safe
+	 */
+	vmpressure_unpin_memcg(vmpr);
 }
 
 /**
@@ -248,7 +256,11 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 
 	if (scanned < vmpressure_win || work_pending(&vmpr->work))
 		return;
-	schedule_work(&vmpr->work);
+
+	/* Reference will be released in vmpressure_work_fn */
+	vmpressure_pin_memcg(vmpr);
+	if (!schedule_work(&vmpr->work))
+		vmpressure_unpin_memcg(vmpr);
 }
 
 /**
-- 
1.8.3.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
  2013-07-11  9:33                   ` Michal Hocko
  (?)
@ 2013-07-11 15:44                   ` Tejun Heo
  2013-07-11 16:22                       ` Michal Hocko
  2013-07-15 10:30                       ` Michal Hocko
  -1 siblings, 2 replies; 40+ messages in thread
From: Tejun Heo @ 2013-07-11 15:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Li Zefan, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

Hello, Michal.

On Thu, Jul 11, 2013 at 11:33:00AM +0200, Michal Hocko wrote:
> +static inline
> +struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
> +{
> +	return container_of(vmpr, struct mem_cgroup, vmpressure);
> +}
> +
> +void vmpressure_pin_memcg(struct vmpressure *vmpr)
> +{
> +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> +
> +	css_get(&memcg->css);
> +}
> +
> +void vmpressure_unpin_memcg(struct vmpressure *vmpr)
> +{
> +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> +
> +	css_put(&memcg->css);
> +}

So, while this *should* work, can't we just cancel/flush the work item
from offline?  There doesn't seem to be any possible deadlocks from my
shallow glance and those mutexes don't seem to be held for long (do
they actually need to be mutexes?  what blocks inside them?).

Also, while at it, can you please remove the work_pending() check?
They're almost always spurious or racy and should be avoided in
general.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-11 16:22                       ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-11 16:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Thu 11-07-13 08:44:08, Tejun Heo wrote:
> Hello, Michal.
> 
> On Thu, Jul 11, 2013 at 11:33:00AM +0200, Michal Hocko wrote:
> > +static inline
> > +struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
> > +{
> > +	return container_of(vmpr, struct mem_cgroup, vmpressure);
> > +}
> > +
> > +void vmpressure_pin_memcg(struct vmpressure *vmpr)
> > +{
> > +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> > +
> > +	css_get(&memcg->css);
> > +}
> > +
> > +void vmpressure_unpin_memcg(struct vmpressure *vmpr)
> > +{
> > +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> > +
> > +	css_put(&memcg->css);
> > +}
> 
> So, while this *should* work, can't we just cancel/flush the work item
> from offline? 

I would rather not put vmpressure clean up code into memcg offlining.
We have reference counting for exactly this purposes so it feels strange
to overcome it like that.
Besides that wouldn't be that racy? The work item could be already
executing and preempted and we do not want vmpr to disappear from under
our feet. I know this is _highly_ unlikely but why to play games?

> There doesn't seem to be any possible deadlocks from my
> shallow glance and those mutexes don't seem to be held for long (do
> they actually need to be mutexes?  what blocks inside them?).

Dunno, to be honest. From a quick look they both can be turned to spin
locks but events_lock might cause long preempt disabled periods when
zillions of events are registered.

> Also, while at it, can you please remove the work_pending() check?
> They're almost always spurious or racy and should be avoided in
> general.

sure, this really looks bogus. I will cook patches for both issues and
send them tomorrow.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-11 16:22                       ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-11 16:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Thu 11-07-13 08:44:08, Tejun Heo wrote:
> Hello, Michal.
> 
> On Thu, Jul 11, 2013 at 11:33:00AM +0200, Michal Hocko wrote:
> > +static inline
> > +struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
> > +{
> > +	return container_of(vmpr, struct mem_cgroup, vmpressure);
> > +}
> > +
> > +void vmpressure_pin_memcg(struct vmpressure *vmpr)
> > +{
> > +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> > +
> > +	css_get(&memcg->css);
> > +}
> > +
> > +void vmpressure_unpin_memcg(struct vmpressure *vmpr)
> > +{
> > +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
> > +
> > +	css_put(&memcg->css);
> > +}
> 
> So, while this *should* work, can't we just cancel/flush the work item
> from offline? 

I would rather not put vmpressure clean up code into memcg offlining.
We have reference counting for exactly this purposes so it feels strange
to overcome it like that.
Besides that wouldn't be that racy? The work item could be already
executing and preempted and we do not want vmpr to disappear from under
our feet. I know this is _highly_ unlikely but why to play games?

> There doesn't seem to be any possible deadlocks from my
> shallow glance and those mutexes don't seem to be held for long (do
> they actually need to be mutexes?  what blocks inside them?).

Dunno, to be honest. From a quick look they both can be turned to spin
locks but events_lock might cause long preempt disabled periods when
zillions of events are registered.

> Also, while at it, can you please remove the work_pending() check?
> They're almost always spurious or racy and should be avoided in
> general.

sure, this really looks bogus. I will cook patches for both issues and
send them tomorrow.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
  2013-07-11 16:22                       ` Michal Hocko
  (?)
@ 2013-07-11 16:32                       ` Tejun Heo
  2013-07-12  8:40                         ` Michal Hocko
  -1 siblings, 1 reply; 40+ messages in thread
From: Tejun Heo @ 2013-07-11 16:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Li Zefan, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

Hello,

On Thu, Jul 11, 2013 at 06:22:15PM +0200, Michal Hocko wrote:
> I would rather not put vmpressure clean up code into memcg offlining.
> We have reference counting for exactly this purposes so it feels strange
> to overcome it like that.

It's not something white and black but for things which can be made
trivially synchrnous, it usually is better to do it that way,
especially while shutting down objects as they enter a lingering stage
where they're de-registered but not destroyed and you should be
careful which parts of the object are still accessible.  I haven't
read it carefully but here I'm not sure whether it's safe to do event
related operations after removal.  From cgroup core side, event list
is shut down synchronously from cgroup_destroy_locked().  It doesn't
seem like that part is explicitly built to remain accessible
afterwards.

> Besides that wouldn't be that racy? The work item could be already
> executing and preempted and we do not want vmpr to disappear from under
> our feet. I know this is _highly_ unlikely but why to play games?

Hmmm?  flush_work() and cancel_work_sync() guarantee that the work
item isn't executing on return unless it's being requeued.  There's no
race condition.

> Dunno, to be honest. From a quick look they both can be turned to spin
> locks but events_lock might cause long preempt disabled periods when
> zillions of events are registered.

I see.

> > Also, while at it, can you please remove the work_pending() check?
> > They're almost always spurious or racy and should be avoided in
> > general.
> 
> sure, this really looks bogus. I will cook patches for both issues and
> send them tomorrow.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-12  6:03                         ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-12  6:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On 2013/7/12 0:22, Michal Hocko wrote:
> On Thu 11-07-13 08:44:08, Tejun Heo wrote:
>> Hello, Michal.
>>
>> On Thu, Jul 11, 2013 at 11:33:00AM +0200, Michal Hocko wrote:
>>> +static inline
>>> +struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
>>> +{
>>> +	return container_of(vmpr, struct mem_cgroup, vmpressure);
>>> +}
>>> +
>>> +void vmpressure_pin_memcg(struct vmpressure *vmpr)
>>> +{
>>> +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
>>> +
>>> +	css_get(&memcg->css);
>>> +}
>>> +
>>> +void vmpressure_unpin_memcg(struct vmpressure *vmpr)
>>> +{
>>> +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
>>> +
>>> +	css_put(&memcg->css);
>>> +}
>>
>> So, while this *should* work, can't we just cancel/flush the work item
>> from offline? 
> 
> I would rather not put vmpressure clean up code into memcg offlining.
> We have reference counting for exactly this purposes so it feels strange
> to overcome it like that.

I'd agree with Tejun here. Asynchrously should be avoided if not necessary,
and the change would be simpler. There's already a vmpressure_init() in
mem_cgroup_css_alloc(), so it doesn't seem bad to do vmpressure cleanup
in memcg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-12  6:03                         ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-12  6:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On 2013/7/12 0:22, Michal Hocko wrote:
> On Thu 11-07-13 08:44:08, Tejun Heo wrote:
>> Hello, Michal.
>>
>> On Thu, Jul 11, 2013 at 11:33:00AM +0200, Michal Hocko wrote:
>>> +static inline
>>> +struct mem_cgroup *vmpressure_to_mem_cgroup(struct vmpressure *vmpr)
>>> +{
>>> +	return container_of(vmpr, struct mem_cgroup, vmpressure);
>>> +}
>>> +
>>> +void vmpressure_pin_memcg(struct vmpressure *vmpr)
>>> +{
>>> +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
>>> +
>>> +	css_get(&memcg->css);
>>> +}
>>> +
>>> +void vmpressure_unpin_memcg(struct vmpressure *vmpr)
>>> +{
>>> +	struct mem_cgroup *memcg = vmpressure_to_mem_cgroup(vmpr);
>>> +
>>> +	css_put(&memcg->css);
>>> +}
>>
>> So, while this *should* work, can't we just cancel/flush the work item
>> from offline? 
> 
> I would rather not put vmpressure clean up code into memcg offlining.
> We have reference counting for exactly this purposes so it feels strange
> to overcome it like that.

I'd agree with Tejun here. Asynchrously should be avoided if not necessary,
and the change would be simpler. There's already a vmpressure_init() in
mem_cgroup_css_alloc(), so it doesn't seem bad to do vmpressure cleanup
in memcg.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
  2013-07-11 16:32                       ` Tejun Heo
@ 2013-07-12  8:40                         ` Michal Hocko
  2013-07-12  9:20                             ` Li Zefan
                                             ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-12  8:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Thu 11-07-13 09:32:38, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jul 11, 2013 at 06:22:15PM +0200, Michal Hocko wrote:
> > I would rather not put vmpressure clean up code into memcg offlining.
> > We have reference counting for exactly this purposes so it feels strange
> > to overcome it like that.
> 
> It's not something white and black but for things which can be made
> trivially synchrnous, it usually is better to do it that way,

True in general but it is also true (in general) that once we have a
reference counting for controlling life cycle for an object we should
not bypass it.

> especially while shutting down objects as they enter a lingering stage
> where they're de-registered but not destroyed and you should be
> careful which parts of the object are still accessible.  I haven't
> read it carefully but here I'm not sure whether it's safe to do event
> related operations after removal.  From cgroup core side, event list
> is shut down synchronously from cgroup_destroy_locked().  It doesn't
> seem like that part is explicitly built to remain accessible
> afterwards.

/me goes and checks the code

vmpressure_event sends signals to _registered_ events but those are
unregistered from the work queue context by cgroup_event_remove (via
vmpressure_unregister_event) queued from cgroup_destroy_locked.

I am not sure what are the guarantees for ordering on the workqueue but
this all suggests that either vmpressure_event sees an empty vmpr->events
or it can safely send signals as cgroup_event_remove is pending on the
queue.

cgroup_event_remove drops a reference to cgrp->dentry after everything
is unregistered and event->wait removed from the wait queue so
cgroup_free_fn couldn't have been called yet and so memcg is still
alive. This means that even css_get/put is not necessary.

So I guess we are safe with the code as is but this all is really
_tricky_ and deserves a fat comment. So rather than adding flushing work
item code we should document it properly.

Or am I missing something?

> > Besides that wouldn't be that racy? The work item could be already
> > executing and preempted and we do not want vmpr to disappear from under
> > our feet. I know this is _highly_ unlikely but why to play games?
> 
> Hmmm?  flush_work() and cancel_work_sync() guarantee that the work
> item isn't executing on return unless it's being requeued.  There's no
> race condition.

OK, I haven't realized the action waits for finishing. /me is not
regular work_queue user...
[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-12  9:20                             ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-12  9:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

>> especially while shutting down objects as they enter a lingering stage
>> where they're de-registered but not destroyed and you should be
>> careful which parts of the object are still accessible.  I haven't
>> read it carefully but here I'm not sure whether it's safe to do event
>> related operations after removal.  From cgroup core side, event list
>> is shut down synchronously from cgroup_destroy_locked().  It doesn't
>> seem like that part is explicitly built to remain accessible
>> afterwards.
> 
> /me goes and checks the code
> 
> vmpressure_event sends signals to _registered_ events but those are
> unregistered from the work queue context by cgroup_event_remove (via
> vmpressure_unregister_event) queued from cgroup_destroy_locked.
> 
> I am not sure what are the guarantees for ordering on the workqueue but
> this all suggests that either vmpressure_event sees an empty vmpr->events
> or it can safely send signals as cgroup_event_remove is pending on the
> queue.
> 
> cgroup_event_remove drops a reference to cgrp->dentry after everything
> is unregistered and event->wait removed from the wait queue so
> cgroup_free_fn couldn't have been called yet and so memcg is still
> alive. This means that even css_get/put is not necessary.
> 
> So I guess we are safe with the code as is but this all is really
> _tricky_ and deserves a fat comment. So rather than adding flushing work
> item code we should document it properly.
> 
> Or am I missing something?
> 

But if I read the code correctly, even no one registers a vmpressure event,
vmpressure() is always running and queue the work item.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-12  9:20                             ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-12  9:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

>> especially while shutting down objects as they enter a lingering stage
>> where they're de-registered but not destroyed and you should be
>> careful which parts of the object are still accessible.  I haven't
>> read it carefully but here I'm not sure whether it's safe to do event
>> related operations after removal.  From cgroup core side, event list
>> is shut down synchronously from cgroup_destroy_locked().  It doesn't
>> seem like that part is explicitly built to remain accessible
>> afterwards.
> 
> /me goes and checks the code
> 
> vmpressure_event sends signals to _registered_ events but those are
> unregistered from the work queue context by cgroup_event_remove (via
> vmpressure_unregister_event) queued from cgroup_destroy_locked.
> 
> I am not sure what are the guarantees for ordering on the workqueue but
> this all suggests that either vmpressure_event sees an empty vmpr->events
> or it can safely send signals as cgroup_event_remove is pending on the
> queue.
> 
> cgroup_event_remove drops a reference to cgrp->dentry after everything
> is unregistered and event->wait removed from the wait queue so
> cgroup_free_fn couldn't have been called yet and so memcg is still
> alive. This means that even css_get/put is not necessary.
> 
> So I guess we are safe with the code as is but this all is really
> _tricky_ and deserves a fat comment. So rather than adding flushing work
> item code we should document it properly.
> 
> Or am I missing something?
> 

But if I read the code correctly, even no one registers a vmpressure event,
vmpressure() is always running and queue the work item.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling
  2013-07-12  8:40                         ` Michal Hocko
  2013-07-12  9:20                             ` Li Zefan
@ 2013-07-12  9:24                           ` Michal Hocko
  2013-07-12  9:24                             ` [PATCH 2/3] vmpressure: change vmpressure::sr_lock to spinlock Michal Hocko
                                               ` (2 more replies)
  2013-07-12 18:34                           ` [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled Tejun Heo
  2 siblings, 3 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-12  9:24 UTC (permalink / raw)
  To: cgroups
  Cc: Tejun Heo, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm

Tejun raised a concern that vmpressure() doesn't take any reference to
memcg which embeds vmpressure structure when it schedules work item so
there is no guarantee that memcg (thus vmpr) is still valid when the
work item is processed.

Normally we should take a css reference on memcg before scheduling the
work and release it in vmpressure_work_fn but this doesn't seem to be
necessary because of the way how eventfd is implemented at cgroup level.

Cgroup events are unregistered from the workqueue context by
cgroup_event_remove scheduled by cgroup_destroy_locked (when a cgroup is
removed by rmdir).

cgroup_event_remove removes the eventfd wait queue from the work
queue, then it unregisters all the registered events and finally
puts a reference to the cgroup dentry. css_free which triggers memcg
deallocation is called after the last reference is dropped.

The scheduled vmpressure work item either happens before
cgroup_event_remove or it is not triggered at all so it always happen
_before_ the last dput thus css_free.

This patch just documents this trickiness.

Brought-up-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/vmpressure.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 736a601..eb2bcf9 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -165,6 +165,13 @@ static bool vmpressure_event(struct vmpressure *vmpr,
 
 static void vmpressure_work_fn(struct work_struct *work)
 {
+	/*
+	 * vmpr which is embedded inside memcg is safe to use from
+	 * this context because cgroup_event_remove which unregisters
+	 * vmpressure events and removes work item from the queue is
+	 * called before dput on the cgroup so css_free is called
+	 * later. So css_get/put on memcg is not necessary.
+	 */
 	struct vmpressure *vmpr = work_to_vmpressure(work);
 	unsigned long scanned;
 	unsigned long reclaimed;
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/3] vmpressure: change vmpressure::sr_lock to spinlock
  2013-07-12  9:24                           ` [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling Michal Hocko
@ 2013-07-12  9:24                             ` Michal Hocko
  2013-07-12  9:24                               ` Michal Hocko
  2013-07-12 18:48                             ` [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling Tejun Heo
  2 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-12  9:24 UTC (permalink / raw)
  To: cgroups
  Cc: Tejun Heo, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm

There is nothing that can sleep inside critical sections protected by
this lock and those sections are really small so there doesn't make much
sense to use mutex for them. Change the log to a spinlock

Brought-up-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/vmpressure.h |  2 +-
 mm/vmpressure.c            | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 76be077..2081680 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -12,7 +12,7 @@ struct vmpressure {
 	unsigned long scanned;
 	unsigned long reclaimed;
 	/* The lock is used to keep the scanned/reclaimed above in sync. */
-	struct mutex sr_lock;
+	struct spinlock sr_lock;
 
 	/* The list of vmpressure_event structs. */
 	struct list_head events;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index eb2bcf9..b8e16fe 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -187,12 +187,12 @@ static void vmpressure_work_fn(struct work_struct *work)
 	if (!vmpr->scanned)
 		return;
 
-	mutex_lock(&vmpr->sr_lock);
+	spin_lock(&vmpr->sr_lock);
 	scanned = vmpr->scanned;
 	reclaimed = vmpr->reclaimed;
 	vmpr->scanned = 0;
 	vmpr->reclaimed = 0;
-	mutex_unlock(&vmpr->sr_lock);
+	spin_unlock(&vmpr->sr_lock);
 
 	do {
 		if (vmpressure_event(vmpr, scanned, reclaimed))
@@ -247,11 +247,11 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	if (!scanned)
 		return;
 
-	mutex_lock(&vmpr->sr_lock);
+	spin_lock(&vmpr->sr_lock);
 	vmpr->scanned += scanned;
 	vmpr->reclaimed += reclaimed;
 	scanned = vmpr->scanned;
-	mutex_unlock(&vmpr->sr_lock);
+	spin_unlock(&vmpr->sr_lock);
 
 	if (scanned < vmpressure_win || work_pending(&vmpr->work))
 		return;
@@ -374,7 +374,7 @@ void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
  */
 void vmpressure_init(struct vmpressure *vmpr)
 {
-	mutex_init(&vmpr->sr_lock);
+	spin_lock_init(&vmpr->sr_lock);
 	mutex_init(&vmpr->events_lock);
 	INIT_LIST_HEAD(&vmpr->events);
 	INIT_WORK(&vmpr->work, vmpressure_work_fn);
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/3] vmpressure: do not check for pending work to prevent from new work
@ 2013-07-12  9:24                               ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-12  9:24 UTC (permalink / raw)
  To: cgroups
  Cc: Tejun Heo, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm

because it is racy and it doesn't give us much anyway as schedule_work
handles this case already.

Brought-up-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/vmpressure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index b8e16fe..581a7d7 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -253,7 +253,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	scanned = vmpr->scanned;
 	spin_unlock(&vmpr->sr_lock);
 
-	if (scanned < vmpressure_win || work_pending(&vmpr->work))
+	if (scanned < vmpressure_win)
 		return;
 	schedule_work(&vmpr->work);
 }
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/3] vmpressure: do not check for pending work to prevent from new work
@ 2013-07-12  9:24                               ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-12  9:24 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: Tejun Heo, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

because it is racy and it doesn't give us much anyway as schedule_work
handles this case already.

Brought-up-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
 mm/vmpressure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index b8e16fe..581a7d7 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -253,7 +253,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	scanned = vmpr->scanned;
 	spin_unlock(&vmpr->sr_lock);
 
-	if (scanned < vmpressure_win || work_pending(&vmpr->work))
+	if (scanned < vmpressure_win)
 		return;
 	schedule_work(&vmpr->work);
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-12  9:29                               ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-12  9:29 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Fri 12-07-13 17:20:09, Li Zefan wrote:
[...]
> But if I read the code correctly, even no one registers a vmpressure event,
> vmpressure() is always running and queue the work item.

True but checking there is somebody is rather impractical. First we
would have to take a events_lock to check this and then drop it after
scheduling the work. Which doesn't guarantee that the registered event
wouldn't go away.
And even trickier, we would have to do the same for all parents up the
hierarchy.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-12  9:29                               ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-12  9:29 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Fri 12-07-13 17:20:09, Li Zefan wrote:
[...]
> But if I read the code correctly, even no one registers a vmpressure event,
> vmpressure() is always running and queue the work item.

True but checking there is somebody is rather impractical. First we
would have to take a events_lock to check this and then drop it after
scheduling the work. Which doesn't guarantee that the registered event
wouldn't go away.
And even trickier, we would have to do the same for all parents up the
hierarchy.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-12  9:54                                 ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-12  9:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On 2013/7/12 17:29, Michal Hocko wrote:
> On Fri 12-07-13 17:20:09, Li Zefan wrote:
> [...]
>> But if I read the code correctly, even no one registers a vmpressure event,
>> vmpressure() is always running and queue the work item.
> 
> True but checking there is somebody is rather impractical. First we
> would have to take a events_lock to check this and then drop it after
> scheduling the work. Which doesn't guarantee that the registered event
> wouldn't go away.
> And even trickier, we would have to do the same for all parents up the
> hierarchy.
> 

The thing is, we can forget about eventfd. eventfd is checked in
vmpressure_work_fn(), while vmpressure() is always called no matter what.

vmpressure()
  queue_work()
                      cgroup_diput()
                        call_rcu(cgroup_free_rcu)                        
                      ...
                      queue_work(destroy_cgroup)
                      ...
                      cgroup_free_fn()
                        mem_cgroup_destroy()
  ...
  vmpress_work_fn()

There's no guarantee that vmpressure work is run before cgroup destroy
work, I think.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-12  9:54                                 ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-12  9:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On 2013/7/12 17:29, Michal Hocko wrote:
> On Fri 12-07-13 17:20:09, Li Zefan wrote:
> [...]
>> But if I read the code correctly, even no one registers a vmpressure event,
>> vmpressure() is always running and queue the work item.
> 
> True but checking there is somebody is rather impractical. First we
> would have to take a events_lock to check this and then drop it after
> scheduling the work. Which doesn't guarantee that the registered event
> wouldn't go away.
> And even trickier, we would have to do the same for all parents up the
> hierarchy.
> 

The thing is, we can forget about eventfd. eventfd is checked in
vmpressure_work_fn(), while vmpressure() is always called no matter what.

vmpressure()
  queue_work()
                      cgroup_diput()
                        call_rcu(cgroup_free_rcu)                        
                      ...
                      queue_work(destroy_cgroup)
                      ...
                      cgroup_free_fn()
                        mem_cgroup_destroy()
  ...
  vmpress_work_fn()

There's no guarantee that vmpressure work is run before cgroup destroy
work, I think.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
  2013-07-12  9:54                                 ` Li Zefan
  (?)
@ 2013-07-12 10:37                                 ` Michal Hocko
  2013-07-15  3:07                                     ` Li Zefan
  -1 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2013-07-12 10:37 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Fri 12-07-13 17:54:27, Li Zefan wrote:
> On 2013/7/12 17:29, Michal Hocko wrote:
> > On Fri 12-07-13 17:20:09, Li Zefan wrote:
> > [...]
> >> But if I read the code correctly, even no one registers a vmpressure event,
> >> vmpressure() is always running and queue the work item.
> > 
> > True but checking there is somebody is rather impractical. First we
> > would have to take a events_lock to check this and then drop it after
> > scheduling the work. Which doesn't guarantee that the registered event
> > wouldn't go away.
> > And even trickier, we would have to do the same for all parents up the
> > hierarchy.
> > 
> 
> The thing is, we can forget about eventfd. eventfd is checked in
> vmpressure_work_fn(), while vmpressure() is always called no matter what.

But vmpressure is called only for an existing memcg. This means that
it cannot be called past css_offline so it must happen _before_ cgroup
eventfd cleanup code.

Or am I missing something?

> vmpressure()
>   queue_work()
>                       cgroup_diput()
>                         call_rcu(cgroup_free_rcu)                        
>                       ...
>                       queue_work(destroy_cgroup)
>                       ...
>                       cgroup_free_fn()
>                         mem_cgroup_destroy()
>   ...
>   vmpress_work_fn()
> 
> There's no guarantee that vmpressure work is run before cgroup destroy
> work, I think.
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
  2013-07-12  8:40                         ` Michal Hocko
  2013-07-12  9:20                             ` Li Zefan
  2013-07-12  9:24                           ` [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling Michal Hocko
@ 2013-07-12 18:34                           ` Tejun Heo
  2013-07-12 18:40                             ` Tejun Heo
  2 siblings, 1 reply; 40+ messages in thread
From: Tejun Heo @ 2013-07-12 18:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Li Zefan, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

Hello, Michal.

On Fri, Jul 12, 2013 at 10:40:39AM +0200, Michal Hocko wrote:
> > It's not something white and black but for things which can be made
> > trivially synchrnous, it usually is better to do it that way,
> 
> True in general but it is also true (in general) that once we have a
> reference counting for controlling life cycle for an object we should
> not bypass it.

When you have a reference count with staged destruction like cgroup,
the object goes two stagages before being released.  It first gets
deactivated or removed from further access and the base ref is
dropped.  This state presists until the left-over references drain.
When all the references are dropped, the object is actually released.

Now, between the initiation of destruction and actual release, the
object often isn't in the same state as before the destruction
started.  It depends on the specifics of the subsystem but usually
only subset of the original functionalities are accessible.  At times,
it becomes tricky discerning what's safe and what's not and we do end
up with latent subtle bugs which are only triggered when tricky
conditions are met - things like certain operations stretching across
destruction point.

It is important to distinguish the boundary between things which
remain accessible after destruction point and it often is easier and
less dangerous to avoid expanding that set unless necessary, so it is
not "bypassing" an existing mechanism at all.  It is an inherent part
of that model and various kernel subsystems have been doing that
forever.

> So I guess we are safe with the code as is but this all is really
> _tricky_ and deserves a fat comment. So rather than adding flushing work
> item code we should document it properly.
> 
> Or am I missing something?

Two things.

* I have no idea what would prevent the work item execution kicking in
  way after the inode is released.  The execution is not flushed and
  the work item isn't pinning the underlying data structure.  There's
  nothing preventing the work item execution to happen after or
  stretch across inode release.

* You're turning something which has a clearly established pattern of
  usage - work item shutdown which is usually done by disabling
  issuance and then flushing / canceling - into something really
  tricky and deserving a fat comment.  In general, kernel engineering
  is never about finding the most optimal / minimal / ingenious
  solution for each encountered instances of problems.  In most cases,
  it's usually about identifying established patterns and applying
  them so that the code is not only safe and correct but also remains
  accessible to the fellow kernel developers.

  Here, people who are familiary with workqueue usage would
  automatically look for shutdown sequence as the work item is
  contained in a dynamic object.  There are cases where
  synchronization happens externally and you don't necessarily need
  flush / cancel, but even then, if you consider readability and
  maintainability, following the established pattern is often then
  right thing to do.  It makes the code easier to follow and verify
  for others and avoids unintentional breakages afterwards as you
  separate out work item draining from the external synchronization
  which may change later on without noticing how it's interlocked with
  work item draining.

> OK, I haven't realized the action waits for finishing. /me is not
> regular work_queue user...

So, please, follow the established patterns.  This really isn't the
place for ingenuity.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
  2013-07-12 18:34                           ` [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled Tejun Heo
@ 2013-07-12 18:40                             ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2013-07-12 18:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Li Zefan, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Fri, Jul 12, 2013 at 11:34:04AM -0700, Tejun Heo wrote:
> not "bypassing" an existing mechanism at all.  It is an inherent part
> of that model and various kernel subsystems have been doing that
> forever.

Just to clarify, I was talking about two staged object release where
the initial phase shuts down parts which aren't necessary for the
draining stage.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling
  2013-07-12  9:24                           ` [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling Michal Hocko
  2013-07-12  9:24                             ` [PATCH 2/3] vmpressure: change vmpressure::sr_lock to spinlock Michal Hocko
  2013-07-12  9:24                               ` Michal Hocko
@ 2013-07-12 18:48                             ` Tejun Heo
  2013-07-15 10:27                                 ` Michal Hocko
  2 siblings, 1 reply; 40+ messages in thread
From: Tejun Heo @ 2013-07-12 18:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm

On Fri, Jul 12, 2013 at 11:24:56AM +0200, Michal Hocko wrote:
> Cgroup events are unregistered from the workqueue context by
> cgroup_event_remove scheduled by cgroup_destroy_locked (when a cgroup is
> removed by rmdir).
> 
> cgroup_event_remove removes the eventfd wait queue from the work
> queue, then it unregisters all the registered events and finally
> puts a reference to the cgroup dentry. css_free which triggers memcg
> deallocation is called after the last reference is dropped.
> 
> The scheduled vmpressure work item either happens before
> cgroup_event_remove or it is not triggered at all so it always happen
> _before_ the last dput thus css_free.

I don't follow what the above has to do with ensuring work item
execution is finished before the underlying data structure is
released.  How are the above relevant?  What am I missing here?

> This patch just documents this trickiness.

This doesn't have to be tricky at all.  It's a *completely* routine
thing.  Would you please stop making it one?

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-15  3:07                                     ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-15  3:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On 2013/7/12 18:37, Michal Hocko wrote:
> On Fri 12-07-13 17:54:27, Li Zefan wrote:
>> On 2013/7/12 17:29, Michal Hocko wrote:
>>> On Fri 12-07-13 17:20:09, Li Zefan wrote:
>>> [...]
>>>> But if I read the code correctly, even no one registers a vmpressure event,
>>>> vmpressure() is always running and queue the work item.
>>>
>>> True but checking there is somebody is rather impractical. First we
>>> would have to take a events_lock to check this and then drop it after
>>> scheduling the work. Which doesn't guarantee that the registered event
>>> wouldn't go away.
>>> And even trickier, we would have to do the same for all parents up the
>>> hierarchy.
>>>
>>
>> The thing is, we can forget about eventfd. eventfd is checked in
>> vmpressure_work_fn(), while vmpressure() is always called no matter what.
> 
> But vmpressure is called only for an existing memcg. This means that
> it cannot be called past css_offline so it must happen _before_ cgroup
> eventfd cleanup code.
> 
> Or am I missing something?
> 

Yeah.

The vmpressure work item is queued if we sense some memory pressure, no matter
if there is any eventfd ever registered. This is the point.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-15  3:07                                     ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-15  3:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On 2013/7/12 18:37, Michal Hocko wrote:
> On Fri 12-07-13 17:54:27, Li Zefan wrote:
>> On 2013/7/12 17:29, Michal Hocko wrote:
>>> On Fri 12-07-13 17:20:09, Li Zefan wrote:
>>> [...]
>>>> But if I read the code correctly, even no one registers a vmpressure event,
>>>> vmpressure() is always running and queue the work item.
>>>
>>> True but checking there is somebody is rather impractical. First we
>>> would have to take a events_lock to check this and then drop it after
>>> scheduling the work. Which doesn't guarantee that the registered event
>>> wouldn't go away.
>>> And even trickier, we would have to do the same for all parents up the
>>> hierarchy.
>>>
>>
>> The thing is, we can forget about eventfd. eventfd is checked in
>> vmpressure_work_fn(), while vmpressure() is always called no matter what.
> 
> But vmpressure is called only for an existing memcg. This means that
> it cannot be called past css_offline so it must happen _before_ cgroup
> eventfd cleanup code.
> 
> Or am I missing something?
> 

Yeah.

The vmpressure work item is queued if we sense some memory pressure, no matter
if there is any eventfd ever registered. This is the point.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
  2013-07-15  3:07                                     ` Li Zefan
  (?)
@ 2013-07-15  9:20                                     ` Michal Hocko
  2013-07-15  9:53                                         ` Li Zefan
  -1 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2013-07-15  9:20 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On Mon 15-07-13 11:07:52, Li Zefan wrote:
> On 2013/7/12 18:37, Michal Hocko wrote:
> > On Fri 12-07-13 17:54:27, Li Zefan wrote:
> >> On 2013/7/12 17:29, Michal Hocko wrote:
> >>> On Fri 12-07-13 17:20:09, Li Zefan wrote:
> >>> [...]
> >>>> But if I read the code correctly, even no one registers a vmpressure event,
> >>>> vmpressure() is always running and queue the work item.
> >>>
> >>> True but checking there is somebody is rather impractical. First we
> >>> would have to take a events_lock to check this and then drop it after
> >>> scheduling the work. Which doesn't guarantee that the registered event
> >>> wouldn't go away.
> >>> And even trickier, we would have to do the same for all parents up the
> >>> hierarchy.
> >>>
> >>
> >> The thing is, we can forget about eventfd. eventfd is checked in
> >> vmpressure_work_fn(), while vmpressure() is always called no matter what.
> > 
> > But vmpressure is called only for an existing memcg. This means that
> > it cannot be called past css_offline so it must happen _before_ cgroup
> > eventfd cleanup code.
> > 
> > Or am I missing something?
> > 
> 
> Yeah.
> 
> The vmpressure work item is queued if we sense some memory pressure, no matter
> if there is any eventfd ever registered. This is the point.

But it is queued on vmpr which is embedded in the memcg which is the
_target_ of the reclaim. There is _no reclaim_ for a memcg after css has
been deactivated which happens _before_ css_offline.

/me confused.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-15  9:53                                         ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-15  9:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups, linux-mm, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On 2013/7/15 17:20, Michal Hocko wrote:
> On Mon 15-07-13 11:07:52, Li Zefan wrote:
>> On 2013/7/12 18:37, Michal Hocko wrote:
>>> On Fri 12-07-13 17:54:27, Li Zefan wrote:
>>>> On 2013/7/12 17:29, Michal Hocko wrote:
>>>>> On Fri 12-07-13 17:20:09, Li Zefan wrote:
>>>>> [...]
>>>>>> But if I read the code correctly, even no one registers a vmpressure event,
>>>>>> vmpressure() is always running and queue the work item.
>>>>>
>>>>> True but checking there is somebody is rather impractical. First we
>>>>> would have to take a events_lock to check this and then drop it after
>>>>> scheduling the work. Which doesn't guarantee that the registered event
>>>>> wouldn't go away.
>>>>> And even trickier, we would have to do the same for all parents up the
>>>>> hierarchy.
>>>>>
>>>>
>>>> The thing is, we can forget about eventfd. eventfd is checked in
>>>> vmpressure_work_fn(), while vmpressure() is always called no matter what.
>>>
>>> But vmpressure is called only for an existing memcg. This means that
>>> it cannot be called past css_offline so it must happen _before_ cgroup
>>> eventfd cleanup code.
>>>
>>> Or am I missing something?
>>>
>>
>> Yeah.
>>
>> The vmpressure work item is queued if we sense some memory pressure, no matter
>> if there is any eventfd ever registered. This is the point.
> 
> But it is queued on vmpr which is embedded in the memcg which is the
> _target_ of the reclaim. There is _no reclaim_ for a memcg after css has
> been deactivated which happens _before_ css_offline.
> 

1. vmpressure() is called, and the work is queued.
2. then we rmdir cgroup, and struct mem_cgroup is freed finally.
3. workqueue schedules the work to run:

static void vmpressure_work_fn(struct work_struct *work)
{
        struct vmpressure *vmpr = work_to_vmpressure(work)
...

As vmpr is embeded in struct mem_cgroup, and memcg has been freed, this
leads to invalid memory access.

NOTE: no one ever registered an eventfd!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled
@ 2013-07-15  9:53                                         ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2013-07-15  9:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tejun Heo, Anton Vorontsov, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro

On 2013/7/15 17:20, Michal Hocko wrote:
> On Mon 15-07-13 11:07:52, Li Zefan wrote:
>> On 2013/7/12 18:37, Michal Hocko wrote:
>>> On Fri 12-07-13 17:54:27, Li Zefan wrote:
>>>> On 2013/7/12 17:29, Michal Hocko wrote:
>>>>> On Fri 12-07-13 17:20:09, Li Zefan wrote:
>>>>> [...]
>>>>>> But if I read the code correctly, even no one registers a vmpressure event,
>>>>>> vmpressure() is always running and queue the work item.
>>>>>
>>>>> True but checking there is somebody is rather impractical. First we
>>>>> would have to take a events_lock to check this and then drop it after
>>>>> scheduling the work. Which doesn't guarantee that the registered event
>>>>> wouldn't go away.
>>>>> And even trickier, we would have to do the same for all parents up the
>>>>> hierarchy.
>>>>>
>>>>
>>>> The thing is, we can forget about eventfd. eventfd is checked in
>>>> vmpressure_work_fn(), while vmpressure() is always called no matter what.
>>>
>>> But vmpressure is called only for an existing memcg. This means that
>>> it cannot be called past css_offline so it must happen _before_ cgroup
>>> eventfd cleanup code.
>>>
>>> Or am I missing something?
>>>
>>
>> Yeah.
>>
>> The vmpressure work item is queued if we sense some memory pressure, no matter
>> if there is any eventfd ever registered. This is the point.
> 
> But it is queued on vmpr which is embedded in the memcg which is the
> _target_ of the reclaim. There is _no reclaim_ for a memcg after css has
> been deactivated which happens _before_ css_offline.
> 

1. vmpressure() is called, and the work is queued.
2. then we rmdir cgroup, and struct mem_cgroup is freed finally.
3. workqueue schedules the work to run:

static void vmpressure_work_fn(struct work_struct *work)
{
        struct vmpressure *vmpr = work_to_vmpressure(work)
...

As vmpr is embeded in struct mem_cgroup, and memcg has been freed, this
leads to invalid memory access.

NOTE: no one ever registered an eventfd!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling
@ 2013-07-15 10:27                                 ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-15 10:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm

On Fri 12-07-13 11:48:36, Tejun Heo wrote:
> On Fri, Jul 12, 2013 at 11:24:56AM +0200, Michal Hocko wrote:
> > Cgroup events are unregistered from the workqueue context by
> > cgroup_event_remove scheduled by cgroup_destroy_locked (when a cgroup is
> > removed by rmdir).
> > 
> > cgroup_event_remove removes the eventfd wait queue from the work
> > queue, then it unregisters all the registered events and finally
> > puts a reference to the cgroup dentry. css_free which triggers memcg
> > deallocation is called after the last reference is dropped.
> > 
> > The scheduled vmpressure work item either happens before
> > cgroup_event_remove or it is not triggered at all so it always happen
> > _before_ the last dput thus css_free.
> 
> I don't follow what the above has to do with ensuring work item
> execution is finished before the underlying data structure is
> released.  How are the above relevant?  What am I missing here?

OK, it seems I managed to confuse myself. I thought that 
remove_wait_queue(event->wqh, &event->wait) called from
cgroup_event_remove guarantee that vmpr event would go away with that
workqueue. But now that I am looking at it, vmpr->work seems to be
living in a completely independent queue.

> > This patch just documents this trickiness.
> 
> This doesn't have to be tricky at all.  It's a *completely* routine
> thing.  Would you please stop making it one?

Fair enough. I will repost the series shortly.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling
@ 2013-07-15 10:27                                 ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-15 10:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Li Zefan, Anton Vorontsov,
	Johannes Weiner, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Fri 12-07-13 11:48:36, Tejun Heo wrote:
> On Fri, Jul 12, 2013 at 11:24:56AM +0200, Michal Hocko wrote:
> > Cgroup events are unregistered from the workqueue context by
> > cgroup_event_remove scheduled by cgroup_destroy_locked (when a cgroup is
> > removed by rmdir).
> > 
> > cgroup_event_remove removes the eventfd wait queue from the work
> > queue, then it unregisters all the registered events and finally
> > puts a reference to the cgroup dentry. css_free which triggers memcg
> > deallocation is called after the last reference is dropped.
> > 
> > The scheduled vmpressure work item either happens before
> > cgroup_event_remove or it is not triggered at all so it always happen
> > _before_ the last dput thus css_free.
> 
> I don't follow what the above has to do with ensuring work item
> execution is finished before the underlying data structure is
> released.  How are the above relevant?  What am I missing here?

OK, it seems I managed to confuse myself. I thought that 
remove_wait_queue(event->wqh, &event->wait) called from
cgroup_event_remove guarantee that vmpr event would go away with that
workqueue. But now that I am looking at it, vmpr->work seems to be
living in a completely independent queue.

> > This patch just documents this trickiness.
> 
> This doesn't have to be tricky at all.  It's a *completely* routine
> thing.  Would you please stop making it one?

Fair enough. I will repost the series shortly.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 1/3] vmpressure: change vmpressure::sr_lock to spinlock
@ 2013-07-15 10:30                       ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-15 10:30 UTC (permalink / raw)
  To: cgroups
  Cc: Tejun Heo, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm

There is nothing that can sleep inside critical sections protected by
this lock and those sections are really small so there doesn't make much
sense to use mutex for them. Change the log to a spinlock

Brought-up-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/vmpressure.h |  2 +-
 mm/vmpressure.c            | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 76be077..2081680 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -12,7 +12,7 @@ struct vmpressure {
 	unsigned long scanned;
 	unsigned long reclaimed;
 	/* The lock is used to keep the scanned/reclaimed above in sync. */
-	struct mutex sr_lock;
+	struct spinlock sr_lock;
 
 	/* The list of vmpressure_event structs. */
 	struct list_head events;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 736a601..f4ee6a1 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -180,12 +180,12 @@ static void vmpressure_work_fn(struct work_struct *work)
 	if (!vmpr->scanned)
 		return;
 
-	mutex_lock(&vmpr->sr_lock);
+	spin_lock(&vmpr->sr_lock);
 	scanned = vmpr->scanned;
 	reclaimed = vmpr->reclaimed;
 	vmpr->scanned = 0;
 	vmpr->reclaimed = 0;
-	mutex_unlock(&vmpr->sr_lock);
+	spin_unlock(&vmpr->sr_lock);
 
 	do {
 		if (vmpressure_event(vmpr, scanned, reclaimed))
@@ -240,11 +240,11 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	if (!scanned)
 		return;
 
-	mutex_lock(&vmpr->sr_lock);
+	spin_lock(&vmpr->sr_lock);
 	vmpr->scanned += scanned;
 	vmpr->reclaimed += reclaimed;
 	scanned = vmpr->scanned;
-	mutex_unlock(&vmpr->sr_lock);
+	spin_unlock(&vmpr->sr_lock);
 
 	if (scanned < vmpressure_win || work_pending(&vmpr->work))
 		return;
@@ -367,7 +367,7 @@ void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
  */
 void vmpressure_init(struct vmpressure *vmpr)
 {
-	mutex_init(&vmpr->sr_lock);
+	spin_lock_init(&vmpr->sr_lock);
 	mutex_init(&vmpr->events_lock);
 	INIT_LIST_HEAD(&vmpr->events);
 	INIT_WORK(&vmpr->work, vmpressure_work_fn);
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 1/3] vmpressure: change vmpressure::sr_lock to spinlock
@ 2013-07-15 10:30                       ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-15 10:30 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: Tejun Heo, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

There is nothing that can sleep inside critical sections protected by
this lock and those sections are really small so there doesn't make much
sense to use mutex for them. Change the log to a spinlock

Brought-up-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
 include/linux/vmpressure.h |  2 +-
 mm/vmpressure.c            | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 76be077..2081680 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -12,7 +12,7 @@ struct vmpressure {
 	unsigned long scanned;
 	unsigned long reclaimed;
 	/* The lock is used to keep the scanned/reclaimed above in sync. */
-	struct mutex sr_lock;
+	struct spinlock sr_lock;
 
 	/* The list of vmpressure_event structs. */
 	struct list_head events;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 736a601..f4ee6a1 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -180,12 +180,12 @@ static void vmpressure_work_fn(struct work_struct *work)
 	if (!vmpr->scanned)
 		return;
 
-	mutex_lock(&vmpr->sr_lock);
+	spin_lock(&vmpr->sr_lock);
 	scanned = vmpr->scanned;
 	reclaimed = vmpr->reclaimed;
 	vmpr->scanned = 0;
 	vmpr->reclaimed = 0;
-	mutex_unlock(&vmpr->sr_lock);
+	spin_unlock(&vmpr->sr_lock);
 
 	do {
 		if (vmpressure_event(vmpr, scanned, reclaimed))
@@ -240,11 +240,11 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	if (!scanned)
 		return;
 
-	mutex_lock(&vmpr->sr_lock);
+	spin_lock(&vmpr->sr_lock);
 	vmpr->scanned += scanned;
 	vmpr->reclaimed += reclaimed;
 	scanned = vmpr->scanned;
-	mutex_unlock(&vmpr->sr_lock);
+	spin_unlock(&vmpr->sr_lock);
 
 	if (scanned < vmpressure_win || work_pending(&vmpr->work))
 		return;
@@ -367,7 +367,7 @@ void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
  */
 void vmpressure_init(struct vmpressure *vmpr)
 {
-	mutex_init(&vmpr->sr_lock);
+	spin_lock_init(&vmpr->sr_lock);
 	mutex_init(&vmpr->events_lock);
 	INIT_LIST_HEAD(&vmpr->events);
 	INIT_WORK(&vmpr->work, vmpressure_work_fn);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 2/3] vmpressure: do not check for pending work to prevent from new work
  2013-07-15 10:30                       ` Michal Hocko
  (?)
@ 2013-07-15 10:30                       ` Michal Hocko
  -1 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-15 10:30 UTC (permalink / raw)
  To: cgroups
  Cc: Tejun Heo, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm

because it is racy and it doesn't give us much anyway as schedule_work
handles this case already.

Brought-up-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/vmpressure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index f4ee6a1..192f973 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -246,7 +246,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 	scanned = vmpr->scanned;
 	spin_unlock(&vmpr->sr_lock);
 
-	if (scanned < vmpressure_win || work_pending(&vmpr->work))
+	if (scanned < vmpressure_win)
 		return;
 	schedule_work(&vmpr->work);
 }
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 3/3] vmpressure: Make sure there are no events queued after memcg is offlined
  2013-07-15 10:30                       ` Michal Hocko
  (?)
  (?)
@ 2013-07-15 10:30                       ` Michal Hocko
  -1 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2013-07-15 10:30 UTC (permalink / raw)
  To: cgroups
  Cc: Tejun Heo, Li Zefan, Anton Vorontsov, Johannes Weiner,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm

vmpressure is called synchronously from the reclaim where the
target_memcg is guaranteed to be alive but the eventfd is signaled from
the work queue context. This means that memcg (along with vmpressure
structure which is embedded into it) might go away while the work item
is pending which would result in use-after-release bug.

We have two possible ways how to fix this. Either vmpressure pins memcg
before it schedules vmpr->work and unpin it in vmpressure_work_fn or
explicitely flush the work item from the css_offline context (as
suggested by Tejun).

This patch implements the later one and it introduces vmpressure_cleanup
which flushes the vmpressure work queue item item. It hooks into
mem_cgroup_css_offline after the memcg itself is cleaned up.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/vmpressure.h |  1 +
 mm/memcontrol.c            |  1 +
 mm/vmpressure.c            | 16 ++++++++++++++++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 2081680..0c9bc9a 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -30,6 +30,7 @@ extern void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
 extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
 
 extern void vmpressure_init(struct vmpressure *vmpr);
+extern void vmpressure_cleanup(struct vmpressure * vmpr);
 extern struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg);
 extern struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr);
 extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6e120e4..198759c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6326,6 +6326,7 @@ static void mem_cgroup_css_offline(struct cgroup *cont)
 	mem_cgroup_invalidate_reclaim_iterators(memcg);
 	mem_cgroup_reparent_charges(memcg);
 	mem_cgroup_destroy_all_caches(memcg);
+	vmpressure_cleanup(&memcg->vmpressure);
 }
 
 static void mem_cgroup_css_free(struct cgroup *cont)
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 192f973..0c1e37d 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -372,3 +372,19 @@ void vmpressure_init(struct vmpressure *vmpr)
 	INIT_LIST_HEAD(&vmpr->events);
 	INIT_WORK(&vmpr->work, vmpressure_work_fn);
 }
+
+/**
+ * vmpressure_cleanup() - shuts down vmpressure control structure
+ * @vmpr:	Structure to be cleaned up
+ *
+ * This function should be called before the structure in which it is
+ * embedded is cleaned up.
+ */
+void vmpressure_cleanup(struct vmpressure *vmpr)
+{
+	/*
+	 * Make sure there is no pending work before eventfd infrastructure
+	 * goes away.
+	 */
+	flush_work(&vmpr->work);
+}
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2013-07-15 10:30 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-10 18:42 workqueue usage in vmpressure Tejun Heo
     [not found] ` <20130710184254.GA16979-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-11  8:31   ` Michal Hocko
     [not found]     ` <20130711083110.GC21667-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-11  8:43       ` Li Zefan
     [not found]         ` <51DE701C.6010800-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-07-11  9:25           ` Michal Hocko
     [not found]             ` <20130711092542.GD21667-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-11  9:28               ` Li Zefan
2013-07-11  9:33                 ` [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled Michal Hocko
2013-07-11  9:33                   ` Michal Hocko
2013-07-11 15:44                   ` Tejun Heo
2013-07-11 16:22                     ` Michal Hocko
2013-07-11 16:22                       ` Michal Hocko
2013-07-11 16:32                       ` Tejun Heo
2013-07-12  8:40                         ` Michal Hocko
2013-07-12  9:20                           ` Li Zefan
2013-07-12  9:20                             ` Li Zefan
2013-07-12  9:29                             ` Michal Hocko
2013-07-12  9:29                               ` Michal Hocko
2013-07-12  9:54                               ` Li Zefan
2013-07-12  9:54                                 ` Li Zefan
2013-07-12 10:37                                 ` Michal Hocko
2013-07-15  3:07                                   ` Li Zefan
2013-07-15  3:07                                     ` Li Zefan
2013-07-15  9:20                                     ` Michal Hocko
2013-07-15  9:53                                       ` Li Zefan
2013-07-15  9:53                                         ` Li Zefan
2013-07-12  9:24                           ` [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling Michal Hocko
2013-07-12  9:24                             ` [PATCH 2/3] vmpressure: change vmpressure::sr_lock to spinlock Michal Hocko
2013-07-12  9:24                             ` [PATCH 3/3] vmpressure: do not check for pending work to prevent from new work Michal Hocko
2013-07-12  9:24                               ` Michal Hocko
2013-07-12 18:48                             ` [PATCH 1/3] vmpressure: document why css_get/put is not necessary for work queue based signaling Tejun Heo
2013-07-15 10:27                               ` Michal Hocko
2013-07-15 10:27                                 ` Michal Hocko
2013-07-12 18:34                           ` [PATCH v2] vmpressure: make sure memcg stays alive until all users are signaled Tejun Heo
2013-07-12 18:40                             ` Tejun Heo
2013-07-12  6:03                       ` Li Zefan
2013-07-12  6:03                         ` Li Zefan
2013-07-15 10:30                     ` [PATCH v3 1/3] vmpressure: change vmpressure::sr_lock to spinlock Michal Hocko
2013-07-15 10:30                       ` Michal Hocko
2013-07-15 10:30                       ` [PATCH v3 2/3] vmpressure: do not check for pending work to prevent from new work Michal Hocko
2013-07-15 10:30                       ` [PATCH v3 3/3] vmpressure: Make sure there are no events queued after memcg is offlined Michal Hocko
2013-07-11  8:45       ` workqueue usage in vmpressure Li Zefan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.