All of lore.kernel.org
 help / color / mirror / Atom feed
* + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-18 21:26 akpm
  2012-07-19 11:39   ` Michal Hocko
  0 siblings, 1 reply; 40+ messages in thread
From: akpm @ 2012-07-18 21:26 UTC (permalink / raw)
  To: mm-commits; +Cc: aneesh.kumar, kamezawa.hiroyu, liwanp, mhocko


The patch titled
     Subject: hugetlb/cgroup: simplify pre_destroy callback
has been added to the -mm tree.  Its filename is
     hugetlb-cgroup-simplify-pre_destroy-callback.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: hugetlb/cgroup: simplify pre_destroy callback

Since we cannot fail in hugetlb_cgroup_move_parent(), we don't really need
to check whether cgroup have any change left after that.  Also skip those
hstates for which we don't have any charge in this cgroup.

Based on an earlier patch from Wanpeng Li.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb_cgroup.c |   49 ++++++++++++++++++------------------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff -puN mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback mm/hugetlb_cgroup.c
--- a/mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback
+++ a/mm/hugetlb_cgroup.c
@@ -65,18 +65,6 @@ static inline struct hugetlb_cgroup *par
 	return hugetlb_cgroup_from_cgroup(cg->parent);
 }
 
-static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
-{
-	int idx;
-	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
-
-	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
-		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
-			return true;
-	}
-	return false;
-}
-
 static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
 {
 	int idx;
@@ -159,24 +147,29 @@ static int hugetlb_cgroup_pre_destroy(st
 {
 	struct hstate *h;
 	struct page *page;
-	int ret = 0, idx = 0;
+	int ret = 0, idx;
+	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
 
-	do {
-		if (cgroup_task_count(cgroup) ||
-		    !list_empty(&cgroup->children)) {
-			ret = -EBUSY;
-			goto out;
-		}
-		for_each_hstate(h) {
-			spin_lock(&hugetlb_lock);
-			list_for_each_entry(page, &h->hugepage_activelist, lru)
-				hugetlb_cgroup_move_parent(idx, cgroup, page);
 
-			spin_unlock(&hugetlb_lock);
-			idx++;
-		}
-		cond_resched();
-	} while (hugetlb_cgroup_have_usage(cgroup));
+	if (cgroup_task_count(cgroup) ||
+	    !list_empty(&cgroup->children)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	for_each_hstate(h) {
+		/*
+		 * if we don't have any charge, skip this hstate
+		 */
+		idx = hstate_index(h);
+		if (res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE) == 0)
+			continue;
+		spin_lock(&hugetlb_lock);
+		list_for_each_entry(page, &h->hugepage_activelist, lru)
+			hugetlb_cgroup_move_parent(idx, cgroup, page);
+		spin_unlock(&hugetlb_lock);
+		VM_BUG_ON(res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE));
+	}
 out:
 	return ret;
 }
_
Subject: Subject: hugetlb/cgroup: simplify pre_destroy callback

Patches currently in -mm which might be from aneesh.kumar@linux.vnet.ibm.com are

linux-next.patch
hugetlb-rename-max_hstate-to-hugetlb_max_hstate.patch
hugetlb-dont-use-err_ptr-with-vm_fault-values.patch
hugetlb-add-an-inline-helper-for-finding-hstate-index.patch
hugetlb-use-mmu_gather-instead-of-a-temporary-linked-list-for-accumulating-pages.patch
hugetlb-avoid-taking-i_mmap_mutex-in-unmap_single_vma-for-hugetlb.patch
hugetlb-simplify-migrate_huge_page.patch
hugetlb-add-a-list-for-tracking-in-use-hugetlb-pages.patch
hugetlb-make-some-static-variables-global.patch
hugetlb-make-some-static-variables-global-mark-hugelb_max_hstate-__read_mostly.patch
mm-hugetlb-add-new-hugetlb-cgroup.patch
mm-hugetlb-add-new-hugetlb-cgroup-fix.patch
mm-hugetlb-add-new-hugetlb-cgroup-fix-fix.patch
mm-hugetlb-add-new-hugetlb-cgroup-fix-3.patch
mm-hugetlb-add-new-hugetlb-cgroup-mark-root_h_cgroup-static.patch
hugetlb-cgroup-add-the-cgroup-pointer-to-page-lru.patch
hugetlb-cgroup-add-charge-uncharge-routines-for-hugetlb-cgroup.patch
hugetlb-cgroup-add-charge-uncharge-routines-for-hugetlb-cgroup-fix.patch
hugetlb-cgroup-add-charge-uncharge-routines-for-hugetlb-cgroup-add-huge_page_order-check-to-avoid-incorrectly-uncharge.patch
hugetlb-cgroup-add-support-for-cgroup-removal.patch
hugetlb-cgroup-add-hugetlb-cgroup-control-files.patch
hugetlb-cgroup-add-hugetlb-cgroup-control-files-fix.patch
hugetlb-cgroup-add-hugetlb-cgroup-control-files-fix-fix.patch
hugetlb-cgroup-migrate-hugetlb-cgroup-info-from-oldpage-to-new-page-during-migration.patch
hugetlb-cgroup-add-hugetlb-controller-documentation.patch
hugetlb-move-all-the-in-use-pages-to-active-list.patch
hugetlb-cgroup-assign-the-page-hugetlb-cgroup-when-we-move-the-page-to-active-list.patch
hugetlb-cgroup-remove-exclude-and-wakeup-rmdir-calls-from-migrate.patch
hugetlb-cgroup-simplify-pre_destroy-callback.patch
memcg-rename-config-variables.patch
memcg-rename-config-variables-fix.patch
memcg-rename-config-variables-fix-fix.patch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-19 11:39   ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-07-19 11:39 UTC (permalink / raw)
  To: akpm
  Cc: mm-commits, aneesh.kumar, kamezawa.hiroyu, liwanp, Tejun Heo,
	Li Zefan, cgroups mailinglist, linux-mm

On Wed 18-07-12 14:26:36, Andrew Morton wrote:
> 
> The patch titled
>      Subject: hugetlb/cgroup: simplify pre_destroy callback
> has been added to the -mm tree.  Its filename is
>      hugetlb-cgroup-simplify-pre_destroy-callback.patch
> 
> Before you just go and hit "reply", please:
>    a) Consider who else should be cc'ed
>    b) Prefer to cc a suitable mailing list as well
>    c) Ideally: find the original patch on the mailing list and do a
>       reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days
> 
> ------------------------------------------------------
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Subject: hugetlb/cgroup: simplify pre_destroy callback
> 
> Since we cannot fail in hugetlb_cgroup_move_parent(), we don't really need
> to check whether cgroup have any change left after that.  Also skip those
> hstates for which we don't have any charge in this cgroup.

IIUC this depends on a non-existent (cgroup) patch. I guess something
like the patch at the end should address it. I haven't tested it though
so it is not signed-off-by yet.

> Based on an earlier patch from Wanpeng Li.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/hugetlb_cgroup.c |   49 ++++++++++++++++++------------------------
>  1 file changed, 21 insertions(+), 28 deletions(-)
> 
> diff -puN mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback mm/hugetlb_cgroup.c
> --- a/mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback
> +++ a/mm/hugetlb_cgroup.c
> @@ -65,18 +65,6 @@ static inline struct hugetlb_cgroup *par
>  	return hugetlb_cgroup_from_cgroup(cg->parent);
>  }
>  
> -static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
> -{
> -	int idx;
> -	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
> -
> -	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> -		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
> -			return true;
> -	}
> -	return false;
> -}
> -
>  static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
>  {
>  	int idx;
> @@ -159,24 +147,29 @@ static int hugetlb_cgroup_pre_destroy(st
>  {
>  	struct hstate *h;
>  	struct page *page;
> -	int ret = 0, idx = 0;
> +	int ret = 0, idx;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
>  
> -	do {
> -		if (cgroup_task_count(cgroup) ||
> -		    !list_empty(&cgroup->children)) {
> -			ret = -EBUSY;
> -			goto out;
> -		}
> -		for_each_hstate(h) {
> -			spin_lock(&hugetlb_lock);
> -			list_for_each_entry(page, &h->hugepage_activelist, lru)
> -				hugetlb_cgroup_move_parent(idx, cgroup, page);
>  
> -			spin_unlock(&hugetlb_lock);
> -			idx++;
> -		}
> -		cond_resched();
> -	} while (hugetlb_cgroup_have_usage(cgroup));
> +	if (cgroup_task_count(cgroup) ||
> +	    !list_empty(&cgroup->children)) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +
> +	for_each_hstate(h) {
> +		/*
> +		 * if we don't have any charge, skip this hstate
> +		 */
> +		idx = hstate_index(h);
> +		if (res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE) == 0)
> +			continue;
> +		spin_lock(&hugetlb_lock);
> +		list_for_each_entry(page, &h->hugepage_activelist, lru)
> +			hugetlb_cgroup_move_parent(idx, cgroup, page);
> +		spin_unlock(&hugetlb_lock);
> +		VM_BUG_ON(res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE));
> +	}
>  out:
>  	return ret;
>  }
> _

---

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-19 11:39   ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-07-19 11:39 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mm-commits-u79uwXL29TY76Z2rM5mHXA,
	aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Wed 18-07-12 14:26:36, Andrew Morton wrote:
> 
> The patch titled
>      Subject: hugetlb/cgroup: simplify pre_destroy callback
> has been added to the -mm tree.  Its filename is
>      hugetlb-cgroup-simplify-pre_destroy-callback.patch
> 
> Before you just go and hit "reply", please:
>    a) Consider who else should be cc'ed
>    b) Prefer to cc a suitable mailing list as well
>    c) Ideally: find the original patch on the mailing list and do a
>       reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days
> 
> ------------------------------------------------------
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Subject: hugetlb/cgroup: simplify pre_destroy callback
> 
> Since we cannot fail in hugetlb_cgroup_move_parent(), we don't really need
> to check whether cgroup have any change left after that.  Also skip those
> hstates for which we don't have any charge in this cgroup.

IIUC this depends on a non-existent (cgroup) patch. I guess something
like the patch at the end should address it. I haven't tested it though
so it is not signed-off-by yet.

> Based on an earlier patch from Wanpeng Li.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Signed-off-by: Wanpeng Li <liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Signed-off-by: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> ---
> 
>  mm/hugetlb_cgroup.c |   49 ++++++++++++++++++------------------------
>  1 file changed, 21 insertions(+), 28 deletions(-)
> 
> diff -puN mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback mm/hugetlb_cgroup.c
> --- a/mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback
> +++ a/mm/hugetlb_cgroup.c
> @@ -65,18 +65,6 @@ static inline struct hugetlb_cgroup *par
>  	return hugetlb_cgroup_from_cgroup(cg->parent);
>  }
>  
> -static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
> -{
> -	int idx;
> -	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
> -
> -	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> -		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
> -			return true;
> -	}
> -	return false;
> -}
> -
>  static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
>  {
>  	int idx;
> @@ -159,24 +147,29 @@ static int hugetlb_cgroup_pre_destroy(st
>  {
>  	struct hstate *h;
>  	struct page *page;
> -	int ret = 0, idx = 0;
> +	int ret = 0, idx;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
>  
> -	do {
> -		if (cgroup_task_count(cgroup) ||
> -		    !list_empty(&cgroup->children)) {
> -			ret = -EBUSY;
> -			goto out;
> -		}
> -		for_each_hstate(h) {
> -			spin_lock(&hugetlb_lock);
> -			list_for_each_entry(page, &h->hugepage_activelist, lru)
> -				hugetlb_cgroup_move_parent(idx, cgroup, page);
>  
> -			spin_unlock(&hugetlb_lock);
> -			idx++;
> -		}
> -		cond_resched();
> -	} while (hugetlb_cgroup_have_usage(cgroup));
> +	if (cgroup_task_count(cgroup) ||
> +	    !list_empty(&cgroup->children)) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +
> +	for_each_hstate(h) {
> +		/*
> +		 * if we don't have any charge, skip this hstate
> +		 */
> +		idx = hstate_index(h);
> +		if (res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE) == 0)
> +			continue;
> +		spin_lock(&hugetlb_lock);
> +		list_for_each_entry(page, &h->hugepage_activelist, lru)
> +			hugetlb_cgroup_move_parent(idx, cgroup, page);
> +		spin_unlock(&hugetlb_lock);
> +		VM_BUG_ON(res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE));
> +	}
>  out:
>  	return ret;
>  }
> _

---
From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Date: Thu, 19 Jul 2012 13:23:23 +0200
Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy

3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
cgroup_mutex lock while calling pre_destroy callbacks because memory
controller could deadlock because force_empty triggered reclaim.
Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
no reclaim going on from mem_cgroup_force_empty though so we can safely
keep the cgroup_mutex locked. This has an advantage that no tasks might
be added during pre_destroy callback and so the handlers don't have to
consider races when new tasks add new charges. This simplifies the
implementation.
---
 kernel/cgroup.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0f3527d..9dba05d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4181,7 +4181,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4204,7 +4203,6 @@ again:
 		return ret;
 	}
 
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
  2012-07-19 11:39   ` Michal Hocko
  (?)
@ 2012-07-19 12:21   ` Aneesh Kumar K.V
  2012-07-19 12:38       ` Michal Hocko
  -1 siblings, 1 reply; 40+ messages in thread
From: Aneesh Kumar K.V @ 2012-07-19 12:21 UTC (permalink / raw)
  To: Michal Hocko, akpm
  Cc: mm-commits, kamezawa.hiroyu, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

Michal Hocko <mhocko@suse.cz> writes:

> On Wed 18-07-12 14:26:36, Andrew Morton wrote:
>> 
>> The patch titled
>>      Subject: hugetlb/cgroup: simplify pre_destroy callback
>> has been added to the -mm tree.  Its filename is
>>      hugetlb-cgroup-simplify-pre_destroy-callback.patch
>> 
>> Before you just go and hit "reply", please:
>>    a) Consider who else should be cc'ed
>>    b) Prefer to cc a suitable mailing list as well
>>    c) Ideally: find the original patch on the mailing list and do a
>>       reply-to-all to that, adding suitable additional cc's
>> 
>> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>> 
>> The -mm tree is included into linux-next and is updated
>> there every 3-4 working days
>> 
>> ------------------------------------------------------
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> Subject: hugetlb/cgroup: simplify pre_destroy callback
>> 
>> Since we cannot fail in hugetlb_cgroup_move_parent(), we don't really need
>> to check whether cgroup have any change left after that.  Also skip those
>> hstates for which we don't have any charge in this cgroup.
>
> IIUC this depends on a non-existent (cgroup) patch. I guess something
> like the patch at the end should address it. I haven't tested it though
> so it is not signed-off-by yet.
>
>> Based on an earlier patch from Wanpeng Li.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Michal Hocko <mhocko@suse.cz>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>> 
>>  mm/hugetlb_cgroup.c |   49 ++++++++++++++++++------------------------
>>  1 file changed, 21 insertions(+), 28 deletions(-)
>> 
>> diff -puN mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback mm/hugetlb_cgroup.c
>> --- a/mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback
>> +++ a/mm/hugetlb_cgroup.c
>> @@ -65,18 +65,6 @@ static inline struct hugetlb_cgroup *par
>>  	return hugetlb_cgroup_from_cgroup(cg->parent);
>>  }
>>  
>> -static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
>> -{
>> -	int idx;
>> -	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
>> -
>> -	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>> -		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
>> -			return true;
>> -	}
>> -	return false;
>> -}
>> -
>>  static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
>>  {
>>  	int idx;
>> @@ -159,24 +147,29 @@ static int hugetlb_cgroup_pre_destroy(st
>>  {
>>  	struct hstate *h;
>>  	struct page *page;
>> -	int ret = 0, idx = 0;
>> +	int ret = 0, idx;
>> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
>>  
>> -	do {
>> -		if (cgroup_task_count(cgroup) ||
>> -		    !list_empty(&cgroup->children)) {
>> -			ret = -EBUSY;
>> -			goto out;
>> -		}
>> -		for_each_hstate(h) {
>> -			spin_lock(&hugetlb_lock);
>> -			list_for_each_entry(page, &h->hugepage_activelist, lru)
>> -				hugetlb_cgroup_move_parent(idx, cgroup, page);
>>  
>> -			spin_unlock(&hugetlb_lock);
>> -			idx++;
>> -		}
>> -		cond_resched();
>> -	} while (hugetlb_cgroup_have_usage(cgroup));
>> +	if (cgroup_task_count(cgroup) ||
>> +	    !list_empty(&cgroup->children)) {
>> +		ret = -EBUSY;
>> +		goto out;
>> +	}
>> +
>> +	for_each_hstate(h) {
>> +		/*
>> +		 * if we don't have any charge, skip this hstate
>> +		 */
>> +		idx = hstate_index(h);
>> +		if (res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE) == 0)
>> +			continue;
>> +		spin_lock(&hugetlb_lock);
>> +		list_for_each_entry(page, &h->hugepage_activelist, lru)
>> +			hugetlb_cgroup_move_parent(idx, cgroup, page);
>> +		spin_unlock(&hugetlb_lock);
>> +		VM_BUG_ON(res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE));
>> +	}
>>  out:
>>  	return ret;
>>  }
>> _
>
> ---
> From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Thu, 19 Jul 2012 13:23:23 +0200
> Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>
> 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
> cgroup_mutex lock while calling pre_destroy callbacks because memory
> controller could deadlock because force_empty triggered reclaim.
> Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
> no reclaim going on from mem_cgroup_force_empty though so we can safely
> keep the cgroup_mutex locked. This has an advantage that no tasks might
> be added during pre_destroy callback and so the handlers don't have to
> consider races when new tasks add new charges. This simplifies the
> implementation.
> ---
>  kernel/cgroup.c |    2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 0f3527d..9dba05d 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4181,7 +4181,6 @@ again:
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> -	mutex_unlock(&cgroup_mutex);
>
>  	/*
>  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> @@ -4204,7 +4203,6 @@ again:
>  		return ret;
>  	}
>
> -	mutex_lock(&cgroup_mutex);
>  	parent = cgrp->parent;
>  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);

mem_cgroup_force_empty still calls 

lru_add_drain_all 
   ->schedule_on_each_cpu
        -> get_online_cpus
           ->mutex_lock(&cpu_hotplug.lock);

So wont we deadlock ?

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-19 12:38       ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-07-19 12:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, mm-commits, kamezawa.hiroyu, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Thu, 19 Jul 2012 13:23:23 +0200
> > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
> >
> > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
> > cgroup_mutex lock while calling pre_destroy callbacks because memory
> > controller could deadlock because force_empty triggered reclaim.
> > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
> > no reclaim going on from mem_cgroup_force_empty though so we can safely
> > keep the cgroup_mutex locked. This has an advantage that no tasks might
> > be added during pre_destroy callback and so the handlers don't have to
> > consider races when new tasks add new charges. This simplifies the
> > implementation.
> > ---
> >  kernel/cgroup.c |    2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> > index 0f3527d..9dba05d 100644
> > --- a/kernel/cgroup.c
> > +++ b/kernel/cgroup.c
> > @@ -4181,7 +4181,6 @@ again:
> >  		mutex_unlock(&cgroup_mutex);
> >  		return -EBUSY;
> >  	}
> > -	mutex_unlock(&cgroup_mutex);
> >
> >  	/*
> >  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> > @@ -4204,7 +4203,6 @@ again:
> >  		return ret;
> >  	}
> >
> > -	mutex_lock(&cgroup_mutex);
> >  	parent = cgrp->parent;
> >  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
> >  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> 
> mem_cgroup_force_empty still calls 
> 
> lru_add_drain_all 
>    ->schedule_on_each_cpu
>         -> get_online_cpus
>            ->mutex_lock(&cpu_hotplug.lock);
> 
> So wont we deadlock ?

Yes you are right. I got it wrong. I thought that the reclaim is the
main problem. It won't be that easy then and the origin mm patch
(hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
or to be dropped.

> 
> -aneesh

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-19 12:38       ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-07-19 12:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> 
> > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> > Date: Thu, 19 Jul 2012 13:23:23 +0200
> > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
> >
> > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
> > cgroup_mutex lock while calling pre_destroy callbacks because memory
> > controller could deadlock because force_empty triggered reclaim.
> > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
> > no reclaim going on from mem_cgroup_force_empty though so we can safely
> > keep the cgroup_mutex locked. This has an advantage that no tasks might
> > be added during pre_destroy callback and so the handlers don't have to
> > consider races when new tasks add new charges. This simplifies the
> > implementation.
> > ---
> >  kernel/cgroup.c |    2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> > index 0f3527d..9dba05d 100644
> > --- a/kernel/cgroup.c
> > +++ b/kernel/cgroup.c
> > @@ -4181,7 +4181,6 @@ again:
> >  		mutex_unlock(&cgroup_mutex);
> >  		return -EBUSY;
> >  	}
> > -	mutex_unlock(&cgroup_mutex);
> >
> >  	/*
> >  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> > @@ -4204,7 +4203,6 @@ again:
> >  		return ret;
> >  	}
> >
> > -	mutex_lock(&cgroup_mutex);
> >  	parent = cgrp->parent;
> >  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
> >  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> 
> mem_cgroup_force_empty still calls 
> 
> lru_add_drain_all 
>    ->schedule_on_each_cpu
>         -> get_online_cpus
>            ->mutex_lock(&cpu_hotplug.lock);
> 
> So wont we deadlock ?

Yes you are right. I got it wrong. I thought that the reclaim is the
main problem. It won't be that easy then and the origin mm patch
(hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
or to be dropped.

> 
> -aneesh

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-19 13:48         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 40+ messages in thread
From: Aneesh Kumar K.V @ 2012-07-19 13:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, mm-commits, kamezawa.hiroyu, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

Michal Hocko <mhocko@suse.cz> writes:

> On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko@suse.cz> writes:
>> 
>> > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>> > From: Michal Hocko <mhocko@suse.cz>
>> > Date: Thu, 19 Jul 2012 13:23:23 +0200
>> > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>> >
>> > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>> > cgroup_mutex lock while calling pre_destroy callbacks because memory
>> > controller could deadlock because force_empty triggered reclaim.
>> > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>> > no reclaim going on from mem_cgroup_force_empty though so we can safely
>> > keep the cgroup_mutex locked. This has an advantage that no tasks might
>> > be added during pre_destroy callback and so the handlers don't have to
>> > consider races when new tasks add new charges. This simplifies the
>> > implementation.
>> > ---
>> >  kernel/cgroup.c |    2 --
>> >  1 file changed, 2 deletions(-)
>> >
>> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> > index 0f3527d..9dba05d 100644
>> > --- a/kernel/cgroup.c
>> > +++ b/kernel/cgroup.c
>> > @@ -4181,7 +4181,6 @@ again:
>> >  		mutex_unlock(&cgroup_mutex);
>> >  		return -EBUSY;
>> >  	}
>> > -	mutex_unlock(&cgroup_mutex);
>> >
>> >  	/*
>> >  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
>> > @@ -4204,7 +4203,6 @@ again:
>> >  		return ret;
>> >  	}
>> >
>> > -	mutex_lock(&cgroup_mutex);
>> >  	parent = cgrp->parent;
>> >  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>> >  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>> 
>> mem_cgroup_force_empty still calls 
>> 
>> lru_add_drain_all 
>>    ->schedule_on_each_cpu
>>         -> get_online_cpus
>>            ->mutex_lock(&cpu_hotplug.lock);
>> 
>> So wont we deadlock ?
>
> Yes you are right. I got it wrong. I thought that the reclaim is the
> main problem. It won't be that easy then and the origin mm patch
> (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
> or to be dropped.

We just need to remove the VM_BUG_ON() right ? The rest of the patch is
good right ? Otherwise how about the below

NOTE: Do we want to do s/mutex_[un]lock(&cgroup_mutex)/cgroup_[un]lock()/  ?

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7981850..01c67f4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4151,7 +4151,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4171,10 +4170,10 @@ again:
 	ret = cgroup_call_pre_destroy(cgrp);
 	if (ret) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
+		mutex_unlock(&cgroup_mutex);
 		return ret;
 	}
 
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e8ddc00..91c96df 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4993,9 +4993,18 @@ free_out:
 
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
+	int ret;
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	/*
+	 * we call lru_add_drain_all, which end up taking
+	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
+	 * the reverse order. So drop the cgroup lock
+	 */
+	ret = mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	return ret;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-19 13:48         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 40+ messages in thread
From: Aneesh Kumar K.V @ 2012-07-19 13:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:

> On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
>> 
>> > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>> > From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
>> > Date: Thu, 19 Jul 2012 13:23:23 +0200
>> > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>> >
>> > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>> > cgroup_mutex lock while calling pre_destroy callbacks because memory
>> > controller could deadlock because force_empty triggered reclaim.
>> > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>> > no reclaim going on from mem_cgroup_force_empty though so we can safely
>> > keep the cgroup_mutex locked. This has an advantage that no tasks might
>> > be added during pre_destroy callback and so the handlers don't have to
>> > consider races when new tasks add new charges. This simplifies the
>> > implementation.
>> > ---
>> >  kernel/cgroup.c |    2 --
>> >  1 file changed, 2 deletions(-)
>> >
>> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> > index 0f3527d..9dba05d 100644
>> > --- a/kernel/cgroup.c
>> > +++ b/kernel/cgroup.c
>> > @@ -4181,7 +4181,6 @@ again:
>> >  		mutex_unlock(&cgroup_mutex);
>> >  		return -EBUSY;
>> >  	}
>> > -	mutex_unlock(&cgroup_mutex);
>> >
>> >  	/*
>> >  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
>> > @@ -4204,7 +4203,6 @@ again:
>> >  		return ret;
>> >  	}
>> >
>> > -	mutex_lock(&cgroup_mutex);
>> >  	parent = cgrp->parent;
>> >  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>> >  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>> 
>> mem_cgroup_force_empty still calls 
>> 
>> lru_add_drain_all 
>>    ->schedule_on_each_cpu
>>         -> get_online_cpus
>>            ->mutex_lock(&cpu_hotplug.lock);
>> 
>> So wont we deadlock ?
>
> Yes you are right. I got it wrong. I thought that the reclaim is the
> main problem. It won't be that easy then and the origin mm patch
> (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
> or to be dropped.

We just need to remove the VM_BUG_ON() right ? The rest of the patch is
good right ? Otherwise how about the below

NOTE: Do we want to do s/mutex_[un]lock(&cgroup_mutex)/cgroup_[un]lock()/  ?

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7981850..01c67f4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4151,7 +4151,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4171,10 +4170,10 @@ again:
 	ret = cgroup_call_pre_destroy(cgrp);
 	if (ret) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
+		mutex_unlock(&cgroup_mutex);
 		return ret;
 	}
 
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e8ddc00..91c96df 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4993,9 +4993,18 @@ free_out:
 
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
+	int ret;
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	/*
+	 * we call lru_add_drain_all, which end up taking
+	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
+	 * the reverse order. So drop the cgroup lock
+	 */
+	ret = mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	return ret;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-19 14:09           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 40+ messages in thread
From: Aneesh Kumar K.V @ 2012-07-19 14:09 UTC (permalink / raw)
  To: akpm, mhocko, kamezawa.hiroyu, liwanp, htejun, lizefan, cgroups,
	linux-mm
  Cc: Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
did lru_add_drain_all() which took hotplug lock while already holding
cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
added to cgroup while we are in pre_destroy. This makes error handling in
pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
Core cgroup will now call pre_destroy with cgroup_mutex held.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 kernel/cgroup.c |    3 +--
 mm/memcontrol.c |   11 ++++++++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7981850..01c67f4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4151,7 +4151,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4171,10 +4170,10 @@ again:
 	ret = cgroup_call_pre_destroy(cgrp);
 	if (ret) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
+		mutex_unlock(&cgroup_mutex);
 		return ret;
 	}
 
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e8ddc00..9bd56ee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4993,9 +4993,18 @@ free_out:
 
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
+	int ret;
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	/*
+	 * we call lru_add_drain_all, which end up taking
+	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
+	 * the reverse order. So drop the cgroup lock
+	 */
+	ret = mem_cgroup_force_empty(memcg, false);
+	cgroup_lock();
+	return ret;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)
-- 
1.7.10

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-19 14:09           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 40+ messages in thread
From: Aneesh Kumar K.V @ 2012-07-19 14:09 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	htejun-Re5JQEeQqe8AvxtiuMwx3w, lizefan-hv44wF8Li93QT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
did lru_add_drain_all() which took hotplug lock while already holding
cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
added to cgroup while we are in pre_destroy. This makes error handling in
pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
Core cgroup will now call pre_destroy with cgroup_mutex held.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 kernel/cgroup.c |    3 +--
 mm/memcontrol.c |   11 ++++++++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7981850..01c67f4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4151,7 +4151,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4171,10 +4170,10 @@ again:
 	ret = cgroup_call_pre_destroy(cgrp);
 	if (ret) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
+		mutex_unlock(&cgroup_mutex);
 		return ret;
 	}
 
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e8ddc00..9bd56ee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4993,9 +4993,18 @@ free_out:
 
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
+	int ret;
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	/*
+	 * we call lru_add_drain_all, which end up taking
+	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
+	 * the reverse order. So drop the cgroup lock
+	 */
+	ret = mem_cgroup_force_empty(memcg, false);
+	cgroup_lock();
+	return ret;
 }
 
 static void mem_cgroup_destroy(struct cgroup *cont)
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-19 16:50             ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-19 16:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, mhocko, kamezawa.hiroyu, liwanp, lizefan, cgroups,
	linux-mm, Peter Zijlstra, glommer

On Thu, Jul 19, 2012 at 07:39:32PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
> cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
> did lru_add_drain_all() which took hotplug lock while already holding
> cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
> added to cgroup while we are in pre_destroy. This makes error handling in
> pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
> Core cgroup will now call pre_destroy with cgroup_mutex held.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

I generally think cgroup_mutex shouldn't be dropped across any cgroup
hierarchy changing operation and thus agree with the cgroup core
change.

>  static int mem_cgroup_pre_destroy(struct cgroup *cont)
>  {
> +	int ret;
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>  
> -	return mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	/*
> +	 * we call lru_add_drain_all, which end up taking
> +	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
> +	 * the reverse order. So drop the cgroup lock
> +	 */
> +	ret = mem_cgroup_force_empty(memcg, false);
> +	cgroup_lock();
> +	return ret;
>  }

This reverse dependency from cpuset is the same problem Glauber
reported a while ago.  I don't know why / how cgroup_mutex got
exported to outside world but this is asking for trouble.  cgroup
mutex protects cgroup hierarchies.  There are many core subsystems
which implement cgroup controllers.  Controller callbacks for
hierarchy changing oeprations need to synchronize with the rest of the
core subsystems.  So, by design, in locking hierarchy, cgroup_mutex
has to be one of the outermost locks.  If somebody tries to grab it
from inside other core subsystem locks, there of course will be
circular locking dependencies.

So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
does it need?  Why can't it work on "changed" notification while
caching the current css like blkcg does?

Let's please unexport cgroup_mutex.  It's moronic to expose that
outside cgroup.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-19 16:50             ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-19 16:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Peter Zijlstra,
	glommer-bzQdu9zFT3WakBO8gow8eQ

On Thu, Jul 19, 2012 at 07:39:32PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
> cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
> did lru_add_drain_all() which took hotplug lock while already holding
> cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
> added to cgroup while we are in pre_destroy. This makes error handling in
> pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
> Core cgroup will now call pre_destroy with cgroup_mutex held.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

I generally think cgroup_mutex shouldn't be dropped across any cgroup
hierarchy changing operation and thus agree with the cgroup core
change.

>  static int mem_cgroup_pre_destroy(struct cgroup *cont)
>  {
> +	int ret;
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>  
> -	return mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	/*
> +	 * we call lru_add_drain_all, which end up taking
> +	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
> +	 * the reverse order. So drop the cgroup lock
> +	 */
> +	ret = mem_cgroup_force_empty(memcg, false);
> +	cgroup_lock();
> +	return ret;
>  }

This reverse dependency from cpuset is the same problem Glauber
reported a while ago.  I don't know why / how cgroup_mutex got
exported to outside world but this is asking for trouble.  cgroup
mutex protects cgroup hierarchies.  There are many core subsystems
which implement cgroup controllers.  Controller callbacks for
hierarchy changing oeprations need to synchronize with the rest of the
core subsystems.  So, by design, in locking hierarchy, cgroup_mutex
has to be one of the outermost locks.  If somebody tries to grab it
from inside other core subsystem locks, there of course will be
circular locking dependencies.

So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
does it need?  Why can't it work on "changed" notification while
caching the current css like blkcg does?

Let's please unexport cgroup_mutex.  It's moronic to expose that
outside cgroup.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-20  1:05           ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 40+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-20  1:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, akpm, mm-commits, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

(2012/07/19 22:48), Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
>
>> On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>>> Michal Hocko <mhocko@suse.cz> writes:
>>>
>>>>  From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>>>> From: Michal Hocko <mhocko@suse.cz>
>>>> Date: Thu, 19 Jul 2012 13:23:23 +0200
>>>> Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>>>>
>>>> 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>>>> cgroup_mutex lock while calling pre_destroy callbacks because memory
>>>> controller could deadlock because force_empty triggered reclaim.
>>>> Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>>>> no reclaim going on from mem_cgroup_force_empty though so we can safely
>>>> keep the cgroup_mutex locked. This has an advantage that no tasks might
>>>> be added during pre_destroy callback and so the handlers don't have to
>>>> consider races when new tasks add new charges. This simplifies the
>>>> implementation.
>>>> ---
>>>>   kernel/cgroup.c |    2 --
>>>>   1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>>>> index 0f3527d..9dba05d 100644
>>>> --- a/kernel/cgroup.c
>>>> +++ b/kernel/cgroup.c
>>>> @@ -4181,7 +4181,6 @@ again:
>>>>   		mutex_unlock(&cgroup_mutex);
>>>>   		return -EBUSY;
>>>>   	}
>>>> -	mutex_unlock(&cgroup_mutex);
>>>>
>>>>   	/*
>>>>   	 * In general, subsystem has no css->refcnt after pre_destroy(). But
>>>> @@ -4204,7 +4203,6 @@ again:
>>>>   		return ret;
>>>>   	}
>>>>
>>>> -	mutex_lock(&cgroup_mutex);
>>>>   	parent = cgrp->parent;
>>>>   	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>>>>   		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>>>
>>> mem_cgroup_force_empty still calls
>>>
>>> lru_add_drain_all
>>>     ->schedule_on_each_cpu
>>>          -> get_online_cpus
>>>             ->mutex_lock(&cpu_hotplug.lock);
>>>
>>> So wont we deadlock ?
>>
>> Yes you are right. I got it wrong. I thought that the reclaim is the
>> main problem. It won't be that easy then and the origin mm patch
>> (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
>> or to be dropped.
>

Aha, then the problematic schedule_on_each_cpu(), Andrew pointed out in this month,
is in front of us :( ...and drain_all_stock_sync() should be fixed too.
Hmm...

> We just need to remove the VM_BUG_ON() right ? The rest of the patch is
> good right ? Otherwise how about the below
>

I'm personally okay but....ugly ?

Thanks,
-Kame
> NOTE: Do we want to do s/mutex_[un]lock(&cgroup_mutex)/cgroup_[un]lock()/  ?
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 7981850..01c67f4 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4151,7 +4151,6 @@ again:
>   		mutex_unlock(&cgroup_mutex);
>   		return -EBUSY;
>   	}
> -	mutex_unlock(&cgroup_mutex);
>
>   	/*
>   	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> @@ -4171,10 +4170,10 @@ again:
>   	ret = cgroup_call_pre_destroy(cgrp);
>   	if (ret) {
>   		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> +		mutex_unlock(&cgroup_mutex);
>   		return ret;
>   	}
>
> -	mutex_lock(&cgroup_mutex);
>   	parent = cgrp->parent;
>   	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>   		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e8ddc00..91c96df 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4993,9 +4993,18 @@ free_out:
>
>   static int mem_cgroup_pre_destroy(struct cgroup *cont)
>   {
> +	int ret;
>   	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> -	return mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	/*
> +	 * we call lru_add_drain_all, which end up taking
> +	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
> +	 * the reverse order. So drop the cgroup lock
> +	 */
> +	ret = mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	return ret;
>   }
>
>   static void mem_cgroup_destroy(struct cgroup *cont)
>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-20  1:05           ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 40+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-20  1:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

(2012/07/19 22:48), Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
>
>> On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>>> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
>>>
>>>>  From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>>>> From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
>>>> Date: Thu, 19 Jul 2012 13:23:23 +0200
>>>> Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>>>>
>>>> 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>>>> cgroup_mutex lock while calling pre_destroy callbacks because memory
>>>> controller could deadlock because force_empty triggered reclaim.
>>>> Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>>>> no reclaim going on from mem_cgroup_force_empty though so we can safely
>>>> keep the cgroup_mutex locked. This has an advantage that no tasks might
>>>> be added during pre_destroy callback and so the handlers don't have to
>>>> consider races when new tasks add new charges. This simplifies the
>>>> implementation.
>>>> ---
>>>>   kernel/cgroup.c |    2 --
>>>>   1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>>>> index 0f3527d..9dba05d 100644
>>>> --- a/kernel/cgroup.c
>>>> +++ b/kernel/cgroup.c
>>>> @@ -4181,7 +4181,6 @@ again:
>>>>   		mutex_unlock(&cgroup_mutex);
>>>>   		return -EBUSY;
>>>>   	}
>>>> -	mutex_unlock(&cgroup_mutex);
>>>>
>>>>   	/*
>>>>   	 * In general, subsystem has no css->refcnt after pre_destroy(). But
>>>> @@ -4204,7 +4203,6 @@ again:
>>>>   		return ret;
>>>>   	}
>>>>
>>>> -	mutex_lock(&cgroup_mutex);
>>>>   	parent = cgrp->parent;
>>>>   	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>>>>   		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>>>
>>> mem_cgroup_force_empty still calls
>>>
>>> lru_add_drain_all
>>>     ->schedule_on_each_cpu
>>>          -> get_online_cpus
>>>             ->mutex_lock(&cpu_hotplug.lock);
>>>
>>> So wont we deadlock ?
>>
>> Yes you are right. I got it wrong. I thought that the reclaim is the
>> main problem. It won't be that easy then and the origin mm patch
>> (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
>> or to be dropped.
>

Aha, then the problematic schedule_on_each_cpu(), Andrew pointed out in this month,
is in front of us :( ...and drain_all_stock_sync() should be fixed too.
Hmm...

> We just need to remove the VM_BUG_ON() right ? The rest of the patch is
> good right ? Otherwise how about the below
>

I'm personally okay but....ugly ?

Thanks,
-Kame
> NOTE: Do we want to do s/mutex_[un]lock(&cgroup_mutex)/cgroup_[un]lock()/  ?
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 7981850..01c67f4 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4151,7 +4151,6 @@ again:
>   		mutex_unlock(&cgroup_mutex);
>   		return -EBUSY;
>   	}
> -	mutex_unlock(&cgroup_mutex);
>
>   	/*
>   	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> @@ -4171,10 +4170,10 @@ again:
>   	ret = cgroup_call_pre_destroy(cgrp);
>   	if (ret) {
>   		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> +		mutex_unlock(&cgroup_mutex);
>   		return ret;
>   	}
>
> -	mutex_lock(&cgroup_mutex);
>   	parent = cgrp->parent;
>   	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>   		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e8ddc00..91c96df 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4993,9 +4993,18 @@ free_out:
>
>   static int mem_cgroup_pre_destroy(struct cgroup *cont)
>   {
> +	int ret;
>   	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>
> -	return mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	/*
> +	 * we call lru_add_drain_all, which end up taking
> +	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
> +	 * the reverse order. So drop the cgroup lock
> +	 */
> +	ret = mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	return ret;
>   }
>
>   static void mem_cgroup_destroy(struct cgroup *cont)
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-20  1:20             ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 40+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-20  1:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, akpm, mm-commits, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

(2012/07/20 10:05), Kamezawa Hiroyuki wrote:
> (2012/07/19 22:48), Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko@suse.cz> writes:
>>
>>> On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>>>> Michal Hocko <mhocko@suse.cz> writes:
>>>>
>>>>>  From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>>>>> From: Michal Hocko <mhocko@suse.cz>
>>>>> Date: Thu, 19 Jul 2012 13:23:23 +0200
>>>>> Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>>>>>
>>>>> 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>>>>> cgroup_mutex lock while calling pre_destroy callbacks because memory
>>>>> controller could deadlock because force_empty triggered reclaim.
>>>>> Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>>>>> no reclaim going on from mem_cgroup_force_empty though so we can safely
>>>>> keep the cgroup_mutex locked. This has an advantage that no tasks might
>>>>> be added during pre_destroy callback and so the handlers don't have to
>>>>> consider races when new tasks add new charges. This simplifies the
>>>>> implementation.
>>>>> ---
>>>>>   kernel/cgroup.c |    2 --
>>>>>   1 file changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>>>>> index 0f3527d..9dba05d 100644
>>>>> --- a/kernel/cgroup.c
>>>>> +++ b/kernel/cgroup.c
>>>>> @@ -4181,7 +4181,6 @@ again:
>>>>>           mutex_unlock(&cgroup_mutex);
>>>>>           return -EBUSY;
>>>>>       }
>>>>> -    mutex_unlock(&cgroup_mutex);
>>>>>
>>>>>       /*
>>>>>        * In general, subsystem has no css->refcnt after pre_destroy(). But
>>>>> @@ -4204,7 +4203,6 @@ again:
>>>>>           return ret;
>>>>>       }
>>>>>
>>>>> -    mutex_lock(&cgroup_mutex);
>>>>>       parent = cgrp->parent;
>>>>>       if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>>>>>           clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>>>>
>>>> mem_cgroup_force_empty still calls
>>>>
>>>> lru_add_drain_all
>>>>     ->schedule_on_each_cpu
>>>>          -> get_online_cpus
>>>>             ->mutex_lock(&cpu_hotplug.lock);
>>>>
>>>> So wont we deadlock ?
>>>
>>> Yes you are right. I got it wrong. I thought that the reclaim is the
>>> main problem. It won't be that easy then and the origin mm patch
>>> (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
>>> or to be dropped.
>>
>
> Aha, then the problematic schedule_on_each_cpu(), Andrew pointed out in this month,
> is in front of us :( ...and drain_all_stock_sync() should be fixed too.
> Hmm...
>
>> We just need to remove the VM_BUG_ON() right ? The rest of the patch is
>> good right ? Otherwise how about the below
>>
>
> I'm personally okay but....ugly ?
>

Hmm, can't cgroup_lock() be implemented as


void cgroup_lock()
{
	get_online_cpus()
	lock_memory_hotplug()
	mutex_lock(&cgroup_mutex);
}


?

Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-20  1:20             ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 40+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-20  1:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

(2012/07/20 10:05), Kamezawa Hiroyuki wrote:
> (2012/07/19 22:48), Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
>>
>>> On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>>>> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
>>>>
>>>>>  From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>>>>> From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
>>>>> Date: Thu, 19 Jul 2012 13:23:23 +0200
>>>>> Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>>>>>
>>>>> 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>>>>> cgroup_mutex lock while calling pre_destroy callbacks because memory
>>>>> controller could deadlock because force_empty triggered reclaim.
>>>>> Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>>>>> no reclaim going on from mem_cgroup_force_empty though so we can safely
>>>>> keep the cgroup_mutex locked. This has an advantage that no tasks might
>>>>> be added during pre_destroy callback and so the handlers don't have to
>>>>> consider races when new tasks add new charges. This simplifies the
>>>>> implementation.
>>>>> ---
>>>>>   kernel/cgroup.c |    2 --
>>>>>   1 file changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>>>>> index 0f3527d..9dba05d 100644
>>>>> --- a/kernel/cgroup.c
>>>>> +++ b/kernel/cgroup.c
>>>>> @@ -4181,7 +4181,6 @@ again:
>>>>>           mutex_unlock(&cgroup_mutex);
>>>>>           return -EBUSY;
>>>>>       }
>>>>> -    mutex_unlock(&cgroup_mutex);
>>>>>
>>>>>       /*
>>>>>        * In general, subsystem has no css->refcnt after pre_destroy(). But
>>>>> @@ -4204,7 +4203,6 @@ again:
>>>>>           return ret;
>>>>>       }
>>>>>
>>>>> -    mutex_lock(&cgroup_mutex);
>>>>>       parent = cgrp->parent;
>>>>>       if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>>>>>           clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>>>>
>>>> mem_cgroup_force_empty still calls
>>>>
>>>> lru_add_drain_all
>>>>     ->schedule_on_each_cpu
>>>>          -> get_online_cpus
>>>>             ->mutex_lock(&cpu_hotplug.lock);
>>>>
>>>> So wont we deadlock ?
>>>
>>> Yes you are right. I got it wrong. I thought that the reclaim is the
>>> main problem. It won't be that easy then and the origin mm patch
>>> (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
>>> or to be dropped.
>>
>
> Aha, then the problematic schedule_on_each_cpu(), Andrew pointed out in this month,
> is in front of us :( ...and drain_all_stock_sync() should be fixed too.
> Hmm...
>
>> We just need to remove the VM_BUG_ON() right ? The rest of the patch is
>> good right ? Otherwise how about the below
>>
>
> I'm personally okay but....ugly ?
>

Hmm, can't cgroup_lock() be implemented as


void cgroup_lock()
{
	get_online_cpus()
	lock_memory_hotplug()
	mutex_lock(&cgroup_mutex);
}


?

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-20  7:51             ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-07-20  7:51 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, kamezawa.hiroyu, liwanp, htejun, lizefan, cgroups, linux-mm

On Thu 19-07-12 19:39:32, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
> cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
> did lru_add_drain_all() which took hotplug lock while already holding
> cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
> added to cgroup while we are in pre_destroy. This makes error handling in
> pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
> Core cgroup will now call pre_destroy with cgroup_mutex held.

OK, this should be good for now but we should definitely work towards
not messing with cgroup_mutex like that and remove the hack from memcg
callback in the long term.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  kernel/cgroup.c |    3 +--
>  mm/memcontrol.c |   11 ++++++++++-
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 7981850..01c67f4 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4151,7 +4151,6 @@ again:
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> -	mutex_unlock(&cgroup_mutex);
>  
>  	/*
>  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> @@ -4171,10 +4170,10 @@ again:
>  	ret = cgroup_call_pre_destroy(cgrp);
>  	if (ret) {
>  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> +		mutex_unlock(&cgroup_mutex);
>  		return ret;
>  	}
>  
> -	mutex_lock(&cgroup_mutex);
>  	parent = cgrp->parent;
>  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e8ddc00..9bd56ee 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4993,9 +4993,18 @@ free_out:
>  
>  static int mem_cgroup_pre_destroy(struct cgroup *cont)
>  {
> +	int ret;
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>  
> -	return mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	/*
> +	 * we call lru_add_drain_all, which end up taking
> +	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
> +	 * the reverse order. So drop the cgroup lock
> +	 */
> +	ret = mem_cgroup_force_empty(memcg, false);
> +	cgroup_lock();
> +	return ret;
>  }
>  
>  static void mem_cgroup_destroy(struct cgroup *cont)
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-20  7:51             ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-07-20  7:51 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	htejun-Re5JQEeQqe8AvxtiuMwx3w, lizefan-hv44wF8Li93QT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Thu 19-07-12 19:39:32, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
> cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
> did lru_add_drain_all() which took hotplug lock while already holding
> cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
> added to cgroup while we are in pre_destroy. This makes error handling in
> pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
> Core cgroup will now call pre_destroy with cgroup_mutex held.

OK, this should be good for now but we should definitely work towards
not messing with cgroup_mutex like that and remove the hack from memcg
callback in the long term.

> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  kernel/cgroup.c |    3 +--
>  mm/memcontrol.c |   11 ++++++++++-
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 7981850..01c67f4 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4151,7 +4151,6 @@ again:
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> -	mutex_unlock(&cgroup_mutex);
>  
>  	/*
>  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> @@ -4171,10 +4170,10 @@ again:
>  	ret = cgroup_call_pre_destroy(cgrp);
>  	if (ret) {
>  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> +		mutex_unlock(&cgroup_mutex);
>  		return ret;
>  	}
>  
> -	mutex_lock(&cgroup_mutex);
>  	parent = cgrp->parent;
>  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e8ddc00..9bd56ee 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4993,9 +4993,18 @@ free_out:
>  
>  static int mem_cgroup_pre_destroy(struct cgroup *cont)
>  {
> +	int ret;
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>  
> -	return mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	/*
> +	 * we call lru_add_drain_all, which end up taking
> +	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
> +	 * the reverse order. So drop the cgroup lock
> +	 */
> +	ret = mem_cgroup_force_empty(memcg, false);
> +	cgroup_lock();
> +	return ret;
>  }
>  
>  static void mem_cgroup_destroy(struct cgroup *cont)
> -- 
> 1.7.10
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-20  8:01               ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-07-20  8:01 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Aneesh Kumar K.V, akpm, mm-commits, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

On Fri 20-07-12 10:20:29, KAMEZAWA Hiroyuki wrote:
[...]
> Hmm, can't cgroup_lock() be implemented as
> 
> 
> void cgroup_lock()
> {
> 	get_online_cpus()
> 	lock_memory_hotplug()
> 	mutex_lock(&cgroup_mutex);
> }

This is really ugly and it wouldn't help much anyway. Notifier which
takes the cgroup_lock is called when cpu_hotplug.lock is held already.
You would need to call cgroup_lock() before taking the cpu_hotplug.lock
and remove it from notifiers. I think this should be doable but I didn't
have too much time to look deeper into it.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-20  8:01               ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2012-07-20  8:01 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Aneesh Kumar K.V, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Fri 20-07-12 10:20:29, KAMEZAWA Hiroyuki wrote:
[...]
> Hmm, can't cgroup_lock() be implemented as
> 
> 
> void cgroup_lock()
> {
> 	get_online_cpus()
> 	lock_memory_hotplug()
> 	mutex_lock(&cgroup_mutex);
> }

This is really ugly and it wouldn't help much anyway. Notifier which
takes the cgroup_lock is called when cpu_hotplug.lock is held already.
You would need to call cgroup_lock() before taking the cpu_hotplug.lock
and remove it from notifiers. I think this should be doable but I didn't
have too much time to look deeper into it.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
  2012-07-19 13:48         ` Aneesh Kumar K.V
                           ` (2 preceding siblings ...)
  (?)
@ 2012-07-20  8:06         ` Michal Hocko
  2012-07-20 19:18           ` Aneesh Kumar K.V
  -1 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2012-07-20  8:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, mm-commits, kamezawa.hiroyu, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

On Thu 19-07-12 19:18:24, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@suse.cz> writes:
> 
> > On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
> >> Michal Hocko <mhocko@suse.cz> writes:
> >> 
> >> > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
> >> > From: Michal Hocko <mhocko@suse.cz>
> >> > Date: Thu, 19 Jul 2012 13:23:23 +0200
> >> > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
> >> >
> >> > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
> >> > cgroup_mutex lock while calling pre_destroy callbacks because memory
> >> > controller could deadlock because force_empty triggered reclaim.
> >> > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
> >> > no reclaim going on from mem_cgroup_force_empty though so we can safely
> >> > keep the cgroup_mutex locked. This has an advantage that no tasks might
> >> > be added during pre_destroy callback and so the handlers don't have to
> >> > consider races when new tasks add new charges. This simplifies the
> >> > implementation.
> >> > ---
> >> >  kernel/cgroup.c |    2 --
> >> >  1 file changed, 2 deletions(-)
> >> >
> >> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> >> > index 0f3527d..9dba05d 100644
> >> > --- a/kernel/cgroup.c
> >> > +++ b/kernel/cgroup.c
> >> > @@ -4181,7 +4181,6 @@ again:
> >> >  		mutex_unlock(&cgroup_mutex);
> >> >  		return -EBUSY;
> >> >  	}
> >> > -	mutex_unlock(&cgroup_mutex);
> >> >
> >> >  	/*
> >> >  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> >> > @@ -4204,7 +4203,6 @@ again:
> >> >  		return ret;
> >> >  	}
> >> >
> >> > -	mutex_lock(&cgroup_mutex);
> >> >  	parent = cgrp->parent;
> >> >  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
> >> >  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> >> 
> >> mem_cgroup_force_empty still calls 
> >> 
> >> lru_add_drain_all 
> >>    ->schedule_on_each_cpu
> >>         -> get_online_cpus
> >>            ->mutex_lock(&cpu_hotplug.lock);
> >> 
> >> So wont we deadlock ?
> >
> > Yes you are right. I got it wrong. I thought that the reclaim is the
> > main problem. It won't be that easy then and the origin mm patch
> > (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
> > or to be dropped.
> 
> We just need to remove the VM_BUG_ON() right ? The rest of the patch is
> good right ? Otherwise how about the below

You can keep VM_BUG_ON with the patch below and also remove the check
for cgroup_task_count || &cgroup->children because that is checked in
cgroup_rmdir already.

> NOTE: Do we want to do s/mutex_[un]lock(&cgroup_mutex)/cgroup_[un]lock()/  ?
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 7981850..01c67f4 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4151,7 +4151,6 @@ again:
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> -	mutex_unlock(&cgroup_mutex);
>  
>  	/*
>  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
> @@ -4171,10 +4170,10 @@ again:
>  	ret = cgroup_call_pre_destroy(cgrp);
>  	if (ret) {
>  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> +		mutex_unlock(&cgroup_mutex);
>  		return ret;
>  	}
>  
> -	mutex_lock(&cgroup_mutex);
>  	parent = cgrp->parent;
>  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e8ddc00..91c96df 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4993,9 +4993,18 @@ free_out:
>  
>  static int mem_cgroup_pre_destroy(struct cgroup *cont)
>  {
> +	int ret;
>  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>  
> -	return mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	/*
> +	 * we call lru_add_drain_all, which end up taking
> +	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
> +	 * the reverse order. So drop the cgroup lock
> +	 */
> +	ret = mem_cgroup_force_empty(memcg, false);
> +	cgroup_unlock();
> +	return ret;
>  }
>  
>  static void mem_cgroup_destroy(struct cgroup *cont)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
  2012-07-20  8:01               ` Michal Hocko
  (?)
@ 2012-07-20  8:08               ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 40+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-20  8:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Aneesh Kumar K.V, akpm, mm-commits, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

(2012/07/20 17:01), Michal Hocko wrote:
> On Fri 20-07-12 10:20:29, KAMEZAWA Hiroyuki wrote:
> [...]
>> Hmm, can't cgroup_lock() be implemented as
>>
>>
>> void cgroup_lock()
>> {
>> 	get_online_cpus()
>> 	lock_memory_hotplug()
>> 	mutex_lock(&cgroup_mutex);
>> }
>
> This is really ugly and it wouldn't help much anyway. Notifier which
> takes the cgroup_lock is called when cpu_hotplug.lock is held already.

Hm ? IIUC, notifer will not work until put_online_cpu() is called.

> You would need to call cgroup_lock() before taking the cpu_hotplug.lock
> and remove it from notifiers. I think this should be doable but I didn't
> have too much time to look deeper into it.
>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
  2012-07-19 16:50             ` Tejun Heo
  (?)
@ 2012-07-20 15:45             ` Peter Zijlstra
  2012-07-20 20:05                 ` Tejun Heo
  -1 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2012-07-20 15:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aneesh Kumar K.V, akpm, mhocko, kamezawa.hiroyu, liwanp, lizefan,
	cgroups, linux-mm, glommer

On Thu, 2012-07-19 at 09:50 -0700, Tejun Heo wrote:
> On Thu, Jul 19, 2012 at 07:39:32PM +0530, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
> > cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
> > did lru_add_drain_all() which took hotplug lock while already holding
> > cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> > But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
> > added to cgroup while we are in pre_destroy. This makes error handling in
> > pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
> > Core cgroup will now call pre_destroy with cgroup_mutex held.
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> I generally think cgroup_mutex shouldn't be dropped across any cgroup
> hierarchy changing operation and thus agree with the cgroup core
> change.
> 
> >  static int mem_cgroup_pre_destroy(struct cgroup *cont)
> >  {
> > +	int ret;
> >  	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> >  
> > -	return mem_cgroup_force_empty(memcg, false);
> > +	cgroup_unlock();
> > +	/*
> > +	 * we call lru_add_drain_all, which end up taking
> > +	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
> > +	 * the reverse order. So drop the cgroup lock
> > +	 */
> > +	ret = mem_cgroup_force_empty(memcg, false);
> > +	cgroup_lock();
> > +	return ret;
> >  }
> 
> This reverse dependency from cpuset is the same problem Glauber
> reported a while ago.  I don't know why / how cgroup_mutex got
> exported to outside world but this is asking for trouble.  cgroup
> mutex protects cgroup hierarchies.  There are many core subsystems
> which implement cgroup controllers.  Controller callbacks for
> hierarchy changing oeprations need to synchronize with the rest of the
> core subsystems.  So, by design, in locking hierarchy, cgroup_mutex
> has to be one of the outermost locks.  If somebody tries to grab it
> from inside other core subsystem locks, there of course will be
> circular locking dependencies.
> 
> So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
> does it need?  Why can't it work on "changed" notification while
> caching the current css like blkcg does?

I've no clue sorry.. /me goes stare at this stuff.. Looks like something
Paul Menage did when he created cgroups. I'll have to have a hard look
at all that to untangle this. Not something obvious to me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
  2012-07-20  8:06         ` Michal Hocko
@ 2012-07-20 19:18           ` Aneesh Kumar K.V
  2012-07-20 19:56             ` Tejun Heo
  0 siblings, 1 reply; 40+ messages in thread
From: Aneesh Kumar K.V @ 2012-07-20 19:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, mm-commits, kamezawa.hiroyu, liwanp, Tejun Heo, Li Zefan,
	cgroups mailinglist, linux-mm

Michal Hocko <mhocko@suse.cz> writes:

> On Thu 19-07-12 19:18:24, Aneesh Kumar K.V wrote:
>> Michal Hocko <mhocko@suse.cz> writes:
>> 
>> > On Thu 19-07-12 17:51:05, Aneesh Kumar K.V wrote:
>> >> Michal Hocko <mhocko@suse.cz> writes:
>> >> 
>> >> > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
>> >> > From: Michal Hocko <mhocko@suse.cz>
>> >> > Date: Thu, 19 Jul 2012 13:23:23 +0200
>> >> > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy
>> >> >
>> >> > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
>> >> > cgroup_mutex lock while calling pre_destroy callbacks because memory
>> >> > controller could deadlock because force_empty triggered reclaim.
>> >> > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
>> >> > no reclaim going on from mem_cgroup_force_empty though so we can safely
>> >> > keep the cgroup_mutex locked. This has an advantage that no tasks might
>> >> > be added during pre_destroy callback and so the handlers don't have to
>> >> > consider races when new tasks add new charges. This simplifies the
>> >> > implementation.
>> >> > ---
>> >> >  kernel/cgroup.c |    2 --
>> >> >  1 file changed, 2 deletions(-)
>> >> >
>> >> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> >> > index 0f3527d..9dba05d 100644
>> >> > --- a/kernel/cgroup.c
>> >> > +++ b/kernel/cgroup.c
>> >> > @@ -4181,7 +4181,6 @@ again:
>> >> >  		mutex_unlock(&cgroup_mutex);
>> >> >  		return -EBUSY;
>> >> >  	}
>> >> > -	mutex_unlock(&cgroup_mutex);
>> >> >
>> >> >  	/*
>> >> >  	 * In general, subsystem has no css->refcnt after pre_destroy(). But
>> >> > @@ -4204,7 +4203,6 @@ again:
>> >> >  		return ret;
>> >> >  	}
>> >> >
>> >> > -	mutex_lock(&cgroup_mutex);
>> >> >  	parent = cgrp->parent;
>> >> >  	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
>> >> >  		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
>> >> 
>> >> mem_cgroup_force_empty still calls 
>> >> 
>> >> lru_add_drain_all 
>> >>    ->schedule_on_each_cpu
>> >>         -> get_online_cpus
>> >>            ->mutex_lock(&cpu_hotplug.lock);
>> >> 
>> >> So wont we deadlock ?
>> >
>> > Yes you are right. I got it wrong. I thought that the reclaim is the
>> > main problem. It won't be that easy then and the origin mm patch
>> > (hugetlb-cgroup-simplify-pre_destroy-callback.patch) still needs a fix
>> > or to be dropped.
>> 
>> We just need to remove the VM_BUG_ON() right ? The rest of the patch is
>> good right ? Otherwise how about the below
>
> You can keep VM_BUG_ON with the patch below and also remove the check
> for cgroup_task_count || &cgroup->children because that is checked in
> cgroup_rmdir already.
>

Does cgroup_rmdir do a cgroup_task_count check ? I do see that it check
cgroup->childern and cgroup->count. But cgroup->count is not same as
task_count right ?

May be we need to push the task_count check also to rmdir so that
pre_destory doesn't need to check this 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-20 19:49             ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-20 19:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm, mhocko, kamezawa.hiroyu, liwanp, lizefan, cgroups, linux-mm

On Thu, Jul 19, 2012 at 07:39:32PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
> cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
> did lru_add_drain_all() which took hotplug lock while already holding
> cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
> added to cgroup while we are in pre_destroy. This makes error handling in
> pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
> Core cgroup will now call pre_destroy with cgroup_mutex held.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

So, umm, let's not do this at this point.  Please just fix memcg such
that it doesn't fail ->pre_destroy() and drop
subsys->__DEPRECATED_clear_css_refs.  cgroup core won't give away new
references during or after pre_destroy that way and memcg is the ONLY
subsystem needing the deprecated behavior so it's rather
counter-productive to implement work-around at this point.

 Nacked-And-Please-Drop-The-DEPRECATED-Behavior-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-20 19:49             ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-20 19:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Thu, Jul 19, 2012 at 07:39:32PM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
> cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
> did lru_add_drain_all() which took hotplug lock while already holding
> cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
> But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
> added to cgroup while we are in pre_destroy. This makes error handling in
> pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
> Core cgroup will now call pre_destroy with cgroup_mutex held.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

So, umm, let's not do this at this point.  Please just fix memcg such
that it doesn't fail ->pre_destroy() and drop
subsys->__DEPRECATED_clear_css_refs.  cgroup core won't give away new
references during or after pre_destroy that way and memcg is the ONLY
subsystem needing the deprecated behavior so it's rather
counter-productive to implement work-around at this point.

 Nacked-And-Please-Drop-The-DEPRECATED-Behavior-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
  2012-07-20 19:18           ` Aneesh Kumar K.V
@ 2012-07-20 19:56             ` Tejun Heo
  2012-07-21  2:14                 ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 40+ messages in thread
From: Tejun Heo @ 2012-07-20 19:56 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Michal Hocko, akpm, mm-commits, kamezawa.hiroyu, liwanp,
	Li Zefan, cgroups mailinglist, linux-mm

On Sat, Jul 21, 2012 at 12:48:34AM +0530, Aneesh Kumar K.V wrote:
> Does cgroup_rmdir do a cgroup_task_count check ? I do see that it check
> cgroup->childern and cgroup->count. But cgroup->count is not same as
> task_count right ?
> 
> May be we need to push the task_count check also to rmdir so that
> pre_destory doesn't need to check this 

task_count implies cgroup refcnt which cgroup core does check.  No
need to worry about that, ->children or whatever from memcg.  As soon
as the deprecated behavior is gone, everything will be okay;
otherwise, it's a bug in cgroup core.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
  2012-07-20 15:45             ` Peter Zijlstra
@ 2012-07-20 20:05                 ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-20 20:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Aneesh Kumar K.V, akpm, mhocko, kamezawa.hiroyu, liwanp, lizefan,
	cgroups, linux-mm, glommer

Hey, Peter.

On Fri, Jul 20, 2012 at 05:45:40PM +0200, Peter Zijlstra wrote:
> > So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
> > does it need?  Why can't it work on "changed" notification while
> > caching the current css like blkcg does?
> 
> I've no clue sorry.. /me goes stare at this stuff.. Looks like something
> Paul Menage did when he created cgroups. I'll have to have a hard look
> at all that to untangle this. Not something obvious to me.

Yeah, it would be great if this can be untangled.  I really don't see
any other reasonable way out of this circular locking mess.  If cpuset
needs stable css association across certain period, the RTTD is
caching the css by holding its ref and synchronize modifications to
that cache, rather than synchronizing cgroup operations themselves.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-20 20:05                 ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-20 20:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Aneesh Kumar K.V, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mhocko-AlSwsSmVLrQ, kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	lizefan-hv44wF8Li93QT0dZR+AlfA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, glommer-bzQdu9zFT3WakBO8gow8eQ

Hey, Peter.

On Fri, Jul 20, 2012 at 05:45:40PM +0200, Peter Zijlstra wrote:
> > So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
> > does it need?  Why can't it work on "changed" notification while
> > caching the current css like blkcg does?
> 
> I've no clue sorry.. /me goes stare at this stuff.. Looks like something
> Paul Menage did when he created cgroups. I'll have to have a hard look
> at all that to untangle this. Not something obvious to me.

Yeah, it would be great if this can be untangled.  I really don't see
any other reasonable way out of this circular locking mess.  If cpuset
needs stable css association across certain period, the RTTD is
caching the css by holding its ref and synchronize modifications to
that cache, rather than synchronizing cgroup operations themselves.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
  2012-07-20 20:05                 ` Tejun Heo
  (?)
@ 2012-07-20 22:07                 ` Glauber Costa
  -1 siblings, 0 replies; 40+ messages in thread
From: Glauber Costa @ 2012-07-20 22:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Aneesh Kumar K.V, akpm, mhocko, kamezawa.hiroyu,
	liwanp, lizefan, cgroups, linux-mm

On 07/20/2012 05:05 PM, Tejun Heo wrote:
> Hey, Peter.
> 
> On Fri, Jul 20, 2012 at 05:45:40PM +0200, Peter Zijlstra wrote:
>>> So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
>>> does it need?  Why can't it work on "changed" notification while
>>> caching the current css like blkcg does?
>>
>> I've no clue sorry.. /me goes stare at this stuff.. Looks like something
>> Paul Menage did when he created cgroups. I'll have to have a hard look
>> at all that to untangle this. Not something obvious to me.
> 
> Yeah, it would be great if this can be untangled.  I really don't see
> any other reasonable way out of this circular locking mess.  If cpuset
> needs stable css association across certain period, the RTTD is
> caching the css by holding its ref and synchronize modifications to
> that cache, rather than synchronizing cgroup operations themselves.
> 
> Thanks.
> 
IIRC, cpuset can insert a task into an existing cgroup itself. Besides
that, it needs go have a stable vision of the cpumask used by all tasks
in the cgroup.

But this is what I remember from the top of my head, and I am still
officially on vacations....

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-21  2:14                 ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 40+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-21  2:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aneesh Kumar K.V, Michal Hocko, akpm, mm-commits, liwanp,
	Li Zefan, cgroups mailinglist, linux-mm

(2012/07/21 4:56), Tejun Heo wrote:
> On Sat, Jul 21, 2012 at 12:48:34AM +0530, Aneesh Kumar K.V wrote:
>> Does cgroup_rmdir do a cgroup_task_count check ? I do see that it check
>> cgroup->childern and cgroup->count. But cgroup->count is not same as
>> task_count right ?
>>
>> May be we need to push the task_count check also to rmdir so that
>> pre_destory doesn't need to check this
>
> task_count implies cgroup refcnt which cgroup core does check.  No
> need to worry about that, ->children or whatever from memcg.  As soon
> as the deprecated behavior is gone, everything will be okay;
> otherwise, it's a bug in cgroup core.
>

I'm sorry I misunderstand. The problem is following.

         CPU A                       CPU B
     mutex_unlock()
                                 mutex_lock()
     ->pre_destroy()             attach task
       commit res->usage=0       mutex_unlock()
                                 increase res->usage
                                 detach task
     mutex_lock()
     check css's refcount=0
      ....continue destroy.

Now, I thinks memcg's check is not enough but putting the -EBUSY there
not to forget this race.


I think a patch to stop task-attach and create child cgroup if  CGRP_WAIT_ON_RMDIR
is set is required. And that's enough..
Thanks,
-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-21  2:14                 ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 40+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-21  2:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aneesh Kumar K.V, Michal Hocko,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Li Zefan,
	cgroups mailinglist, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

(2012/07/21 4:56), Tejun Heo wrote:
> On Sat, Jul 21, 2012 at 12:48:34AM +0530, Aneesh Kumar K.V wrote:
>> Does cgroup_rmdir do a cgroup_task_count check ? I do see that it check
>> cgroup->childern and cgroup->count. But cgroup->count is not same as
>> task_count right ?
>>
>> May be we need to push the task_count check also to rmdir so that
>> pre_destory doesn't need to check this
>
> task_count implies cgroup refcnt which cgroup core does check.  No
> need to worry about that, ->children or whatever from memcg.  As soon
> as the deprecated behavior is gone, everything will be okay;
> otherwise, it's a bug in cgroup core.
>

I'm sorry I misunderstand. The problem is following.

         CPU A                       CPU B
     mutex_unlock()
                                 mutex_lock()
     ->pre_destroy()             attach task
       commit res->usage=0       mutex_unlock()
                                 increase res->usage
                                 detach task
     mutex_lock()
     check css's refcount=0
      ....continue destroy.

Now, I thinks memcg's check is not enough but putting the -EBUSY there
not to forget this race.


I think a patch to stop task-attach and create child cgroup if  CGRP_WAIT_ON_RMDIR
is set is required. And that's enough..
Thanks,
-Kame






^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
  2012-07-21  2:14                 ` Kamezawa Hiroyuki
  (?)
@ 2012-07-21  2:46                 ` Tejun Heo
  2012-07-21  4:05                   ` Kamezawa Hiroyuki
  -1 siblings, 1 reply; 40+ messages in thread
From: Tejun Heo @ 2012-07-21  2:46 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Aneesh Kumar K.V, Michal Hocko, akpm, mm-commits, liwanp,
	Li Zefan, cgroups mailinglist, linux-mm

Hello, Kamezawa-san.

On Sat, Jul 21, 2012 at 11:14:21AM +0900, Kamezawa Hiroyuki wrote:
> I'm sorry I misunderstand. The problem is following.
> 
>         CPU A                       CPU B
>     mutex_unlock()
>                                 mutex_lock()
>     ->pre_destroy()             attach task
>       commit res->usage=0       mutex_unlock()
>                                 increase res->usage
>                                 detach task
>     mutex_lock()
>     check css's refcount=0
>      ....continue destroy.
> 
> Now, I thinks memcg's check is not enough but putting the -EBUSY there
> not to forget this race.
> 
> 
> I think a patch to stop task-attach and create child cgroup if  CGRP_WAIT_ON_RMDIR
> is set is required. And that's enough..

The *ONLY* reason we're not marking the cgroup dead after the checking
whether the cgroup has children or task at the top of cgroup_rmdir()
is because memcg might fail ->pre_destroy() and cancel the cgroup
removal.  We can't commit to removal because memcg might fail.

Now, if memcg drops the deprecated behavior, we can simply commit to
removal *before* starting calling pre_destroy() and it doesn't matter
at all whether we hold cgroup_mutex across pre_destroy or not and
cgroup core will simply deny any addition to the cgroup committed to
death.  (and remove a handsome amount of ugly code in the process)

So, the *ONLY* reason this can't be fixed properly from cgroup core is
because memcg's pre_destory() might fail and it doesn't make much
sense to me to implement add a workaround at this point when the whole
problem will go away once memcg's pre_destroy() is updated.

So, please update memcg and drop the __DEPRECATED flag, so that the
cgroup core can drop at least this particular part of misdesign. :(

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
  2012-07-21  2:46                 ` Tejun Heo
@ 2012-07-21  4:05                   ` Kamezawa Hiroyuki
  2012-07-22 17:34                       ` Tejun Heo
  0 siblings, 1 reply; 40+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-21  4:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aneesh Kumar K.V, Michal Hocko, akpm, mm-commits, liwanp,
	Li Zefan, cgroups mailinglist, linux-mm

(2012/07/21 11:46), Tejun Heo wrote:
> Hello, Kamezawa-san.
>
> On Sat, Jul 21, 2012 at 11:14:21AM +0900, Kamezawa Hiroyuki wrote:
>> I'm sorry I misunderstand. The problem is following.
>>
>>          CPU A                       CPU B
>>      mutex_unlock()
>>                                  mutex_lock()
>>      ->pre_destroy()             attach task
>>        commit res->usage=0       mutex_unlock()
>>                                  increase res->usage
>>                                  detach task
>>      mutex_lock()
>>      check css's refcount=0
>>       ....continue destroy.
>>
>> Now, I thinks memcg's check is not enough but putting the -EBUSY there
>> not to forget this race.
>>
>>
>> I think a patch to stop task-attach and create child cgroup if  CGRP_WAIT_ON_RMDIR
>> is set is required. And that's enough..
>
> The *ONLY* reason we're not marking the cgroup dead after the checking
> whether the cgroup has children or task at the top of cgroup_rmdir()
> is because memcg might fail ->pre_destroy() and cancel the cgroup
> removal.  We can't commit to removal because memcg might fail.
>
> Now, if memcg drops the deprecated behavior, we can simply commit to
> removal *before* starting calling pre_destroy() and it doesn't matter
> at all whether we hold cgroup_mutex across pre_destroy or not and
> cgroup core will simply deny any addition to the cgroup committed to
> death.  (and remove a handsome amount of ugly code in the process)
>
> So, the *ONLY* reason this can't be fixed properly from cgroup core is
> because memcg's pre_destory() might fail and it doesn't make much
> sense to me to implement add a workaround at this point when the whole
> problem will go away once memcg's pre_destroy() is updated.
>
> So, please update memcg and drop the __DEPRECATED flag, so that the
> cgroup core can drop at least this particular part of misdesign. :(
>

Maybe it's better to remove memcg's pre_destroy() at all and do the job
in asynchronus thread called by ->destroy().

I'll cook a patch again.

-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-22 17:34                       ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-22 17:34 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Aneesh Kumar K.V, Michal Hocko, akpm, mm-commits, liwanp,
	Li Zefan, cgroups mailinglist, linux-mm

Hello, Kamezawa-san.

On Sat, Jul 21, 2012 at 01:05:13PM +0900, Kamezawa Hiroyuki wrote:
> Maybe it's better to remove memcg's pre_destroy() at all and do the job
> in asynchronus thread called by ->destroy().

It doesn't really matter as long as ->pre_destroy() doesn't fail.  The
only meaningful difference between them is that pre_destroy() is
called before css refs reach zero (and can be used to drop css refs so
that destruction can happen).

> I'll cook a patch again.

Thanks!

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree
@ 2012-07-22 17:34                       ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-22 17:34 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Aneesh Kumar K.V, Michal Hocko,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Li Zefan,
	cgroups mailinglist, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

Hello, Kamezawa-san.

On Sat, Jul 21, 2012 at 01:05:13PM +0900, Kamezawa Hiroyuki wrote:
> Maybe it's better to remove memcg's pre_destroy() at all and do the job
> in asynchronus thread called by ->destroy().

It doesn't really matter as long as ->pre_destroy() doesn't fail.  The
only meaningful difference between them is that pre_destroy() is
called before css refs reach zero (and can be used to drop css refs so
that destruction can happen).

> I'll cook a patch again.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-27  6:15                   ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2012-07-27  6:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Aneesh Kumar K.V, akpm, mhocko, kamezawa.hiroyu,
	liwanp, cgroups, linux-mm, glommer

On 2012/7/21 4:05, Tejun Heo wrote:

> Hey, Peter.
> 
> On Fri, Jul 20, 2012 at 05:45:40PM +0200, Peter Zijlstra wrote:
>>> So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
>>> does it need?  Why can't it work on "changed" notification while
>>> caching the current css like blkcg does?
>>
>> I've no clue sorry.. /me goes stare at this stuff.. Looks like something
>> Paul Menage did when he created cgroups. I'll have to have a hard look
>> at all that to untangle this. Not something obvious to me.
> 
> Yeah, it would be great if this can be untangled.  I really don't see
> any other reasonable way out of this circular locking mess.  If cpuset
> needs stable css association across certain period, the RTTD is
> caching the css by holding its ref and synchronize modifications to
> that cache, rather than synchronizing cgroup operations themselves.
> 


The cgroup core was extracted from cpuset, so they are deeply tangled.

There are several issues to resolve with regard to removing cgroup lock from cpuset.

- there are places that the cgroup hierarchy is travelled. This should be
easy, as cpuset can be made to maintain its hierarchy.

- cpuset disallows clearing cpuset.mems/cpuset.cpus if the cgroup is not empty,
which can be guaranteed only by cgroup lock.

- cpuset disallows a task be attached to a cgroup with empty cpuset.mems/cpuset.cpus,
which again can be guarantted only by cgroup lock.

- cpuset may move tasks from a cgroup to another cgroup (Glauber mentioned this).

- maybe other cases I overlooked..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-27  6:15                   ` Li Zefan
  0 siblings, 0 replies; 40+ messages in thread
From: Li Zefan @ 2012-07-27  6:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Aneesh Kumar K.V,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	glommer-bzQdu9zFT3WakBO8gow8eQ

On 2012/7/21 4:05, Tejun Heo wrote:

> Hey, Peter.
> 
> On Fri, Jul 20, 2012 at 05:45:40PM +0200, Peter Zijlstra wrote:
>>> So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
>>> does it need?  Why can't it work on "changed" notification while
>>> caching the current css like blkcg does?
>>
>> I've no clue sorry.. /me goes stare at this stuff.. Looks like something
>> Paul Menage did when he created cgroups. I'll have to have a hard look
>> at all that to untangle this. Not something obvious to me.
> 
> Yeah, it would be great if this can be untangled.  I really don't see
> any other reasonable way out of this circular locking mess.  If cpuset
> needs stable css association across certain period, the RTTD is
> caching the css by holding its ref and synchronize modifications to
> that cache, rather than synchronizing cgroup operations themselves.
> 


The cgroup core was extracted from cpuset, so they are deeply tangled.

There are several issues to resolve with regard to removing cgroup lock from cpuset.

- there are places that the cgroup hierarchy is travelled. This should be
easy, as cpuset can be made to maintain its hierarchy.

- cpuset disallows clearing cpuset.mems/cpuset.cpus if the cgroup is not empty,
which can be guaranteed only by cgroup lock.

- cpuset disallows a task be attached to a cgroup with empty cpuset.mems/cpuset.cpus,
which again can be guarantted only by cgroup lock.

- cpuset may move tasks from a cgroup to another cgroup (Glauber mentioned this).

- maybe other cases I overlooked..

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-30 18:25                     ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-30 18:25 UTC (permalink / raw)
  To: Li Zefan
  Cc: Peter Zijlstra, Aneesh Kumar K.V, akpm, mhocko, kamezawa.hiroyu,
	liwanp, cgroups, linux-mm, glommer

Hello,

On Fri, Jul 27, 2012 at 02:15:12PM +0800, Li Zefan wrote:
> The cgroup core was extracted from cpuset, so they are deeply tangled.
> 
> There are several issues to resolve with regard to removing cgroup lock from cpuset.
> 
> - there are places that the cgroup hierarchy is travelled. This should be
> easy, as cpuset can be made to maintain its hierarchy.

Or we can expose limited interface for traversal while protecting
hierarchy itself with smaller lock or make the hierarchy safe to
traverse with RCU read lock.

> - cpuset disallows clearing cpuset.mems/cpuset.cpus if the cgroup is not empty,
> which can be guaranteed only by cgroup lock.
>
> - cpuset disallows a task be attached to a cgroup with empty cpuset.mems/cpuset.cpus,
> which again can be guarantted only by cgroup lock.

Why can't callbacks enforce the above two?  We can keep track of the
number of tasks in the cgroup and reject operations as necessary.
What's missing?

> - cpuset may move tasks from a cgroup to another cgroup (Glauber mentioned this).

No idea about this one.  Why / how does this happen?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir
@ 2012-07-30 18:25                     ` Tejun Heo
  0 siblings, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2012-07-30 18:25 UTC (permalink / raw)
  To: Li Zefan
  Cc: Peter Zijlstra, Aneesh Kumar K.V,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-AlSwsSmVLrQ,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A,
	liwanp-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	glommer-bzQdu9zFT3WakBO8gow8eQ

Hello,

On Fri, Jul 27, 2012 at 02:15:12PM +0800, Li Zefan wrote:
> The cgroup core was extracted from cpuset, so they are deeply tangled.
> 
> There are several issues to resolve with regard to removing cgroup lock from cpuset.
> 
> - there are places that the cgroup hierarchy is travelled. This should be
> easy, as cpuset can be made to maintain its hierarchy.

Or we can expose limited interface for traversal while protecting
hierarchy itself with smaller lock or make the hierarchy safe to
traverse with RCU read lock.

> - cpuset disallows clearing cpuset.mems/cpuset.cpus if the cgroup is not empty,
> which can be guaranteed only by cgroup lock.
>
> - cpuset disallows a task be attached to a cgroup with empty cpuset.mems/cpuset.cpus,
> which again can be guarantted only by cgroup lock.

Why can't callbacks enforce the above two?  We can keep track of the
number of tasks in the cgroup and reject operations as necessary.
What's missing?

> - cpuset may move tasks from a cgroup to another cgroup (Glauber mentioned this).

No idea about this one.  Why / how does this happen?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2012-07-30 18:25 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-18 21:26 + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree akpm
2012-07-19 11:39 ` Michal Hocko
2012-07-19 11:39   ` Michal Hocko
2012-07-19 12:21   ` Aneesh Kumar K.V
2012-07-19 12:38     ` Michal Hocko
2012-07-19 12:38       ` Michal Hocko
2012-07-19 13:48       ` Aneesh Kumar K.V
2012-07-19 13:48         ` Aneesh Kumar K.V
2012-07-19 14:09         ` [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir Aneesh Kumar K.V
2012-07-19 14:09           ` Aneesh Kumar K.V
2012-07-19 16:50           ` Tejun Heo
2012-07-19 16:50             ` Tejun Heo
2012-07-20 15:45             ` Peter Zijlstra
2012-07-20 20:05               ` Tejun Heo
2012-07-20 20:05                 ` Tejun Heo
2012-07-20 22:07                 ` Glauber Costa
2012-07-27  6:15                 ` Li Zefan
2012-07-27  6:15                   ` Li Zefan
2012-07-30 18:25                   ` Tejun Heo
2012-07-30 18:25                     ` Tejun Heo
2012-07-20  7:51           ` Michal Hocko
2012-07-20  7:51             ` Michal Hocko
2012-07-20 19:49           ` Tejun Heo
2012-07-20 19:49             ` Tejun Heo
2012-07-20  1:05         ` + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree Kamezawa Hiroyuki
2012-07-20  1:05           ` Kamezawa Hiroyuki
2012-07-20  1:20           ` Kamezawa Hiroyuki
2012-07-20  1:20             ` Kamezawa Hiroyuki
2012-07-20  8:01             ` Michal Hocko
2012-07-20  8:01               ` Michal Hocko
2012-07-20  8:08               ` Kamezawa Hiroyuki
2012-07-20  8:06         ` Michal Hocko
2012-07-20 19:18           ` Aneesh Kumar K.V
2012-07-20 19:56             ` Tejun Heo
2012-07-21  2:14               ` Kamezawa Hiroyuki
2012-07-21  2:14                 ` Kamezawa Hiroyuki
2012-07-21  2:46                 ` Tejun Heo
2012-07-21  4:05                   ` Kamezawa Hiroyuki
2012-07-22 17:34                     ` Tejun Heo
2012-07-22 17:34                       ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.