* [PATCH 0/4 v2] memcg: cleanup per-cpu charge caches @ 2011-07-22 12:43 Michal Hocko 2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko ` (3 more replies) 0 siblings, 4 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 12:43 UTC (permalink / raw) To: linux-mm; +Cc: Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel Hi, this is a second version of the per-cpu carge draining code cleanup. I have dropped the "fix unnecessary reclaim if there are still cached charges" part because it seems to have some issues and it is not critical at the moment. I think that the cleanup has some sense on its own. Changes since v1: - memcg: do not try to drain per-cpu caches without pages uses drain_cache_local for the current CPU - added memcg: add mem_cgroup_same_or_subtree helper - dropped "memcg: prevent from reclaiming if there are per-cpu cached charges" patch Michal Hocko (4): memcg: do not try to drain per-cpu caches without pages memcg: unify sync and async per-cpu charge cache draining memcg: add mem_cgroup_same_or_subtree helper memcg: get rid of percpu_charge_mutex lock mm/memcontrol.c | 110 +++++++++++++++++++++++++++++++------------------------ 1 files changed, 62 insertions(+), 48 deletions(-) -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-22 12:43 [PATCH 0/4 v2] memcg: cleanup per-cpu charge caches Michal Hocko @ 2011-07-21 7:38 ` Michal Hocko 2011-07-25 1:16 ` KAMEZAWA Hiroyuki 2011-07-22 10:24 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko ` (2 subsequent siblings) 3 siblings, 1 reply; 56+ messages in thread From: Michal Hocko @ 2011-07-21 7:38 UTC (permalink / raw) To: linux-mm; +Cc: Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel drain_all_stock_async tries to optimize a work to be done on the work queue by excluding any work for the current CPU because it assumes that the context we are called from already tried to charge from that cache and it's failed so it must be empty already. While the assumption is correct we can optimize it even more by checking the current number of pages in the cache. This will also reduce a work on other CPUs with an empty stock. For the current CPU we can simply call drain_local_stock rather than deferring it to the work queue. [KAMEZAWA Hiroyuki - use drain_local_stock for current CPU optimization] Signed-off-by: Michal Hocko <mhocko@suse.cz> --- mm/memcontrol.c | 13 +++++++------ 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f11f198..c012ffe 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2159,11 +2159,8 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem) struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; - if (cpu == curcpu) - continue; - mem = stock->cached; - if (!mem) + if (!mem || !stock->nr_pages) continue; if (mem != root_mem) { if (!root_mem->use_hierarchy) @@ -2172,8 +2169,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem) if (!css_is_ancestor(&mem->css, &root_mem->css)) continue; } - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) - schedule_work_on(cpu, &stock->work); + if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { + if (cpu == curcpu) + drain_local_stock(&stock->work); + else + schedule_work_on(cpu, &stock->work); + } } put_online_cpus(); mutex_unlock(&percpu_charge_mutex); -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko @ 2011-07-25 1:16 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-25 1:16 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Thu, 21 Jul 2011 09:38:00 +0200 Michal Hocko <mhocko@suse.cz> wrote: > drain_all_stock_async tries to optimize a work to be done on the work > queue by excluding any work for the current CPU because it assumes that > the context we are called from already tried to charge from that cache > and it's failed so it must be empty already. > While the assumption is correct we can optimize it even more by checking > the current number of pages in the cache. This will also reduce a work > on other CPUs with an empty stock. > For the current CPU we can simply call drain_local_stock rather than > deferring it to the work queue. > > [KAMEZAWA Hiroyuki - use drain_local_stock for current CPU optimization] > Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages @ 2011-07-25 1:16 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-25 1:16 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Thu, 21 Jul 2011 09:38:00 +0200 Michal Hocko <mhocko@suse.cz> wrote: > drain_all_stock_async tries to optimize a work to be done on the work > queue by excluding any work for the current CPU because it assumes that > the context we are called from already tried to charge from that cache > and it's failed so it must be empty already. > While the assumption is correct we can optimize it even more by checking > the current number of pages in the cache. This will also reduce a work > on other CPUs with an empty stock. > For the current CPU we can simply call drain_local_stock rather than > deferring it to the work queue. > > [KAMEZAWA Hiroyuki - use drain_local_stock for current CPU optimization] > Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* [patch] memcg: pin execution to current cpu while draining stock 2011-07-25 1:16 ` KAMEZAWA Hiroyuki @ 2011-08-17 19:49 ` Johannes Weiner -1 siblings, 0 replies; 56+ messages in thread From: Johannes Weiner @ 2011-08-17 19:49 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Michal Hocko, linux-mm, Balbir Singh, Andrew Morton, linux-kernel Commit d1a05b6 'memcg: do not try to drain per-cpu caches without pages' added a drain_local_stock() call to a preemptible section. The draining task looks up the cpu-local stock twice to set the draining-flag, then to drain the stock and clear the flag again. If the task is migrated to a different CPU in between, noone will clear the flag on the first stock and it will be forever undrainable. Its charge can not be recovered and the cgroup can not be deleted anymore. Properly pin the task to the executing CPU while draining stocks. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com Cc: Michal Hocko <mhocko@suse.cz> --- mm/memcontrol.c | 9 ++------- 1 files changed, 2 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 697a1d5..e9b1206 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2085,13 +2085,7 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) /* Notify other cpus that system-wide "drain" is running */ get_online_cpus(); - /* - * Get a hint for avoiding draining charges on the current cpu, - * which must be exhausted by our charging. It is not required that - * this be a precise check, so we use raw_smp_processor_id() instead of - * getcpu()/putcpu(). - */ - curcpu = raw_smp_processor_id(); + curcpu = get_cpu(); for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; @@ -2108,6 +2102,7 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) schedule_work_on(cpu, &stock->work); } } + put_cpu(); if (!sync) goto out; -- 1.7.6 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [patch] memcg: pin execution to current cpu while draining stock @ 2011-08-17 19:49 ` Johannes Weiner 0 siblings, 0 replies; 56+ messages in thread From: Johannes Weiner @ 2011-08-17 19:49 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Michal Hocko, linux-mm, Balbir Singh, Andrew Morton, linux-kernel Commit d1a05b6 'memcg: do not try to drain per-cpu caches without pages' added a drain_local_stock() call to a preemptible section. The draining task looks up the cpu-local stock twice to set the draining-flag, then to drain the stock and clear the flag again. If the task is migrated to a different CPU in between, noone will clear the flag on the first stock and it will be forever undrainable. Its charge can not be recovered and the cgroup can not be deleted anymore. Properly pin the task to the executing CPU while draining stocks. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com Cc: Michal Hocko <mhocko@suse.cz> --- mm/memcontrol.c | 9 ++------- 1 files changed, 2 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 697a1d5..e9b1206 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2085,13 +2085,7 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) /* Notify other cpus that system-wide "drain" is running */ get_online_cpus(); - /* - * Get a hint for avoiding draining charges on the current cpu, - * which must be exhausted by our charging. It is not required that - * this be a precise check, so we use raw_smp_processor_id() instead of - * getcpu()/putcpu(). - */ - curcpu = raw_smp_processor_id(); + curcpu = get_cpu(); for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; @@ -2108,6 +2102,7 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) schedule_work_on(cpu, &stock->work); } } + put_cpu(); if (!sync) goto out; -- 1.7.6 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [patch] memcg: pin execution to current cpu while draining stock 2011-08-17 19:49 ` Johannes Weiner @ 2011-08-18 0:24 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-18 0:24 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Wed, 17 Aug 2011 21:49:27 +0200 Johannes Weiner <jweiner@redhat.com> wrote: > Commit d1a05b6 'memcg: do not try to drain per-cpu caches without > pages' added a drain_local_stock() call to a preemptible section. > > The draining task looks up the cpu-local stock twice to set the > draining-flag, then to drain the stock and clear the flag again. If > the task is migrated to a different CPU in between, noone will clear > the flag on the first stock and it will be forever undrainable. Its > charge can not be recovered and the cgroup can not be deleted anymore. > > Properly pin the task to the executing CPU while draining stocks. > > Signed-off-by: Johannes Weiner <jweiner@redhat.com> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com > Cc: Michal Hocko <mhocko@suse.cz> Thanks. I think Shaoha Li reported this. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com == http://www.spinics.net/lists/linux-mm/msg22635.html I get below warning: BUG: using smp_processor_id() in preemptible [00000000] code: bash/739 caller is drain_local_stock+0x1a/0x55 Pid: 739, comm: bash Tainted: G W 3.0.0+ #255 Call Trace: [<ffffffff813435c6>] debug_smp_processor_id+0xc2/0xdc [<ffffffff8114ae9b>] drain_local_stock+0x1a/0x55 [<ffffffff8114b076>] drain_all_stock+0x98/0x13a [<ffffffff8114f04c>] mem_cgroup_force_empty+0xa3/0x27a [<ffffffff8114ff1d>] ? sys_close+0x38/0x138 [<ffffffff811a7631>] ? environ_read+0x1d/0x159 [<ffffffff8114f253>] mem_cgroup_force_empty_write+0x17/0x19 [<ffffffff810c72fb>] cgroup_file_write+0xa8/0xba [<ffffffff811522ce>] vfs_write+0xb3/0x138 [<ffffffff81152416>] sys_write+0x4a/0x71 [<ffffffff8114ffd5>] ? sys_close+0xf0/0x138 [<ffffffff8176deab>] system_call_fastpath+0x16/0x1b drain_local_stock() should be run with preempt disabled. == Andrew, could you pull this one , too ? http://www.spinics.net/lists/linux-mm/msg22636.html Thanks, -Kame > --- > mm/memcontrol.c | 9 ++------- > 1 files changed, 2 insertions(+), 7 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 697a1d5..e9b1206 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2085,13 +2085,7 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > /* Notify other cpus that system-wide "drain" is running */ > get_online_cpus(); > - /* > - * Get a hint for avoiding draining charges on the current cpu, > - * which must be exhausted by our charging. It is not required that > - * this be a precise check, so we use raw_smp_processor_id() instead of > - * getcpu()/putcpu(). > - */ > - curcpu = raw_smp_processor_id(); > + curcpu = get_cpu(); > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > @@ -2108,6 +2102,7 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > schedule_work_on(cpu, &stock->work); > } > } > + put_cpu(); > > if (!sync) > goto out; > -- > 1.7.6 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [patch] memcg: pin execution to current cpu while draining stock @ 2011-08-18 0:24 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-18 0:24 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Wed, 17 Aug 2011 21:49:27 +0200 Johannes Weiner <jweiner@redhat.com> wrote: > Commit d1a05b6 'memcg: do not try to drain per-cpu caches without > pages' added a drain_local_stock() call to a preemptible section. > > The draining task looks up the cpu-local stock twice to set the > draining-flag, then to drain the stock and clear the flag again. If > the task is migrated to a different CPU in between, noone will clear > the flag on the first stock and it will be forever undrainable. Its > charge can not be recovered and the cgroup can not be deleted anymore. > > Properly pin the task to the executing CPU while draining stocks. > > Signed-off-by: Johannes Weiner <jweiner@redhat.com> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com > Cc: Michal Hocko <mhocko@suse.cz> Thanks. I think Shaoha Li reported this. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com == http://www.spinics.net/lists/linux-mm/msg22635.html I get below warning: BUG: using smp_processor_id() in preemptible [00000000] code: bash/739 caller is drain_local_stock+0x1a/0x55 Pid: 739, comm: bash Tainted: G W 3.0.0+ #255 Call Trace: [<ffffffff813435c6>] debug_smp_processor_id+0xc2/0xdc [<ffffffff8114ae9b>] drain_local_stock+0x1a/0x55 [<ffffffff8114b076>] drain_all_stock+0x98/0x13a [<ffffffff8114f04c>] mem_cgroup_force_empty+0xa3/0x27a [<ffffffff8114ff1d>] ? sys_close+0x38/0x138 [<ffffffff811a7631>] ? environ_read+0x1d/0x159 [<ffffffff8114f253>] mem_cgroup_force_empty_write+0x17/0x19 [<ffffffff810c72fb>] cgroup_file_write+0xa8/0xba [<ffffffff811522ce>] vfs_write+0xb3/0x138 [<ffffffff81152416>] sys_write+0x4a/0x71 [<ffffffff8114ffd5>] ? sys_close+0xf0/0x138 [<ffffffff8176deab>] system_call_fastpath+0x16/0x1b drain_local_stock() should be run with preempt disabled. == Andrew, could you pull this one , too ? http://www.spinics.net/lists/linux-mm/msg22636.html Thanks, -Kame > --- > mm/memcontrol.c | 9 ++------- > 1 files changed, 2 insertions(+), 7 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 697a1d5..e9b1206 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2085,13 +2085,7 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > /* Notify other cpus that system-wide "drain" is running */ > get_online_cpus(); > - /* > - * Get a hint for avoiding draining charges on the current cpu, > - * which must be exhausted by our charging. It is not required that > - * this be a precise check, so we use raw_smp_processor_id() instead of > - * getcpu()/putcpu(). > - */ > - curcpu = raw_smp_processor_id(); > + curcpu = get_cpu(); > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > @@ -2108,6 +2102,7 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > schedule_work_on(cpu, &stock->work); > } > } > + put_cpu(); > > if (!sync) > goto out; > -- > 1.7.6 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [patch] memcg: pin execution to current cpu while draining stock 2011-08-17 19:49 ` Johannes Weiner @ 2011-08-18 5:56 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-18 5:56 UTC (permalink / raw) To: Johannes Weiner Cc: KAMEZAWA Hiroyuki, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Wed 17-08-11 21:49:27, Johannes Weiner wrote: > Commit d1a05b6 'memcg: do not try to drain per-cpu caches without > pages' added a drain_local_stock() call to a preemptible section. > > The draining task looks up the cpu-local stock twice to set the > draining-flag, then to drain the stock and clear the flag again. If > the task is migrated to a different CPU in between, noone will clear > the flag on the first stock and it will be forever undrainable. Its > charge can not be recovered and the cgroup can not be deleted anymore. > > Properly pin the task to the executing CPU while draining stocks. > > Signed-off-by: Johannes Weiner <jweiner@redhat.com> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com > Cc: Michal Hocko <mhocko@suse.cz> My fault, I didn't realize that drain_local_stock needs preemption disabled. Sorry about that and good work, Johannes. Acked-by: Michal Hocko <mhocko@suse.cz> -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [patch] memcg: pin execution to current cpu while draining stock @ 2011-08-18 5:56 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-18 5:56 UTC (permalink / raw) To: Johannes Weiner Cc: KAMEZAWA Hiroyuki, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Wed 17-08-11 21:49:27, Johannes Weiner wrote: > Commit d1a05b6 'memcg: do not try to drain per-cpu caches without > pages' added a drain_local_stock() call to a preemptible section. > > The draining task looks up the cpu-local stock twice to set the > draining-flag, then to drain the stock and clear the flag again. If > the task is migrated to a different CPU in between, noone will clear > the flag on the first stock and it will be forever undrainable. Its > charge can not be recovered and the cgroup can not be deleted anymore. > > Properly pin the task to the executing CPU while draining stocks. > > Signed-off-by: Johannes Weiner <jweiner@redhat.com> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com > Cc: Michal Hocko <mhocko@suse.cz> My fault, I didn't realize that drain_local_stock needs preemption disabled. Sorry about that and good work, Johannes. Acked-by: Michal Hocko <mhocko@suse.cz> -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining 2011-07-22 12:43 [PATCH 0/4 v2] memcg: cleanup per-cpu charge caches Michal Hocko 2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko @ 2011-07-22 10:24 ` Michal Hocko 2011-07-22 10:27 ` [PATCH 3/4] memcg: add mem_cgroup_same_or_subtree helper Michal Hocko 2011-07-22 11:20 ` [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko 3 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 10:24 UTC (permalink / raw) To: linux-mm; +Cc: Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel Currently we have two ways how to drain per-CPU caches for charges. drain_all_stock_sync will synchronously drain all caches while drain_all_stock_async will asynchronously drain only those that refer to a given memory cgroup or its subtree in hierarchy. Targeted async draining has been introduced by 26fe6168 (memcg: fix percpu cached charge draining frequency) to reduce the cpu workers number. sync draining is currently triggered only from mem_cgroup_force_empty which is triggered only by userspace (mem_cgroup_force_empty_write) or when a cgroup is removed (mem_cgroup_pre_destroy). Although these are not usually frequent operations it still makes some sense to do targeted draining as well, especially if the box has many CPUs. This patch unifies both methods to use the single code (drain_all_stock) which relies on the original async implementation and just adds flush_work to wait on all caches that are still under work for the sync mode. We are using FLUSHING_CACHED_CHARGE bit check to prevent from waiting on a work that we haven't triggered. Please note that both sync and async functions are currently protected by percpu_charge_mutex so we cannot race with other drainers. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 48 ++++++++++++++++++++++++++++++++++-------------- 1 files changed, 34 insertions(+), 14 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c012ffe..8852bed 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2133,19 +2133,14 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) } /* - * Tries to drain stocked charges in other cpus. This function is asynchronous - * and just put a work per cpu for draining localy on each cpu. Caller can - * expects some charges will be back to res_counter later but cannot wait for - * it. + * Drains all per-CPU charge caches for given root_mem resp. subtree + * of the hierarchy under it. sync flag says whether we should block + * until the work is done. */ -static void drain_all_stock_async(struct mem_cgroup *root_mem) +static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) { int cpu, curcpu; - /* - * If someone calls draining, avoid adding more kworker runs. - */ - if (!mutex_trylock(&percpu_charge_mutex)) - return; + /* Notify other cpus that system-wide "drain" is running */ get_online_cpus(); /* @@ -2176,17 +2171,42 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem) schedule_work_on(cpu, &stock->work); } } + + if (!sync) + goto out; + + for_each_online_cpu(cpu) { + struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); + if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) + flush_work(&stock->work); + } +out: put_online_cpus(); +} + +/* + * Tries to drain stocked charges in other cpus. This function is asynchronous + * and just put a work per cpu for draining localy on each cpu. Caller can + * expects some charges will be back to res_counter later but cannot wait for + * it. + */ +static void drain_all_stock_async(struct mem_cgroup *root_mem) +{ + /* + * If someone calls draining, avoid adding more kworker runs. + */ + if (!mutex_trylock(&percpu_charge_mutex)) + return; + drain_all_stock(root_mem, false); mutex_unlock(&percpu_charge_mutex); - /* We don't wait for flush_work */ } /* This is a synchronous drain interface. */ -static void drain_all_stock_sync(void) +static void drain_all_stock_sync(struct mem_cgroup *root_mem) { /* called when force_empty is called */ mutex_lock(&percpu_charge_mutex); - schedule_on_each_cpu(drain_local_stock); + drain_all_stock(root_mem, true); mutex_unlock(&percpu_charge_mutex); } @@ -3835,7 +3855,7 @@ move_account: goto out; /* This is for making all *used* pages to be on LRU. */ lru_add_drain_all(); - drain_all_stock_sync(); + drain_all_stock_sync(mem); ret = 0; mem_cgroup_start_move(mem); for_each_node_state(node, N_HIGH_MEMORY) { -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 3/4] memcg: add mem_cgroup_same_or_subtree helper 2011-07-22 12:43 [PATCH 0/4 v2] memcg: cleanup per-cpu charge caches Michal Hocko 2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko 2011-07-22 10:24 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko @ 2011-07-22 10:27 ` Michal Hocko 2011-07-22 11:20 ` [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko 3 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 10:27 UTC (permalink / raw) To: linux-mm; +Cc: Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel We are checking whether a given two groups are same or at least in the same subtree of a hierarchy at several places. Let's make a helper for it to make code easier to read. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memcontrol.c | 51 ++++++++++++++++++++++++++------------------------- 1 files changed, 26 insertions(+), 25 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8852bed..3685107 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1046,6 +1046,21 @@ void mem_cgroup_move_lists(struct page *page, mem_cgroup_add_lru_list(page, to); } +/* + * Checks whether given mem is same or in the root_mem's + * hierarchy subtree + */ +static bool mem_cgroup_same_or_subtree(const struct mem_cgroup *root_mem, + struct mem_cgroup *mem) +{ + if (root_mem != mem) { + return (root_mem->use_hierarchy && + css_is_ancestor(&mem->css, &root_mem->css)); + } + + return true; +} + int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem) { int ret; @@ -1065,10 +1080,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem) * enabled in "curr" and "curr" is a child of "mem" in *cgroup* * hierarchy(even if use_hierarchy is disabled in "mem"). */ - if (mem->use_hierarchy) - ret = css_is_ancestor(&curr->css, &mem->css); - else - ret = (curr == mem); + ret = mem_cgroup_same_or_subtree(mem, curr); css_put(&curr->css); return ret; } @@ -1404,10 +1416,9 @@ static bool mem_cgroup_under_move(struct mem_cgroup *mem) to = mc.to; if (!from) goto unlock; - if (from == mem || to == mem - || (mem->use_hierarchy && css_is_ancestor(&from->css, &mem->css)) - || (mem->use_hierarchy && css_is_ancestor(&to->css, &mem->css))) - ret = true; + + ret = mem_cgroup_same_or_subtree(mem, from) + || mem_cgroup_same_or_subtree(mem, to); unlock: spin_unlock(&mc.lock); return ret; @@ -1894,25 +1905,20 @@ struct oom_wait_info { static int memcg_oom_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *arg) { - struct mem_cgroup *wake_mem = (struct mem_cgroup *)arg; + struct mem_cgroup *wake_mem = (struct mem_cgroup *)arg, + *oom_wait_mem; struct oom_wait_info *oom_wait_info; oom_wait_info = container_of(wait, struct oom_wait_info, wait); + oom_wait_mem = oom_wait_info->mem; - if (oom_wait_info->mem == wake_mem) - goto wakeup; - /* if no hierarchy, no match */ - if (!oom_wait_info->mem->use_hierarchy || !wake_mem->use_hierarchy) - return 0; /* * Both of oom_wait_info->mem and wake_mem are stable under us. * Then we can use css_is_ancestor without taking care of RCU. */ - if (!css_is_ancestor(&oom_wait_info->mem->css, &wake_mem->css) && - !css_is_ancestor(&wake_mem->css, &oom_wait_info->mem->css)) + if (!mem_cgroup_same_or_subtree(oom_wait_mem, wake_mem) + && !mem_cgroup_same_or_subtree(wake_mem, oom_wait_mem)) return 0; - -wakeup: return autoremove_wake_function(wait, mode, sync, arg); } @@ -2157,13 +2163,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) mem = stock->cached; if (!mem || !stock->nr_pages) continue; - if (mem != root_mem) { - if (!root_mem->use_hierarchy) - continue; - /* check whether "mem" is under tree of "root_mem" */ - if (!css_is_ancestor(&mem->css, &root_mem->css)) - continue; - } + if (!mem_cgroup_same_or_subtree(root_mem, mem)) + continue; if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { if (cpu == curcpu) drain_local_stock(&stock->work); -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock 2011-07-22 12:43 [PATCH 0/4 v2] memcg: cleanup per-cpu charge caches Michal Hocko ` (2 preceding siblings ...) 2011-07-22 10:27 ` [PATCH 3/4] memcg: add mem_cgroup_same_or_subtree helper Michal Hocko @ 2011-07-22 11:20 ` Michal Hocko 2011-07-25 1:18 ` KAMEZAWA Hiroyuki 2011-08-08 18:47 ` Johannes Weiner 3 siblings, 2 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 11:20 UTC (permalink / raw) To: linux-mm; +Cc: Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel percpu_charge_mutex protects from multiple simultaneous per-cpu charge caches draining because we might end up having too many work items. At least this was the case until 26fe6168 (memcg: fix percpu cached charge draining frequency) when we introduced a more targeted draining for async mode. Now that also sync draining is targeted we can safely remove mutex because we will not send more work than the current number of CPUs. FLUSHING_CACHED_CHARGE protects from sending the same work multiple times and stock->nr_pages == 0 protects from pointless sending a work if there is obviously nothing to be done. This is of course racy but we can live with it as the race window is really small (we would have to see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still non-zero). The only remaining place where we can race is synchronous mode when we rely on FLUSHING_CACHED_CHARGE test which might have been set by other drainer on the same group but we should wait in that case as well. Signed-off-by: Michal Hocko <mhocko@suse.cz> --- mm/memcontrol.c | 12 ++---------- 1 files changed, 2 insertions(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3685107..f8463a0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { #define FLUSHING_CACHED_CHARGE (0) }; static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); -static DEFINE_MUTEX(percpu_charge_mutex); /* * Try to consume stocked charge on this cpu. If success, one page is consumed @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) flush_work(&stock->work); } out: @@ -2193,22 +2193,14 @@ out: */ static void drain_all_stock_async(struct mem_cgroup *root_mem) { - /* - * If someone calls draining, avoid adding more kworker runs. - */ - if (!mutex_trylock(&percpu_charge_mutex)) - return; drain_all_stock(root_mem, false); - mutex_unlock(&percpu_charge_mutex); } /* This is a synchronous drain interface. */ static void drain_all_stock_sync(struct mem_cgroup *root_mem) { /* called when force_empty is called */ - mutex_lock(&percpu_charge_mutex); drain_all_stock(root_mem, true); - mutex_unlock(&percpu_charge_mutex); } /* -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock 2011-07-22 11:20 ` [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko @ 2011-07-25 1:18 ` KAMEZAWA Hiroyuki 2011-08-08 18:47 ` Johannes Weiner 1 sibling, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-25 1:18 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Fri, 22 Jul 2011 13:20:25 +0200 Michal Hocko <mhocko@suse.cz> wrote: > percpu_charge_mutex protects from multiple simultaneous per-cpu charge > caches draining because we might end up having too many work items. > At least this was the case until 26fe6168 (memcg: fix percpu cached > charge draining frequency) when we introduced a more targeted draining > for async mode. > Now that also sync draining is targeted we can safely remove mutex > because we will not send more work than the current number of CPUs. > FLUSHING_CACHED_CHARGE protects from sending the same work multiple > times and stock->nr_pages == 0 protects from pointless sending a work > if there is obviously nothing to be done. This is of course racy but we > can live with it as the race window is really small (we would have to > see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still > non-zero). > The only remaining place where we can race is synchronous mode when we > rely on FLUSHING_CACHED_CHARGE test which might have been set by other > drainer on the same group but we should wait in that case as well. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock @ 2011-07-25 1:18 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-25 1:18 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Fri, 22 Jul 2011 13:20:25 +0200 Michal Hocko <mhocko@suse.cz> wrote: > percpu_charge_mutex protects from multiple simultaneous per-cpu charge > caches draining because we might end up having too many work items. > At least this was the case until 26fe6168 (memcg: fix percpu cached > charge draining frequency) when we introduced a more targeted draining > for async mode. > Now that also sync draining is targeted we can safely remove mutex > because we will not send more work than the current number of CPUs. > FLUSHING_CACHED_CHARGE protects from sending the same work multiple > times and stock->nr_pages == 0 protects from pointless sending a work > if there is obviously nothing to be done. This is of course racy but we > can live with it as the race window is really small (we would have to > see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still > non-zero). > The only remaining place where we can race is synchronous mode when we > rely on FLUSHING_CACHED_CHARGE test which might have been set by other > drainer on the same group but we should wait in that case as well. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock 2011-07-22 11:20 ` [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko @ 2011-08-08 18:47 ` Johannes Weiner 2011-08-08 18:47 ` Johannes Weiner 1 sibling, 0 replies; 56+ messages in thread From: Johannes Weiner @ 2011-08-08 18:47 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Fri, Jul 22, 2011 at 01:20:25PM +0200, Michal Hocko wrote: > percpu_charge_mutex protects from multiple simultaneous per-cpu charge > caches draining because we might end up having too many work items. > At least this was the case until 26fe6168 (memcg: fix percpu cached > charge draining frequency) when we introduced a more targeted draining > for async mode. > Now that also sync draining is targeted we can safely remove mutex > because we will not send more work than the current number of CPUs. > FLUSHING_CACHED_CHARGE protects from sending the same work multiple > times and stock->nr_pages == 0 protects from pointless sending a work > if there is obviously nothing to be done. This is of course racy but we > can live with it as the race window is really small (we would have to > see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still > non-zero). > The only remaining place where we can race is synchronous mode when we > rely on FLUSHING_CACHED_CHARGE test which might have been set by other > drainer on the same group but we should wait in that case as well. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > --- > mm/memcontrol.c | 12 ++---------- > 1 files changed, 2 insertions(+), 10 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 3685107..f8463a0 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { > #define FLUSHING_CACHED_CHARGE (0) > }; > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > -static DEFINE_MUTEX(percpu_charge_mutex); > > /* > * Try to consume stocked charge on this cpu. If success, one page is consumed > @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > flush_work(&stock->work); > } > out: This hunk triggers a crash for me, as the draining is already done and stock->cached reset to NULL when dereferenced here. Oops is attached. We have this loop in drain_all_stock(): for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; mem = stock->cached; if (!mem || !stock->nr_pages) continue; if (!mem_cgroup_same_or_subtree(root_mem, mem)) continue; if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { if (cpu == curcpu) drain_local_stock(&stock->work); else schedule_work_on(cpu, &stock->work); } } The only thing that stabilizes stock->cached is the knowledge that there are still pages accounted to the memcg. Without the mutex serializing this code, can't there be a concurrent execution that leads to stock->cached being drained, becoming empty and freed by someone else between the stock->nr_pages check and the ancestor check, resulting in use after free? What makes stock->cached safe to dereference? [ 2313.442944] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 2313.443935] IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 [ 2313.443935] PGD 4ae7a067 PUD 4adc4067 PMD 0 [ 2313.443935] Oops: 0000 [#1] PREEMPT SMP [ 2313.443935] CPU 0 [ 2313.443935] Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 [ 2313.443935] RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 [ 2313.443935] RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 [ 2313.443935] RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e [ 2313.443935] RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 [ 2313.443935] RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 [ 2313.443935] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 [ 2313.443935] R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 [ 2313.443935] FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 [ 2313.443935] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2313.443935] CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 [ 2313.443935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2313.443935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2313.443935] Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) [ 2313.443935] Stack: [ 2313.443935] ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 [ 2313.443935] ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 [ 2313.443935] ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 [ 2313.443935] Call Trace: [ 2313.443935] [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 [ 2313.443935] [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 [ 2313.443935] [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 [ 2313.443935] [<ffffffff81111872>] ? path_put+0x22/0x30 [ 2313.443935] [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 [ 2313.443935] [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 [ 2313.443935] [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 [ 2313.443935] [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 [ 2313.443935] [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 [ 2313.443935] [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 [ 2313.443935] [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 [ 2313.443935] [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 [ 2313.443935] [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b [ 2313.443935] Code: b7 42 0a 5d c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb 49 89 f4 e8 10 85 00 00 [ 2313.443935] 8b 43 18 49 8b 54 24 18 48 85 d2 74 05 48 85 c0 75 15 31 db ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock @ 2011-08-08 18:47 ` Johannes Weiner 0 siblings, 0 replies; 56+ messages in thread From: Johannes Weiner @ 2011-08-08 18:47 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Fri, Jul 22, 2011 at 01:20:25PM +0200, Michal Hocko wrote: > percpu_charge_mutex protects from multiple simultaneous per-cpu charge > caches draining because we might end up having too many work items. > At least this was the case until 26fe6168 (memcg: fix percpu cached > charge draining frequency) when we introduced a more targeted draining > for async mode. > Now that also sync draining is targeted we can safely remove mutex > because we will not send more work than the current number of CPUs. > FLUSHING_CACHED_CHARGE protects from sending the same work multiple > times and stock->nr_pages == 0 protects from pointless sending a work > if there is obviously nothing to be done. This is of course racy but we > can live with it as the race window is really small (we would have to > see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still > non-zero). > The only remaining place where we can race is synchronous mode when we > rely on FLUSHING_CACHED_CHARGE test which might have been set by other > drainer on the same group but we should wait in that case as well. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > --- > mm/memcontrol.c | 12 ++---------- > 1 files changed, 2 insertions(+), 10 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 3685107..f8463a0 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { > #define FLUSHING_CACHED_CHARGE (0) > }; > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > -static DEFINE_MUTEX(percpu_charge_mutex); > > /* > * Try to consume stocked charge on this cpu. If success, one page is consumed > @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > flush_work(&stock->work); > } > out: This hunk triggers a crash for me, as the draining is already done and stock->cached reset to NULL when dereferenced here. Oops is attached. We have this loop in drain_all_stock(): for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; mem = stock->cached; if (!mem || !stock->nr_pages) continue; if (!mem_cgroup_same_or_subtree(root_mem, mem)) continue; if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { if (cpu == curcpu) drain_local_stock(&stock->work); else schedule_work_on(cpu, &stock->work); } } The only thing that stabilizes stock->cached is the knowledge that there are still pages accounted to the memcg. Without the mutex serializing this code, can't there be a concurrent execution that leads to stock->cached being drained, becoming empty and freed by someone else between the stock->nr_pages check and the ancestor check, resulting in use after free? What makes stock->cached safe to dereference? [ 2313.442944] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 2313.443935] IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 [ 2313.443935] PGD 4ae7a067 PUD 4adc4067 PMD 0 [ 2313.443935] Oops: 0000 [#1] PREEMPT SMP [ 2313.443935] CPU 0 [ 2313.443935] Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 [ 2313.443935] RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 [ 2313.443935] RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 [ 2313.443935] RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e [ 2313.443935] RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 [ 2313.443935] RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 [ 2313.443935] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 [ 2313.443935] R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 [ 2313.443935] FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 [ 2313.443935] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2313.443935] CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 [ 2313.443935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2313.443935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2313.443935] Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) [ 2313.443935] Stack: [ 2313.443935] ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 [ 2313.443935] ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 [ 2313.443935] ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 [ 2313.443935] Call Trace: [ 2313.443935] [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 [ 2313.443935] [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 [ 2313.443935] [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 [ 2313.443935] [<ffffffff81111872>] ? path_put+0x22/0x30 [ 2313.443935] [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 [ 2313.443935] [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 [ 2313.443935] [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 [ 2313.443935] [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 [ 2313.443935] [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 [ 2313.443935] [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 [ 2313.443935] [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 [ 2313.443935] [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 [ 2313.443935] [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b [ 2313.443935] Code: b7 42 0a 5d c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb 49 89 f4 e8 10 85 00 00 [ 2313.443935] 8b 43 18 49 8b 54 24 18 48 85 d2 74 05 48 85 c0 75 15 31 db -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock 2011-08-08 18:47 ` Johannes Weiner @ 2011-08-08 21:47 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-08 21:47 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Mon 08-08-11 20:47:38, Johannes Weiner wrote: [...] > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { > > #define FLUSHING_CACHED_CHARGE (0) > > }; > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > -static DEFINE_MUTEX(percpu_charge_mutex); > > > > /* > > * Try to consume stocked charge on this cpu. If success, one page is consumed > > @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > flush_work(&stock->work); > > } > > out: > > This hunk triggers a crash for me, as the draining is already done and > stock->cached reset to NULL when dereferenced here. Oops is attached. Thanks for catching this. We are racing synchronous drain from force_empty and async drain from reclaim, I guess. Sync. checked whether it should wait for the work and the cache got drained and set to NULL. First of all we must not dereference the cached mem without FLUSHING_CACHED_CHARGE bit test. We have to be sure that there is some draining on that cache. stock->cached is set to NULL before we clear the bit (I guess we need to add a barrier into drain_local_stock). So we should see mem either as NULL or still valid (I have to think some more about "still valid" part - maybe we will need rcu_read_lock). diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f4ec4e7..626c916 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2197,8 +2197,10 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) + struct mem_cgroup *mem = stock->cached; + if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags) && + mem && mem_cgroup_same_or_subtree(root_mem, mem) + ) flush_work(&stock->work); } out: > > We have this loop in drain_all_stock(): > > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > > mem = stock->cached; > if (!mem || !stock->nr_pages) > continue; > if (!mem_cgroup_same_or_subtree(root_mem, mem)) > continue; > if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > if (cpu == curcpu) > drain_local_stock(&stock->work); > else > schedule_work_on(cpu, &stock->work); > } > } > > The only thing that stabilizes stock->cached is the knowledge that > there are still pages accounted to the memcg. Yes you are right we have to set FLUSHING_CACHED_CHARGE before nr_pages check (and do the appropriate cleanup on the continue paths). This looks quite ugly, though. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f4ec4e7..eca46141 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2179,17 +2179,23 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; + /* Try to lock the cache */ + if(test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) + continue; + mem = stock->cached; if (!mem || !stock->nr_pages) - continue; + goto unlock_cache; if (!mem_cgroup_same_or_subtree(root_mem, mem)) - continue; - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { - if (cpu == curcpu) - drain_local_stock(&stock->work); - else - schedule_work_on(cpu, &stock->work); - } + goto unlock_cache; + + if (cpu == curcpu) + drain_local_stock(&stock->work); + else + schedule_work_on(cpu, &stock->work); + continue; +unlock_cache: + clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); ^^^^^ need a barrier? } if (!sync) > Without the mutex serializing this code, can't there be a concurrent > execution that leads to stock->cached being drained, becoming empty > and freed by someone else between the stock->nr_pages check and the > ancestor check, resulting in use after free? > > What makes stock->cached safe to dereference? We are using FLUSHING_CACHED_CHARGE as a lock for local draining. I guess it should be sufficient. mutex which was used previously caused that async draining was exclusive so a root_mem that has potentially many relevant caches has to back off because other mem wants to clear the cache on the same CPU. I will think about this tomorrow (with fresh eyes). I think we should be able to be without mutex. Anyway thanks for the really good report! > > [ 2313.442944] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > [ 2313.443935] IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > [ 2313.443935] PGD 4ae7a067 PUD 4adc4067 PMD 0 > [ 2313.443935] Oops: 0000 [#1] PREEMPT SMP > [ 2313.443935] CPU 0 > [ 2313.443935] Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 > [ 2313.443935] RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > [ 2313.443935] RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 > [ 2313.443935] RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e > [ 2313.443935] RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 > [ 2313.443935] RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 > [ 2313.443935] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 > [ 2313.443935] R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 > [ 2313.443935] FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 > [ 2313.443935] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 2313.443935] CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 > [ 2313.443935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 2313.443935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 2313.443935] Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) > [ 2313.443935] Stack: > [ 2313.443935] ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 > [ 2313.443935] ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 > [ 2313.443935] ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 > [ 2313.443935] Call Trace: > [ 2313.443935] [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 > [ 2313.443935] [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 > [ 2313.443935] [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 > [ 2313.443935] [<ffffffff81111872>] ? path_put+0x22/0x30 > [ 2313.443935] [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 > [ 2313.443935] [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 > [ 2313.443935] [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 > [ 2313.443935] [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 > [ 2313.443935] [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 > [ 2313.443935] [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 > [ 2313.443935] [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 > [ 2313.443935] [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 > [ 2313.443935] [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b > [ 2313.443935] Code: b7 42 0a 5d c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb 49 89 f4 e8 10 85 00 00 > [ 2313.443935] 8b 43 18 49 8b 54 24 18 48 85 d2 74 05 48 85 c0 75 15 31 db > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock @ 2011-08-08 21:47 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-08 21:47 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Mon 08-08-11 20:47:38, Johannes Weiner wrote: [...] > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { > > #define FLUSHING_CACHED_CHARGE (0) > > }; > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > -static DEFINE_MUTEX(percpu_charge_mutex); > > > > /* > > * Try to consume stocked charge on this cpu. If success, one page is consumed > > @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > flush_work(&stock->work); > > } > > out: > > This hunk triggers a crash for me, as the draining is already done and > stock->cached reset to NULL when dereferenced here. Oops is attached. Thanks for catching this. We are racing synchronous drain from force_empty and async drain from reclaim, I guess. Sync. checked whether it should wait for the work and the cache got drained and set to NULL. First of all we must not dereference the cached mem without FLUSHING_CACHED_CHARGE bit test. We have to be sure that there is some draining on that cache. stock->cached is set to NULL before we clear the bit (I guess we need to add a barrier into drain_local_stock). So we should see mem either as NULL or still valid (I have to think some more about "still valid" part - maybe we will need rcu_read_lock). diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f4ec4e7..626c916 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2197,8 +2197,10 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) + struct mem_cgroup *mem = stock->cached; + if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags) && + mem && mem_cgroup_same_or_subtree(root_mem, mem) + ) flush_work(&stock->work); } out: > > We have this loop in drain_all_stock(): > > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > > mem = stock->cached; > if (!mem || !stock->nr_pages) > continue; > if (!mem_cgroup_same_or_subtree(root_mem, mem)) > continue; > if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > if (cpu == curcpu) > drain_local_stock(&stock->work); > else > schedule_work_on(cpu, &stock->work); > } > } > > The only thing that stabilizes stock->cached is the knowledge that > there are still pages accounted to the memcg. Yes you are right we have to set FLUSHING_CACHED_CHARGE before nr_pages check (and do the appropriate cleanup on the continue paths). This looks quite ugly, though. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f4ec4e7..eca46141 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2179,17 +2179,23 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; + /* Try to lock the cache */ + if(test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) + continue; + mem = stock->cached; if (!mem || !stock->nr_pages) - continue; + goto unlock_cache; if (!mem_cgroup_same_or_subtree(root_mem, mem)) - continue; - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { - if (cpu == curcpu) - drain_local_stock(&stock->work); - else - schedule_work_on(cpu, &stock->work); - } + goto unlock_cache; + + if (cpu == curcpu) + drain_local_stock(&stock->work); + else + schedule_work_on(cpu, &stock->work); + continue; +unlock_cache: + clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); ^^^^^ need a barrier? } if (!sync) > Without the mutex serializing this code, can't there be a concurrent > execution that leads to stock->cached being drained, becoming empty > and freed by someone else between the stock->nr_pages check and the > ancestor check, resulting in use after free? > > What makes stock->cached safe to dereference? We are using FLUSHING_CACHED_CHARGE as a lock for local draining. I guess it should be sufficient. mutex which was used previously caused that async draining was exclusive so a root_mem that has potentially many relevant caches has to back off because other mem wants to clear the cache on the same CPU. I will think about this tomorrow (with fresh eyes). I think we should be able to be without mutex. Anyway thanks for the really good report! > > [ 2313.442944] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > [ 2313.443935] IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > [ 2313.443935] PGD 4ae7a067 PUD 4adc4067 PMD 0 > [ 2313.443935] Oops: 0000 [#1] PREEMPT SMP > [ 2313.443935] CPU 0 > [ 2313.443935] Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 > [ 2313.443935] RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > [ 2313.443935] RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 > [ 2313.443935] RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e > [ 2313.443935] RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 > [ 2313.443935] RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 > [ 2313.443935] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 > [ 2313.443935] R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 > [ 2313.443935] FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 > [ 2313.443935] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 2313.443935] CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 > [ 2313.443935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 2313.443935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 2313.443935] Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) > [ 2313.443935] Stack: > [ 2313.443935] ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 > [ 2313.443935] ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 > [ 2313.443935] ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 > [ 2313.443935] Call Trace: > [ 2313.443935] [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 > [ 2313.443935] [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 > [ 2313.443935] [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 > [ 2313.443935] [<ffffffff81111872>] ? path_put+0x22/0x30 > [ 2313.443935] [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 > [ 2313.443935] [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 > [ 2313.443935] [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 > [ 2313.443935] [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 > [ 2313.443935] [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 > [ 2313.443935] [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 > [ 2313.443935] [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 > [ 2313.443935] [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 > [ 2313.443935] [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b > [ 2313.443935] Code: b7 42 0a 5d c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb 49 89 f4 e8 10 85 00 00 > [ 2313.443935] 8b 43 18 49 8b 54 24 18 48 85 d2 74 05 48 85 c0 75 15 31 db > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock 2011-08-08 21:47 ` Michal Hocko @ 2011-08-08 23:19 ` Johannes Weiner -1 siblings, 0 replies; 56+ messages in thread From: Johannes Weiner @ 2011-08-08 23:19 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Mon, Aug 08, 2011 at 11:47:04PM +0200, Michal Hocko wrote: > On Mon 08-08-11 20:47:38, Johannes Weiner wrote: > [...] > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { > > > #define FLUSHING_CACHED_CHARGE (0) > > > }; > > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > > -static DEFINE_MUTEX(percpu_charge_mutex); > > > > > > /* > > > * Try to consume stocked charge on this cpu. If success, one page is consumed > > > @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > > > for_each_online_cpu(cpu) { > > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > > - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > > + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > > + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > > flush_work(&stock->work); > > > } > > > out: > > > > This hunk triggers a crash for me, as the draining is already done and > > stock->cached reset to NULL when dereferenced here. Oops is attached. > > Thanks for catching this. We are racing synchronous drain from > force_empty and async drain from reclaim, I guess. It's at the end of a benchmark where several tasks delete the cgroups. There is no reclaim going on anymore, it must be several sync drains from force_empty of different memcgs. > Sync. checked whether it should wait for the work and the cache got > drained and set to NULL. First of all we must not dereference the > cached mem without FLUSHING_CACHED_CHARGE bit test. We have to be > sure that there is some draining on that cache. stock->cached is set > to NULL before we clear the bit (I guess we need to add a barrier > into drain_local_stock). So we should see mem either as NULL or > still valid (I have to think some more about "still valid" part - > maybe we will need rcu_read_lock). > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f4ec4e7..626c916 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2197,8 +2197,10 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > + struct mem_cgroup *mem = stock->cached; > + if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags) && > + mem && mem_cgroup_same_or_subtree(root_mem, mem) > + ) > flush_work(&stock->work); > } > out: This ordering makes sure that mem is a sensible pointer, but it still does not pin the object, *mem, which could go away the femtosecond after the test_bit succeeds. > > We have this loop in drain_all_stock(): > > > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > struct mem_cgroup *mem; > > > > mem = stock->cached; > > if (!mem || !stock->nr_pages) > > continue; > > if (!mem_cgroup_same_or_subtree(root_mem, mem)) > > continue; > > if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > > if (cpu == curcpu) > > drain_local_stock(&stock->work); > > else > > schedule_work_on(cpu, &stock->work); > > } > > } > > > > The only thing that stabilizes stock->cached is the knowledge that > > there are still pages accounted to the memcg. > > Yes you are right we have to set FLUSHING_CACHED_CHARGE before nr_pages > check (and do the appropriate cleanup on the continue paths). This looks > quite ugly, though. > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f4ec4e7..eca46141 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2179,17 +2179,23 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > > + /* Try to lock the cache */ > + if(test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > + continue; > + > mem = stock->cached; > if (!mem || !stock->nr_pages) > - continue; > + goto unlock_cache; > if (!mem_cgroup_same_or_subtree(root_mem, mem)) > - continue; > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > - if (cpu == curcpu) > - drain_local_stock(&stock->work); > - else > - schedule_work_on(cpu, &stock->work); > - } > + goto unlock_cache; So one thread locks the cache, recognizes stock->cached is not in the right hierarchy and unlocks again. While the cache was locked, a concurrent drainer with the right hierarchy skipped the stock because it was locked. That doesn't sound right. But yes, we probably need exclusive access to the stock state. > + > + if (cpu == curcpu) > + drain_local_stock(&stock->work); > + else > + schedule_work_on(cpu, &stock->work); > + continue; > +unlock_cache: > + clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > > ^^^^^ > need a barrier? > } > > if (!sync) > > > Without the mutex serializing this code, can't there be a concurrent > > execution that leads to stock->cached being drained, becoming empty > > and freed by someone else between the stock->nr_pages check and the > > ancestor check, resulting in use after free? > > > > What makes stock->cached safe to dereference? > > We are using FLUSHING_CACHED_CHARGE as a lock for local draining. I > guess it should be sufficient. > > mutex which was used previously caused that async draining was exclusive > so a root_mem that has potentially many relevant caches has to back off > because other mem wants to clear the cache on the same CPU. It's now replaced by what is essentially a per-stock bit-spinlock that is always trylocked. Would it make sense to promote it to a real spinlock? Draining a stock is pretty fast, there should be minimal lock hold times, but we would still avoid that tiny race window where we would skip otherwise correct stocks just because they are locked. > I will think about this tomorrow (with fresh eyes). I think we should be > able to be without mutex. The problem is that we have a multi-op atomic section, so we can not go lockless. We can read the stock state just fine, and order accesses to different members so that we get a coherent image. But there is still nothing that pins the charge to the memcg, and thus nothing that stabilizes *stock->cached. I agree that we can probably do better than a global lock, though. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock @ 2011-08-08 23:19 ` Johannes Weiner 0 siblings, 0 replies; 56+ messages in thread From: Johannes Weiner @ 2011-08-08 23:19 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Mon, Aug 08, 2011 at 11:47:04PM +0200, Michal Hocko wrote: > On Mon 08-08-11 20:47:38, Johannes Weiner wrote: > [...] > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { > > > #define FLUSHING_CACHED_CHARGE (0) > > > }; > > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > > -static DEFINE_MUTEX(percpu_charge_mutex); > > > > > > /* > > > * Try to consume stocked charge on this cpu. If success, one page is consumed > > > @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > > > for_each_online_cpu(cpu) { > > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > > - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > > + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > > + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > > flush_work(&stock->work); > > > } > > > out: > > > > This hunk triggers a crash for me, as the draining is already done and > > stock->cached reset to NULL when dereferenced here. Oops is attached. > > Thanks for catching this. We are racing synchronous drain from > force_empty and async drain from reclaim, I guess. It's at the end of a benchmark where several tasks delete the cgroups. There is no reclaim going on anymore, it must be several sync drains from force_empty of different memcgs. > Sync. checked whether it should wait for the work and the cache got > drained and set to NULL. First of all we must not dereference the > cached mem without FLUSHING_CACHED_CHARGE bit test. We have to be > sure that there is some draining on that cache. stock->cached is set > to NULL before we clear the bit (I guess we need to add a barrier > into drain_local_stock). So we should see mem either as NULL or > still valid (I have to think some more about "still valid" part - > maybe we will need rcu_read_lock). > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f4ec4e7..626c916 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2197,8 +2197,10 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > + struct mem_cgroup *mem = stock->cached; > + if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags) && > + mem && mem_cgroup_same_or_subtree(root_mem, mem) > + ) > flush_work(&stock->work); > } > out: This ordering makes sure that mem is a sensible pointer, but it still does not pin the object, *mem, which could go away the femtosecond after the test_bit succeeds. > > We have this loop in drain_all_stock(): > > > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > struct mem_cgroup *mem; > > > > mem = stock->cached; > > if (!mem || !stock->nr_pages) > > continue; > > if (!mem_cgroup_same_or_subtree(root_mem, mem)) > > continue; > > if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > > if (cpu == curcpu) > > drain_local_stock(&stock->work); > > else > > schedule_work_on(cpu, &stock->work); > > } > > } > > > > The only thing that stabilizes stock->cached is the knowledge that > > there are still pages accounted to the memcg. > > Yes you are right we have to set FLUSHING_CACHED_CHARGE before nr_pages > check (and do the appropriate cleanup on the continue paths). This looks > quite ugly, though. > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f4ec4e7..eca46141 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2179,17 +2179,23 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > > + /* Try to lock the cache */ > + if(test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > + continue; > + > mem = stock->cached; > if (!mem || !stock->nr_pages) > - continue; > + goto unlock_cache; > if (!mem_cgroup_same_or_subtree(root_mem, mem)) > - continue; > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > - if (cpu == curcpu) > - drain_local_stock(&stock->work); > - else > - schedule_work_on(cpu, &stock->work); > - } > + goto unlock_cache; So one thread locks the cache, recognizes stock->cached is not in the right hierarchy and unlocks again. While the cache was locked, a concurrent drainer with the right hierarchy skipped the stock because it was locked. That doesn't sound right. But yes, we probably need exclusive access to the stock state. > + > + if (cpu == curcpu) > + drain_local_stock(&stock->work); > + else > + schedule_work_on(cpu, &stock->work); > + continue; > +unlock_cache: > + clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > > ^^^^^ > need a barrier? > } > > if (!sync) > > > Without the mutex serializing this code, can't there be a concurrent > > execution that leads to stock->cached being drained, becoming empty > > and freed by someone else between the stock->nr_pages check and the > > ancestor check, resulting in use after free? > > > > What makes stock->cached safe to dereference? > > We are using FLUSHING_CACHED_CHARGE as a lock for local draining. I > guess it should be sufficient. > > mutex which was used previously caused that async draining was exclusive > so a root_mem that has potentially many relevant caches has to back off > because other mem wants to clear the cache on the same CPU. It's now replaced by what is essentially a per-stock bit-spinlock that is always trylocked. Would it make sense to promote it to a real spinlock? Draining a stock is pretty fast, there should be minimal lock hold times, but we would still avoid that tiny race window where we would skip otherwise correct stocks just because they are locked. > I will think about this tomorrow (with fresh eyes). I think we should be > able to be without mutex. The problem is that we have a multi-op atomic section, so we can not go lockless. We can read the stock state just fine, and order accesses to different members so that we get a coherent image. But there is still nothing that pins the charge to the memcg, and thus nothing that stabilizes *stock->cached. I agree that we can probably do better than a global lock, though. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock 2011-08-08 23:19 ` Johannes Weiner @ 2011-08-09 7:26 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 7:26 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Tue 09-08-11 01:19:12, Johannes Weiner wrote: > On Mon, Aug 08, 2011 at 11:47:04PM +0200, Michal Hocko wrote: > > On Mon 08-08-11 20:47:38, Johannes Weiner wrote: > > [...] > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { > > > > #define FLUSHING_CACHED_CHARGE (0) > > > > }; > > > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > > > -static DEFINE_MUTEX(percpu_charge_mutex); > > > > > > > > /* > > > > * Try to consume stocked charge on this cpu. If success, one page is consumed > > > > @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > > > > > for_each_online_cpu(cpu) { > > > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > > > - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > > > + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > > > + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > > > flush_work(&stock->work); > > > > } > > > > out: > > > > > > This hunk triggers a crash for me, as the draining is already done and > > > stock->cached reset to NULL when dereferenced here. Oops is attached. > > > > Thanks for catching this. We are racing synchronous drain from > > force_empty and async drain from reclaim, I guess. > > It's at the end of a benchmark where several tasks delete the cgroups. > There is no reclaim going on anymore, it must be several sync drains > from force_empty of different memcgs. I am afraid it is worse than that. Sync. drainer can kick itself really trivial just by doing local draining. Then stock->cached is guaranteed to be NULL when we reach this... [...] > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index f4ec4e7..626c916 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2197,8 +2197,10 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > + struct mem_cgroup *mem = stock->cached; > > + if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags) && > > + mem && mem_cgroup_same_or_subtree(root_mem, mem) > > + ) > > flush_work(&stock->work); > > } > > out: > > This ordering makes sure that mem is a sensible pointer, but it still > does not pin the object, *mem, which could go away the femtosecond > after the test_bit succeeds. Yes there are possible races CPU0 CPU1 CPU2 mem = stock->cached stock->cached = NULL clear_bit test_and_set_bit test_bit() <preempted> ... mem_cgroup_destroy use after free The race is not very probable because we are doing quite a bunch of work before we can deallocate mem. mem_cgroup_destroy is called after synchronize_rcu so we can close the race by rcu_read_lock, right? [...] > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index f4ec4e7..eca46141 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2179,17 +2179,23 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > struct mem_cgroup *mem; > > > > + /* Try to lock the cache */ > > + if(test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > + continue; > > + > > mem = stock->cached; > > if (!mem || !stock->nr_pages) > > - continue; > > + goto unlock_cache; > > if (!mem_cgroup_same_or_subtree(root_mem, mem)) > > - continue; > > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > > - if (cpu == curcpu) > > - drain_local_stock(&stock->work); > > - else > > - schedule_work_on(cpu, &stock->work); > > - } > > + goto unlock_cache; > > So one thread locks the cache, recognizes stock->cached is not in the > right hierarchy and unlocks again. While the cache was locked, a > concurrent drainer with the right hierarchy skipped the stock because > it was locked. That doesn't sound right. You are right. We have to make it less parallel. > > But yes, we probably need exclusive access to the stock state. > > > + > > + if (cpu == curcpu) > > + drain_local_stock(&stock->work); > > + else > > + schedule_work_on(cpu, &stock->work); > > + continue; > > +unlock_cache: > > + clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > > > > ^^^^^ > > need a barrier? > > } > > > > if (!sync) > > > > > Without the mutex serializing this code, can't there be a concurrent > > > execution that leads to stock->cached being drained, becoming empty > > > and freed by someone else between the stock->nr_pages check and the > > > ancestor check, resulting in use after free? > > > > > > What makes stock->cached safe to dereference? > > > > We are using FLUSHING_CACHED_CHARGE as a lock for local draining. I > > guess it should be sufficient. > > > > mutex which was used previously caused that async draining was exclusive > > so a root_mem that has potentially many relevant caches has to back off > > because other mem wants to clear the cache on the same CPU. > > It's now replaced by what is essentially a per-stock bit-spinlock that > is always trylocked. Yes. > > Would it make sense to promote it to a real spinlock? Draining a > stock is pretty fast, there should be minimal lock hold times, but we > would still avoid that tiny race window where we would skip otherwise > correct stocks just because they are locked. Yes, per-stock spinlock will be easiest way because if tried other games with atomics we would endup with a more complicated refill_stock which can race with draining as well. > > I will think about this tomorrow (with fresh eyes). I think we should be > > able to be without mutex. > > The problem is that we have a multi-op atomic section, so we can not > go lockless. We can read the stock state just fine, and order > accesses to different members so that we get a coherent image. But > there is still nothing that pins the charge to the memcg, and thus > nothing that stabilizes *stock->cached. > > I agree that we can probably do better than a global lock, though. Fully agreed. I will send a patch after I give it some testing. Thanks! -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock @ 2011-08-09 7:26 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 7:26 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel On Tue 09-08-11 01:19:12, Johannes Weiner wrote: > On Mon, Aug 08, 2011 at 11:47:04PM +0200, Michal Hocko wrote: > > On Mon 08-08-11 20:47:38, Johannes Weiner wrote: > > [...] > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { > > > > #define FLUSHING_CACHED_CHARGE (0) > > > > }; > > > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > > > -static DEFINE_MUTEX(percpu_charge_mutex); > > > > > > > > /* > > > > * Try to consume stocked charge on this cpu. If success, one page is consumed > > > > @@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > > > > > for_each_online_cpu(cpu) { > > > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > > > - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > > > + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > > > + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > > > flush_work(&stock->work); > > > > } > > > > out: > > > > > > This hunk triggers a crash for me, as the draining is already done and > > > stock->cached reset to NULL when dereferenced here. Oops is attached. > > > > Thanks for catching this. We are racing synchronous drain from > > force_empty and async drain from reclaim, I guess. > > It's at the end of a benchmark where several tasks delete the cgroups. > There is no reclaim going on anymore, it must be several sync drains > from force_empty of different memcgs. I am afraid it is worse than that. Sync. drainer can kick itself really trivial just by doing local draining. Then stock->cached is guaranteed to be NULL when we reach this... [...] > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index f4ec4e7..626c916 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2197,8 +2197,10 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > + struct mem_cgroup *mem = stock->cached; > > + if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags) && > > + mem && mem_cgroup_same_or_subtree(root_mem, mem) > > + ) > > flush_work(&stock->work); > > } > > out: > > This ordering makes sure that mem is a sensible pointer, but it still > does not pin the object, *mem, which could go away the femtosecond > after the test_bit succeeds. Yes there are possible races CPU0 CPU1 CPU2 mem = stock->cached stock->cached = NULL clear_bit test_and_set_bit test_bit() <preempted> ... mem_cgroup_destroy use after free The race is not very probable because we are doing quite a bunch of work before we can deallocate mem. mem_cgroup_destroy is called after synchronize_rcu so we can close the race by rcu_read_lock, right? [...] > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index f4ec4e7..eca46141 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2179,17 +2179,23 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > struct mem_cgroup *mem; > > > > + /* Try to lock the cache */ > > + if(test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > + continue; > > + > > mem = stock->cached; > > if (!mem || !stock->nr_pages) > > - continue; > > + goto unlock_cache; > > if (!mem_cgroup_same_or_subtree(root_mem, mem)) > > - continue; > > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > > - if (cpu == curcpu) > > - drain_local_stock(&stock->work); > > - else > > - schedule_work_on(cpu, &stock->work); > > - } > > + goto unlock_cache; > > So one thread locks the cache, recognizes stock->cached is not in the > right hierarchy and unlocks again. While the cache was locked, a > concurrent drainer with the right hierarchy skipped the stock because > it was locked. That doesn't sound right. You are right. We have to make it less parallel. > > But yes, we probably need exclusive access to the stock state. > > > + > > + if (cpu == curcpu) > > + drain_local_stock(&stock->work); > > + else > > + schedule_work_on(cpu, &stock->work); > > + continue; > > +unlock_cache: > > + clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > > > > ^^^^^ > > need a barrier? > > } > > > > if (!sync) > > > > > Without the mutex serializing this code, can't there be a concurrent > > > execution that leads to stock->cached being drained, becoming empty > > > and freed by someone else between the stock->nr_pages check and the > > > ancestor check, resulting in use after free? > > > > > > What makes stock->cached safe to dereference? > > > > We are using FLUSHING_CACHED_CHARGE as a lock for local draining. I > > guess it should be sufficient. > > > > mutex which was used previously caused that async draining was exclusive > > so a root_mem that has potentially many relevant caches has to back off > > because other mem wants to clear the cache on the same CPU. > > It's now replaced by what is essentially a per-stock bit-spinlock that > is always trylocked. Yes. > > Would it make sense to promote it to a real spinlock? Draining a > stock is pretty fast, there should be minimal lock hold times, but we > would still avoid that tiny race window where we would skip otherwise > correct stocks just because they are locked. Yes, per-stock spinlock will be easiest way because if tried other games with atomics we would endup with a more complicated refill_stock which can race with draining as well. > > I will think about this tomorrow (with fresh eyes). I think we should be > > able to be without mutex. > > The problem is that we have a multi-op atomic section, so we can not > go lockless. We can read the stock state just fine, and order > accesses to different members so that we get a coherent image. But > there is still nothing that pins the charge to the memcg, and thus > nothing that stabilizes *stock->cached. > > I agree that we can probably do better than a global lock, though. Fully agreed. I will send a patch after I give it some testing. Thanks! -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 7:26 ` Michal Hocko @ 2011-08-09 9:31 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 9:31 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel What do you think about the half backed patch bellow? I didn't manage to test it yet but I guess it should help. I hate asymmetry of drain_lock locking (it is acquired somewhere else than it is released which is not). I will think about a nicer way how to do it. Maybe I should also split the rcu part in a separate patch. What do you think? --- >From 26c2cdc55aa14ec4a54e9c8e2c8b9072c7cb8e28 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.cz> Date: Tue, 9 Aug 2011 10:53:28 +0200 Subject: [PATCH] memcg: fix drain_all_stock crash 8521fc50 (memcg: get rid of percpu_charge_mutex lock) introduced a crash in sync mode when we are about to check whether we have to wait for the work because we are calling mem_cgroup_same_or_subtree without checking FLUSHING_CACHED_CHARGE before so we can dereference already cleaned cache (the simplest case would be when we drain the local cache). BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 PGD 4ae7a067 PUD 4adc4067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU 0 Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) Stack: ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 Call Trace: [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 [<ffffffff81111872>] ? path_put+0x22/0x30 [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b Testing FLUSHING_CACHED_CHARGE before dereferencing is still not enough because then we still might see mem == NULL so we have to check it before dereferencing. We have to do all stock checking under FLUSHING_CACHED_CHARGE bit lock so it is much easier to use a spin_lock instead. Let's also add a flag (under_drain) that draining in progress so that concurrent callers do not have to wait on the lock pointlessly. Finally we do not make sure that the mem still exists. It could have been removed in the meantime: CPU0 CPU1 CPU2 mem=stock->cached stock->cached=NULL clear_bit test_and_set_bit test_bit() ... <preempted> mem_cgroup_destroy use after free `...' is actually quite a bunch of work to do so the race is not very probable. The important thing, though, is that cgroup_subsys->destroy (mem_cgroup_destroy) is called after synchronize_rcu so we can protect by calling rcu_read_lock when dereferencing cached mem. TODO: - check if under_drain needs some memory barriers - check the hotplug path (can we wait on spinlock?) - better changelog - do some testing Signed-off-by: Michal Hocko <mhocko@suse.cz> Reported-by: Johannes Weiner <jweiner@redhat.com> --- mm/memcontrol.c | 67 +++++++++++++++++++++++++++++++++++++++++------------- 1 files changed, 51 insertions(+), 16 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f4ec4e7..e34f9fd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2087,8 +2087,8 @@ struct memcg_stock_pcp { struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; struct work_struct work; - unsigned long flags; -#define FLUSHING_CACHED_CHARGE (0) + spinlock_t drain_lock; /* protects from parallel draining */ + bool under_drain; }; static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); @@ -2114,6 +2114,7 @@ static bool consume_stock(struct mem_cgroup *mem) /* * Returns stocks cached in percpu to res_counter and reset cached information. + * Do not call this directly - use drain_local_stock instead. */ static void drain_stock(struct memcg_stock_pcp *stock) { @@ -2133,12 +2134,16 @@ static void drain_stock(struct memcg_stock_pcp *stock) /* * This must be called under preempt disabled or must be called by * a thread which is pinned to local cpu. + * Parameter is not used. + * Assumes stock->drain_lock held. */ static void drain_local_stock(struct work_struct *dummy) { struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock); drain_stock(stock); - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); + + stock->under_drain = false; + spin_unlock(&stock->drain_lock); } /* @@ -2150,7 +2155,9 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock); if (stock->cached != mem) { /* reset if necessary */ - drain_stock(stock); + spin_lock(&stock->drain_lock); + stock->under_drain = true; + drain_local_stock(NULL); stock->cached = mem; } stock->nr_pages += nr_pages; @@ -2179,17 +2186,27 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; + /* + * make sure we are not waiting when somebody already drains + * the cache. + */ + if (!spin_trylock(&stock->drain_lock)) { + if (stock->under_drain) + continue; + spin_lock(&stock->drain_lock); + } mem = stock->cached; - if (!mem || !stock->nr_pages) + if (!mem || !stock->nr_pages || + !mem_cgroup_same_or_subtree(root_mem, mem)) { + spin_unlock(&stock->drain_lock); continue; - if (!mem_cgroup_same_or_subtree(root_mem, mem)) - continue; - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { - if (cpu == curcpu) - drain_local_stock(&stock->work); - else - schedule_work_on(cpu, &stock->work); } + + stock->under_drain = true; + if (cpu == curcpu) + drain_local_stock(&stock->work); + else + schedule_work_on(cpu, &stock->work); } if (!sync) @@ -2197,8 +2214,20 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) + struct mem_cgroup *mem; + bool wait_for_drain = false; + + /* + * we have to be careful about parallel group destroying + * (mem_cgroup_destroy) which is derefered after sychronize_rcu + */ + rcu_read_lock(); + mem = stock->cached; + wait_for_drain = stock->under_drain && + mem && mem_cgroup_same_or_subtree(root_mem, mem); + rcu_read_unlock(); + + if (wait_for_drain) flush_work(&stock->work); } out: @@ -2278,8 +2307,12 @@ static int __cpuinit memcg_cpu_hotplug_callback(struct notifier_block *nb, for_each_mem_cgroup_all(iter) mem_cgroup_drain_pcp_counter(iter, cpu); - stock = &per_cpu(memcg_stock, cpu); - drain_stock(stock); + if (!spin_trylock(&stock->drain_lock)) { + if (stock->under_drain) + return NOTIFY_OK; + spin_lock(&stock->drain_lock); + } + drain_local_stock(NULL); return NOTIFY_OK; } @@ -5068,6 +5101,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); INIT_WORK(&stock->work, drain_local_stock); + stock->under_drain = false; + spin_lock_init(&stock->drain_lock); } hotcpu_notifier(memcg_cpu_hotplug_callback, 0); } else { -- 1.7.5.4 -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-09 9:31 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 9:31 UTC (permalink / raw) To: Johannes Weiner Cc: linux-mm, Balbir Singh, KAMEZAWA Hiroyuki, Andrew Morton, linux-kernel What do you think about the half backed patch bellow? I didn't manage to test it yet but I guess it should help. I hate asymmetry of drain_lock locking (it is acquired somewhere else than it is released which is not). I will think about a nicer way how to do it. Maybe I should also split the rcu part in a separate patch. What do you think? --- ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 9:31 ` Michal Hocko @ 2011-08-09 9:32 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-09 9:32 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 11:31:50 +0200 Michal Hocko <mhocko@suse.cz> wrote: > What do you think about the half backed patch bellow? I didn't manage to > test it yet but I guess it should help. I hate asymmetry of drain_lock > locking (it is acquired somewhere else than it is released which is > not). I will think about a nicer way how to do it. > Maybe I should also split the rcu part in a separate patch. > > What do you think? I'd like to revert 8521fc50 first and consider total design change rather than ad-hoc fix. Personally, I don't like to have spin-lock in per-cpu area. Thanks, -Kame > --- > From 26c2cdc55aa14ec4a54e9c8e2c8b9072c7cb8e28 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.cz> > Date: Tue, 9 Aug 2011 10:53:28 +0200 > Subject: [PATCH] memcg: fix drain_all_stock crash > > 8521fc50 (memcg: get rid of percpu_charge_mutex lock) introduced a crash > in sync mode when we are about to check whether we have to wait for the > work because we are calling mem_cgroup_same_or_subtree without checking > FLUSHING_CACHED_CHARGE before so we can dereference already cleaned > cache (the simplest case would be when we drain the local cache). > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > PGD 4ae7a067 PUD 4adc4067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP > CPU 0 > Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 > RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 > RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e > RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 > RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 > R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 > R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 > FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) > Stack: > ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 > ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 > ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 > Call Trace: > [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 > [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 > [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 > [<ffffffff81111872>] ? path_put+0x22/0x30 > [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 > [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 > [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 > [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 > [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 > [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 > [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 > [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 > [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b > > Testing FLUSHING_CACHED_CHARGE before dereferencing is still not enough > because then we still might see mem == NULL so we have to check it > before dereferencing. > We have to do all stock checking under FLUSHING_CACHED_CHARGE bit lock > so it is much easier to use a spin_lock instead. Let's also add a flag > (under_drain) that draining in progress so that concurrent callers do > not have to wait on the lock pointlessly. > > Finally we do not make sure that the mem still exists. It could have > been removed in the meantime: > CPU0 CPU1 CPU2 > mem=stock->cached > stock->cached=NULL > clear_bit > test_and_set_bit > test_bit() ... > <preempted> mem_cgroup_destroy > use after free > > `...' is actually quite a bunch of work to do so the race is not very > probable. The important thing, though, is that cgroup_subsys->destroy > (mem_cgroup_destroy) is called after synchronize_rcu so we can protect > by calling rcu_read_lock when dereferencing cached mem. > > TODO: > - check if under_drain needs some memory barriers > - check the hotplug path (can we wait on spinlock?) > - better changelog > - do some testing > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > Reported-by: Johannes Weiner <jweiner@redhat.com> > --- > mm/memcontrol.c | 67 +++++++++++++++++++++++++++++++++++++++++------------- > 1 files changed, 51 insertions(+), 16 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f4ec4e7..e34f9fd 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2087,8 +2087,8 @@ struct memcg_stock_pcp { > struct mem_cgroup *cached; /* this never be root cgroup */ > unsigned int nr_pages; > struct work_struct work; > - unsigned long flags; > -#define FLUSHING_CACHED_CHARGE (0) > + spinlock_t drain_lock; /* protects from parallel draining */ > + bool under_drain; > }; > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > @@ -2114,6 +2114,7 @@ static bool consume_stock(struct mem_cgroup *mem) > > /* > * Returns stocks cached in percpu to res_counter and reset cached information. > + * Do not call this directly - use drain_local_stock instead. > */ > static void drain_stock(struct memcg_stock_pcp *stock) > { > @@ -2133,12 +2134,16 @@ static void drain_stock(struct memcg_stock_pcp *stock) > /* > * This must be called under preempt disabled or must be called by > * a thread which is pinned to local cpu. > + * Parameter is not used. > + * Assumes stock->drain_lock held. > */ > static void drain_local_stock(struct work_struct *dummy) > { > struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock); > drain_stock(stock); > - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > + > + stock->under_drain = false; > + spin_unlock(&stock->drain_lock); > } > > /* > @@ -2150,7 +2155,9 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) > struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock); > > if (stock->cached != mem) { /* reset if necessary */ > - drain_stock(stock); > + spin_lock(&stock->drain_lock); > + stock->under_drain = true; > + drain_local_stock(NULL); > stock->cached = mem; > } > stock->nr_pages += nr_pages; > @@ -2179,17 +2186,27 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > > + /* > + * make sure we are not waiting when somebody already drains > + * the cache. > + */ > + if (!spin_trylock(&stock->drain_lock)) { > + if (stock->under_drain) > + continue; > + spin_lock(&stock->drain_lock); > + } > mem = stock->cached; > - if (!mem || !stock->nr_pages) > + if (!mem || !stock->nr_pages || > + !mem_cgroup_same_or_subtree(root_mem, mem)) { > + spin_unlock(&stock->drain_lock); > continue; > - if (!mem_cgroup_same_or_subtree(root_mem, mem)) > - continue; > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > - if (cpu == curcpu) > - drain_local_stock(&stock->work); > - else > - schedule_work_on(cpu, &stock->work); > } > + > + stock->under_drain = true; > + if (cpu == curcpu) > + drain_local_stock(&stock->work); > + else > + schedule_work_on(cpu, &stock->work); > } > > if (!sync) > @@ -2197,8 +2214,20 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > + struct mem_cgroup *mem; > + bool wait_for_drain = false; > + > + /* > + * we have to be careful about parallel group destroying > + * (mem_cgroup_destroy) which is derefered after sychronize_rcu > + */ > + rcu_read_lock(); > + mem = stock->cached; > + wait_for_drain = stock->under_drain && > + mem && mem_cgroup_same_or_subtree(root_mem, mem); > + rcu_read_unlock(); > + > + if (wait_for_drain) > flush_work(&stock->work); > } > out: > @@ -2278,8 +2307,12 @@ static int __cpuinit memcg_cpu_hotplug_callback(struct notifier_block *nb, > for_each_mem_cgroup_all(iter) > mem_cgroup_drain_pcp_counter(iter, cpu); > > - stock = &per_cpu(memcg_stock, cpu); > - drain_stock(stock); > + if (!spin_trylock(&stock->drain_lock)) { > + if (stock->under_drain) > + return NOTIFY_OK; > + spin_lock(&stock->drain_lock); > + } > + drain_local_stock(NULL); > return NOTIFY_OK; > } > > @@ -5068,6 +5101,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > struct memcg_stock_pcp *stock = > &per_cpu(memcg_stock, cpu); > INIT_WORK(&stock->work, drain_local_stock); > + stock->under_drain = false; > + spin_lock_init(&stock->drain_lock); > } > hotcpu_notifier(memcg_cpu_hotplug_callback, 0); > } else { > -- > 1.7.5.4 > > > -- > Michal Hocko > SUSE Labs > SUSE LINUX s.r.o. > Lihovarska 1060/12 > 190 00 Praha 9 > Czech Republic > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-09 9:32 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-09 9:32 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 11:31:50 +0200 Michal Hocko <mhocko@suse.cz> wrote: > What do you think about the half backed patch bellow? I didn't manage to > test it yet but I guess it should help. I hate asymmetry of drain_lock > locking (it is acquired somewhere else than it is released which is > not). I will think about a nicer way how to do it. > Maybe I should also split the rcu part in a separate patch. > > What do you think? I'd like to revert 8521fc50 first and consider total design change rather than ad-hoc fix. Personally, I don't like to have spin-lock in per-cpu area. Thanks, -Kame > --- > From 26c2cdc55aa14ec4a54e9c8e2c8b9072c7cb8e28 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.cz> > Date: Tue, 9 Aug 2011 10:53:28 +0200 > Subject: [PATCH] memcg: fix drain_all_stock crash > > 8521fc50 (memcg: get rid of percpu_charge_mutex lock) introduced a crash > in sync mode when we are about to check whether we have to wait for the > work because we are calling mem_cgroup_same_or_subtree without checking > FLUSHING_CACHED_CHARGE before so we can dereference already cleaned > cache (the simplest case would be when we drain the local cache). > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > PGD 4ae7a067 PUD 4adc4067 PMD 0 > Oops: 0000 [#1] PREEMPT SMP > CPU 0 > Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 > RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 > RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e > RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 > RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 > R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 > R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 > FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) > Stack: > ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 > ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 > ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 > Call Trace: > [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 > [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 > [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 > [<ffffffff81111872>] ? path_put+0x22/0x30 > [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 > [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 > [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 > [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 > [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 > [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 > [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 > [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 > [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b > > Testing FLUSHING_CACHED_CHARGE before dereferencing is still not enough > because then we still might see mem == NULL so we have to check it > before dereferencing. > We have to do all stock checking under FLUSHING_CACHED_CHARGE bit lock > so it is much easier to use a spin_lock instead. Let's also add a flag > (under_drain) that draining in progress so that concurrent callers do > not have to wait on the lock pointlessly. > > Finally we do not make sure that the mem still exists. It could have > been removed in the meantime: > CPU0 CPU1 CPU2 > mem=stock->cached > stock->cached=NULL > clear_bit > test_and_set_bit > test_bit() ... > <preempted> mem_cgroup_destroy > use after free > > `...' is actually quite a bunch of work to do so the race is not very > probable. The important thing, though, is that cgroup_subsys->destroy > (mem_cgroup_destroy) is called after synchronize_rcu so we can protect > by calling rcu_read_lock when dereferencing cached mem. > > TODO: > - check if under_drain needs some memory barriers > - check the hotplug path (can we wait on spinlock?) > - better changelog > - do some testing > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > Reported-by: Johannes Weiner <jweiner@redhat.com> > --- > mm/memcontrol.c | 67 +++++++++++++++++++++++++++++++++++++++++------------- > 1 files changed, 51 insertions(+), 16 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f4ec4e7..e34f9fd 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2087,8 +2087,8 @@ struct memcg_stock_pcp { > struct mem_cgroup *cached; /* this never be root cgroup */ > unsigned int nr_pages; > struct work_struct work; > - unsigned long flags; > -#define FLUSHING_CACHED_CHARGE (0) > + spinlock_t drain_lock; /* protects from parallel draining */ > + bool under_drain; > }; > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > @@ -2114,6 +2114,7 @@ static bool consume_stock(struct mem_cgroup *mem) > > /* > * Returns stocks cached in percpu to res_counter and reset cached information. > + * Do not call this directly - use drain_local_stock instead. > */ > static void drain_stock(struct memcg_stock_pcp *stock) > { > @@ -2133,12 +2134,16 @@ static void drain_stock(struct memcg_stock_pcp *stock) > /* > * This must be called under preempt disabled or must be called by > * a thread which is pinned to local cpu. > + * Parameter is not used. > + * Assumes stock->drain_lock held. > */ > static void drain_local_stock(struct work_struct *dummy) > { > struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock); > drain_stock(stock); > - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > + > + stock->under_drain = false; > + spin_unlock(&stock->drain_lock); > } > > /* > @@ -2150,7 +2155,9 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) > struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock); > > if (stock->cached != mem) { /* reset if necessary */ > - drain_stock(stock); > + spin_lock(&stock->drain_lock); > + stock->under_drain = true; > + drain_local_stock(NULL); > stock->cached = mem; > } > stock->nr_pages += nr_pages; > @@ -2179,17 +2186,27 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > > + /* > + * make sure we are not waiting when somebody already drains > + * the cache. > + */ > + if (!spin_trylock(&stock->drain_lock)) { > + if (stock->under_drain) > + continue; > + spin_lock(&stock->drain_lock); > + } > mem = stock->cached; > - if (!mem || !stock->nr_pages) > + if (!mem || !stock->nr_pages || > + !mem_cgroup_same_or_subtree(root_mem, mem)) { > + spin_unlock(&stock->drain_lock); > continue; > - if (!mem_cgroup_same_or_subtree(root_mem, mem)) > - continue; > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > - if (cpu == curcpu) > - drain_local_stock(&stock->work); > - else > - schedule_work_on(cpu, &stock->work); > } > + > + stock->under_drain = true; > + if (cpu == curcpu) > + drain_local_stock(&stock->work); > + else > + schedule_work_on(cpu, &stock->work); > } > > if (!sync) > @@ -2197,8 +2214,20 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > + struct mem_cgroup *mem; > + bool wait_for_drain = false; > + > + /* > + * we have to be careful about parallel group destroying > + * (mem_cgroup_destroy) which is derefered after sychronize_rcu > + */ > + rcu_read_lock(); > + mem = stock->cached; > + wait_for_drain = stock->under_drain && > + mem && mem_cgroup_same_or_subtree(root_mem, mem); > + rcu_read_unlock(); > + > + if (wait_for_drain) > flush_work(&stock->work); > } > out: > @@ -2278,8 +2307,12 @@ static int __cpuinit memcg_cpu_hotplug_callback(struct notifier_block *nb, > for_each_mem_cgroup_all(iter) > mem_cgroup_drain_pcp_counter(iter, cpu); > > - stock = &per_cpu(memcg_stock, cpu); > - drain_stock(stock); > + if (!spin_trylock(&stock->drain_lock)) { > + if (stock->under_drain) > + return NOTIFY_OK; > + spin_lock(&stock->drain_lock); > + } > + drain_local_stock(NULL); > return NOTIFY_OK; > } > > @@ -5068,6 +5101,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > struct memcg_stock_pcp *stock = > &per_cpu(memcg_stock, cpu); > INIT_WORK(&stock->work, drain_local_stock); > + stock->under_drain = false; > + spin_lock_init(&stock->drain_lock); > } > hotcpu_notifier(memcg_cpu_hotplug_callback, 0); > } else { > -- > 1.7.5.4 > > > -- > Michal Hocko > SUSE Labs > SUSE LINUX s.r.o. > Lihovarska 1060/12 > 190 00 Praha 9 > Czech Republic > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 9:32 ` KAMEZAWA Hiroyuki @ 2011-08-09 9:45 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 9:45 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > On Tue, 9 Aug 2011 11:31:50 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > What do you think about the half backed patch bellow? I didn't manage to > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > locking (it is acquired somewhere else than it is released which is > > not). I will think about a nicer way how to do it. > > Maybe I should also split the rcu part in a separate patch. > > > > What do you think? > > > I'd like to revert 8521fc50 first and consider total design change > rather than ad-hoc fix. Agreed. Revert should go into 3.0 stable as well. Although the global mutex is buggy we have that behavior for a long time without any reports. We should address it but it can wait for 3.2. > Personally, I don't like to have spin-lock in per-cpu area. spinlock is not that different from what we already have with the bit lock. > > > Thanks, > -Kame > > > --- > > From 26c2cdc55aa14ec4a54e9c8e2c8b9072c7cb8e28 Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@suse.cz> > > Date: Tue, 9 Aug 2011 10:53:28 +0200 > > Subject: [PATCH] memcg: fix drain_all_stock crash > > > > 8521fc50 (memcg: get rid of percpu_charge_mutex lock) introduced a crash > > in sync mode when we are about to check whether we have to wait for the > > work because we are calling mem_cgroup_same_or_subtree without checking > > FLUSHING_CACHED_CHARGE before so we can dereference already cleaned > > cache (the simplest case would be when we drain the local cache). > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > > IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > > PGD 4ae7a067 PUD 4adc4067 PMD 0 > > Oops: 0000 [#1] PREEMPT SMP > > CPU 0 > > Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 > > RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > > RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 > > RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e > > RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 > > RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 > > R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 > > R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 > > FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) > > Stack: > > ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 > > ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 > > ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 > > Call Trace: > > [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 > > [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 > > [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 > > [<ffffffff81111872>] ? path_put+0x22/0x30 > > [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 > > [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 > > [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 > > [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 > > [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 > > [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 > > [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 > > [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 > > [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b > > > > Testing FLUSHING_CACHED_CHARGE before dereferencing is still not enough > > because then we still might see mem == NULL so we have to check it > > before dereferencing. > > We have to do all stock checking under FLUSHING_CACHED_CHARGE bit lock > > so it is much easier to use a spin_lock instead. Let's also add a flag > > (under_drain) that draining in progress so that concurrent callers do > > not have to wait on the lock pointlessly. > > > > Finally we do not make sure that the mem still exists. It could have > > been removed in the meantime: > > CPU0 CPU1 CPU2 > > mem=stock->cached > > stock->cached=NULL > > clear_bit > > test_and_set_bit > > test_bit() ... > > <preempted> mem_cgroup_destroy > > use after free > > > > `...' is actually quite a bunch of work to do so the race is not very > > probable. The important thing, though, is that cgroup_subsys->destroy > > (mem_cgroup_destroy) is called after synchronize_rcu so we can protect > > by calling rcu_read_lock when dereferencing cached mem. > > > > TODO: > > - check if under_drain needs some memory barriers > > - check the hotplug path (can we wait on spinlock?) > > - better changelog > > - do some testing > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > Reported-by: Johannes Weiner <jweiner@redhat.com> > > --- > > mm/memcontrol.c | 67 +++++++++++++++++++++++++++++++++++++++++------------- > > 1 files changed, 51 insertions(+), 16 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index f4ec4e7..e34f9fd 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2087,8 +2087,8 @@ struct memcg_stock_pcp { > > struct mem_cgroup *cached; /* this never be root cgroup */ > > unsigned int nr_pages; > > struct work_struct work; > > - unsigned long flags; > > -#define FLUSHING_CACHED_CHARGE (0) > > + spinlock_t drain_lock; /* protects from parallel draining */ > > + bool under_drain; > > }; > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > > > @@ -2114,6 +2114,7 @@ static bool consume_stock(struct mem_cgroup *mem) > > > > /* > > * Returns stocks cached in percpu to res_counter and reset cached information. > > + * Do not call this directly - use drain_local_stock instead. > > */ > > static void drain_stock(struct memcg_stock_pcp *stock) > > { > > @@ -2133,12 +2134,16 @@ static void drain_stock(struct memcg_stock_pcp *stock) > > /* > > * This must be called under preempt disabled or must be called by > > * a thread which is pinned to local cpu. > > + * Parameter is not used. > > + * Assumes stock->drain_lock held. > > */ > > static void drain_local_stock(struct work_struct *dummy) > > { > > struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock); > > drain_stock(stock); > > - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > > + > > + stock->under_drain = false; > > + spin_unlock(&stock->drain_lock); > > } > > > > /* > > @@ -2150,7 +2155,9 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) > > struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock); > > > > if (stock->cached != mem) { /* reset if necessary */ > > - drain_stock(stock); > > + spin_lock(&stock->drain_lock); > > + stock->under_drain = true; > > + drain_local_stock(NULL); > > stock->cached = mem; > > } > > stock->nr_pages += nr_pages; > > @@ -2179,17 +2186,27 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > struct mem_cgroup *mem; > > > > + /* > > + * make sure we are not waiting when somebody already drains > > + * the cache. > > + */ > > + if (!spin_trylock(&stock->drain_lock)) { > > + if (stock->under_drain) > > + continue; > > + spin_lock(&stock->drain_lock); > > + } > > mem = stock->cached; > > - if (!mem || !stock->nr_pages) > > + if (!mem || !stock->nr_pages || > > + !mem_cgroup_same_or_subtree(root_mem, mem)) { > > + spin_unlock(&stock->drain_lock); > > continue; > > - if (!mem_cgroup_same_or_subtree(root_mem, mem)) > > - continue; > > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > > - if (cpu == curcpu) > > - drain_local_stock(&stock->work); > > - else > > - schedule_work_on(cpu, &stock->work); > > } > > + > > + stock->under_drain = true; > > + if (cpu == curcpu) > > + drain_local_stock(&stock->work); > > + else > > + schedule_work_on(cpu, &stock->work); > > } > > > > if (!sync) > > @@ -2197,8 +2214,20 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > + struct mem_cgroup *mem; > > + bool wait_for_drain = false; > > + > > + /* > > + * we have to be careful about parallel group destroying > > + * (mem_cgroup_destroy) which is derefered after sychronize_rcu > > + */ > > + rcu_read_lock(); > > + mem = stock->cached; > > + wait_for_drain = stock->under_drain && > > + mem && mem_cgroup_same_or_subtree(root_mem, mem); > > + rcu_read_unlock(); > > + > > + if (wait_for_drain) > > flush_work(&stock->work); > > } > > out: > > @@ -2278,8 +2307,12 @@ static int __cpuinit memcg_cpu_hotplug_callback(struct notifier_block *nb, > > for_each_mem_cgroup_all(iter) > > mem_cgroup_drain_pcp_counter(iter, cpu); > > > > - stock = &per_cpu(memcg_stock, cpu); > > - drain_stock(stock); > > + if (!spin_trylock(&stock->drain_lock)) { > > + if (stock->under_drain) > > + return NOTIFY_OK; > > + spin_lock(&stock->drain_lock); > > + } > > + drain_local_stock(NULL); > > return NOTIFY_OK; > > } > > > > @@ -5068,6 +5101,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > struct memcg_stock_pcp *stock = > > &per_cpu(memcg_stock, cpu); > > INIT_WORK(&stock->work, drain_local_stock); > > + stock->under_drain = false; > > + spin_lock_init(&stock->drain_lock); > > } > > hotcpu_notifier(memcg_cpu_hotplug_callback, 0); > > } else { > > -- > > 1.7.5.4 > > > > > > -- > > Michal Hocko > > SUSE Labs > > SUSE LINUX s.r.o. > > Lihovarska 1060/12 > > 190 00 Praha 9 > > Czech Republic > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-09 9:45 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 9:45 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > On Tue, 9 Aug 2011 11:31:50 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > What do you think about the half backed patch bellow? I didn't manage to > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > locking (it is acquired somewhere else than it is released which is > > not). I will think about a nicer way how to do it. > > Maybe I should also split the rcu part in a separate patch. > > > > What do you think? > > > I'd like to revert 8521fc50 first and consider total design change > rather than ad-hoc fix. Agreed. Revert should go into 3.0 stable as well. Although the global mutex is buggy we have that behavior for a long time without any reports. We should address it but it can wait for 3.2. > Personally, I don't like to have spin-lock in per-cpu area. spinlock is not that different from what we already have with the bit lock. > > > Thanks, > -Kame > > > --- > > From 26c2cdc55aa14ec4a54e9c8e2c8b9072c7cb8e28 Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@suse.cz> > > Date: Tue, 9 Aug 2011 10:53:28 +0200 > > Subject: [PATCH] memcg: fix drain_all_stock crash > > > > 8521fc50 (memcg: get rid of percpu_charge_mutex lock) introduced a crash > > in sync mode when we are about to check whether we have to wait for the > > work because we are calling mem_cgroup_same_or_subtree without checking > > FLUSHING_CACHED_CHARGE before so we can dereference already cleaned > > cache (the simplest case would be when we drain the local cache). > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > > IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > > PGD 4ae7a067 PUD 4adc4067 PMD 0 > > Oops: 0000 [#1] PREEMPT SMP > > CPU 0 > > Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3 > > RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70 > > RSP: 0018:ffff880077b09c88 EFLAGS: 00010202 > > RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e > > RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000 > > RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000 > > R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00 > > R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80 > > FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310) > > Stack: > > ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3 > > ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001 > > ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000 > > Call Trace: > > [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40 > > [<ffffffff810feccf>] drain_all_stock+0x11f/0x170 > > [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0 > > [<ffffffff81111872>] ? path_put+0x22/0x30 > > [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170 > > [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20 > > [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500 > > [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0 > > [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0 > > [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80 > > [<ffffffff81114e7b>] do_rmdir+0xfb/0x110 > > [<ffffffff81114ea6>] sys_rmdir+0x16/0x20 > > [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b > > > > Testing FLUSHING_CACHED_CHARGE before dereferencing is still not enough > > because then we still might see mem == NULL so we have to check it > > before dereferencing. > > We have to do all stock checking under FLUSHING_CACHED_CHARGE bit lock > > so it is much easier to use a spin_lock instead. Let's also add a flag > > (under_drain) that draining in progress so that concurrent callers do > > not have to wait on the lock pointlessly. > > > > Finally we do not make sure that the mem still exists. It could have > > been removed in the meantime: > > CPU0 CPU1 CPU2 > > mem=stock->cached > > stock->cached=NULL > > clear_bit > > test_and_set_bit > > test_bit() ... > > <preempted> mem_cgroup_destroy > > use after free > > > > `...' is actually quite a bunch of work to do so the race is not very > > probable. The important thing, though, is that cgroup_subsys->destroy > > (mem_cgroup_destroy) is called after synchronize_rcu so we can protect > > by calling rcu_read_lock when dereferencing cached mem. > > > > TODO: > > - check if under_drain needs some memory barriers > > - check the hotplug path (can we wait on spinlock?) > > - better changelog > > - do some testing > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > Reported-by: Johannes Weiner <jweiner@redhat.com> > > --- > > mm/memcontrol.c | 67 +++++++++++++++++++++++++++++++++++++++++------------- > > 1 files changed, 51 insertions(+), 16 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index f4ec4e7..e34f9fd 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2087,8 +2087,8 @@ struct memcg_stock_pcp { > > struct mem_cgroup *cached; /* this never be root cgroup */ > > unsigned int nr_pages; > > struct work_struct work; > > - unsigned long flags; > > -#define FLUSHING_CACHED_CHARGE (0) > > + spinlock_t drain_lock; /* protects from parallel draining */ > > + bool under_drain; > > }; > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); > > > > @@ -2114,6 +2114,7 @@ static bool consume_stock(struct mem_cgroup *mem) > > > > /* > > * Returns stocks cached in percpu to res_counter and reset cached information. > > + * Do not call this directly - use drain_local_stock instead. > > */ > > static void drain_stock(struct memcg_stock_pcp *stock) > > { > > @@ -2133,12 +2134,16 @@ static void drain_stock(struct memcg_stock_pcp *stock) > > /* > > * This must be called under preempt disabled or must be called by > > * a thread which is pinned to local cpu. > > + * Parameter is not used. > > + * Assumes stock->drain_lock held. > > */ > > static void drain_local_stock(struct work_struct *dummy) > > { > > struct memcg_stock_pcp *stock = &__get_cpu_var(memcg_stock); > > drain_stock(stock); > > - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); > > + > > + stock->under_drain = false; > > + spin_unlock(&stock->drain_lock); > > } > > > > /* > > @@ -2150,7 +2155,9 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) > > struct memcg_stock_pcp *stock = &get_cpu_var(memcg_stock); > > > > if (stock->cached != mem) { /* reset if necessary */ > > - drain_stock(stock); > > + spin_lock(&stock->drain_lock); > > + stock->under_drain = true; > > + drain_local_stock(NULL); > > stock->cached = mem; > > } > > stock->nr_pages += nr_pages; > > @@ -2179,17 +2186,27 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > struct mem_cgroup *mem; > > > > + /* > > + * make sure we are not waiting when somebody already drains > > + * the cache. > > + */ > > + if (!spin_trylock(&stock->drain_lock)) { > > + if (stock->under_drain) > > + continue; > > + spin_lock(&stock->drain_lock); > > + } > > mem = stock->cached; > > - if (!mem || !stock->nr_pages) > > + if (!mem || !stock->nr_pages || > > + !mem_cgroup_same_or_subtree(root_mem, mem)) { > > + spin_unlock(&stock->drain_lock); > > continue; > > - if (!mem_cgroup_same_or_subtree(root_mem, mem)) > > - continue; > > - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { > > - if (cpu == curcpu) > > - drain_local_stock(&stock->work); > > - else > > - schedule_work_on(cpu, &stock->work); > > } > > + > > + stock->under_drain = true; > > + if (cpu == curcpu) > > + drain_local_stock(&stock->work); > > + else > > + schedule_work_on(cpu, &stock->work); > > } > > > > if (!sync) > > @@ -2197,8 +2214,20 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) > > > > for_each_online_cpu(cpu) { > > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > - if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && > > - test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) > > + struct mem_cgroup *mem; > > + bool wait_for_drain = false; > > + > > + /* > > + * we have to be careful about parallel group destroying > > + * (mem_cgroup_destroy) which is derefered after sychronize_rcu > > + */ > > + rcu_read_lock(); > > + mem = stock->cached; > > + wait_for_drain = stock->under_drain && > > + mem && mem_cgroup_same_or_subtree(root_mem, mem); > > + rcu_read_unlock(); > > + > > + if (wait_for_drain) > > flush_work(&stock->work); > > } > > out: > > @@ -2278,8 +2307,12 @@ static int __cpuinit memcg_cpu_hotplug_callback(struct notifier_block *nb, > > for_each_mem_cgroup_all(iter) > > mem_cgroup_drain_pcp_counter(iter, cpu); > > > > - stock = &per_cpu(memcg_stock, cpu); > > - drain_stock(stock); > > + if (!spin_trylock(&stock->drain_lock)) { > > + if (stock->under_drain) > > + return NOTIFY_OK; > > + spin_lock(&stock->drain_lock); > > + } > > + drain_local_stock(NULL); > > return NOTIFY_OK; > > } > > > > @@ -5068,6 +5101,8 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > struct memcg_stock_pcp *stock = > > &per_cpu(memcg_stock, cpu); > > INIT_WORK(&stock->work, drain_local_stock); > > + stock->under_drain = false; > > + spin_lock_init(&stock->drain_lock); > > } > > hotcpu_notifier(memcg_cpu_hotplug_callback, 0); > > } else { > > -- > > 1.7.5.4 > > > > > > -- > > Michal Hocko > > SUSE Labs > > SUSE LINUX s.r.o. > > Lihovarska 1060/12 > > 190 00 Praha 9 > > Czech Republic > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 9:45 ` Michal Hocko @ 2011-08-09 9:53 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-09 9:53 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 11:45:03 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > On Tue, 9 Aug 2011 11:31:50 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > locking (it is acquired somewhere else than it is released which is > > > not). I will think about a nicer way how to do it. > > > Maybe I should also split the rcu part in a separate patch. > > > > > > What do you think? > > > > > > I'd like to revert 8521fc50 first and consider total design change > > rather than ad-hoc fix. > > Agreed. Revert should go into 3.0 stable as well. Although the global > mutex is buggy we have that behavior for a long time without any reports. > We should address it but it can wait for 3.2. > What "buggy" means here ? "problematic" or "cause OOps ?" > > Personally, I don't like to have spin-lock in per-cpu area. > > spinlock is not that different from what we already have with the bit > lock. maybe. The best is lockless style...but pointer in percpu cache is problem.. Thanks, -Kame ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-09 9:53 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-09 9:53 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 11:45:03 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > On Tue, 9 Aug 2011 11:31:50 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > locking (it is acquired somewhere else than it is released which is > > > not). I will think about a nicer way how to do it. > > > Maybe I should also split the rcu part in a separate patch. > > > > > > What do you think? > > > > > > I'd like to revert 8521fc50 first and consider total design change > > rather than ad-hoc fix. > > Agreed. Revert should go into 3.0 stable as well. Although the global > mutex is buggy we have that behavior for a long time without any reports. > We should address it but it can wait for 3.2. > What "buggy" means here ? "problematic" or "cause OOps ?" > > Personally, I don't like to have spin-lock in per-cpu area. > > spinlock is not that different from what we already have with the bit > lock. maybe. The best is lockless style...but pointer in percpu cache is problem.. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 9:53 ` KAMEZAWA Hiroyuki @ 2011-08-09 10:09 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 10:09 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > On Tue, 9 Aug 2011 11:45:03 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > locking (it is acquired somewhere else than it is released which is > > > > not). I will think about a nicer way how to do it. > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > What do you think? > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > rather than ad-hoc fix. > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > mutex is buggy we have that behavior for a long time without any reports. > > We should address it but it can wait for 3.2. I will send the revert request to Linus. > What "buggy" means here ? "problematic" or "cause OOps ?" I have described that in an earlier email. Consider pathological case when CPU0 wants to async. drain a memcg which has a lot of cached charges while CPU1 is already draining so it holds the mutex. CPU0 backs off so it has to reclaim although we could prevent from it by getting rid of cached charges. This is not critical though. > > > > Personally, I don't like to have spin-lock in per-cpu area. > > > > spinlock is not that different from what we already have with the bit > > lock. > > maybe. The best is lockless style...but pointer in percpu cache is problem.. > > Thanks, > -Kame -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-09 10:09 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 10:09 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > On Tue, 9 Aug 2011 11:45:03 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > locking (it is acquired somewhere else than it is released which is > > > > not). I will think about a nicer way how to do it. > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > What do you think? > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > rather than ad-hoc fix. > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > mutex is buggy we have that behavior for a long time without any reports. > > We should address it but it can wait for 3.2. I will send the revert request to Linus. > What "buggy" means here ? "problematic" or "cause OOps ?" I have described that in an earlier email. Consider pathological case when CPU0 wants to async. drain a memcg which has a lot of cached charges while CPU1 is already draining so it holds the mutex. CPU0 backs off so it has to reclaim although we could prevent from it by getting rid of cached charges. This is not critical though. > > > > Personally, I don't like to have spin-lock in per-cpu area. > > > > spinlock is not that different from what we already have with the bit > > lock. > > maybe. The best is lockless style...but pointer in percpu cache is problem.. > > Thanks, > -Kame -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 10:09 ` Michal Hocko @ 2011-08-09 10:07 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-09 10:07 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 12:09:44 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > > On Tue, 9 Aug 2011 11:45:03 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > > locking (it is acquired somewhere else than it is released which is > > > > > not). I will think about a nicer way how to do it. > > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > > > What do you think? > > > > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > > rather than ad-hoc fix. > > > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > > mutex is buggy we have that behavior for a long time without any reports. > > > We should address it but it can wait for 3.2. > > I will send the revert request to Linus. > > > What "buggy" means here ? "problematic" or "cause OOps ?" > > I have described that in an earlier email. Consider pathological case > when CPU0 wants to async. drain a memcg which has a lot of cached charges while > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has > to reclaim although we could prevent from it by getting rid of cached > charges. This is not critical though. > That problem should be fixed by background reclaim. I'll do it after fixing numascan. (and dirty-ratio problem...) Thanks, -Kame ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-09 10:07 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-09 10:07 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 12:09:44 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > > On Tue, 9 Aug 2011 11:45:03 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > > locking (it is acquired somewhere else than it is released which is > > > > > not). I will think about a nicer way how to do it. > > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > > > What do you think? > > > > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > > rather than ad-hoc fix. > > > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > > mutex is buggy we have that behavior for a long time without any reports. > > > We should address it but it can wait for 3.2. > > I will send the revert request to Linus. > > > What "buggy" means here ? "problematic" or "cause OOps ?" > > I have described that in an earlier email. Consider pathological case > when CPU0 wants to async. drain a memcg which has a lot of cached charges while > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has > to reclaim although we could prevent from it by getting rid of cached > charges. This is not critical though. > That problem should be fixed by background reclaim. I'll do it after fixing numascan. (and dirty-ratio problem...) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 10:07 ` KAMEZAWA Hiroyuki @ 2011-08-09 11:46 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 11:46 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue 09-08-11 19:07:25, KAMEZAWA Hiroyuki wrote: > On Tue, 9 Aug 2011 12:09:44 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > > > On Tue, 9 Aug 2011 11:45:03 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > > > locking (it is acquired somewhere else than it is released which is > > > > > > not). I will think about a nicer way how to do it. > > > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > > > rather than ad-hoc fix. > > > > > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > > > mutex is buggy we have that behavior for a long time without any reports. > > > > We should address it but it can wait for 3.2. > > > > I will send the revert request to Linus. > > > > > What "buggy" means here ? "problematic" or "cause OOps ?" > > > > I have described that in an earlier email. Consider pathological case > > when CPU0 wants to async. drain a memcg which has a lot of cached charges while > > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has > > to reclaim although we could prevent from it by getting rid of cached > > charges. This is not critical though. > > > > That problem should be fixed by background reclaim. How? Do you plan to rework locking or the charge caching completely? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-09 11:46 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-09 11:46 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue 09-08-11 19:07:25, KAMEZAWA Hiroyuki wrote: > On Tue, 9 Aug 2011 12:09:44 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > > > On Tue, 9 Aug 2011 11:45:03 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > > > locking (it is acquired somewhere else than it is released which is > > > > > > not). I will think about a nicer way how to do it. > > > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > > > rather than ad-hoc fix. > > > > > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > > > mutex is buggy we have that behavior for a long time without any reports. > > > > We should address it but it can wait for 3.2. > > > > I will send the revert request to Linus. > > > > > What "buggy" means here ? "problematic" or "cause OOps ?" > > > > I have described that in an earlier email. Consider pathological case > > when CPU0 wants to async. drain a memcg which has a lot of cached charges while > > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has > > to reclaim although we could prevent from it by getting rid of cached > > charges. This is not critical though. > > > > That problem should be fixed by background reclaim. How? Do you plan to rework locking or the charge caching completely? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 11:46 ` Michal Hocko @ 2011-08-09 23:54 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-09 23:54 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 13:46:42 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Tue 09-08-11 19:07:25, KAMEZAWA Hiroyuki wrote: > > On Tue, 9 Aug 2011 12:09:44 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > > > > On Tue, 9 Aug 2011 11:45:03 +0200 > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > > > > locking (it is acquired somewhere else than it is released which is > > > > > > > not). I will think about a nicer way how to do it. > > > > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > > > > rather than ad-hoc fix. > > > > > > > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > > > > mutex is buggy we have that behavior for a long time without any reports. > > > > > We should address it but it can wait for 3.2. > > > > > > I will send the revert request to Linus. > > > > > > > What "buggy" means here ? "problematic" or "cause OOps ?" > > > > > > I have described that in an earlier email. Consider pathological case > > > when CPU0 wants to async. drain a memcg which has a lot of cached charges while > > > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has > > > to reclaim although we could prevent from it by getting rid of cached > > > charges. This is not critical though. > > > > > > > That problem should be fixed by background reclaim. > > How? Do you plan to rework locking or the charge caching completely? > >From your description, the problem is not the lock itself but a task may go into _unnecessary_ direct-reclaim even if there are remaining chages on per-cpu stocks, which cause latency. In (all) my automatic background reclaim tests, no direct reclaim happens if background reclaim is enabled. And as I said before, we may be able to add a flag not to cache more. It's set by some condition ....as usage is near to the limit. Thanks, -Kame ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-09 23:54 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-08-09 23:54 UTC (permalink / raw) To: Michal Hocko Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Tue, 9 Aug 2011 13:46:42 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Tue 09-08-11 19:07:25, KAMEZAWA Hiroyuki wrote: > > On Tue, 9 Aug 2011 12:09:44 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > > > > On Tue, 9 Aug 2011 11:45:03 +0200 > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > > > > locking (it is acquired somewhere else than it is released which is > > > > > > > not). I will think about a nicer way how to do it. > > > > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > > > > rather than ad-hoc fix. > > > > > > > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > > > > mutex is buggy we have that behavior for a long time without any reports. > > > > > We should address it but it can wait for 3.2. > > > > > > I will send the revert request to Linus. > > > > > > > What "buggy" means here ? "problematic" or "cause OOps ?" > > > > > > I have described that in an earlier email. Consider pathological case > > > when CPU0 wants to async. drain a memcg which has a lot of cached charges while > > > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has > > > to reclaim although we could prevent from it by getting rid of cached > > > charges. This is not critical though. > > > > > > > That problem should be fixed by background reclaim. > > How? Do you plan to rework locking or the charge caching completely? > >From your description, the problem is not the lock itself but a task may go into _unnecessary_ direct-reclaim even if there are remaining chages on per-cpu stocks, which cause latency. In (all) my automatic background reclaim tests, no direct reclaim happens if background reclaim is enabled. And as I said before, we may be able to add a flag not to cache more. It's set by some condition ....as usage is near to the limit. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash 2011-08-09 23:54 ` KAMEZAWA Hiroyuki @ 2011-08-10 7:29 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-10 7:29 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Wed 10-08-11 08:54:37, KAMEZAWA Hiroyuki wrote: > On Tue, 9 Aug 2011 13:46:42 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Tue 09-08-11 19:07:25, KAMEZAWA Hiroyuki wrote: > > > On Tue, 9 Aug 2011 12:09:44 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > > > > > On Tue, 9 Aug 2011 11:45:03 +0200 > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > > > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > > > > > locking (it is acquired somewhere else than it is released which is > > > > > > > > not). I will think about a nicer way how to do it. > > > > > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > > > > > rather than ad-hoc fix. > > > > > > > > > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > > > > > mutex is buggy we have that behavior for a long time without any reports. > > > > > > We should address it but it can wait for 3.2. > > > > > > > > I will send the revert request to Linus. > > > > > > > > > What "buggy" means here ? "problematic" or "cause OOps ?" > > > > > > > > I have described that in an earlier email. Consider pathological case > > > > when CPU0 wants to async. drain a memcg which has a lot of cached charges while > > > > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has > > > > to reclaim although we could prevent from it by getting rid of cached > > > > charges. This is not critical though. > > > > > > > > > > That problem should be fixed by background reclaim. > > > > How? Do you plan to rework locking or the charge caching completely? > > > > From your description, the problem is not the lock itself but a task > may go into _unnecessary_ direct-reclaim even if there are remaining > chages on per-cpu stocks, which cause latency. The problem is partly the lock because it prevents from parallel async reclaimers. This is too restrictive. If we are going to rely on async draining we will have to above problem. > > In (all) my automatic background reclaim tests, no direct reclaim happens > if background reclaim is enabled. Yes, I haven't seen this problem yet but I guess that there is non 0 chance that there is a workload which triggers this. > And as I said before, we may be able to add a flag not to cache > more. It's set by some condition ....as usage is near to the limit. > > Thanks, > -Kame Thanks -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH RFC] memcg: fix drain_all_stock crash @ 2011-08-10 7:29 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-08-10 7:29 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Johannes Weiner, linux-mm, Balbir Singh, Andrew Morton, linux-kernel On Wed 10-08-11 08:54:37, KAMEZAWA Hiroyuki wrote: > On Tue, 9 Aug 2011 13:46:42 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Tue 09-08-11 19:07:25, KAMEZAWA Hiroyuki wrote: > > > On Tue, 9 Aug 2011 12:09:44 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > On Tue 09-08-11 18:53:13, KAMEZAWA Hiroyuki wrote: > > > > > On Tue, 9 Aug 2011 11:45:03 +0200 > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > On Tue 09-08-11 18:32:16, KAMEZAWA Hiroyuki wrote: > > > > > > > On Tue, 9 Aug 2011 11:31:50 +0200 > > > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > > > > > What do you think about the half backed patch bellow? I didn't manage to > > > > > > > > test it yet but I guess it should help. I hate asymmetry of drain_lock > > > > > > > > locking (it is acquired somewhere else than it is released which is > > > > > > > > not). I will think about a nicer way how to do it. > > > > > > > > Maybe I should also split the rcu part in a separate patch. > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > > > > I'd like to revert 8521fc50 first and consider total design change > > > > > > > rather than ad-hoc fix. > > > > > > > > > > > > Agreed. Revert should go into 3.0 stable as well. Although the global > > > > > > mutex is buggy we have that behavior for a long time without any reports. > > > > > > We should address it but it can wait for 3.2. > > > > > > > > I will send the revert request to Linus. > > > > > > > > > What "buggy" means here ? "problematic" or "cause OOps ?" > > > > > > > > I have described that in an earlier email. Consider pathological case > > > > when CPU0 wants to async. drain a memcg which has a lot of cached charges while > > > > CPU1 is already draining so it holds the mutex. CPU0 backs off so it has > > > > to reclaim although we could prevent from it by getting rid of cached > > > > charges. This is not critical though. > > > > > > > > > > That problem should be fixed by background reclaim. > > > > How? Do you plan to rework locking or the charge caching completely? > > > > From your description, the problem is not the lock itself but a task > may go into _unnecessary_ direct-reclaim even if there are remaining > chages on per-cpu stocks, which cause latency. The problem is partly the lock because it prevents from parallel async reclaimers. This is too restrictive. If we are going to rely on async draining we will have to above problem. > > In (all) my automatic background reclaim tests, no direct reclaim happens > if background reclaim is enabled. Yes, I haven't seen this problem yet but I guess that there is non 0 chance that there is a workload which triggers this. > And as I said before, we may be able to add a flag not to cache > more. It's set by some condition ....as usage is near to the limit. > > Thanks, > -Kame Thanks -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges @ 2011-07-21 9:41 Michal Hocko 2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko 0 siblings, 1 reply; 56+ messages in thread From: Michal Hocko @ 2011-07-21 9:41 UTC (permalink / raw) To: linux-mm; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-kernel Hi, this patchset cleans up per-cpu charge cache draining code and it fixes an issue where we could reclaim from a group even though there are caches with charges on other CPUs that can be used. Although the problem is far from being critical it can bite us on large machines with many CPUs. I am sorry that I am mixing those two things but the fix depends on the work done in the clean up patches so I hope it won't be confusing. The reason is that I needed a sane implementation of sync draining code and wanted to prevent from code duplication. First two patches should be quite straightforward. Checking stock->nr_pages is more general than excluding just the local CPU and having targeted sync draining also makes a good sense to me. The third one might require some discussion. AFAIU it should be safe but others might see some issues. Anyway I have no issues to drop it because the fix doesn't depend on it. I have put it before the fix just because I wanted to have all cleanups in front. Finally the fourth patch is the already mentioned fix. I do not think I have ever seen any sane application (aka not artificially created usecase) where we would trigger the behavior in a such way that the performance would hurt or something similar but I have already seen a pointless reclaim while we had caches on other CPUs. As the number of CPUs grow I think the change makes quite a good sense. The patchset is on top of the current Linus tree but it should apply on top of the current mmotm tree as well. Any thoughts comments? Michal Hocko (4): memcg: do not try to drain per-cpu caches without pages memcg: unify sync and async per-cpu charge cache draining memcg: get rid of percpu_charge_mutex lock memcg: prevent from reclaiming if there are per-cpu cached charges mm/memcontrol.c | 73 +++++++++++++++++++++++++++++++------------------------ 1 files changed, 41 insertions(+), 32 deletions(-) -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-21 9:41 [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges Michal Hocko @ 2011-07-21 7:38 ` Michal Hocko 2011-07-21 10:12 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 56+ messages in thread From: Michal Hocko @ 2011-07-21 7:38 UTC (permalink / raw) To: linux-mm; +Cc: Balbir Singh, Daisuke Nishimura, KAMEZAWA Hiroyuki, linux-kernel drain_all_stock_async tries to optimize a work to be done on the work queue by excluding any work for the current CPU because it assumes that the context we are called from already tried to charge from that cache and it's failed so it must be empty already. While the assumption is correct we can do it by checking the current number of pages in the cache. This will also reduce a work on other CPUs with an empty stock. Signed-off-by: Michal Hocko <mhocko@suse.cz> --- mm/memcontrol.c | 14 ++------------ 1 files changed, 2 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f11f198..786bffb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2140,7 +2140,7 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) */ static void drain_all_stock_async(struct mem_cgroup *root_mem) { - int cpu, curcpu; + int cpu; /* * If someone calls draining, avoid adding more kworker runs. */ @@ -2148,22 +2148,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem) return; /* Notify other cpus that system-wide "drain" is running */ get_online_cpus(); - /* - * Get a hint for avoiding draining charges on the current cpu, - * which must be exhausted by our charging. It is not required that - * this be a precise check, so we use raw_smp_processor_id() instead of - * getcpu()/putcpu(). - */ - curcpu = raw_smp_processor_id(); for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; - if (cpu == curcpu) - continue; - mem = stock->cached; - if (!mem) + if (!mem || !stock->nr_pages) continue; if (mem != root_mem) { if (!root_mem->use_hierarchy) -- 1.7.5.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko @ 2011-07-21 10:12 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-21 10:12 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Thu, 21 Jul 2011 09:38:00 +0200 Michal Hocko <mhocko@suse.cz> wrote: > drain_all_stock_async tries to optimize a work to be done on the work > queue by excluding any work for the current CPU because it assumes that > the context we are called from already tried to charge from that cache > and it's failed so it must be empty already. > While the assumption is correct we can do it by checking the current > number of pages in the cache. This will also reduce a work on other CPUs > with an empty stock. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> At the first look, when a charge against TransParentHugepage() goes into the reclaim routine, stock->nr_pages != 0 and this will call additional kworker. nr_pages check itself seems good. Thanks, -Kame > --- > mm/memcontrol.c | 14 ++------------ > 1 files changed, 2 insertions(+), 12 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f11f198..786bffb 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2140,7 +2140,7 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) > */ > static void drain_all_stock_async(struct mem_cgroup *root_mem) > { > - int cpu, curcpu; > + int cpu; > /* > * If someone calls draining, avoid adding more kworker runs. > */ > @@ -2148,22 +2148,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem) > return; > /* Notify other cpus that system-wide "drain" is running */ > get_online_cpus(); > - /* > - * Get a hint for avoiding draining charges on the current cpu, > - * which must be exhausted by our charging. It is not required that > - * this be a precise check, so we use raw_smp_processor_id() instead of > - * getcpu()/putcpu(). > - */ > - curcpu = raw_smp_processor_id(); > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > > - if (cpu == curcpu) > - continue; > - > mem = stock->cached; > - if (!mem) > + if (!mem || !stock->nr_pages) > continue; > if (mem != root_mem) { > if (!root_mem->use_hierarchy) > -- > 1.7.5.4 > > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages @ 2011-07-21 10:12 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-21 10:12 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Thu, 21 Jul 2011 09:38:00 +0200 Michal Hocko <mhocko@suse.cz> wrote: > drain_all_stock_async tries to optimize a work to be done on the work > queue by excluding any work for the current CPU because it assumes that > the context we are called from already tried to charge from that cache > and it's failed so it must be empty already. > While the assumption is correct we can do it by checking the current > number of pages in the cache. This will also reduce a work on other CPUs > with an empty stock. > > Signed-off-by: Michal Hocko <mhocko@suse.cz> At the first look, when a charge against TransParentHugepage() goes into the reclaim routine, stock->nr_pages != 0 and this will call additional kworker. nr_pages check itself seems good. Thanks, -Kame > --- > mm/memcontrol.c | 14 ++------------ > 1 files changed, 2 insertions(+), 12 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index f11f198..786bffb 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2140,7 +2140,7 @@ static void refill_stock(struct mem_cgroup *mem, unsigned int nr_pages) > */ > static void drain_all_stock_async(struct mem_cgroup *root_mem) > { > - int cpu, curcpu; > + int cpu; > /* > * If someone calls draining, avoid adding more kworker runs. > */ > @@ -2148,22 +2148,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem) > return; > /* Notify other cpus that system-wide "drain" is running */ > get_online_cpus(); > - /* > - * Get a hint for avoiding draining charges on the current cpu, > - * which must be exhausted by our charging. It is not required that > - * this be a precise check, so we use raw_smp_processor_id() instead of > - * getcpu()/putcpu(). > - */ > - curcpu = raw_smp_processor_id(); > for_each_online_cpu(cpu) { > struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > struct mem_cgroup *mem; > > - if (cpu == curcpu) > - continue; > - > mem = stock->cached; > - if (!mem) > + if (!mem || !stock->nr_pages) > continue; > if (mem != root_mem) { > if (!root_mem->use_hierarchy) > -- > 1.7.5.4 > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-21 10:12 ` KAMEZAWA Hiroyuki @ 2011-07-21 11:36 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-21 11:36 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > On Thu, 21 Jul 2011 09:38:00 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > drain_all_stock_async tries to optimize a work to be done on the work > > queue by excluding any work for the current CPU because it assumes that > > the context we are called from already tried to charge from that cache > > and it's failed so it must be empty already. > > While the assumption is correct we can do it by checking the current > > number of pages in the cache. This will also reduce a work on other CPUs > > with an empty stock. > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > At the first look, when a charge against TransParentHugepage() goes > into the reclaim routine, stock->nr_pages != 0 and this will > call additional kworker. True. We will drain a charge which could be used by other allocations in the meantime so we have a good chance to reclaim less. But how big problem is that? I mean I can add a new parameter that would force checking the current cpu but it doesn't look nice. I cannot add that condition unconditionally because the code will be shared with the sync path in the next patch and that one needs to drain _all_ cpus. What would you suggest? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages @ 2011-07-21 11:36 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-21 11:36 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > On Thu, 21 Jul 2011 09:38:00 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > drain_all_stock_async tries to optimize a work to be done on the work > > queue by excluding any work for the current CPU because it assumes that > > the context we are called from already tried to charge from that cache > > and it's failed so it must be empty already. > > While the assumption is correct we can do it by checking the current > > number of pages in the cache. This will also reduce a work on other CPUs > > with an empty stock. > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > At the first look, when a charge against TransParentHugepage() goes > into the reclaim routine, stock->nr_pages != 0 and this will > call additional kworker. True. We will drain a charge which could be used by other allocations in the meantime so we have a good chance to reclaim less. But how big problem is that? I mean I can add a new parameter that would force checking the current cpu but it doesn't look nice. I cannot add that condition unconditionally because the code will be shared with the sync path in the next patch and that one needs to drain _all_ cpus. What would you suggest? -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-21 11:36 ` Michal Hocko @ 2011-07-21 23:44 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-21 23:44 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Thu, 21 Jul 2011 13:36:06 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > > On Thu, 21 Jul 2011 09:38:00 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > drain_all_stock_async tries to optimize a work to be done on the work > > > queue by excluding any work for the current CPU because it assumes that > > > the context we are called from already tried to charge from that cache > > > and it's failed so it must be empty already. > > > While the assumption is correct we can do it by checking the current > > > number of pages in the cache. This will also reduce a work on other CPUs > > > with an empty stock. > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > > > > At the first look, when a charge against TransParentHugepage() goes > > into the reclaim routine, stock->nr_pages != 0 and this will > > call additional kworker. > > True. We will drain a charge which could be used by other allocations > in the meantime so we have a good chance to reclaim less. But how big > problem is that? > I mean I can add a new parameter that would force checking the current > cpu but it doesn't look nice. I cannot add that condition > unconditionally because the code will be shared with the sync path in > the next patch and that one needs to drain _all_ cpus. > > What would you suggest? By 2 methods - just check nr_pages. - drain "local stock" without calling schedule_work(). It's fast. Thanks, -Kame ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages @ 2011-07-21 23:44 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-21 23:44 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Thu, 21 Jul 2011 13:36:06 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > > On Thu, 21 Jul 2011 09:38:00 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > drain_all_stock_async tries to optimize a work to be done on the work > > > queue by excluding any work for the current CPU because it assumes that > > > the context we are called from already tried to charge from that cache > > > and it's failed so it must be empty already. > > > While the assumption is correct we can do it by checking the current > > > number of pages in the cache. This will also reduce a work on other CPUs > > > with an empty stock. > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > > > > At the first look, when a charge against TransParentHugepage() goes > > into the reclaim routine, stock->nr_pages != 0 and this will > > call additional kworker. > > True. We will drain a charge which could be used by other allocations > in the meantime so we have a good chance to reclaim less. But how big > problem is that? > I mean I can add a new parameter that would force checking the current > cpu but it doesn't look nice. I cannot add that condition > unconditionally because the code will be shared with the sync path in > the next patch and that one needs to drain _all_ cpus. > > What would you suggest? By 2 methods - just check nr_pages. - drain "local stock" without calling schedule_work(). It's fast. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-21 23:44 ` KAMEZAWA Hiroyuki @ 2011-07-22 9:19 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 9:19 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote: > On Thu, 21 Jul 2011 13:36:06 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > > > On Thu, 21 Jul 2011 09:38:00 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > drain_all_stock_async tries to optimize a work to be done on the work > > > > queue by excluding any work for the current CPU because it assumes that > > > > the context we are called from already tried to charge from that cache > > > > and it's failed so it must be empty already. > > > > While the assumption is correct we can do it by checking the current > > > > number of pages in the cache. This will also reduce a work on other CPUs > > > > with an empty stock. > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > > > > > > > At the first look, when a charge against TransParentHugepage() goes > > > into the reclaim routine, stock->nr_pages != 0 and this will > > > call additional kworker. > > > > True. We will drain a charge which could be used by other allocations > > in the meantime so we have a good chance to reclaim less. But how big > > problem is that? > > I mean I can add a new parameter that would force checking the current > > cpu but it doesn't look nice. I cannot add that condition > > unconditionally because the code will be shared with the sync path in > > the next patch and that one needs to drain _all_ cpus. > > > > What would you suggest? > By 2 methods > > - just check nr_pages. Not sure I understand which nr_pages you mean. The one that comes from the charging path or stock->nr_pages? If you mean the first one then we do not have in the reclaim path where we call drain_all_stock_async. > - drain "local stock" without calling schedule_work(). It's fast. but there is nothing to be drained locally in the paths where we call drain_all_stock_async... Or do you mean that drain_all_stock shouldn't use work queue at all? > > Thanks, > -Kame -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages @ 2011-07-22 9:19 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 9:19 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote: > On Thu, 21 Jul 2011 13:36:06 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > > > On Thu, 21 Jul 2011 09:38:00 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > drain_all_stock_async tries to optimize a work to be done on the work > > > > queue by excluding any work for the current CPU because it assumes that > > > > the context we are called from already tried to charge from that cache > > > > and it's failed so it must be empty already. > > > > While the assumption is correct we can do it by checking the current > > > > number of pages in the cache. This will also reduce a work on other CPUs > > > > with an empty stock. > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > > > > > > > At the first look, when a charge against TransParentHugepage() goes > > > into the reclaim routine, stock->nr_pages != 0 and this will > > > call additional kworker. > > > > True. We will drain a charge which could be used by other allocations > > in the meantime so we have a good chance to reclaim less. But how big > > problem is that? > > I mean I can add a new parameter that would force checking the current > > cpu but it doesn't look nice. I cannot add that condition > > unconditionally because the code will be shared with the sync path in > > the next patch and that one needs to drain _all_ cpus. > > > > What would you suggest? > By 2 methods > > - just check nr_pages. Not sure I understand which nr_pages you mean. The one that comes from the charging path or stock->nr_pages? If you mean the first one then we do not have in the reclaim path where we call drain_all_stock_async. > - drain "local stock" without calling schedule_work(). It's fast. but there is nothing to be drained locally in the paths where we call drain_all_stock_async... Or do you mean that drain_all_stock shouldn't use work queue at all? > > Thanks, > -Kame -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-22 9:19 ` Michal Hocko @ 2011-07-22 9:28 ` KAMEZAWA Hiroyuki -1 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-22 9:28 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Fri, 22 Jul 2011 11:19:36 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote: > > On Thu, 21 Jul 2011 13:36:06 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > > > > On Thu, 21 Jul 2011 09:38:00 +0200 > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > drain_all_stock_async tries to optimize a work to be done on the work > > > > > queue by excluding any work for the current CPU because it assumes that > > > > > the context we are called from already tried to charge from that cache > > > > > and it's failed so it must be empty already. > > > > > While the assumption is correct we can do it by checking the current > > > > > number of pages in the cache. This will also reduce a work on other CPUs > > > > > with an empty stock. > > > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > > > > > > > > > > At the first look, when a charge against TransParentHugepage() goes > > > > into the reclaim routine, stock->nr_pages != 0 and this will > > > > call additional kworker. > > > > > > True. We will drain a charge which could be used by other allocations > > > in the meantime so we have a good chance to reclaim less. But how big > > > problem is that? > > > I mean I can add a new parameter that would force checking the current > > > cpu but it doesn't look nice. I cannot add that condition > > > unconditionally because the code will be shared with the sync path in > > > the next patch and that one needs to drain _all_ cpus. > > > > > > What would you suggest? > > By 2 methods > > > > - just check nr_pages. > > Not sure I understand which nr_pages you mean. The one that comes from > the charging path or stock->nr_pages? > If you mean the first one then we do not have in the reclaim path where > we call drain_all_stock_async. > stock->nr_pages. > > - drain "local stock" without calling schedule_work(). It's fast. > > but there is nothing to be drained locally in the paths where we call > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't > use work queue at all? I mean calling schedule_work against local cpu is just waste of time. Then, drain it directly and move local cpu's stock->nr_pages to res_counter. Thanks, -Kame ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages @ 2011-07-22 9:28 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 56+ messages in thread From: KAMEZAWA Hiroyuki @ 2011-07-22 9:28 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Fri, 22 Jul 2011 11:19:36 +0200 Michal Hocko <mhocko@suse.cz> wrote: > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote: > > On Thu, 21 Jul 2011 13:36:06 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > > > > On Thu, 21 Jul 2011 09:38:00 +0200 > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > drain_all_stock_async tries to optimize a work to be done on the work > > > > > queue by excluding any work for the current CPU because it assumes that > > > > > the context we are called from already tried to charge from that cache > > > > > and it's failed so it must be empty already. > > > > > While the assumption is correct we can do it by checking the current > > > > > number of pages in the cache. This will also reduce a work on other CPUs > > > > > with an empty stock. > > > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > > > > > > > > > > At the first look, when a charge against TransParentHugepage() goes > > > > into the reclaim routine, stock->nr_pages != 0 and this will > > > > call additional kworker. > > > > > > True. We will drain a charge which could be used by other allocations > > > in the meantime so we have a good chance to reclaim less. But how big > > > problem is that? > > > I mean I can add a new parameter that would force checking the current > > > cpu but it doesn't look nice. I cannot add that condition > > > unconditionally because the code will be shared with the sync path in > > > the next patch and that one needs to drain _all_ cpus. > > > > > > What would you suggest? > > By 2 methods > > > > - just check nr_pages. > > Not sure I understand which nr_pages you mean. The one that comes from > the charging path or stock->nr_pages? > If you mean the first one then we do not have in the reclaim path where > we call drain_all_stock_async. > stock->nr_pages. > > - drain "local stock" without calling schedule_work(). It's fast. > > but there is nothing to be drained locally in the paths where we call > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't > use work queue at all? I mean calling schedule_work against local cpu is just waste of time. Then, drain it directly and move local cpu's stock->nr_pages to res_counter. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-22 9:28 ` KAMEZAWA Hiroyuki @ 2011-07-22 9:58 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 9:58 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Fri 22-07-11 18:28:22, KAMEZAWA Hiroyuki wrote: > On Fri, 22 Jul 2011 11:19:36 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote: > > > On Thu, 21 Jul 2011 13:36:06 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > > > > > On Thu, 21 Jul 2011 09:38:00 +0200 > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > drain_all_stock_async tries to optimize a work to be done on the work > > > > > > queue by excluding any work for the current CPU because it assumes that > > > > > > the context we are called from already tried to charge from that cache > > > > > > and it's failed so it must be empty already. > > > > > > While the assumption is correct we can do it by checking the current > > > > > > number of pages in the cache. This will also reduce a work on other CPUs > > > > > > with an empty stock. > > > > > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > > > > > > > > > > > > > At the first look, when a charge against TransParentHugepage() goes > > > > > into the reclaim routine, stock->nr_pages != 0 and this will > > > > > call additional kworker. > > > > > > > > True. We will drain a charge which could be used by other allocations > > > > in the meantime so we have a good chance to reclaim less. But how big > > > > problem is that? > > > > I mean I can add a new parameter that would force checking the current > > > > cpu but it doesn't look nice. I cannot add that condition > > > > unconditionally because the code will be shared with the sync path in > > > > the next patch and that one needs to drain _all_ cpus. > > > > > > > > What would you suggest? > > > By 2 methods > > > > > > - just check nr_pages. > > > > Not sure I understand which nr_pages you mean. The one that comes from > > the charging path or stock->nr_pages? > > If you mean the first one then we do not have in the reclaim path where > > we call drain_all_stock_async. > > > > stock->nr_pages. > > > > - drain "local stock" without calling schedule_work(). It's fast. > > > > but there is nothing to be drained locally in the paths where we call > > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't > > use work queue at all? > > I mean calling schedule_work against local cpu is just waste of time. > Then, drain it directly and move local cpu's stock->nr_pages to res_counter. got it. Thanks for clarification. Will repost the updated version. > Thanks, > -Kame -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages @ 2011-07-22 9:58 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 9:58 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Fri 22-07-11 18:28:22, KAMEZAWA Hiroyuki wrote: > On Fri, 22 Jul 2011 11:19:36 +0200 > Michal Hocko <mhocko@suse.cz> wrote: > > > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote: > > > On Thu, 21 Jul 2011 13:36:06 +0200 > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > On Thu 21-07-11 19:12:50, KAMEZAWA Hiroyuki wrote: > > > > > On Thu, 21 Jul 2011 09:38:00 +0200 > > > > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > > > > > > > drain_all_stock_async tries to optimize a work to be done on the work > > > > > > queue by excluding any work for the current CPU because it assumes that > > > > > > the context we are called from already tried to charge from that cache > > > > > > and it's failed so it must be empty already. > > > > > > While the assumption is correct we can do it by checking the current > > > > > > number of pages in the cache. This will also reduce a work on other CPUs > > > > > > with an empty stock. > > > > > > > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.cz> > > > > > > > > > > > > > > > At the first look, when a charge against TransParentHugepage() goes > > > > > into the reclaim routine, stock->nr_pages != 0 and this will > > > > > call additional kworker. > > > > > > > > True. We will drain a charge which could be used by other allocations > > > > in the meantime so we have a good chance to reclaim less. But how big > > > > problem is that? > > > > I mean I can add a new parameter that would force checking the current > > > > cpu but it doesn't look nice. I cannot add that condition > > > > unconditionally because the code will be shared with the sync path in > > > > the next patch and that one needs to drain _all_ cpus. > > > > > > > > What would you suggest? > > > By 2 methods > > > > > > - just check nr_pages. > > > > Not sure I understand which nr_pages you mean. The one that comes from > > the charging path or stock->nr_pages? > > If you mean the first one then we do not have in the reclaim path where > > we call drain_all_stock_async. > > > > stock->nr_pages. > > > > - drain "local stock" without calling schedule_work(). It's fast. > > > > but there is nothing to be drained locally in the paths where we call > > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't > > use work queue at all? > > I mean calling schedule_work against local cpu is just waste of time. > Then, drain it directly and move local cpu's stock->nr_pages to res_counter. got it. Thanks for clarification. Will repost the updated version. > Thanks, > -Kame -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages 2011-07-22 9:58 ` Michal Hocko @ 2011-07-22 10:23 ` Michal Hocko -1 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 10:23 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Fri 22-07-11 11:58:15, Michal Hocko wrote: > On Fri 22-07-11 18:28:22, KAMEZAWA Hiroyuki wrote: > > On Fri, 22 Jul 2011 11:19:36 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote: > > > > On Thu, 21 Jul 2011 13:36:06 +0200 > > > > By 2 methods > > > > > > > > - just check nr_pages. > > > > > > Not sure I understand which nr_pages you mean. The one that comes from > > > the charging path or stock->nr_pages? > > > If you mean the first one then we do not have in the reclaim path where > > > we call drain_all_stock_async. > > > > > > > stock->nr_pages. > > > > > > - drain "local stock" without calling schedule_work(). It's fast. > > > > > > but there is nothing to be drained locally in the paths where we call > > > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't > > > use work queue at all? > > > > I mean calling schedule_work against local cpu is just waste of time. > > Then, drain it directly and move local cpu's stock->nr_pages to res_counter. > > got it. Thanks for clarification. Will repost the updated version. --- >From 2f17df54db6661c39a05669d08a9e6257435b898 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.cz> Date: Thu, 21 Jul 2011 09:38:00 +0200 Subject: [PATCH] memcg: do not try to drain per-cpu caches without pages drain_all_stock_async tries to optimize a work to be done on the work queue by excluding any work for the current CPU because it assumes that the context we are called from already tried to charge from that cache and it's failed so it must be empty already. While the assumption is correct we can optimize it even more by checking the current number of pages in the cache. This will also reduce a work on other CPUs with an empty stock. For the current CPU we can simply call drain_local_stock rather than deferring it to the work queue. [KAMEZAWA Hiroyuki - use drain_local_stock for current CPU optimization] Signed-off-by: Michal Hocko <mhocko@suse.cz> --- mm/memcontrol.c | 13 +++++++------ 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f11f198..c012ffe 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2159,11 +2159,8 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem) struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *mem; - if (cpu == curcpu) - continue; - mem = stock->cached; - if (!mem) + if (!mem || !stock->nr_pages) continue; if (mem != root_mem) { if (!root_mem->use_hierarchy) @@ -2172,8 +2169,12 @@ static void drain_all_stock_async(struct mem_cgroup *root_mem) if (!css_is_ancestor(&mem->css, &root_mem->css)) continue; } - if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) - schedule_work_on(cpu, &stock->work); + if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { + if (cpu == curcpu) + drain_local_stock(&stock->work); + else + schedule_work_on(cpu, &stock->work); + } } put_online_cpus(); mutex_unlock(&percpu_charge_mutex); -- 1.7.5.4 -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages @ 2011-07-22 10:23 ` Michal Hocko 0 siblings, 0 replies; 56+ messages in thread From: Michal Hocko @ 2011-07-22 10:23 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Balbir Singh, Daisuke Nishimura, linux-kernel On Fri 22-07-11 11:58:15, Michal Hocko wrote: > On Fri 22-07-11 18:28:22, KAMEZAWA Hiroyuki wrote: > > On Fri, 22 Jul 2011 11:19:36 +0200 > > Michal Hocko <mhocko@suse.cz> wrote: > > > > > On Fri 22-07-11 08:44:13, KAMEZAWA Hiroyuki wrote: > > > > On Thu, 21 Jul 2011 13:36:06 +0200 > > > > By 2 methods > > > > > > > > - just check nr_pages. > > > > > > Not sure I understand which nr_pages you mean. The one that comes from > > > the charging path or stock->nr_pages? > > > If you mean the first one then we do not have in the reclaim path where > > > we call drain_all_stock_async. > > > > > > > stock->nr_pages. > > > > > > - drain "local stock" without calling schedule_work(). It's fast. > > > > > > but there is nothing to be drained locally in the paths where we call > > > drain_all_stock_async... Or do you mean that drain_all_stock shouldn't > > > use work queue at all? > > > > I mean calling schedule_work against local cpu is just waste of time. > > Then, drain it directly and move local cpu's stock->nr_pages to res_counter. > > got it. Thanks for clarification. Will repost the updated version. --- ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2011-08-18 5:56 UTC | newest] Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-07-22 12:43 [PATCH 0/4 v2] memcg: cleanup per-cpu charge caches Michal Hocko 2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko 2011-07-25 1:16 ` KAMEZAWA Hiroyuki 2011-07-25 1:16 ` KAMEZAWA Hiroyuki 2011-08-17 19:49 ` [patch] memcg: pin execution to current cpu while draining stock Johannes Weiner 2011-08-17 19:49 ` Johannes Weiner 2011-08-18 0:24 ` KAMEZAWA Hiroyuki 2011-08-18 0:24 ` KAMEZAWA Hiroyuki 2011-08-18 5:56 ` Michal Hocko 2011-08-18 5:56 ` Michal Hocko 2011-07-22 10:24 ` [PATCH 2/4] memcg: unify sync and async per-cpu charge cache draining Michal Hocko 2011-07-22 10:27 ` [PATCH 3/4] memcg: add mem_cgroup_same_or_subtree helper Michal Hocko 2011-07-22 11:20 ` [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock Michal Hocko 2011-07-25 1:18 ` KAMEZAWA Hiroyuki 2011-07-25 1:18 ` KAMEZAWA Hiroyuki 2011-08-08 18:47 ` Johannes Weiner 2011-08-08 18:47 ` Johannes Weiner 2011-08-08 21:47 ` Michal Hocko 2011-08-08 21:47 ` Michal Hocko 2011-08-08 23:19 ` Johannes Weiner 2011-08-08 23:19 ` Johannes Weiner 2011-08-09 7:26 ` Michal Hocko 2011-08-09 7:26 ` Michal Hocko 2011-08-09 9:31 ` [PATCH RFC] memcg: fix drain_all_stock crash Michal Hocko 2011-08-09 9:31 ` Michal Hocko 2011-08-09 9:32 ` KAMEZAWA Hiroyuki 2011-08-09 9:32 ` KAMEZAWA Hiroyuki 2011-08-09 9:45 ` Michal Hocko 2011-08-09 9:45 ` Michal Hocko 2011-08-09 9:53 ` KAMEZAWA Hiroyuki 2011-08-09 9:53 ` KAMEZAWA Hiroyuki 2011-08-09 10:09 ` Michal Hocko 2011-08-09 10:09 ` Michal Hocko 2011-08-09 10:07 ` KAMEZAWA Hiroyuki 2011-08-09 10:07 ` KAMEZAWA Hiroyuki 2011-08-09 11:46 ` Michal Hocko 2011-08-09 11:46 ` Michal Hocko 2011-08-09 23:54 ` KAMEZAWA Hiroyuki 2011-08-09 23:54 ` KAMEZAWA Hiroyuki 2011-08-10 7:29 ` Michal Hocko 2011-08-10 7:29 ` Michal Hocko -- strict thread matches above, loose matches on Subject: below -- 2011-07-21 9:41 [PATCH 0/4] memcg: cleanup per-cpu charge caches + fix unnecessary reclaim if there are still cached charges Michal Hocko 2011-07-21 7:38 ` [PATCH 1/4] memcg: do not try to drain per-cpu caches without pages Michal Hocko 2011-07-21 10:12 ` KAMEZAWA Hiroyuki 2011-07-21 10:12 ` KAMEZAWA Hiroyuki 2011-07-21 11:36 ` Michal Hocko 2011-07-21 11:36 ` Michal Hocko 2011-07-21 23:44 ` KAMEZAWA Hiroyuki 2011-07-21 23:44 ` KAMEZAWA Hiroyuki 2011-07-22 9:19 ` Michal Hocko 2011-07-22 9:19 ` Michal Hocko 2011-07-22 9:28 ` KAMEZAWA Hiroyuki 2011-07-22 9:28 ` KAMEZAWA Hiroyuki 2011-07-22 9:58 ` Michal Hocko 2011-07-22 9:58 ` Michal Hocko 2011-07-22 10:23 ` Michal Hocko 2011-07-22 10:23 ` Michal Hocko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.