All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Fri, 8 Feb 2013 14:33:18 -0500	[thread overview]
Message-ID: <20130208193318.GA15951@cmpxchg.org> (raw)
In-Reply-To: <1357235661-29564-5-git-send-email-mhocko@suse.cz>

On Thu, Jan 03, 2013 at 06:54:18PM +0100, Michal Hocko wrote:
> Now that per-node-zone-priority iterator caches memory cgroups rather
> than their css ids we have to be careful and remove them from the
> iterator when they are on the way out otherwise they might hang for
> unbounded amount of time (until the global/targeted reclaim triggers the
> zone under priority to find out the group is dead and let it to find the
> final rest).
> 
> This is solved by hooking into mem_cgroup_css_offline and checking all
> per-node-zone-priority iterators up the way to the root cgroup. If the
> current memcg is found in the respective iter->last_visited then it is
> replaced by the previous one in the same sub-hierarchy.
> 
> This guarantees that no group gets more reclaiming than necessary and
> the next iteration will continue without noticing that the removed group
> has disappeared.
> 
> Spotted-by: Ying Han <yinghan@google.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/memcontrol.c |   89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e9f5c47..4f81abd 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6375,10 +6375,99 @@ free_out:
>  	return ERR_PTR(error);
>  }
>  
> +/*
> + * Helper to find memcg's previous group under the given root
> + * hierarchy.
> + */
> +struct mem_cgroup *__find_prev_memcg(struct mem_cgroup *root,
> +		struct mem_cgroup *memcg)
> +{
> +	struct cgroup *memcg_cgroup = memcg->css.cgroup;
> +	struct cgroup *root_cgroup = root->css.cgroup;
> +	struct cgroup *prev_cgroup = NULL;
> +	struct cgroup *iter;
> +
> +	cgroup_for_each_descendant_pre(iter, root_cgroup) {
> +		if (iter == memcg_cgroup)
> +			break;
> +		prev_cgroup = iter;
> +	}
> +
> +	return (prev_cgroup) ? mem_cgroup_from_cont(prev_cgroup) : NULL;
> +}
> +
> +/*
> + * Remove the given memcg under given root from all per-node per-zone
> + * per-priority chached iterators.
> + */
> +static void mem_cgroup_uncache_reclaim_iters(struct mem_cgroup *root,
> +		struct mem_cgroup *memcg)
> +{
> +	int node;
> +
> +	for_each_node(node) {
> +		struct mem_cgroup_per_node *pn = root->info.nodeinfo[node];
> +		int zone;
> +
> +		for (zone = 0; zone < MAX_NR_ZONES; zone++) {
> +			struct mem_cgroup_per_zone *mz;
> +			int prio;
> +
> +			mz = &pn->zoneinfo[zone];
> +			for (prio = 0; prio < DEF_PRIORITY + 1; prio++) {
> +				struct mem_cgroup_reclaim_iter *iter;
> +
> +				/*
> +				 * Just drop the reference on the removed memcg
> +				 * cached last_visited. No need to lock iter as
> +				 * the memcg is on the way out and cannot be
> +				 * reclaimed.
> +				 */
> +				iter = &mz->reclaim_iter[prio];
> +				if (root == memcg) {
> +					if (iter->last_visited)
> +						css_put(&iter->last_visited->css);
> +					continue;
> +				}
> +
> +				rcu_read_lock();
> +				spin_lock(&iter->iter_lock);
> +				if (iter->last_visited == memcg) {
> +					iter->last_visited = __find_prev_memcg(
> +							root, memcg);
> +					css_put(&memcg->css);
> +				}
> +				spin_unlock(&iter->iter_lock);
> +				rcu_read_unlock();
> +			}
> +		}
> +	}
> +}
> +
> +/*
> + * Remove the given memcg from all cached reclaim iterators.
> + */
> +static void mem_cgroup_uncache_from_reclaim(struct mem_cgroup *memcg)
> +{
> +	struct mem_cgroup *parent = memcg;
> +
> +	do {
> +		mem_cgroup_uncache_reclaim_iters(parent, memcg);
> +	} while ((parent = parent_mem_cgroup(parent)));
> +
> +	/*
> +	 * if the root memcg is not hierarchical we have to check it
> +	 * explicitely.
> +	 */
> +	if (!root_mem_cgroup->use_hierarchy)
> +		mem_cgroup_uncache_reclaim_iters(root_mem_cgroup, memcg);
> +}

for each in hierarchy:
  for each node:
    for each zone:
      for each reclaim priority:

every time a cgroup is destroyed.  I don't think such a hammer is
justified in general, let alone for consolidating code a little.

Can we invalidate the position cache lazily?  Have a global "cgroup
destruction" counter and store a snapshot of that counter whenever we
put a cgroup pointer in the position cache.  We only use the cached
pointer if that counter has not changed in the meantime, so we know
that the cgroup still exists.

It is pretty pretty imprecise and we invalidate the whole cache every
time a cgroup is destroyed, but I think that should be okay.  If not,
better ideas are welcome.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Fri, 8 Feb 2013 14:33:18 -0500	[thread overview]
Message-ID: <20130208193318.GA15951@cmpxchg.org> (raw)
In-Reply-To: <1357235661-29564-5-git-send-email-mhocko@suse.cz>

On Thu, Jan 03, 2013 at 06:54:18PM +0100, Michal Hocko wrote:
> Now that per-node-zone-priority iterator caches memory cgroups rather
> than their css ids we have to be careful and remove them from the
> iterator when they are on the way out otherwise they might hang for
> unbounded amount of time (until the global/targeted reclaim triggers the
> zone under priority to find out the group is dead and let it to find the
> final rest).
> 
> This is solved by hooking into mem_cgroup_css_offline and checking all
> per-node-zone-priority iterators up the way to the root cgroup. If the
> current memcg is found in the respective iter->last_visited then it is
> replaced by the previous one in the same sub-hierarchy.
> 
> This guarantees that no group gets more reclaiming than necessary and
> the next iteration will continue without noticing that the removed group
> has disappeared.
> 
> Spotted-by: Ying Han <yinghan@google.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  mm/memcontrol.c |   89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e9f5c47..4f81abd 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6375,10 +6375,99 @@ free_out:
>  	return ERR_PTR(error);
>  }
>  
> +/*
> + * Helper to find memcg's previous group under the given root
> + * hierarchy.
> + */
> +struct mem_cgroup *__find_prev_memcg(struct mem_cgroup *root,
> +		struct mem_cgroup *memcg)
> +{
> +	struct cgroup *memcg_cgroup = memcg->css.cgroup;
> +	struct cgroup *root_cgroup = root->css.cgroup;
> +	struct cgroup *prev_cgroup = NULL;
> +	struct cgroup *iter;
> +
> +	cgroup_for_each_descendant_pre(iter, root_cgroup) {
> +		if (iter == memcg_cgroup)
> +			break;
> +		prev_cgroup = iter;
> +	}
> +
> +	return (prev_cgroup) ? mem_cgroup_from_cont(prev_cgroup) : NULL;
> +}
> +
> +/*
> + * Remove the given memcg under given root from all per-node per-zone
> + * per-priority chached iterators.
> + */
> +static void mem_cgroup_uncache_reclaim_iters(struct mem_cgroup *root,
> +		struct mem_cgroup *memcg)
> +{
> +	int node;
> +
> +	for_each_node(node) {
> +		struct mem_cgroup_per_node *pn = root->info.nodeinfo[node];
> +		int zone;
> +
> +		for (zone = 0; zone < MAX_NR_ZONES; zone++) {
> +			struct mem_cgroup_per_zone *mz;
> +			int prio;
> +
> +			mz = &pn->zoneinfo[zone];
> +			for (prio = 0; prio < DEF_PRIORITY + 1; prio++) {
> +				struct mem_cgroup_reclaim_iter *iter;
> +
> +				/*
> +				 * Just drop the reference on the removed memcg
> +				 * cached last_visited. No need to lock iter as
> +				 * the memcg is on the way out and cannot be
> +				 * reclaimed.
> +				 */
> +				iter = &mz->reclaim_iter[prio];
> +				if (root == memcg) {
> +					if (iter->last_visited)
> +						css_put(&iter->last_visited->css);
> +					continue;
> +				}
> +
> +				rcu_read_lock();
> +				spin_lock(&iter->iter_lock);
> +				if (iter->last_visited == memcg) {
> +					iter->last_visited = __find_prev_memcg(
> +							root, memcg);
> +					css_put(&memcg->css);
> +				}
> +				spin_unlock(&iter->iter_lock);
> +				rcu_read_unlock();
> +			}
> +		}
> +	}
> +}
> +
> +/*
> + * Remove the given memcg from all cached reclaim iterators.
> + */
> +static void mem_cgroup_uncache_from_reclaim(struct mem_cgroup *memcg)
> +{
> +	struct mem_cgroup *parent = memcg;
> +
> +	do {
> +		mem_cgroup_uncache_reclaim_iters(parent, memcg);
> +	} while ((parent = parent_mem_cgroup(parent)));
> +
> +	/*
> +	 * if the root memcg is not hierarchical we have to check it
> +	 * explicitely.
> +	 */
> +	if (!root_mem_cgroup->use_hierarchy)
> +		mem_cgroup_uncache_reclaim_iters(root_mem_cgroup, memcg);
> +}

for each in hierarchy:
  for each node:
    for each zone:
      for each reclaim priority:

every time a cgroup is destroyed.  I don't think such a hammer is
justified in general, let alone for consolidating code a little.

Can we invalidate the position cache lazily?  Have a global "cgroup
destruction" counter and store a snapshot of that counter whenever we
put a cgroup pointer in the position cache.  We only use the cached
pointer if that counter has not changed in the meantime, so we know
that the cgroup still exists.

It is pretty pretty imprecise and we invalidate the whole cache every
time a cgroup is destroyed, but I think that should be okay.  If not,
better ideas are welcome.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-02-08 19:33 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-03 17:54 [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 1/7] memcg: synchronize per-zone iterator access by a spinlock Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 2/7] memcg: keep prev's css alive for the whole mem_cgroup_iter Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 3/7] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-07  6:18   ` Kamezawa Hiroyuki
2013-01-07  6:18     ` Kamezawa Hiroyuki
2013-02-08 19:33   ` Johannes Weiner [this message]
2013-02-08 19:33     ` Johannes Weiner
2013-02-11 15:16     ` Michal Hocko
2013-02-11 15:16       ` Michal Hocko
2013-02-11 17:56       ` Johannes Weiner
2013-02-11 17:56         ` Johannes Weiner
2013-02-11 19:29         ` Michal Hocko
2013-02-11 19:29           ` Michal Hocko
2013-02-11 19:58           ` Johannes Weiner
2013-02-11 19:58             ` Johannes Weiner
2013-02-11 21:27             ` Michal Hocko
2013-02-11 21:27               ` Michal Hocko
2013-02-11 22:07               ` Michal Hocko
2013-02-11 22:07                 ` Michal Hocko
2013-02-11 22:39               ` Johannes Weiner
2013-02-11 22:39                 ` Johannes Weiner
2013-02-12  9:54                 ` Michal Hocko
2013-02-12  9:54                   ` Michal Hocko
2013-02-12 15:10                   ` Johannes Weiner
2013-02-12 15:10                     ` Johannes Weiner
2013-02-12 15:43                     ` Michal Hocko
2013-02-12 15:43                       ` Michal Hocko
2013-02-12 16:10                       ` Paul E. McKenney
2013-02-12 16:10                         ` Paul E. McKenney
2013-02-12 17:25                         ` Johannes Weiner
2013-02-12 17:25                           ` Johannes Weiner
2013-02-12 18:31                           ` Paul E. McKenney
2013-02-12 18:31                             ` Paul E. McKenney
2013-02-12 19:53                             ` Johannes Weiner
2013-02-12 19:53                               ` Johannes Weiner
2013-02-13  9:51                               ` Michal Hocko
2013-02-13  9:51                                 ` Michal Hocko
2013-02-12 17:56                         ` Michal Hocko
2013-02-12 17:56                           ` Michal Hocko
2013-02-12 16:13                       ` Michal Hocko
2013-02-12 16:13                         ` Michal Hocko
2013-02-12 16:24                         ` Michal Hocko
2013-02-12 16:24                           ` Michal Hocko
2013-02-12 16:37                           ` Michal Hocko
2013-02-12 16:37                             ` Michal Hocko
2013-02-12 16:41                           ` Johannes Weiner
2013-02-12 16:41                             ` Johannes Weiner
2013-02-12 17:12                             ` Michal Hocko
2013-02-12 17:12                               ` Michal Hocko
2013-02-12 17:37                               ` Johannes Weiner
2013-02-12 17:37                                 ` Johannes Weiner
2013-02-13  8:11                                 ` Glauber Costa
2013-02-13  8:11                                   ` Glauber Costa
2013-02-13 10:38                                   ` Michal Hocko
2013-02-13 10:38                                     ` Michal Hocko
2013-02-13 10:34                                 ` Michal Hocko
2013-02-13 10:34                                   ` Michal Hocko
2013-02-13 12:56                                   ` Michal Hocko
2013-02-13 12:56                                     ` Michal Hocko
2013-02-12 16:33                       ` Johannes Weiner
2013-02-12 16:33                         ` Johannes Weiner
2013-01-03 17:54 ` [PATCH v3 5/7] memcg: simplify mem_cgroup_iter Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 6/7] memcg: further " Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 7/7] cgroup: remove css_get_next Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-04  3:42   ` Li Zefan
2013-01-04  3:42     ` Li Zefan
2013-01-23 12:52 ` [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-23 12:52   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130208193318.GA15951@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=glommer@parallels.com \
    --cc=htejun@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.