linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: hannes@cmpxchg.org, akpm@linux-foundation.org,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	vdavydov@parallels.com, kernel-team@fb.com
Subject: Re: [PATCH v2 3/2] memcg: punt high overage reclaim to return-to-userland path
Date: Mon, 7 Sep 2015 11:23:46 +0200	[thread overview]
Message-ID: <20150907092346.GC6022@dhcp22.suse.cz> (raw)
In-Reply-To: <20150904210011.GH25329@mtj.duckdns.org>

On Fri 04-09-15 17:00:11, Tejun Heo wrote:
> Currently, try_charge() tries to reclaim memory synchronously when the
> high limit is breached; however, if the allocation doesn't have
> __GFP_WAIT, synchronous reclaim is skipped.  If a process performs
> only speculative allocations, it can blow way past the high limit.
> This is actually easily reproducible by simply doing "find /".
> slab/slub allocator tries speculative allocations first, so as long as
> there's memory which can be consumed without blocking, it can keep
> allocating memory regardless of the high limit.
> 
> This patch makes try_charge() always punt the over-high reclaim to the
> return-to-userland path.  If try_charge() detects that high limit is
> breached, it adds the overage to current->memcg_nr_pages_over_high and
> schedules execution of mem_cgroup_handle_over_high() which performs
> synchronous reclaim from the return-to-userland path.

This also means that a killed task will not reclaim before it dies. This
shouldn't be a big deal, though, because the task should uncharge its
memory which will most likely belong to the memcg. More on that below.

> As long as kernel doesn't have a run-away allocation spree, this
> should provide enough protection while making kmemcg behave more
> consistently.

I would also point out that this approach allows for a better reclaim
opportunities for GFP_NOFS charges which are quite common with kmem
enabled.

> v2: - Switched to reclaiming only the overage caused by current rather
>       than the difference between usage and high as suggested by
>       Michal.
>     - Don't record the memcg which went over high limit.  This makes
>       exit path handling unnecessary.  Dropped.

Hmm, this allows to leave a memcg in a high limit excess. I guess
you are right that this is not that likely to lose sleep over
it. Nevertheless, a nasty user could move away from within signal
handler context which runs before. This looks like a potential runaway
but the migration outside of the restricted hierarchy is a problem in
itself so I wouldn't consider this a problem.

>     - Drop mentions of avoiding high stack usage from description as
>       suggested by Vladimir.  max limit still triggers direct reclaim.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov@parallels.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> +/*
> + * Scheduled by try_charge() to be executed from the userland return path
> + * and reclaims memory over the high limit.
> + */
> +void mem_cgroup_handle_over_high(void)
> +{
> +	unsigned int nr_pages = current->memcg_nr_pages_over_high;
> +	struct mem_cgroup *memcg, *pos;
> +
> +	if (likely(!nr_pages))
> +		return;

This is hooking into a hot path so I guess it would be better to make
this part inline and the rest can go via function call.

> +
> +	pos = memcg = get_mem_cgroup_from_mm(current->mm);
> +
> +	do {
> +		if (page_counter_read(&pos->memory) <= pos->high)
> +			continue;
> +		mem_cgroup_events(pos, MEMCG_HIGH, 1);

I was thinking about when to emit the event when I realized we haven't
specified the semantic anywhere. It sounds more logical to emit the
event when the limit is breached rather than when we reclaim for it.  On
the other hand we have been doing it only for reclaim and GFP_NOWAIT
where not accounted. So this mimics the previous behavior. Same for the
max limit when we skip the charge. So I guess this is OK as well.


> +		try_to_free_mem_cgroup_pages(pos, nr_pages, GFP_KERNEL, true);
> +	} while ((pos = parent_mem_cgroup(pos)));
> +
> +	css_put(&memcg->css);
> +	current->memcg_nr_pages_over_high = 0;
> +}
> +
>  static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  		      unsigned int nr_pages)
>  {
> @@ -2082,17 +2108,22 @@ done_restock:

JFYI you can get rid of labels in the patch format by
[diff "default"]
        xfuncname = "^[[:alpha:]$_].*[^:]$"

I found it really nice and easier to read.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-09-07  9:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-28 22:01 [PATCH 1/2] memcg: flatten task_struct->memcg_oom Tejun Heo
2015-08-28 22:02 ` [PATCH 2/2] memcg: always enable kmemcg on the default hierarchy Tejun Heo
2015-08-31 22:54   ` Andrew Morton
2015-09-04 21:00   ` [PATCH v2 3/2] memcg: punt high overage reclaim to return-to-userland path Tejun Heo
2015-09-07  9:23     ` Michal Hocko [this message]
2015-09-08 16:59       ` Tejun Heo
2015-09-07 11:38     ` Vladimir Davydov
2015-09-08 17:00       ` Tejun Heo
2015-09-15  9:20   ` [PATCH 2/2] memcg: always enable kmemcg on the default hierarchy Johannes Weiner
2015-09-01 15:25 ` [PATCH 1/2] memcg: flatten task_struct->memcg_oom Michal Hocko
2015-09-02 11:45 ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150907092346.GC6022@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).