archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <>
To: Michal Hocko <>
Cc: Andrew Morton <>,
	Tejun Heo <>, Roman Gushchin <>,,,,
Subject: Re: [PATCH] mm: memcontrol: asynchronous reclaim for memory.high
Date: Wed, 19 Feb 2020 14:16:18 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Wed, Feb 19, 2020 at 07:37:31PM +0100, Michal Hocko wrote:
> On Wed 19-02-20 13:12:19, Johannes Weiner wrote:
> > We have received regression reports from users whose workloads moved
> > into containers and subsequently encountered new latencies. For some
> > users these were a nuisance, but for some it meant missing their SLA
> > response times. We tracked those delays down to cgroup limits, which
> > inject direct reclaim stalls into the workload where previously all
> > reclaim was handled my kswapd.
> I am curious why is this unexpected when the high limit is explicitly
> documented as a throttling mechanism.

Memory.high is supposed to curb aggressive growth using throttling
instead of OOM killing. However, if the workload has plenty of easily
reclaimable memory and just needs to recycle a couple of cache pages
to permit an allocation, there is no need to throttle the workload -
just as there wouldn't be any need to trigger the OOM killer.

So it's not unexpected, but it's unnecessarily heavy-handed: since
memory.high allows some flexibility around the target size, we can
move the routine reclaim activity (cache recycling) out of the main
execution stream of the workload, just like we do with kswapd. If that
cannot keep up, we can throttle and do direct reclaim.

It doesn't change the memory.high semantics, but it allows exploiting
the fact that we have SMP systems and can parallize the book keeping.

> > This patch adds asynchronous reclaim to the memory.high cgroup limit
> > while keeping direct reclaim as a fallback. In our testing, this
> > eliminated all direct reclaim from the affected workload.
> Who is accounted for all the work? Unless I am missing something this
> just gets hidden in the system activity and that might hurt the
> isolation. I do see how moving the work to a different context is
> desirable but this work has to be accounted properly when it is going to
> become a normal mode of operation (rather than a rare exception like the
> existing irq context handling).

Yes, the plan is to account it to the cgroup on whose behalf we're
doing the work.

The problem is that we have a general lack of usable CPU control right
now - see Rik's work on this:
For workloads that are contended on CPU, we cannot enable the CPU
controller because the scheduling latencies are too high. And for
workloads that aren't CPU contended, well, it doesn't really matter
where the reclaim cycles are accounted to.

Once we have the CPU controller up to speed, we can add annotations
like these to account stretches of execution to specific
cgroups. There just isn't much point to do it before we can actually
enable CPU control on the real workloads where it would matter.

[ This is generally work in process: for example, if you isolate
  workloads with memory.low, kswapd cpu time isn't accounted to the
  cgroup that causes it. Swap IO issued by kswapd isn't accounted to
  the group that is getting swapped. The memory consumed by struct
  cgroup itself, the percpu allocations for the vmstat arrays etc.,
  which is sizable, are not accounted to the cgroup that creates
  subgroups, and so forth. ]

  reply	other threads:[~2020-02-19 19:16 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-19 18:12 [PATCH] mm: memcontrol: asynchronous reclaim for memory.high Johannes Weiner
2020-02-19 18:37 ` Michal Hocko
2020-02-19 19:16   ` Johannes Weiner [this message]
2020-02-19 19:53     ` Michal Hocko
2020-02-19 21:17       ` Johannes Weiner
2020-02-20  9:46         ` Michal Hocko
2020-02-20 14:41           ` Johannes Weiner
2020-02-19 21:41       ` Daniel Jordan
2020-02-19 22:08         ` Johannes Weiner
2020-02-20 15:45           ` Daniel Jordan
2020-02-20 15:56             ` Tejun Heo
2020-02-20 18:23               ` Daniel Jordan
2020-02-20 18:45                 ` Tejun Heo
2020-02-20 19:55                   ` Daniel Jordan
2020-02-20 20:54                     ` Tejun Heo
2020-02-19 19:17   ` Chris Down
2020-02-19 19:31   ` Andrew Morton
2020-02-19 21:33     ` Johannes Weiner
2020-02-26 20:25 ` Shakeel Butt
2020-02-26 22:26   ` Johannes Weiner
2020-02-26 23:36     ` Shakeel Butt
2020-02-26 23:46       ` Johannes Weiner
2020-02-27  0:12     ` Yang Shi
2020-02-27  2:42       ` Shakeel Butt
2020-02-27  9:58       ` Michal Hocko
2020-02-27 12:50       ` Johannes Weiner
2020-02-26 23:59   ` Yang Shi
2020-02-27  2:36     ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).