linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
       [not found] <cover.1622043596.git.yuleixzhang@tencent.com>
@ 2021-05-26 20:52 ` Shakeel Butt
  2021-05-31 12:11   ` yulei zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2021-05-26 20:52 UTC (permalink / raw)
  To: yulei zhang
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Christian Brauner, Cgroups,
	benbjiang, Wanpeng Li, Yulei Zhang, Linux MM, Michal Hocko,
	Roman Gushchin

Adding linux-mm and related folks.



On Wed, May 26, 2021 at 9:18 AM <yulei.kernel@gmail.com> wrote:
>
> From: Yulei Zhang <yuleixzhang@tencent.com>
>
> In this patch set we present the idea to suppress the memory allocation
> speed in memory cgroup, which aims to avoid direct reclaim caused by
> memory allocation burst while under memory pressure.
>

I am assuming here direct reclaim means global reclaim.

> As minimum watermark could be easily broken if certain tasks allocate
> massive amount of memory in a short period of time, in that case it will
> trigger the direct memory reclaim and cause unacceptable jitters for
> latency critical tasks, such as guaranteed pod task in K8s.
>
> With memory allocation speed throttle(mst) mechanism we could lower the
> memory allocation speed in certian cgroup, usually for low priority tasks,
> so that could avoid the direct memory reclaim in time.

Can you please explain why memory.high is not good enough for your
use-case? You can orchestrate the memory.high limits in such a way
that those certain cgroups hit their memory.high limit before causing
the global reclaim. You might need to dynamically adjust the limits
based on other workloads or unaccounted memory.

>
> And per-memcg interfaces are introduced under memcg tree, not visiable for
> root memcg.
> - <cgroup_root>/<cgroup_name>/memory.alloc_bps
>  - 0 -> means memory speed throttle disabled
>  - non-zero -> value in bytes for memory allocation speed limits
>
> - <cgroup_root>/<cgroup_name>/memory.stat:mst_mem_spd_max
>   it records the max memory allocation speed of the memory cgroup in the
>   last period of time slice
>
> - <cgroup_root>/<cgroup_name>/memory.stat:mst_nr_throttled
>   it represents the number of times for allocation throttling
>
> Yulei Zhang (7):
>   mm: record total charge and max speed counter in memcg
>   mm: introduce alloc_bps to memcg for memory allocation speed throttle
>   mm: memory allocation speed throttle setup in hierarchy
>   mm: introduce slice analysis into memory speed throttle mechanism
>   mm: introduce memory allocation speed throttle
>   mm: record the numbers of memory allocation throttle
>   mm: introduce mst low and min watermark
>
>  include/linux/memcontrol.h   |  23 +++
>  include/linux/page_counter.h |   8 +
>  init/Kconfig                 |   8 +
>  mm/memcontrol.c              | 295 +++++++++++++++++++++++++++++++++++
>  mm/page_counter.c            |  39 +++++
>  5 files changed, 373 insertions(+)
>
> --
> 2.28.0
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-05-26 20:52 ` [RFC 0/7] Introduce memory allocation speed throttle in memcg Shakeel Butt
@ 2021-05-31 12:11   ` yulei zhang
  2021-05-31 18:20     ` Shakeel Butt
  2021-06-01 14:45     ` Chris Down
  0 siblings, 2 replies; 10+ messages in thread
From: yulei zhang @ 2021-05-31 12:11 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Christian Brauner, Cgroups,
	benbjiang, Wanpeng Li, Yulei Zhang, Linux MM, Michal Hocko,
	Roman Gushchin

On Thu, May 27, 2021 at 4:52 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> Adding linux-mm and related folks.
>
>
>
> On Wed, May 26, 2021 at 9:18 AM <yulei.kernel@gmail.com> wrote:
> >
> > From: Yulei Zhang <yuleixzhang@tencent.com>
> >
> > In this patch set we present the idea to suppress the memory allocation
> > speed in memory cgroup, which aims to avoid direct reclaim caused by
> > memory allocation burst while under memory pressure.
> >
>
> I am assuming here direct reclaim means global reclaim.
>

Yep.

> > As minimum watermark could be easily broken if certain tasks allocate
> > massive amount of memory in a short period of time, in that case it will
> > trigger the direct memory reclaim and cause unacceptable jitters for
> > latency critical tasks, such as guaranteed pod task in K8s.
> >
> > With memory allocation speed throttle(mst) mechanism we could lower the
> > memory allocation speed in certian cgroup, usually for low priority tasks,
> > so that could avoid the direct memory reclaim in time.
>
> Can you please explain why memory.high is not good enough for your
> use-case? You can orchestrate the memory.high limits in such a way
> that those certain cgroups hit their memory.high limit before causing
> the global reclaim. You might need to dynamically adjust the limits
> based on other workloads or unaccounted memory.
>

Yep, dynamically adjust the memory.high limits can ease the memory pressure
and postpone the global reclaim, but it can easily trigger the oom in
the cgroups,
which may not be suitable in certain usage cases when we want the services
alive. Using throttle to suppress the allocation may help keep the
activities and
doesn't impact others.  Thanks.

> >
> > And per-memcg interfaces are introduced under memcg tree, not visiable for
> > root memcg.
> > - <cgroup_root>/<cgroup_name>/memory.alloc_bps
> >  - 0 -> means memory speed throttle disabled
> >  - non-zero -> value in bytes for memory allocation speed limits
> >
> > - <cgroup_root>/<cgroup_name>/memory.stat:mst_mem_spd_max
> >   it records the max memory allocation speed of the memory cgroup in the
> >   last period of time slice
> >
> > - <cgroup_root>/<cgroup_name>/memory.stat:mst_nr_throttled
> >   it represents the number of times for allocation throttling
> >
> > Yulei Zhang (7):
> >   mm: record total charge and max speed counter in memcg
> >   mm: introduce alloc_bps to memcg for memory allocation speed throttle
> >   mm: memory allocation speed throttle setup in hierarchy
> >   mm: introduce slice analysis into memory speed throttle mechanism
> >   mm: introduce memory allocation speed throttle
> >   mm: record the numbers of memory allocation throttle
> >   mm: introduce mst low and min watermark
> >
> >  include/linux/memcontrol.h   |  23 +++
> >  include/linux/page_counter.h |   8 +
> >  init/Kconfig                 |   8 +
> >  mm/memcontrol.c              | 295 +++++++++++++++++++++++++++++++++++
> >  mm/page_counter.c            |  39 +++++
> >  5 files changed, 373 insertions(+)
> >
> > --
> > 2.28.0
> >


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-05-31 12:11   ` yulei zhang
@ 2021-05-31 18:20     ` Shakeel Butt
  2021-06-01 14:45     ` Chris Down
  1 sibling, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2021-05-31 18:20 UTC (permalink / raw)
  To: yulei zhang
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Christian Brauner, Cgroups,
	benbjiang, Wanpeng Li, Yulei Zhang, Linux MM, Michal Hocko,
	Roman Gushchin

On Mon, May 31, 2021 at 5:11 AM yulei zhang <yulei.kernel@gmail.com> wrote:
>
[...]
> > Can you please explain why memory.high is not good enough for your
> > use-case? You can orchestrate the memory.high limits in such a way
> > that those certain cgroups hit their memory.high limit before causing
> > the global reclaim. You might need to dynamically adjust the limits
> > based on other workloads or unaccounted memory.
> >
>
> Yep, dynamically adjust the memory.high limits can ease the memory pressure
> and postpone the global reclaim, but it can easily trigger the oom in
> the cgroups,

Can you please elaborate a bit more on this? The memory.high has a
strong throttling mechanism, so if you are observing memory.high being
ineffective then we need to fix that. Also can you please explain a
bit more on the specific of the workload which is able to escape
memory.high throttling e.g. the normal number of processes/threads in
the workload?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-05-31 12:11   ` yulei zhang
  2021-05-31 18:20     ` Shakeel Butt
@ 2021-06-01 14:45     ` Chris Down
  2021-06-02  9:11       ` yulei zhang
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Down @ 2021-06-01 14:45 UTC (permalink / raw)
  To: yulei zhang
  Cc: Shakeel Butt, Tejun Heo, Zefan Li, Johannes Weiner,
	Christian Brauner, Cgroups, benbjiang, Wanpeng Li, Yulei Zhang,
	Linux MM, Michal Hocko, Roman Gushchin

yulei zhang writes:
>Yep, dynamically adjust the memory.high limits can ease the memory pressure
>and postpone the global reclaim, but it can easily trigger the oom in
>the cgroups,

To go further on Shakeel's point, which I agree with, memory.high should 
_never_ result in memcg OOM. Even if the limit is breached dramatically, we 
don't OOM the cgroup. If you have a demonstration of memory.high resulting in 
cgroup-level OOM kills in recent kernels, then that needs to be provided. :-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-06-01 14:45     ` Chris Down
@ 2021-06-02  9:11       ` yulei zhang
  2021-06-02 15:39         ` Shakeel Butt
  0 siblings, 1 reply; 10+ messages in thread
From: yulei zhang @ 2021-06-02  9:11 UTC (permalink / raw)
  To: Chris Down
  Cc: Shakeel Butt, Tejun Heo, Zefan Li, Johannes Weiner,
	Christian Brauner, Cgroups, benbjiang, Wanpeng Li, Yulei Zhang,
	Linux MM, Michal Hocko, Roman Gushchin

On Tue, Jun 1, 2021 at 10:45 PM Chris Down <chris@chrisdown.name> wrote:
>
> yulei zhang writes:
> >Yep, dynamically adjust the memory.high limits can ease the memory pressure
> >and postpone the global reclaim, but it can easily trigger the oom in
> >the cgroups,
>
> To go further on Shakeel's point, which I agree with, memory.high should
> _never_ result in memcg OOM. Even if the limit is breached dramatically, we
> don't OOM the cgroup. If you have a demonstration of memory.high resulting in
> cgroup-level OOM kills in recent kernels, then that needs to be provided. :-)

You are right, I mistook it for max. Shakeel means the throttling
during context switch
which uses memory.high as threshold to calculate the sleep time.
Currently it only applies
to cgroupv2.  In this patchset we explore another idea to throttle the
memory usage, which
rely on setting an average allocation speed in memcg. We hope to
suppress the memory
usage in low priority cgroups when it reaches the system watermark and
still keep the activities
alive.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-06-02  9:11       ` yulei zhang
@ 2021-06-02 15:39         ` Shakeel Butt
  2021-06-03 10:19           ` yulei zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2021-06-02 15:39 UTC (permalink / raw)
  To: yulei zhang
  Cc: Chris Down, Tejun Heo, Zefan Li, Johannes Weiner,
	Christian Brauner, Cgroups, benbjiang, Wanpeng Li, Yulei Zhang,
	Linux MM, Michal Hocko, Roman Gushchin

On Wed, Jun 2, 2021 at 2:11 AM yulei zhang <yulei.kernel@gmail.com> wrote:
>
> On Tue, Jun 1, 2021 at 10:45 PM Chris Down <chris@chrisdown.name> wrote:
> >
> > yulei zhang writes:
> > >Yep, dynamically adjust the memory.high limits can ease the memory pressure
> > >and postpone the global reclaim, but it can easily trigger the oom in
> > >the cgroups,
> >
> > To go further on Shakeel's point, which I agree with, memory.high should
> > _never_ result in memcg OOM. Even if the limit is breached dramatically, we
> > don't OOM the cgroup. If you have a demonstration of memory.high resulting in
> > cgroup-level OOM kills in recent kernels, then that needs to be provided. :-)
>
> You are right, I mistook it for max. Shakeel means the throttling
> during context switch
> which uses memory.high as threshold to calculate the sleep time.
> Currently it only applies
> to cgroupv2.  In this patchset we explore another idea to throttle the
> memory usage, which
> rely on setting an average allocation speed in memcg. We hope to
> suppress the memory
> usage in low priority cgroups when it reaches the system watermark and
> still keep the activities
> alive.

I think you need to make the case: why should we add one more form of
throttling? Basically why memory.high is not good for your use-case
and the proposed solution works better. Though IMO it would be a hard
sell.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-06-02 15:39         ` Shakeel Butt
@ 2021-06-03 10:19           ` yulei zhang
  2021-06-03 11:38             ` Chris Down
  0 siblings, 1 reply; 10+ messages in thread
From: yulei zhang @ 2021-06-03 10:19 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Chris Down, Tejun Heo, Zefan Li, Johannes Weiner,
	Christian Brauner, Cgroups, benbjiang, Wanpeng Li, Yulei Zhang,
	Linux MM, Michal Hocko, Roman Gushchin

On Wed, Jun 2, 2021 at 11:39 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Jun 2, 2021 at 2:11 AM yulei zhang <yulei.kernel@gmail.com> wrote:
> >
> > On Tue, Jun 1, 2021 at 10:45 PM Chris Down <chris@chrisdown.name> wrote:
> > >
> > > yulei zhang writes:
> > > >Yep, dynamically adjust the memory.high limits can ease the memory pressure
> > > >and postpone the global reclaim, but it can easily trigger the oom in
> > > >the cgroups,
> > >
> > > To go further on Shakeel's point, which I agree with, memory.high should
> > > _never_ result in memcg OOM. Even if the limit is breached dramatically, we
> > > don't OOM the cgroup. If you have a demonstration of memory.high resulting in
> > > cgroup-level OOM kills in recent kernels, then that needs to be provided. :-)
> >
> > You are right, I mistook it for max. Shakeel means the throttling
> > during context switch
> > which uses memory.high as threshold to calculate the sleep time.
> > Currently it only applies
> > to cgroupv2.  In this patchset we explore another idea to throttle the
> > memory usage, which
> > rely on setting an average allocation speed in memcg. We hope to
> > suppress the memory
> > usage in low priority cgroups when it reaches the system watermark and
> > still keep the activities
> > alive.
>
> I think you need to make the case: why should we add one more form of
> throttling? Basically why memory.high is not good for your use-case
> and the proposed solution works better. Though IMO it would be a hard
> sell.

Thanks. IMHO, there are differences between these two throttlings.
memory.high is a per-memcg throttle which targets to limit the memory
usage of the tasks in the cgroup. For the memory allocation speed throttle(MST),
the purpose is to avoid the memory burst in cgroup which would trigger
the global reclaim and affects the timing sensitive workloads in other cgroup.
For example, we have two pods with memory overcommit enabled, one includes
online tasks and the other has offline tasks, if we restrict the memory usage of
the offline pod with memory.high, it will lose the benefit of memory overcommit
when the other workloads are idle. On the other hand, if we don't
limit the memory
usage, it will easily break the system watermark when there suddenly has massive
memory operations. If enable MST in this case, we will be able to
avoid the direct
reclaim and leverage the overcommit.
.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-06-03 10:19           ` yulei zhang
@ 2021-06-03 11:38             ` Chris Down
  2021-06-04 10:15               ` yulei zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Down @ 2021-06-03 11:38 UTC (permalink / raw)
  To: yulei zhang
  Cc: Shakeel Butt, Tejun Heo, Zefan Li, Johannes Weiner,
	Christian Brauner, Cgroups, benbjiang, Wanpeng Li, Yulei Zhang,
	Linux MM, Michal Hocko, Roman Gushchin

yulei zhang writes:
>Thanks. IMHO, there are differences between these two throttlings.
>memory.high is a per-memcg throttle which targets to limit the memory
>usage of the tasks in the cgroup. For the memory allocation speed throttle(MST),
>the purpose is to avoid the memory burst in cgroup which would trigger
>the global reclaim and affects the timing sensitive workloads in other cgroup.
>For example, we have two pods with memory overcommit enabled, one includes
>online tasks and the other has offline tasks, if we restrict the memory usage of
>the offline pod with memory.high, it will lose the benefit of memory overcommit
>when the other workloads are idle. On the other hand, if we don't
>limit the memory
>usage, it will easily break the system watermark when there suddenly has massive
>memory operations. If enable MST in this case, we will be able to
>avoid the direct
>reclaim and leverage the overcommit.

Having a speed throttle is a very primitive knob: it's hard to know what the 
correct values are for a user. That's one of the reasons why we've moved away 
from that kind of tunable for blkio.

Ultimately, if you want work-conserving behaviour, why not use memory.low?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-06-03 11:38             ` Chris Down
@ 2021-06-04 10:15               ` yulei zhang
  2021-06-04 11:51                 ` Chris Down
  0 siblings, 1 reply; 10+ messages in thread
From: yulei zhang @ 2021-06-04 10:15 UTC (permalink / raw)
  To: Chris Down
  Cc: Shakeel Butt, Tejun Heo, Zefan Li, Johannes Weiner,
	Christian Brauner, Cgroups, benbjiang, Wanpeng Li, Yulei Zhang,
	Linux MM, Michal Hocko, Roman Gushchin

On Thu, Jun 3, 2021 at 7:38 PM Chris Down <chris@chrisdown.name> wrote:
>
> yulei zhang writes:
> >Thanks. IMHO, there are differences between these two throttlings.
> >memory.high is a per-memcg throttle which targets to limit the memory
> >usage of the tasks in the cgroup. For the memory allocation speed throttle(MST),
> >the purpose is to avoid the memory burst in cgroup which would trigger
> >the global reclaim and affects the timing sensitive workloads in other cgroup.
> >For example, we have two pods with memory overcommit enabled, one includes
> >online tasks and the other has offline tasks, if we restrict the memory usage of
> >the offline pod with memory.high, it will lose the benefit of memory overcommit
> >when the other workloads are idle. On the other hand, if we don't
> >limit the memory
> >usage, it will easily break the system watermark when there suddenly has massive
> >memory operations. If enable MST in this case, we will be able to
> >avoid the direct
> >reclaim and leverage the overcommit.
>
> Having a speed throttle is a very primitive knob: it's hard to know what the
> correct values are for a user. That's one of the reasons why we've moved away
> from that kind of tunable for blkio.
>
> Ultimately, if you want work-conserving behaviour, why not use memory.low?

Thanks. But currently low and high are for cgroup v2 setting, do you
think we'd better
extend the same mechanism to cgroup v1?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg
  2021-06-04 10:15               ` yulei zhang
@ 2021-06-04 11:51                 ` Chris Down
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Down @ 2021-06-04 11:51 UTC (permalink / raw)
  To: yulei zhang
  Cc: Shakeel Butt, Tejun Heo, Zefan Li, Johannes Weiner,
	Christian Brauner, Cgroups, benbjiang, Wanpeng Li, Yulei Zhang,
	Linux MM, Michal Hocko, Roman Gushchin

yulei zhang writes:
>> Having a speed throttle is a very primitive knob: it's hard to know what the
>> correct values are for a user. That's one of the reasons why we've moved away
>> from that kind of tunable for blkio.
>>
>> Ultimately, if you want work-conserving behaviour, why not use memory.low?
>
>Thanks. But currently low and high are for cgroup v2 setting, do you
>think we'd better
>extend the same mechanism to cgroup v1?

The cgroup v1 interface is frozen and in pure maintenance mode -- we're not 
adding new features there and haven't done so for some time.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-04 11:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1622043596.git.yuleixzhang@tencent.com>
2021-05-26 20:52 ` [RFC 0/7] Introduce memory allocation speed throttle in memcg Shakeel Butt
2021-05-31 12:11   ` yulei zhang
2021-05-31 18:20     ` Shakeel Butt
2021-06-01 14:45     ` Chris Down
2021-06-02  9:11       ` yulei zhang
2021-06-02 15:39         ` Shakeel Butt
2021-06-03 10:19           ` yulei zhang
2021-06-03 11:38             ` Chris Down
2021-06-04 10:15               ` yulei zhang
2021-06-04 11:51                 ` Chris Down

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).