[PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
@ 2020-09-21  8:02 zangchunxin
  2020-09-21  8:12 ` Michal Hocko
  2020-09-21 15:55 ` Shakeel Butt
  0 siblings, 2 replies; 21+ messages in thread
From: zangchunxin @ 2020-09-21  8:02 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, tj, lizefan, corbet
  Cc: ast, daniel, kafai, songliubraving, yhs, andriin, john.fastabend,
	kpsingh, cgroups, linux-doc, linux-mm, linux-kernel, netdev, bpf,
	Chunxin Zang

From: Chunxin Zang <zangchunxin@bytedance.com>

In the cgroup v1, we have 'force_mepty' interface. This is very
useful for userspace to actively release memory. But the cgroup
v2 does not.

This patch reuse cgroup v1's function, but have a new name for
the interface. Because I think 'drop_cache' may be is easier to
understand :)

Signed-off-by: Chunxin Zang <zangchunxin@bytedance.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 11 +++++++++++
 mm/memcontrol.c                         |  5 +++++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index ce3e05e41724..fbff959c8116 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1181,6 +1181,17 @@ PAGE_SIZE multiple when read back.
 	high limit is used and monitored properly, this limit's
 	utility is limited to providing the final safety net.
 
+  memory.drop_cache
+    A write-only single value file which exists on non-root
+    cgroups.
+
+    Provide a mechanism for users to actively trigger memory
+    reclaim. The cgroup will be reclaimed and as many pages
+    reclaimed as possible.
+
+    It will broke low boundary. Because it tries to reclaim the
+    memory many times, until the memory drops to a certain level.
+
   memory.oom.group
 	A read-write single value file which exists on non-root
 	cgroups.  The default value is "0".
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0b38b6ad547d..98646484efff 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6226,6 +6226,11 @@ static struct cftype memory_files[] = {
 		.write = memory_max_write,
 	},
 	{
+		.name = "drop_cache",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.write = mem_cgroup_force_empty_write,
+	},
+	{
 		.name = "events",
 		.flags = CFTYPE_NOT_ON_ROOT,
 		.file_offset = offsetof(struct mem_cgroup, events_file),
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21  8:02 [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2 zangchunxin
@ 2020-09-21  8:12 ` Michal Hocko
  2020-09-21 10:42   ` Chris Down
  2020-09-21 10:55   ` Yafang Shao
  2020-09-21 15:55 ` Shakeel Butt
  1 sibling, 2 replies; 21+ messages in thread
From: Michal Hocko @ 2020-09-21  8:12 UTC (permalink / raw)
  To: zangchunxin
  Cc: hannes, vdavydov.dev, akpm, tj, lizefan, corbet, ast, daniel,
	kafai, songliubraving, yhs, andriin, john.fastabend, kpsingh,
	cgroups, linux-doc, linux-mm, linux-kernel, netdev, bpf

On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> From: Chunxin Zang <zangchunxin@bytedance.com>
> 
> In the cgroup v1, we have 'force_mepty' interface. This is very
> useful for userspace to actively release memory. But the cgroup
> v2 does not.
> 
> This patch reuse cgroup v1's function, but have a new name for
> the interface. Because I think 'drop_cache' may be is easier to
> understand :)

This should really explain a usecase. Global drop_caches is a terrible
interface and it has caused many problems in the past. People have
learned to use it as a remedy to any problem they might see and cause
other problems without realizing that. This is the reason why we even
log each attempt to drop caches.

I would rather not repeat the same mistake on the memcg level unless
there is a very strong reason for it.

> Signed-off-by: Chunxin Zang <zangchunxin@bytedance.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 11 +++++++++++
>  mm/memcontrol.c                         |  5 +++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index ce3e05e41724..fbff959c8116 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1181,6 +1181,17 @@ PAGE_SIZE multiple when read back.
>  	high limit is used and monitored properly, this limit's
>  	utility is limited to providing the final safety net.
>  
> +  memory.drop_cache
> +    A write-only single value file which exists on non-root
> +    cgroups.
> +
> +    Provide a mechanism for users to actively trigger memory
> +    reclaim. The cgroup will be reclaimed and as many pages
> +    reclaimed as possible.
> +
> +    It will broke low boundary. Because it tries to reclaim the
> +    memory many times, until the memory drops to a certain level.
> +
>    memory.oom.group
>  	A read-write single value file which exists on non-root
>  	cgroups.  The default value is "0".
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0b38b6ad547d..98646484efff 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6226,6 +6226,11 @@ static struct cftype memory_files[] = {
>  		.write = memory_max_write,
>  	},
>  	{
> +		.name = "drop_cache",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.write = mem_cgroup_force_empty_write,
> +	},
> +	{
>  		.name = "events",
>  		.flags = CFTYPE_NOT_ON_ROOT,
>  		.file_offset = offsetof(struct mem_cgroup, events_file),
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21  8:12 ` Michal Hocko
@ 2020-09-21 10:42   ` Chris Down
  2020-09-21 10:55   ` Yafang Shao
  1 sibling, 0 replies; 21+ messages in thread
From: Chris Down @ 2020-09-21 10:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: zangchunxin, hannes, vdavydov.dev, akpm, tj, lizefan, corbet,
	ast, daniel, kafai, songliubraving, yhs, andriin, john.fastabend,
	kpsingh, cgroups, linux-doc, linux-mm, linux-kernel, netdev, bpf

Michal Hocko writes:
>On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
>> From: Chunxin Zang <zangchunxin@bytedance.com>
>>
>> In the cgroup v1, we have 'force_mepty' interface. This is very
>> useful for userspace to actively release memory. But the cgroup
>> v2 does not.
>>
>> This patch reuse cgroup v1's function, but have a new name for
>> the interface. Because I think 'drop_cache' may be is easier to
>> understand :)
>
>This should really explain a usecase. Global drop_caches is a terrible
>interface and it has caused many problems in the past. People have
>learned to use it as a remedy to any problem they might see and cause
>other problems without realizing that. This is the reason why we even
>log each attempt to drop caches.
>
>I would rather not repeat the same mistake on the memcg level unless
>there is a very strong reason for it.

I agree with Michal. We already have ways to do best-effort memory release on 
cgroup v2, primarily with memory.high. Singling out a category of memory for 
reclaim has historically proved to be a fool's errand.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21  8:12 ` Michal Hocko
  2020-09-21 10:42   ` Chris Down
@ 2020-09-21 10:55   ` Yafang Shao
  2020-09-21 11:05     ` Michal Hocko
  1 sibling, 1 reply; 21+ messages in thread
From: Yafang Shao @ 2020-09-21 10:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: zangchunxin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> > From: Chunxin Zang <zangchunxin@bytedance.com>
> >
> > In the cgroup v1, we have 'force_mepty' interface. This is very
> > useful for userspace to actively release memory. But the cgroup
> > v2 does not.
> >
> > This patch reuse cgroup v1's function, but have a new name for
> > the interface. Because I think 'drop_cache' may be is easier to
> > understand :)
>
> This should really explain a usecase. Global drop_caches is a terrible
> interface and it has caused many problems in the past. People have
> learned to use it as a remedy to any problem they might see and cause
> other problems without realizing that. This is the reason why we even
> log each attempt to drop caches.
>
> I would rather not repeat the same mistake on the memcg level unless
> there is a very strong reason for it.
>

I think we'd better add these comments above the function
mem_cgroup_force_empty() to explain why we don't want to expose this
interface in cgroup2, otherwise people will continue to send this
proposal without any strong reason.


> > Signed-off-by: Chunxin Zang <zangchunxin@bytedance.com>
> > ---
> >  Documentation/admin-guide/cgroup-v2.rst | 11 +++++++++++
> >  mm/memcontrol.c                         |  5 +++++
> >  2 files changed, 16 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index ce3e05e41724..fbff959c8116 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1181,6 +1181,17 @@ PAGE_SIZE multiple when read back.
> >       high limit is used and monitored properly, this limit's
> >       utility is limited to providing the final safety net.
> >
> > +  memory.drop_cache
> > +    A write-only single value file which exists on non-root
> > +    cgroups.
> > +
> > +    Provide a mechanism for users to actively trigger memory
> > +    reclaim. The cgroup will be reclaimed and as many pages
> > +    reclaimed as possible.
> > +
> > +    It will broke low boundary. Because it tries to reclaim the
> > +    memory many times, until the memory drops to a certain level.
> > +
> >    memory.oom.group
> >       A read-write single value file which exists on non-root
> >       cgroups.  The default value is "0".
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 0b38b6ad547d..98646484efff 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6226,6 +6226,11 @@ static struct cftype memory_files[] = {
> >               .write = memory_max_write,
> >       },
> >       {
> > +             .name = "drop_cache",
> > +             .flags = CFTYPE_NOT_ON_ROOT,
> > +             .write = mem_cgroup_force_empty_write,
> > +     },
> > +     {
> >               .name = "events",
> >               .flags = CFTYPE_NOT_ON_ROOT,
> >               .file_offset = offsetof(struct mem_cgroup, events_file),
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs
>


-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21 10:55   ` Yafang Shao
@ 2020-09-21 11:05     ` Michal Hocko
  2020-09-21 11:23       ` Yafang Shao
  2020-09-22  9:43       ` [External] " Chunxin Zang
  0 siblings, 2 replies; 21+ messages in thread
From: Michal Hocko @ 2020-09-21 11:05 UTC (permalink / raw)
  To: Yafang Shao
  Cc: zangchunxin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > >
> > > In the cgroup v1, we have 'force_mepty' interface. This is very
> > > useful for userspace to actively release memory. But the cgroup
> > > v2 does not.
> > >
> > > This patch reuse cgroup v1's function, but have a new name for
> > > the interface. Because I think 'drop_cache' may be is easier to
> > > understand :)
> >
> > This should really explain a usecase. Global drop_caches is a terrible
> > interface and it has caused many problems in the past. People have
> > learned to use it as a remedy to any problem they might see and cause
> > other problems without realizing that. This is the reason why we even
> > log each attempt to drop caches.
> >
> > I would rather not repeat the same mistake on the memcg level unless
> > there is a very strong reason for it.
> >
> 
> I think we'd better add these comments above the function
> mem_cgroup_force_empty() to explain why we don't want to expose this
> interface in cgroup2, otherwise people will continue to send this
> proposal without any strong reason.

I do not mind people sending this proposal.  "V1 used to have an
interface, we need it in v2 as well" is not really viable without
providing more reasoning on the specific usecase.

_Any_ patch should have a proper justification. This is nothing really
new to the process and I am wondering why this is coming as a surprise.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21 11:05     ` Michal Hocko
@ 2020-09-21 11:23       ` Yafang Shao
  2020-09-21 11:36         ` Michal Hocko
  2020-09-22  9:43       ` [External] " Chunxin Zang
  1 sibling, 1 reply; 21+ messages in thread
From: Yafang Shao @ 2020-09-21 11:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: zangchunxin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Mon, Sep 21, 2020 at 7:05 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> > On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> > > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > > >
> > > > In the cgroup v1, we have 'force_mepty' interface. This is very
> > > > useful for userspace to actively release memory. But the cgroup
> > > > v2 does not.
> > > >
> > > > This patch reuse cgroup v1's function, but have a new name for
> > > > the interface. Because I think 'drop_cache' may be is easier to
> > > > understand :)
> > >
> > > This should really explain a usecase. Global drop_caches is a terrible
> > > interface and it has caused many problems in the past. People have
> > > learned to use it as a remedy to any problem they might see and cause
> > > other problems without realizing that. This is the reason why we even
> > > log each attempt to drop caches.
> > >
> > > I would rather not repeat the same mistake on the memcg level unless
> > > there is a very strong reason for it.
> > >
> >
> > I think we'd better add these comments above the function
> > mem_cgroup_force_empty() to explain why we don't want to expose this
> > interface in cgroup2, otherwise people will continue to send this
> > proposal without any strong reason.
>
> I do not mind people sending this proposal.  "V1 used to have an
> interface, we need it in v2 as well" is not really viable without
> providing more reasoning on the specific usecase.
>
> _Any_ patch should have a proper justification. This is nothing really
> new to the process and I am wondering why this is coming as a surprise.
>

Container users always want to drop cache in a specific container,
because they used to use drop_caches to fix memory pressure issues.
Although drop_caches can cause some unexpected issues, it could also
fix some issues. So container users want to use it in containers as
well.
If this feature is not implemented in cgroup, they will ask you why
but there is no explanation in the kernel.

Regarding the memory.high, it is not perfect as well, because you have
to set it to 0 to drop_caches, and the processes in the containers
have to reclaim pages as well because they reach the memory.high, but
memory.force_empty won't make other processes to reclaim.

That doesn't mean I agree to add this interface, while I really mean
that if we discard one feature we'd better explain why.

-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21 11:23       ` Yafang Shao
@ 2020-09-21 11:36         ` Michal Hocko
  2020-09-22  4:20           ` Yafang Shao
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2020-09-21 11:36 UTC (permalink / raw)
  To: Yafang Shao
  Cc: zangchunxin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Mon 21-09-20 19:23:01, Yafang Shao wrote:
> On Mon, Sep 21, 2020 at 7:05 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> > > On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> > > > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > > > >
> > > > > In the cgroup v1, we have 'force_mepty' interface. This is very
> > > > > useful for userspace to actively release memory. But the cgroup
> > > > > v2 does not.
> > > > >
> > > > > This patch reuse cgroup v1's function, but have a new name for
> > > > > the interface. Because I think 'drop_cache' may be is easier to
> > > > > understand :)
> > > >
> > > > This should really explain a usecase. Global drop_caches is a terrible
> > > > interface and it has caused many problems in the past. People have
> > > > learned to use it as a remedy to any problem they might see and cause
> > > > other problems without realizing that. This is the reason why we even
> > > > log each attempt to drop caches.
> > > >
> > > > I would rather not repeat the same mistake on the memcg level unless
> > > > there is a very strong reason for it.
> > > >
> > >
> > > I think we'd better add these comments above the function
> > > mem_cgroup_force_empty() to explain why we don't want to expose this
> > > interface in cgroup2, otherwise people will continue to send this
> > > proposal without any strong reason.
> >
> > I do not mind people sending this proposal.  "V1 used to have an
> > interface, we need it in v2 as well" is not really viable without
> > providing more reasoning on the specific usecase.
> >
> > _Any_ patch should have a proper justification. This is nothing really
> > new to the process and I am wondering why this is coming as a surprise.
> >
> 
> Container users always want to drop cache in a specific container,
> because they used to use drop_caches to fix memory pressure issues.

This is exactly the kind of problems we have seen in the past. There
should be zero reason to addre potential reclaim problems by dropping
page cache on the floor. There is a huge cargo cult about this
procedure and I have seen numerous reports when people complained about
performance afterwards just to learn that the dropped page cache was one
of the resons for that.

> Although drop_caches can cause some unexpected issues, it could also
> fix some issues.

"Some issues" is way too general. We really want to learn about those
issues and address them properly.

> So container users want to use it in containers as
> well.
> If this feature is not implemented in cgroup, they will ask you why
> but there is no explanation in the kernel.

There is no usecase that would really require it so far.

> Regarding the memory.high, it is not perfect as well, because you have
> to set it to 0 to drop_caches, and the processes in the containers
> have to reclaim pages as well because they reach the memory.high, but
> memory.force_empty won't make other processes to reclaim.

Since 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering
memory.high") the limit is set after the reclaim so the race window when
somebody would be pushed to high limit reclaim is reduced. But I do
agree this is just a workaround.

> That doesn't mean I agree to add this interface, while I really mean
> that if we discard one feature we'd better explain why.

We need to understand why somebody wants an interface because once it is
added it will have to be maintained for ever.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21  8:02 [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2 zangchunxin
  2020-09-21  8:12 ` Michal Hocko
@ 2020-09-21 15:55 ` Shakeel Butt
  1 sibling, 0 replies; 21+ messages in thread
From: Shakeel Butt @ 2020-09-21 15:55 UTC (permalink / raw)
  To: zangchunxin
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Tejun Heo, Li Zefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, yhs, andriin, john.fastabend,
	kpsingh, Cgroups, linux-doc, Linux MM, LKML, netdev, bpf

On Mon, Sep 21, 2020 at 1:05 AM <zangchunxin@bytedance.com> wrote:
>
> From: Chunxin Zang <zangchunxin@bytedance.com>
>
> In the cgroup v1, we have 'force_mepty' interface. This is very
> useful for userspace to actively release memory. But the cgroup
> v2 does not.
>
> This patch reuse cgroup v1's function, but have a new name for
> the interface. Because I think 'drop_cache' may be is easier to
> understand :)
>
> Signed-off-by: Chunxin Zang <zangchunxin@bytedance.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 11 +++++++++++
>  mm/memcontrol.c                         |  5 +++++
>  2 files changed, 16 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index ce3e05e41724..fbff959c8116 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1181,6 +1181,17 @@ PAGE_SIZE multiple when read back.
>         high limit is used and monitored properly, this limit's
>         utility is limited to providing the final safety net.
>
> +  memory.drop_cache
> +    A write-only single value file which exists on non-root
> +    cgroups.
> +
> +    Provide a mechanism for users to actively trigger memory
> +    reclaim. The cgroup will be reclaimed and as many pages
> +    reclaimed as possible.
> +
> +    It will broke low boundary. Because it tries to reclaim the
> +    memory many times, until the memory drops to a certain level.
> +

drop_cache is not really force_empty(). What is your use-case? Maybe
you can use memory.reclaim [1] for your use-case. It is already in
Andrew's tree.

[1] https://lkml.kernel.org/r/20200909215752.1725525-1-shakeelb@google.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21 11:36         ` Michal Hocko
@ 2020-09-22  4:20           ` Yafang Shao
  2020-09-22  7:27             ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Yafang Shao @ 2020-09-22  4:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: zangchunxin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Mon, Sep 21, 2020 at 7:36 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 21-09-20 19:23:01, Yafang Shao wrote:
> > On Mon, Sep 21, 2020 at 7:05 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> > > > On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> > > > > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > > > > >
> > > > > > In the cgroup v1, we have 'force_mepty' interface. This is very
> > > > > > useful for userspace to actively release memory. But the cgroup
> > > > > > v2 does not.
> > > > > >
> > > > > > This patch reuse cgroup v1's function, but have a new name for
> > > > > > the interface. Because I think 'drop_cache' may be is easier to
> > > > > > understand :)
> > > > >
> > > > > This should really explain a usecase. Global drop_caches is a terrible
> > > > > interface and it has caused many problems in the past. People have
> > > > > learned to use it as a remedy to any problem they might see and cause
> > > > > other problems without realizing that. This is the reason why we even
> > > > > log each attempt to drop caches.
> > > > >
> > > > > I would rather not repeat the same mistake on the memcg level unless
> > > > > there is a very strong reason for it.
> > > > >
> > > >
> > > > I think we'd better add these comments above the function
> > > > mem_cgroup_force_empty() to explain why we don't want to expose this
> > > > interface in cgroup2, otherwise people will continue to send this
> > > > proposal without any strong reason.
> > >
> > > I do not mind people sending this proposal.  "V1 used to have an
> > > interface, we need it in v2 as well" is not really viable without
> > > providing more reasoning on the specific usecase.
> > >
> > > _Any_ patch should have a proper justification. This is nothing really
> > > new to the process and I am wondering why this is coming as a surprise.
> > >
> >
> > Container users always want to drop cache in a specific container,
> > because they used to use drop_caches to fix memory pressure issues.
>
> This is exactly the kind of problems we have seen in the past. There
> should be zero reason to addre potential reclaim problems by dropping
> page cache on the floor. There is a huge cargo cult about this
> procedure and I have seen numerous reports when people complained about
> performance afterwards just to learn that the dropped page cache was one
> of the resons for that.
>
> > Although drop_caches can cause some unexpected issues, it could also
> > fix some issues.
>
> "Some issues" is way too general. We really want to learn about those
> issues and address them properly.
>

One use case in our production environment is that some of our tasks
become very latency sensitive from 7am to 10am, so before these tasks
become active we will use drop_caches to drop page caches generated by
other tasks at night to avoid these tasks triggering direct reclaim.
The best way to do it is to fix the latency in direct reclaim, but it
will take great effort. while drop_caches give us an easier way to
achieve the same goal.
IOW, drop_caches give the users an option to achieve their goal before
they find a better solution.

> > So container users want to use it in containers as
> > well.
> > If this feature is not implemented in cgroup, they will ask you why
> > but there is no explanation in the kernel.
>
> There is no usecase that would really require it so far.
>
> > Regarding the memory.high, it is not perfect as well, because you have
> > to set it to 0 to drop_caches, and the processes in the containers
> > have to reclaim pages as well because they reach the memory.high, but
> > memory.force_empty won't make other processes to reclaim.
>
> Since 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering
> memory.high") the limit is set after the reclaim so the race window when
> somebody would be pushed to high limit reclaim is reduced. But I do
> agree this is just a workaround.
>
> > That doesn't mean I agree to add this interface, while I really mean
> > that if we discard one feature we'd better explain why.
>
> We need to understand why somebody wants an interface because once it is
> added it will have to be maintained for ever.
> --
> Michal Hocko
> SUSE Labs



-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22  4:20           ` Yafang Shao
@ 2020-09-22  7:27             ` Michal Hocko
  2020-09-22  8:06               ` Yafang Shao
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2020-09-22  7:27 UTC (permalink / raw)
  To: Yafang Shao
  Cc: zangchunxin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Tue 22-09-20 12:20:52, Yafang Shao wrote:
> On Mon, Sep 21, 2020 at 7:36 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 21-09-20 19:23:01, Yafang Shao wrote:
> > > On Mon, Sep 21, 2020 at 7:05 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> > > > > On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > >
> > > > > > On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> > > > > > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > > > > > >
> > > > > > > In the cgroup v1, we have 'force_mepty' interface. This is very
> > > > > > > useful for userspace to actively release memory. But the cgroup
> > > > > > > v2 does not.
> > > > > > >
> > > > > > > This patch reuse cgroup v1's function, but have a new name for
> > > > > > > the interface. Because I think 'drop_cache' may be is easier to
> > > > > > > understand :)
> > > > > >
> > > > > > This should really explain a usecase. Global drop_caches is a terrible
> > > > > > interface and it has caused many problems in the past. People have
> > > > > > learned to use it as a remedy to any problem they might see and cause
> > > > > > other problems without realizing that. This is the reason why we even
> > > > > > log each attempt to drop caches.
> > > > > >
> > > > > > I would rather not repeat the same mistake on the memcg level unless
> > > > > > there is a very strong reason for it.
> > > > > >
> > > > >
> > > > > I think we'd better add these comments above the function
> > > > > mem_cgroup_force_empty() to explain why we don't want to expose this
> > > > > interface in cgroup2, otherwise people will continue to send this
> > > > > proposal without any strong reason.
> > > >
> > > > I do not mind people sending this proposal.  "V1 used to have an
> > > > interface, we need it in v2 as well" is not really viable without
> > > > providing more reasoning on the specific usecase.
> > > >
> > > > _Any_ patch should have a proper justification. This is nothing really
> > > > new to the process and I am wondering why this is coming as a surprise.
> > > >
> > >
> > > Container users always want to drop cache in a specific container,
> > > because they used to use drop_caches to fix memory pressure issues.
> >
> > This is exactly the kind of problems we have seen in the past. There
> > should be zero reason to addre potential reclaim problems by dropping
> > page cache on the floor. There is a huge cargo cult about this
> > procedure and I have seen numerous reports when people complained about
> > performance afterwards just to learn that the dropped page cache was one
> > of the resons for that.
> >
> > > Although drop_caches can cause some unexpected issues, it could also
> > > fix some issues.
> >
> > "Some issues" is way too general. We really want to learn about those
> > issues and address them properly.
> >
> 
> One use case in our production environment is that some of our tasks
> become very latency sensitive from 7am to 10am, so before these tasks
> become active we will use drop_caches to drop page caches generated by
> other tasks at night to avoid these tasks triggering direct reclaim.
>
> The best way to do it is to fix the latency in direct reclaim, but it
> will take great effort.

What is the latency triggered by the memory reclaim? It should be mostly
a clean page cache right as drop_caches only drops clean pages. Or is
this more about [id]cache? Do you have any profiles where is the time
spent?

> while drop_caches give us an easier way to achieve the same goal.

It papers over real problems and that is my earlier comment about.

> IOW, drop_caches give the users an option to achieve their goal before
> they find a better solution.

You can achieve the same by a different configuration already. You can
isolate your page cache hungry overnight (likely a backup) workload into
its own memcg. You can either use an aggressive high limit during the
run or simply reduce the high/max limit after the work is done.

If you cannot isolate that easily then you can lower high limit
temporarily before your peak workload.

Really throwing all the page cache away just to prepare for a peak
workload sounds like a bad idea to me. You are potentially throwing
data that you have to spend time to refault again.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22  7:27             ` Michal Hocko
@ 2020-09-22  8:06               ` Yafang Shao
  2020-09-22 10:01                 ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Yafang Shao @ 2020-09-22  8:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: zangchunxin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Tue, Sep 22, 2020 at 3:27 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 22-09-20 12:20:52, Yafang Shao wrote:
> > On Mon, Sep 21, 2020 at 7:36 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 21-09-20 19:23:01, Yafang Shao wrote:
> > > > On Mon, Sep 21, 2020 at 7:05 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> > > > > > On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > >
> > > > > > > On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> > > > > > > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > > > > > > >
> > > > > > > > In the cgroup v1, we have 'force_mepty' interface. This is very
> > > > > > > > useful for userspace to actively release memory. But the cgroup
> > > > > > > > v2 does not.
> > > > > > > >
> > > > > > > > This patch reuse cgroup v1's function, but have a new name for
> > > > > > > > the interface. Because I think 'drop_cache' may be is easier to
> > > > > > > > understand :)
> > > > > > >
> > > > > > > This should really explain a usecase. Global drop_caches is a terrible
> > > > > > > interface and it has caused many problems in the past. People have
> > > > > > > learned to use it as a remedy to any problem they might see and cause
> > > > > > > other problems without realizing that. This is the reason why we even
> > > > > > > log each attempt to drop caches.
> > > > > > >
> > > > > > > I would rather not repeat the same mistake on the memcg level unless
> > > > > > > there is a very strong reason for it.
> > > > > > >
> > > > > >
> > > > > > I think we'd better add these comments above the function
> > > > > > mem_cgroup_force_empty() to explain why we don't want to expose this
> > > > > > interface in cgroup2, otherwise people will continue to send this
> > > > > > proposal without any strong reason.
> > > > >
> > > > > I do not mind people sending this proposal.  "V1 used to have an
> > > > > interface, we need it in v2 as well" is not really viable without
> > > > > providing more reasoning on the specific usecase.
> > > > >
> > > > > _Any_ patch should have a proper justification. This is nothing really
> > > > > new to the process and I am wondering why this is coming as a surprise.
> > > > >
> > > >
> > > > Container users always want to drop cache in a specific container,
> > > > because they used to use drop_caches to fix memory pressure issues.
> > >
> > > This is exactly the kind of problems we have seen in the past. There
> > > should be zero reason to addre potential reclaim problems by dropping
> > > page cache on the floor. There is a huge cargo cult about this
> > > procedure and I have seen numerous reports when people complained about
> > > performance afterwards just to learn that the dropped page cache was one
> > > of the resons for that.
> > >
> > > > Although drop_caches can cause some unexpected issues, it could also
> > > > fix some issues.
> > >
> > > "Some issues" is way too general. We really want to learn about those
> > > issues and address them properly.
> > >
> >
> > One use case in our production environment is that some of our tasks
> > become very latency sensitive from 7am to 10am, so before these tasks
> > become active we will use drop_caches to drop page caches generated by
> > other tasks at night to avoid these tasks triggering direct reclaim.
> >
> > The best way to do it is to fix the latency in direct reclaim, but it
> > will take great effort.
>
> What is the latency triggered by the memory reclaim? It should be mostly
> a clean page cache right as drop_caches only drops clean pages. Or is
> this more about [id]cache? Do you have any profiles where is the time
> spent?
>

Yes, we have analyzed the issues in the direct reclaim, but that is
not the point.
The point is that each case may take us several days to analyze, while
the user can't wait, so they will use drop_caches to workaround it
until we find the solution.

> > while drop_caches give us an easier way to achieve the same goal.
>
> It papers over real problems and that is my earlier comment about.
>
> > IOW, drop_caches give the users an option to achieve their goal before
> > they find a better solution.
>
> You can achieve the same by a different configuration already. You can
> isolate your page cache hungry overnight (likely a backup) workload into
> its own memcg. You can either use an aggressive high limit during the
> run or simply reduce the high/max limit after the work is done.
>
> If you cannot isolate that easily then you can lower high limit
> temporarily before your peak workload.
>

Right, there're many better solutions if you have _enough_ time.
But as I described above, the user needs to fix it ASAP before you
find a better solution.

> Really throwing all the page cache away just to prepare for a peak
> workload sounds like a bad idea to me. You are potentially throwing
> data that you have to spend time to refault again.

If some tasks work at night but idle at day, while the others work at
day but idle at night, it is not bad to drop all caches when you
switch the workload.
It may be useless to drop these caches, but these caches are really
useless for the next workload.

-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-21 11:05     ` Michal Hocko
  2020-09-21 11:23       ` Yafang Shao
@ 2020-09-22  9:43       ` Chunxin Zang
  2020-09-22  9:51         ` Chris Down
  1 sibling, 1 reply; 21+ messages in thread
From: Chunxin Zang @ 2020-09-22  9:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Yafang Shao, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Mon, Sep 21, 2020 at 7:05 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 21-09-20 18:55:40, Yafang Shao wrote:
> > On Mon, Sep 21, 2020 at 4:12 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 21-09-20 16:02:55, zangchunxin@bytedance.com wrote:
> > > > From: Chunxin Zang <zangchunxin@bytedance.com>
> > > >
> > > > In the cgroup v1, we have 'force_mepty' interface. This is very
> > > > useful for userspace to actively release memory. But the cgroup
> > > > v2 does not.
> > > >
> > > > This patch reuse cgroup v1's function, but have a new name for
> > > > the interface. Because I think 'drop_cache' may be is easier to
> > > > understand :)
> > >
> > > This should really explain a usecase. Global drop_caches is a terrible
> > > interface and it has caused many problems in the past. People have
> > > learned to use it as a remedy to any problem they might see and cause
> > > other problems without realizing that. This is the reason why we even
> > > log each attempt to drop caches.
> > >
> > > I would rather not repeat the same mistake on the memcg level unless
> > > there is a very strong reason for it.
> > >
> >
> > I think we'd better add these comments above the function
> > mem_cgroup_force_empty() to explain why we don't want to expose this
> > interface in cgroup2, otherwise people will continue to send this
> > proposal without any strong reason.
>
> I do not mind people sending this proposal.  "V1 used to have an
> interface, we need it in v2 as well" is not really viable without
> providing more reasoning on the specific usecase.
>
> _Any_ patch should have a proper justification. This is nothing really
> new to the process and I am wondering why this is coming as a surprise.
>

I'm so sorry for that.
My usecase is that there are two types of services in one server. They
have difference
priorities. Type_A has the highest priority, we need to ensure it's
schedule latency、I/O
latency、memory enough. Type_B has the lowest priority, we expect it
will not affect
Type_A when executed.
So Type_A could use memory without any limit. Type_B could use memory
only when the
memory is absolutely sufficient. But we cannot estimate how much
memory Type_B should
use. Because everything is dynamic. So we can't set Type_B's memory.high.

So we want to release the memory of Type_B when global memory is
insufficient in order
to ensure the quality of service of Type_A . In the past, we used the
'force_empty' interface
of cgroup v1.

> --
> Michal Hocko
> SUSE Labs

Best wishes
Chunxin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22  9:43       ` [External] " Chunxin Zang
@ 2020-09-22  9:51         ` Chris Down
  2020-09-22 10:24           ` Chunxin Zang
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Down @ 2020-09-22  9:51 UTC (permalink / raw)
  To: Chunxin Zang
  Cc: Michal Hocko, Yafang Shao, Johannes Weiner, Vladimir Davydov,
	Andrew Morton, Tejun Heo, lizefan, Jonathan Corbet,
	Alexei Starovoitov, Daniel Borkmann, kafai, Song Liu,
	Yonghong Song, andriin, john.fastabend, kpsingh, Cgroups,
	linux-doc, Linux MM, LKML, netdev, bpf

Chunxin Zang writes:
>My usecase is that there are two types of services in one server. They
>have difference
>priorities. Type_A has the highest priority, we need to ensure it's
>schedule latency、I/O
>latency、memory enough. Type_B has the lowest priority, we expect it
>will not affect
>Type_A when executed.
>So Type_A could use memory without any limit. Type_B could use memory
>only when the
>memory is absolutely sufficient. But we cannot estimate how much
>memory Type_B should
>use. Because everything is dynamic. So we can't set Type_B's memory.high.
>
>So we want to release the memory of Type_B when global memory is
>insufficient in order
>to ensure the quality of service of Type_A . In the past, we used the
>'force_empty' interface
>of cgroup v1.

This sounds like a perfect use case for memory.low on Type_A, and it's pretty 
much exactly what we invented it for. What's the problem with that?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22  8:06               ` Yafang Shao
@ 2020-09-22 10:01                 ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2020-09-22 10:01 UTC (permalink / raw)
  To: Yafang Shao
  Cc: zangchunxin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Tejun Heo, lizefan, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, kafai, Song Liu, Yonghong Song, andriin,
	john.fastabend, kpsingh, Cgroups, linux-doc, Linux MM, LKML,
	netdev, bpf

On Tue 22-09-20 16:06:31, Yafang Shao wrote:
> On Tue, Sep 22, 2020 at 3:27 PM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > What is the latency triggered by the memory reclaim? It should be mostly
> > a clean page cache right as drop_caches only drops clean pages. Or is
> > this more about [id]cache? Do you have any profiles where is the time
> > spent?
> >
> 
> Yes, we have analyzed the issues in the direct reclaim, but that is
> not the point.

Are those fixed?

> The point is that each case may take us several days to analyze, while
> the user can't wait, so they will use drop_caches to workaround it
> until we find the solution.

As I've said there are several options to achieve an immediate action.
Careful resource domains configuration will certainly help with that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22  9:51         ` Chris Down
@ 2020-09-22 10:24           ` Chunxin Zang
  2020-09-22 10:42             ` Chris Down
  0 siblings, 1 reply; 21+ messages in thread
From: Chunxin Zang @ 2020-09-22 10:24 UTC (permalink / raw)
  To: Chris Down
  Cc: Michal Hocko, Yafang Shao, Johannes Weiner, Vladimir Davydov,
	Andrew Morton, Tejun Heo, lizefan, Jonathan Corbet,
	Alexei Starovoitov, Daniel Borkmann, kafai, Song Liu,
	Yonghong Song, andriin, john.fastabend, kpsingh, Cgroups,
	linux-doc, Linux MM, LKML, netdev, bpf

On Tue, Sep 22, 2020 at 5:51 PM Chris Down <chris@chrisdown.name> wrote:
>
> Chunxin Zang writes:
> >My usecase is that there are two types of services in one server. They
> >have difference
> >priorities. Type_A has the highest priority, we need to ensure it's
> >schedule latency、I/O
> >latency、memory enough. Type_B has the lowest priority, we expect it
> >will not affect
> >Type_A when executed.
> >So Type_A could use memory without any limit. Type_B could use memory
> >only when the
> >memory is absolutely sufficient. But we cannot estimate how much
> >memory Type_B should
> >use. Because everything is dynamic. So we can't set Type_B's memory.high.
> >
> >So we want to release the memory of Type_B when global memory is
> >insufficient in order
> >to ensure the quality of service of Type_A . In the past, we used the
> >'force_empty' interface
> >of cgroup v1.
>
> This sounds like a perfect use case for memory.low on Type_A, and it's pretty
> much exactly what we invented it for. What's the problem with that?

But we cannot estimate how much memory Type_A uses at least.
For example:
total memory: 100G
At the beginning, Type_A was in an idle state, and it only used 10G of memory.
The load is very low. We want to run Type_B to avoid wasting machine resources.
When Type_B runs for a while, it used 80G of memory.
At this time Type_A is busy, it needs more memory.

Usually we will reclaim the memory of B first for Type_A . If not
enough, we will
kill Type_B.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22 10:24           ` Chunxin Zang
@ 2020-09-22 10:42             ` Chris Down
  2020-09-22 12:37               ` Chunxin Zang
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Down @ 2020-09-22 10:42 UTC (permalink / raw)
  To: Chunxin Zang
  Cc: Michal Hocko, Yafang Shao, Johannes Weiner, Vladimir Davydov,
	Andrew Morton, Tejun Heo, lizefan, Jonathan Corbet,
	Alexei Starovoitov, Daniel Borkmann, kafai, Song Liu,
	Yonghong Song, andriin, john.fastabend, kpsingh, Cgroups,
	linux-doc, Linux MM, LKML, netdev, bpf

Chunxin Zang writes:
>On Tue, Sep 22, 2020 at 5:51 PM Chris Down <chris@chrisdown.name> wrote:
>>
>> Chunxin Zang writes:
>> >My usecase is that there are two types of services in one server. They
>> >have difference
>> >priorities. Type_A has the highest priority, we need to ensure it's
>> >schedule latency、I/O
>> >latency、memory enough. Type_B has the lowest priority, we expect it
>> >will not affect
>> >Type_A when executed.
>> >So Type_A could use memory without any limit. Type_B could use memory
>> >only when the
>> >memory is absolutely sufficient. But we cannot estimate how much
>> >memory Type_B should
>> >use. Because everything is dynamic. So we can't set Type_B's memory.high.
>> >
>> >So we want to release the memory of Type_B when global memory is
>> >insufficient in order
>> >to ensure the quality of service of Type_A . In the past, we used the
>> >'force_empty' interface
>> >of cgroup v1.
>>
>> This sounds like a perfect use case for memory.low on Type_A, and it's pretty
>> much exactly what we invented it for. What's the problem with that?
>
>But we cannot estimate how much memory Type_A uses at least.

memory.low allows ballparking, you don't have to know exactly how much it uses.  
Any amount of protection biases reclaim away from that cgroup.

>For example:
>total memory: 100G
>At the beginning, Type_A was in an idle state, and it only used 10G of memory.
>The load is very low. We want to run Type_B to avoid wasting machine resources.
>When Type_B runs for a while, it used 80G of memory.
>At this time Type_A is busy, it needs more memory.

Ok, so set memory.low for Type_A close to your maximum expected value.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22 10:42             ` Chris Down
@ 2020-09-22 12:37               ` Chunxin Zang
  2020-09-22 12:43                 ` Chris Down
  2020-09-22 19:57                 ` Shakeel Butt
  0 siblings, 2 replies; 21+ messages in thread
From: Chunxin Zang @ 2020-09-22 12:37 UTC (permalink / raw)
  To: Chris Down
  Cc: Michal Hocko, Yafang Shao, Johannes Weiner, Vladimir Davydov,
	Andrew Morton, Tejun Heo, lizefan, Jonathan Corbet,
	Alexei Starovoitov, Daniel Borkmann, kafai, Song Liu,
	Yonghong Song, andriin, john.fastabend, kpsingh, Cgroups,
	linux-doc, Linux MM, LKML, netdev, bpf

On Tue, Sep 22, 2020 at 6:42 PM Chris Down <chris@chrisdown.name> wrote:
>
> Chunxin Zang writes:
> >On Tue, Sep 22, 2020 at 5:51 PM Chris Down <chris@chrisdown.name> wrote:
> >>
> >> Chunxin Zang writes:
> >> >My usecase is that there are two types of services in one server. They
> >> >have difference
> >> >priorities. Type_A has the highest priority, we need to ensure it's
> >> >schedule latency、I/O
> >> >latency、memory enough. Type_B has the lowest priority, we expect it
> >> >will not affect
> >> >Type_A when executed.
> >> >So Type_A could use memory without any limit. Type_B could use memory
> >> >only when the
> >> >memory is absolutely sufficient. But we cannot estimate how much
> >> >memory Type_B should
> >> >use. Because everything is dynamic. So we can't set Type_B's memory.high.
> >> >
> >> >So we want to release the memory of Type_B when global memory is
> >> >insufficient in order
> >> >to ensure the quality of service of Type_A . In the past, we used the
> >> >'force_empty' interface
> >> >of cgroup v1.
> >>
> >> This sounds like a perfect use case for memory.low on Type_A, and it's pretty
> >> much exactly what we invented it for. What's the problem with that?
> >
> >But we cannot estimate how much memory Type_A uses at least.
>
> memory.low allows ballparking, you don't have to know exactly how much it uses.
> Any amount of protection biases reclaim away from that cgroup.
>
> >For example:
> >total memory: 100G
> >At the beginning, Type_A was in an idle state, and it only used 10G of memory.
> >The load is very low. We want to run Type_B to avoid wasting machine resources.
> >When Type_B runs for a while, it used 80G of memory.
> >At this time Type_A is busy, it needs more memory.
>
> Ok, so set memory.low for Type_A close to your maximum expected value.

Please forgive me for not being able to understand why setting
memory.low for Type_A can solve the problem.
In my scene, Type_A is the most important, so I will set 100G to memory.low.
But 'memory.low' only takes effect passively when the kernel is
reclaiming memory. It means that reclaim Type_B's memory only when
Type_A  in alloc memory slow path. This will affect Type_A's
performance.
We want to reclaim Type_B's memory in advance when A is expected to be busy.

Best wishes
Chunxin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22 12:37               ` Chunxin Zang
@ 2020-09-22 12:43                 ` Chris Down
  2020-09-23  2:35                   ` Chunxin Zang
  2020-09-22 19:57                 ` Shakeel Butt
  1 sibling, 1 reply; 21+ messages in thread
From: Chris Down @ 2020-09-22 12:43 UTC (permalink / raw)
  To: Chunxin Zang
  Cc: Michal Hocko, Yafang Shao, Johannes Weiner, Vladimir Davydov,
	Andrew Morton, Tejun Heo, lizefan, Jonathan Corbet,
	Alexei Starovoitov, Daniel Borkmann, kafai, Song Liu,
	Yonghong Song, andriin, john.fastabend, kpsingh, Cgroups,
	linux-doc, Linux MM, LKML, netdev, bpf

Chunxin Zang writes:
>Please forgive me for not being able to understand why setting
>memory.low for Type_A can solve the problem.
>In my scene, Type_A is the most important, so I will set 100G to memory.low.
>But 'memory.low' only takes effect passively when the kernel is
>reclaiming memory. It means that reclaim Type_B's memory only when
>Type_A  in alloc memory slow path. This will affect Type_A's
>performance.
>We want to reclaim Type_B's memory in advance when A is expected to be busy.

That's what kswapd reclaim is for, so this distinction is meaningless without 
measurements :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22 12:37               ` Chunxin Zang
  2020-09-22 12:43                 ` Chris Down
@ 2020-09-22 19:57                 ` Shakeel Butt
  2020-09-23  2:40                   ` Chunxin Zang
  1 sibling, 1 reply; 21+ messages in thread
From: Shakeel Butt @ 2020-09-22 19:57 UTC (permalink / raw)
  To: Chunxin Zang
  Cc: Chris Down, Michal Hocko, Yafang Shao, Johannes Weiner,
	Vladimir Davydov, Andrew Morton, Tejun Heo, Li Zefan,
	Jonathan Corbet, Alexei Starovoitov, Daniel Borkmann, kafai,
	Song Liu, Yonghong Song, andriin, john.fastabend, kpsingh,
	Cgroups, linux-doc, Linux MM, LKML, netdev, bpf

On Tue, Sep 22, 2020 at 5:37 AM Chunxin Zang <zangchunxin@bytedance.com> wrote:
>
> On Tue, Sep 22, 2020 at 6:42 PM Chris Down <chris@chrisdown.name> wrote:
> >
> > Chunxin Zang writes:
> > >On Tue, Sep 22, 2020 at 5:51 PM Chris Down <chris@chrisdown.name> wrote:
> > >>
> > >> Chunxin Zang writes:
> > >> >My usecase is that there are two types of services in one server. They
> > >> >have difference
> > >> >priorities. Type_A has the highest priority, we need to ensure it's
> > >> >schedule latency、I/O
> > >> >latency、memory enough. Type_B has the lowest priority, we expect it
> > >> >will not affect
> > >> >Type_A when executed.
> > >> >So Type_A could use memory without any limit. Type_B could use memory
> > >> >only when the
> > >> >memory is absolutely sufficient. But we cannot estimate how much
> > >> >memory Type_B should
> > >> >use. Because everything is dynamic. So we can't set Type_B's memory.high.
> > >> >
> > >> >So we want to release the memory of Type_B when global memory is
> > >> >insufficient in order
> > >> >to ensure the quality of service of Type_A . In the past, we used the
> > >> >'force_empty' interface
> > >> >of cgroup v1.
> > >>
> > >> This sounds like a perfect use case for memory.low on Type_A, and it's pretty
> > >> much exactly what we invented it for. What's the problem with that?
> > >
> > >But we cannot estimate how much memory Type_A uses at least.
> >
> > memory.low allows ballparking, you don't have to know exactly how much it uses.
> > Any amount of protection biases reclaim away from that cgroup.
> >
> > >For example:
> > >total memory: 100G
> > >At the beginning, Type_A was in an idle state, and it only used 10G of memory.
> > >The load is very low. We want to run Type_B to avoid wasting machine resources.
> > >When Type_B runs for a while, it used 80G of memory.
> > >At this time Type_A is busy, it needs more memory.
> >
> > Ok, so set memory.low for Type_A close to your maximum expected value.
>
> Please forgive me for not being able to understand why setting
> memory.low for Type_A can solve the problem.
> In my scene, Type_A is the most important, so I will set 100G to memory.low.
> But 'memory.low' only takes effect passively when the kernel is
> reclaiming memory. It means that reclaim Type_B's memory only when
> Type_A  in alloc memory slow path. This will affect Type_A's
> performance.
> We want to reclaim Type_B's memory in advance when A is expected to be busy.
>

How will you know when to reclaim from B? Are you polling /proc/meminfo?

From what I understand, you want to proactively reclaim from B, so
that A does not go into global reclaim and in the worst case kill B,
right?

BTW you can use memory.high to reclaim from B by setting it lower than
memory.current of B and reset it to 'max' once the reclaim is done.
Since 'B' is not high priority (I am assuming not a latency sensitive
workload), B hitting temporary memory.high should not be an issue.
Also I am assuming you don't much care about the amount of memory to
be reclaimed from B, so I think memory.high can fulfil your use-case.
However if in future you decide to proactively reclaim from all the
jobs based on their priority i.e. more aggressive reclaim from B and a
little bit reclaim from A then memory.high is not a good interface.

Shakeel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22 12:43                 ` Chris Down
@ 2020-09-23  2:35                   ` Chunxin Zang
  0 siblings, 0 replies; 21+ messages in thread
From: Chunxin Zang @ 2020-09-23  2:35 UTC (permalink / raw)
  To: Chris Down
  Cc: Michal Hocko, Yafang Shao, Johannes Weiner, Vladimir Davydov,
	Andrew Morton, Tejun Heo, lizefan, Jonathan Corbet,
	Alexei Starovoitov, Daniel Borkmann, kafai, Song Liu,
	Yonghong Song, andriin, john.fastabend, kpsingh, Cgroups,
	linux-doc, Linux MM, LKML, netdev, bpf

On Tue, Sep 22, 2020 at 8:43 PM Chris Down <chris@chrisdown.name> wrote:
>
> Chunxin Zang writes:
> >Please forgive me for not being able to understand why setting
> >memory.low for Type_A can solve the problem.
> >In my scene, Type_A is the most important, so I will set 100G to memory.low.
> >But 'memory.low' only takes effect passively when the kernel is
> >reclaiming memory. It means that reclaim Type_B's memory only when
> >Type_A  in alloc memory slow path. This will affect Type_A's
> >performance.
> >We want to reclaim Type_B's memory in advance when A is expected to be busy.
>
> That's what kswapd reclaim is for, so this distinction is meaningless without
> measurements :-)

Thanks for these suggestions, I will give it a try.

Best wishes
Chunxin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2
  2020-09-22 19:57                 ` Shakeel Butt
@ 2020-09-23  2:40                   ` Chunxin Zang
  0 siblings, 0 replies; 21+ messages in thread
From: Chunxin Zang @ 2020-09-23  2:40 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Chris Down, Michal Hocko, Yafang Shao, Johannes Weiner,
	Vladimir Davydov, Andrew Morton, Tejun Heo, Li Zefan,
	Jonathan Corbet, Alexei Starovoitov, Daniel Borkmann, kafai,
	Song Liu, Yonghong Song, andriin, john.fastabend, kpsingh,
	Cgroups, linux-doc, Linux MM, LKML, netdev, bpf

On Wed, Sep 23, 2020 at 3:57 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Tue, Sep 22, 2020 at 5:37 AM Chunxin Zang <zangchunxin@bytedance.com> wrote:
> >
> > On Tue, Sep 22, 2020 at 6:42 PM Chris Down <chris@chrisdown.name> wrote:
> > >
> > > Chunxin Zang writes:
> > > >On Tue, Sep 22, 2020 at 5:51 PM Chris Down <chris@chrisdown.name> wrote:
> > > >>
> > > >> Chunxin Zang writes:
> > > >> >My usecase is that there are two types of services in one server. They
> > > >> >have difference
> > > >> >priorities. Type_A has the highest priority, we need to ensure it's
> > > >> >schedule latency、I/O
> > > >> >latency、memory enough. Type_B has the lowest priority, we expect it
> > > >> >will not affect
> > > >> >Type_A when executed.
> > > >> >So Type_A could use memory without any limit. Type_B could use memory
> > > >> >only when the
> > > >> >memory is absolutely sufficient. But we cannot estimate how much
> > > >> >memory Type_B should
> > > >> >use. Because everything is dynamic. So we can't set Type_B's memory.high.
> > > >> >
> > > >> >So we want to release the memory of Type_B when global memory is
> > > >> >insufficient in order
> > > >> >to ensure the quality of service of Type_A . In the past, we used the
> > > >> >'force_empty' interface
> > > >> >of cgroup v1.
> > > >>
> > > >> This sounds like a perfect use case for memory.low on Type_A, and it's pretty
> > > >> much exactly what we invented it for. What's the problem with that?
> > > >
> > > >But we cannot estimate how much memory Type_A uses at least.
> > >
> > > memory.low allows ballparking, you don't have to know exactly how much it uses.
> > > Any amount of protection biases reclaim away from that cgroup.
> > >
> > > >For example:
> > > >total memory: 100G
> > > >At the beginning, Type_A was in an idle state, and it only used 10G of memory.
> > > >The load is very low. We want to run Type_B to avoid wasting machine resources.
> > > >When Type_B runs for a while, it used 80G of memory.
> > > >At this time Type_A is busy, it needs more memory.
> > >
> > > Ok, so set memory.low for Type_A close to your maximum expected value.
> >
> > Please forgive me for not being able to understand why setting
> > memory.low for Type_A can solve the problem.
> > In my scene, Type_A is the most important, so I will set 100G to memory.low.
> > But 'memory.low' only takes effect passively when the kernel is
> > reclaiming memory. It means that reclaim Type_B's memory only when
> > Type_A  in alloc memory slow path. This will affect Type_A's
> > performance.
> > We want to reclaim Type_B's memory in advance when A is expected to be busy.
> >
>
> How will you know when to reclaim from B? Are you polling /proc/meminfo?
>

Monitor global memory usage through the daemon. If the memory is used
80% or 90%, it will reclaim B's memory.

> From what I understand, you want to proactively reclaim from B, so
> that A does not go into global reclaim and in the worst case kill B,
> right?

Yes, it is.

>
> BTW you can use memory.high to reclaim from B by setting it lower than
> memory.current of B and reset it to 'max' once the reclaim is done.
> Since 'B' is not high priority (I am assuming not a latency sensitive
> workload), B hitting temporary memory.high should not be an issue.
> Also I am assuming you don't much care about the amount of memory to
> be reclaimed from B, so I think memory.high can fulfil your use-case.
> However if in future you decide to proactively reclaim from all the
> jobs based on their priority i.e. more aggressive reclaim from B and a
> little bit reclaim from A then memory.high is not a good interface.
>
> Shakeel

Thanks for these suggestions, I will give it a try.

Best wishes
Chunxin

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-09-23  2:41 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-21  8:02 [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2 zangchunxin
2020-09-21  8:12 ` Michal Hocko
2020-09-21 10:42   ` Chris Down
2020-09-21 10:55   ` Yafang Shao
2020-09-21 11:05     ` Michal Hocko
2020-09-21 11:23       ` Yafang Shao
2020-09-21 11:36         ` Michal Hocko
2020-09-22  4:20           ` Yafang Shao
2020-09-22  7:27             ` Michal Hocko
2020-09-22  8:06               ` Yafang Shao
2020-09-22 10:01                 ` Michal Hocko
2020-09-22  9:43       ` [External] " Chunxin Zang
2020-09-22  9:51         ` Chris Down
2020-09-22 10:24           ` Chunxin Zang
2020-09-22 10:42             ` Chris Down
2020-09-22 12:37               ` Chunxin Zang
2020-09-22 12:43                 ` Chris Down
2020-09-23  2:35                   ` Chunxin Zang
2020-09-22 19:57                 ` Shakeel Butt
2020-09-23  2:40                   ` Chunxin Zang
2020-09-21 15:55 ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).