linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm, memcg: reset low limit during memcg offlining
@ 2017-07-25 11:40 Roman Gushchin
  2017-07-25 11:58 ` Michal Hocko
  2017-07-25 12:05 ` Vladimir Davydov
  0 siblings, 2 replies; 14+ messages in thread
From: Roman Gushchin @ 2017-07-25 11:40 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Tejun Heo, Johannes Weiner, Michal Hocko,
	Vladimir Davydov, kernel-team, cgroups, linux-kernel

A removed memory cgroup with a defined low limit and some belonging
pagecache has very low chances to be freed.

If a cgroup has been removed, there is likely no memory pressure inside
the cgroup, and the pagecache is protected from the external pressure
by the defined low limit. The cgroup will be freed only after
the reclaim of all belonging pages. And it will not happen until
there are any reclaimable memory in the system. That means,
there is a good chance, that a cold pagecache will reside
in the memory for an undefined amount of time, wasting
system resources.

Fix this issue by zeroing memcg->low during memcg offlining.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/memcontrol.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aed11b2d0251..2aa204b8f9fd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4300,6 +4300,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	}
 	spin_unlock(&memcg->event_list_lock);
 
+	memcg->low = 0;
+
 	memcg_offline_kmem(memcg);
 	wb_memcg_offline(memcg);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: reset low limit during memcg offlining
  2017-07-25 11:40 [PATCH] mm, memcg: reset low limit during memcg offlining Roman Gushchin
@ 2017-07-25 11:58 ` Michal Hocko
  2017-07-25 12:06   ` Roman Gushchin
  2017-07-25 12:05 ` Vladimir Davydov
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-07-25 11:58 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Tejun Heo, Johannes Weiner, Vladimir Davydov,
	kernel-team, cgroups, linux-kernel

On Tue 25-07-17 12:40:47, Roman Gushchin wrote:
> A removed memory cgroup with a defined low limit and some belonging
> pagecache has very low chances to be freed.
> 
> If a cgroup has been removed, there is likely no memory pressure inside
> the cgroup, and the pagecache is protected from the external pressure
> by the defined low limit. The cgroup will be freed only after
> the reclaim of all belonging pages. And it will not happen until
> there are any reclaimable memory in the system. That means,
> there is a good chance, that a cold pagecache will reside
> in the memory for an undefined amount of time, wasting
> system resources.
> 
> Fix this issue by zeroing memcg->low during memcg offlining.

Very well spotted! This goes all the way down to low limit inclusion
AFAICS. I would be even tempted to mark it for stable because hiding
some memory from reclaim basically indefinitely is not good. We might
have been just lucky nobody has noticed that yet.

Fixes: 241994ed8649 ("mm: memcontrol: default hierarchy interface for memory")

> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: kernel-team@fb.com
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/memcontrol.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aed11b2d0251..2aa204b8f9fd 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4300,6 +4300,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  	}
>  	spin_unlock(&memcg->event_list_lock);
>  
> +	memcg->low = 0;
> +
>  	memcg_offline_kmem(memcg);
>  	wb_memcg_offline(memcg);
>  
> -- 
> 2.13.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: reset low limit during memcg offlining
  2017-07-25 11:40 [PATCH] mm, memcg: reset low limit during memcg offlining Roman Gushchin
  2017-07-25 11:58 ` Michal Hocko
@ 2017-07-25 12:05 ` Vladimir Davydov
  2017-07-25 12:31   ` Roman Gushchin
  1 sibling, 1 reply; 14+ messages in thread
From: Vladimir Davydov @ 2017-07-25 12:05 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Tejun Heo, Johannes Weiner, Michal Hocko, kernel-team,
	cgroups, linux-kernel

On Tue, Jul 25, 2017 at 12:40:47PM +0100, Roman Gushchin wrote:
> A removed memory cgroup with a defined low limit and some belonging
> pagecache has very low chances to be freed.
> 
> If a cgroup has been removed, there is likely no memory pressure inside
> the cgroup, and the pagecache is protected from the external pressure
> by the defined low limit. The cgroup will be freed only after
> the reclaim of all belonging pages. And it will not happen until
> there are any reclaimable memory in the system. That means,
> there is a good chance, that a cold pagecache will reside
> in the memory for an undefined amount of time, wasting
> system resources.
> 
> Fix this issue by zeroing memcg->low during memcg offlining.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: kernel-team@fb.com
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/memcontrol.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index aed11b2d0251..2aa204b8f9fd 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4300,6 +4300,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  	}
>  	spin_unlock(&memcg->event_list_lock);
>  
> +	memcg->low = 0;
> +
>  	memcg_offline_kmem(memcg);
>  	wb_memcg_offline(memcg);
>  

We already have that - see mem_cgroup_css_reset().

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: reset low limit during memcg offlining
  2017-07-25 11:58 ` Michal Hocko
@ 2017-07-25 12:06   ` Roman Gushchin
  0 siblings, 0 replies; 14+ messages in thread
From: Roman Gushchin @ 2017-07-25 12:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Tejun Heo, Johannes Weiner, Vladimir Davydov,
	kernel-team, cgroups, linux-kernel

On Tue, Jul 25, 2017 at 01:58:08PM +0200, Michal Hocko wrote:
> On Tue 25-07-17 12:40:47, Roman Gushchin wrote:
> > A removed memory cgroup with a defined low limit and some belonging
> > pagecache has very low chances to be freed.
> > 
> > If a cgroup has been removed, there is likely no memory pressure inside
> > the cgroup, and the pagecache is protected from the external pressure
> > by the defined low limit. The cgroup will be freed only after
> > the reclaim of all belonging pages. And it will not happen until
> > there are any reclaimable memory in the system. That means,
> > there is a good chance, that a cold pagecache will reside
> > in the memory for an undefined amount of time, wasting
> > system resources.
> > 
> > Fix this issue by zeroing memcg->low during memcg offlining.
> 
> Very well spotted! This goes all the way down to low limit inclusion
> AFAICS. I would be even tempted to mark it for stable because hiding
> some memory from reclaim basically indefinitely is not good. We might
> have been just lucky nobody has noticed that yet.

I believe it's because there are not so many actual low limit users,
and those who do, are using some offstream patches to mitigate this issue.

Thanks!

Roman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: reset low limit during memcg offlining
  2017-07-25 12:05 ` Vladimir Davydov
@ 2017-07-25 12:31   ` Roman Gushchin
  2017-07-25 12:44     ` Michal Hocko
  2017-07-26  8:30     ` Vladimir Davydov
  0 siblings, 2 replies; 14+ messages in thread
From: Roman Gushchin @ 2017-07-25 12:31 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Tejun Heo, Johannes Weiner, Michal Hocko, kernel-team,
	cgroups, linux-kernel

On Tue, Jul 25, 2017 at 03:05:37PM +0300, Vladimir Davydov wrote:
> On Tue, Jul 25, 2017 at 12:40:47PM +0100, Roman Gushchin wrote:
> > A removed memory cgroup with a defined low limit and some belonging
> > pagecache has very low chances to be freed.
> > 
> > If a cgroup has been removed, there is likely no memory pressure inside
> > the cgroup, and the pagecache is protected from the external pressure
> > by the defined low limit. The cgroup will be freed only after
> > the reclaim of all belonging pages. And it will not happen until
> > there are any reclaimable memory in the system. That means,
> > there is a good chance, that a cold pagecache will reside
> > in the memory for an undefined amount of time, wasting
> > system resources.
> > 
> > Fix this issue by zeroing memcg->low during memcg offlining.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> > Cc: kernel-team@fb.com
> > Cc: cgroups@vger.kernel.org
> > Cc: linux-mm@kvack.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  mm/memcontrol.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index aed11b2d0251..2aa204b8f9fd 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -4300,6 +4300,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
> >  	}
> >  	spin_unlock(&memcg->event_list_lock);
> >  
> > +	memcg->low = 0;
> > +
> >  	memcg_offline_kmem(memcg);
> >  	wb_memcg_offline(memcg);
> >  
> 
> We already have that - see mem_cgroup_css_reset().

Hm, I see...

But are you sure, that calling mem_cgroup_css_reset() from offlining path
is always a good idea?

As I understand, css_reset() callback is intended to _completely_ disable all
limits, as if there were no cgroup at all. And it's main purpose to be called
when controllers are detached from the hierarhy.

Offlining is different: some limits make perfect sence after offlining
(e.g. we want to limit the writeback speed), and other might be tweaked
(e.g. we can set soft limit to prioritize reclaiming of abandoned cgroups).

So, I'd prefer to move this code to the offlining callback,
and not to call css_reset.

But, anyway, thanks for pointing at the mem_cgroup_css_reset().

Roman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: reset low limit during memcg offlining
  2017-07-25 12:31   ` Roman Gushchin
@ 2017-07-25 12:44     ` Michal Hocko
  2017-07-26  8:30     ` Vladimir Davydov
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-07-25 12:44 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Vladimir Davydov, linux-mm, Tejun Heo, Johannes Weiner,
	kernel-team, cgroups, linux-kernel

On Tue 25-07-17 13:31:13, Roman Gushchin wrote:
> On Tue, Jul 25, 2017 at 03:05:37PM +0300, Vladimir Davydov wrote:
> > On Tue, Jul 25, 2017 at 12:40:47PM +0100, Roman Gushchin wrote:
> > > A removed memory cgroup with a defined low limit and some belonging
> > > pagecache has very low chances to be freed.
> > > 
> > > If a cgroup has been removed, there is likely no memory pressure inside
> > > the cgroup, and the pagecache is protected from the external pressure
> > > by the defined low limit. The cgroup will be freed only after
> > > the reclaim of all belonging pages. And it will not happen until
> > > there are any reclaimable memory in the system. That means,
> > > there is a good chance, that a cold pagecache will reside
> > > in the memory for an undefined amount of time, wasting
> > > system resources.
> > > 
> > > Fix this issue by zeroing memcg->low during memcg offlining.
> > > 
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Michal Hocko <mhocko@kernel.org>
> > > Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> > > Cc: kernel-team@fb.com
> > > Cc: cgroups@vger.kernel.org
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-kernel@vger.kernel.org
> > > ---
> > >  mm/memcontrol.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index aed11b2d0251..2aa204b8f9fd 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -4300,6 +4300,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
> > >  	}
> > >  	spin_unlock(&memcg->event_list_lock);
> > >  
> > > +	memcg->low = 0;
> > > +
> > >  	memcg_offline_kmem(memcg);
> > >  	wb_memcg_offline(memcg);
> > >  
> > 
> > We already have that - see mem_cgroup_css_reset().
> 
> Hm, I see...
> 
> But are you sure, that calling mem_cgroup_css_reset() from offlining path
> is always a good idea?

Well, originally I wanted to suggest the same but then I asked the very
same question and couldn't answer it myself. memcg_offline_kmem feels
much more generic.

> As I understand, css_reset() callback is intended to _completely_ disable all
> limits, as if there were no cgroup at all. And it's main purpose to be called
> when controllers are detached from the hierarhy.

yes, that is my understanding as well.
 
> Offlining is different: some limits make perfect sence after offlining
> (e.g. we want to limit the writeback speed), and other might be tweaked
> (e.g. we can set soft limit to prioritize reclaiming of abandoned cgroups).

and the writeback path was exactly the one that triggered my
suspicious...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: reset low limit during memcg offlining
  2017-07-25 12:31   ` Roman Gushchin
  2017-07-25 12:44     ` Michal Hocko
@ 2017-07-26  8:30     ` Vladimir Davydov
  2017-07-26 12:06       ` Tejun Heo
  2017-07-27 13:04       ` [PATCH 1/2] mm, memcg: reset memory.low " Roman Gushchin
  1 sibling, 2 replies; 14+ messages in thread
From: Vladimir Davydov @ 2017-07-26  8:30 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Tejun Heo, Johannes Weiner, Michal Hocko, kernel-team,
	cgroups, linux-kernel

On Tue, Jul 25, 2017 at 01:31:13PM +0100, Roman Gushchin wrote:
> On Tue, Jul 25, 2017 at 03:05:37PM +0300, Vladimir Davydov wrote:
> > On Tue, Jul 25, 2017 at 12:40:47PM +0100, Roman Gushchin wrote:
> > > A removed memory cgroup with a defined low limit and some belonging
> > > pagecache has very low chances to be freed.
> > > 
> > > If a cgroup has been removed, there is likely no memory pressure inside
> > > the cgroup, and the pagecache is protected from the external pressure
> > > by the defined low limit. The cgroup will be freed only after
> > > the reclaim of all belonging pages. And it will not happen until
> > > there are any reclaimable memory in the system. That means,
> > > there is a good chance, that a cold pagecache will reside
> > > in the memory for an undefined amount of time, wasting
> > > system resources.
> > > 
> > > Fix this issue by zeroing memcg->low during memcg offlining.
> > > 
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Michal Hocko <mhocko@kernel.org>
> > > Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> > > Cc: kernel-team@fb.com
> > > Cc: cgroups@vger.kernel.org
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-kernel@vger.kernel.org
> > > ---
> > >  mm/memcontrol.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index aed11b2d0251..2aa204b8f9fd 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -4300,6 +4300,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
> > >  	}
> > >  	spin_unlock(&memcg->event_list_lock);
> > >  
> > > +	memcg->low = 0;
> > > +
> > >  	memcg_offline_kmem(memcg);
> > >  	wb_memcg_offline(memcg);
> > >  
> > 
> > We already have that - see mem_cgroup_css_reset().
> 
> Hm, I see...
> 
> But are you sure, that calling mem_cgroup_css_reset() from offlining path
> is always a good idea?
> 
> As I understand, css_reset() callback is intended to _completely_ disable all
> limits, as if there were no cgroup at all.

But that's exactly what cgroup offline is: deletion of a cgroup as if it
never existed. The fact that we leave the zombie dangling until all
pages charged to the cgroup are gone is an implementation detail. IIRC
we would "reparent" those charges and delete the mem_cgroup right away
if it were not inherently racy.

> And it's main purpose to be called
> when controllers are detached from the hierarhy.
> 
> Offlining is different: some limits make perfect sence after offlining
> (e.g. we want to limit the writeback speed), and other might be tweaked
> (e.g. we can set soft limit to prioritize reclaiming of abandoned cgroups).

The user can't tweak limits of an offline cgroup, because the cgroup
directory no longer exist. So IMHO resetting all limits is reasonable.
If you want to keep the cgroup limits effective, you shouldn't have
deleted it in the first place, I suppose.

You might also want to check out this:

  http://www.spinics.net/lists/linux-mm/msg102995.html

> 
> So, I'd prefer to move this code to the offlining callback,
> and not to call css_reset.
> 
> But, anyway, thanks for pointing at the mem_cgroup_css_reset().

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm, memcg: reset low limit during memcg offlining
  2017-07-26  8:30     ` Vladimir Davydov
@ 2017-07-26 12:06       ` Tejun Heo
  2017-07-27 13:04       ` [PATCH 1/2] mm, memcg: reset memory.low " Roman Gushchin
  1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2017-07-26 12:06 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Roman Gushchin, linux-mm, Johannes Weiner, Michal Hocko,
	kernel-team, cgroups, linux-kernel

Hello, Vladimir.

On Wed, Jul 26, 2017 at 11:30:17AM +0300, Vladimir Davydov wrote:
> > As I understand, css_reset() callback is intended to _completely_ disable all
> > limits, as if there were no cgroup at all.
> 
> But that's exactly what cgroup offline is: deletion of a cgroup as if it
> never existed. The fact that we leave the zombie dangling until all
> pages charged to the cgroup are gone is an implementation detail. IIRC
> we would "reparent" those charges and delete the mem_cgroup right away
> if it were not inherently racy.

That may be true for memcg but not in general.  Think about writeback
IOs servicing dirty pages of a removed cgroup.  Removing a cgroup
shouldn't grant it more resources than when it was alive and changing
the membership to the parent will break that.  For memcg, they seem
the same just because no new major consumption can be generated after
removal.

> The user can't tweak limits of an offline cgroup, because the cgroup
> directory no longer exist. So IMHO resetting all limits is reasonable.
> If you want to keep the cgroup limits effective, you shouldn't have
> deleted it in the first place, I suppose.

I don't think that's the direction we wanna go.  Granting more
resources on removal is surprising.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm, memcg: reset memory.low during memcg offlining
  2017-07-26  8:30     ` Vladimir Davydov
  2017-07-26 12:06       ` Tejun Heo
@ 2017-07-27 13:04       ` Roman Gushchin
  2017-07-27 13:04         ` [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction") Roman Gushchin
                           ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Roman Gushchin @ 2017-07-27 13:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Roman Gushchin, Vladimir Davydov, Tejun Heo, Johannes Weiner,
	Michal Hocko, kernel-team, cgroups, linux-mm

A removed memory cgroup with a defined memory.low and some belonging
pagecache has very low chances to be freed.

If a cgroup has been removed, there is likely no memory pressure inside
the cgroup, and the pagecache is protected from the external pressure
by the defined low limit. The cgroup will be freed only after
the reclaim of all belonging pages. And it will not happen until
there are any reclaimable memory in the system. That means,
there is a good chance, that a cold pagecache will reside
in the memory for an undefined amount of time, wasting
system resources.

This problem was fixed earlier by commit fa06235b8eb0
("cgroup: reset css on destruction"), but it's not a best way
to do it, as we can't really reset all limits/counters during
cgroup offlining.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/memcontrol.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d61133e6af99..7b24210596ea 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4300,6 +4300,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	}
 	spin_unlock(&memcg->event_list_lock);
 
+	memcg->low = 0;
+
 	memcg_offline_kmem(memcg);
 	wb_memcg_offline(memcg);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction")
  2017-07-27 13:04       ` [PATCH 1/2] mm, memcg: reset memory.low " Roman Gushchin
@ 2017-07-27 13:04         ` Roman Gushchin
  2017-07-27 13:52           ` Tejun Heo
  2017-07-27 14:36           ` Johannes Weiner
  2017-07-27 14:35         ` [PATCH 1/2] mm, memcg: reset memory.low during memcg offlining Johannes Weiner
  2017-07-27 14:47         ` Michal Hocko
  2 siblings, 2 replies; 14+ messages in thread
From: Roman Gushchin @ 2017-07-27 13:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Roman Gushchin, Vladimir Davydov, Tejun Heo, Johannes Weiner,
	Michal Hocko, kernel-team, cgroups, linux-mm

Commit fa06235b8eb0 ("cgroup: reset css on destruction") caused
css_reset callback to be called from the offlining path. Although
it solves the problem mentioned in the commit description
("For instance, memory cgroup needs to reset memory.low, otherwise
pages charged to a dead cgroup might never get reclaimed."),
generally speaking, it's not correct.

An offline cgroup can still be a resource domain, and we shouldn't
grant it more resources than it had before deletion.

For instance, if an offline memory cgroup has dirty pages, we should
still imply i/o limits during writeback.

The css_reset callback is designed to return the cgroup state
into the original state, that means reset all limits and counters.
It's spomething different from the offlining, and we shouldn't use
it from the offlining path. Instead, we should adjust necessary
settings from the per-controller css_offline callbacks (e.g. reset
memory.low).

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: kernel-team@fb.com
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 kernel/cgroup/cgroup.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 29c36c075249..4e93482e066c 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4499,9 +4499,6 @@ static void offline_css(struct cgroup_subsys_state *css)
 	if (!(css->flags & CSS_ONLINE))
 		return;
 
-	if (ss->css_reset)
-		ss->css_reset(css);
-
 	if (ss->css_offline)
 		ss->css_offline(css);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction")
  2017-07-27 13:04         ` [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction") Roman Gushchin
@ 2017-07-27 13:52           ` Tejun Heo
  2017-07-27 14:36           ` Johannes Weiner
  1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2017-07-27 13:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-kernel, Vladimir Davydov, Johannes Weiner, Michal Hocko,
	kernel-team, cgroups, linux-mm

On Thu, Jul 27, 2017 at 02:04:28PM +0100, Roman Gushchin wrote:
> Commit fa06235b8eb0 ("cgroup: reset css on destruction") caused
> css_reset callback to be called from the offlining path. Although
> it solves the problem mentioned in the commit description
> ("For instance, memory cgroup needs to reset memory.low, otherwise
> pages charged to a dead cgroup might never get reclaimed."),
> generally speaking, it's not correct.
> 
> An offline cgroup can still be a resource domain, and we shouldn't
> grant it more resources than it had before deletion.
> 
> For instance, if an offline memory cgroup has dirty pages, we should
> still imply i/o limits during writeback.
> 
> The css_reset callback is designed to return the cgroup state
> into the original state, that means reset all limits and counters.
> It's spomething different from the offlining, and we shouldn't use
> it from the offlining path. Instead, we should adjust necessary
> settings from the per-controller css_offline callbacks (e.g. reset
> memory.low).
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: kernel-team@fb.com
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

Acked-by: Tejun Heo <tj@kernel.org>

Please feel free to route with the previous patch through -mm.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm, memcg: reset memory.low during memcg offlining
  2017-07-27 13:04       ` [PATCH 1/2] mm, memcg: reset memory.low " Roman Gushchin
  2017-07-27 13:04         ` [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction") Roman Gushchin
@ 2017-07-27 14:35         ` Johannes Weiner
  2017-07-27 14:47         ` Michal Hocko
  2 siblings, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2017-07-27 14:35 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-kernel, Vladimir Davydov, Tejun Heo, Michal Hocko,
	kernel-team, cgroups, linux-mm, Andrew Morton

CC Andrew - can you route these through the -mm tree please?

On Thu, Jul 27, 2017 at 02:04:27PM +0100, Roman Gushchin wrote:
> A removed memory cgroup with a defined memory.low and some belonging
> pagecache has very low chances to be freed.
> 
> If a cgroup has been removed, there is likely no memory pressure inside
> the cgroup, and the pagecache is protected from the external pressure
> by the defined low limit. The cgroup will be freed only after
> the reclaim of all belonging pages. And it will not happen until
> there are any reclaimable memory in the system. That means,
> there is a good chance, that a cold pagecache will reside
> in the memory for an undefined amount of time, wasting
> system resources.
> 
> This problem was fixed earlier by commit fa06235b8eb0
> ("cgroup: reset css on destruction"), but it's not a best way
> to do it, as we can't really reset all limits/counters during
> cgroup offlining.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: kernel-team@fb.com
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction")
  2017-07-27 13:04         ` [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction") Roman Gushchin
  2017-07-27 13:52           ` Tejun Heo
@ 2017-07-27 14:36           ` Johannes Weiner
  1 sibling, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2017-07-27 14:36 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-kernel, Vladimir Davydov, Tejun Heo, Michal Hocko,
	kernel-team, cgroups, linux-mm, Andrew Morton

On Thu, Jul 27, 2017 at 02:04:28PM +0100, Roman Gushchin wrote:
> Commit fa06235b8eb0 ("cgroup: reset css on destruction") caused
> css_reset callback to be called from the offlining path. Although
> it solves the problem mentioned in the commit description
> ("For instance, memory cgroup needs to reset memory.low, otherwise
> pages charged to a dead cgroup might never get reclaimed."),
> generally speaking, it's not correct.
> 
> An offline cgroup can still be a resource domain, and we shouldn't
> grant it more resources than it had before deletion.
> 
> For instance, if an offline memory cgroup has dirty pages, we should
> still imply i/o limits during writeback.
> 
> The css_reset callback is designed to return the cgroup state
> into the original state, that means reset all limits and counters.
> It's spomething different from the offlining, and we shouldn't use
> it from the offlining path. Instead, we should adjust necessary
> settings from the per-controller css_offline callbacks (e.g. reset
> memory.low).
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: kernel-team@fb.com
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm, memcg: reset memory.low during memcg offlining
  2017-07-27 13:04       ` [PATCH 1/2] mm, memcg: reset memory.low " Roman Gushchin
  2017-07-27 13:04         ` [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction") Roman Gushchin
  2017-07-27 14:35         ` [PATCH 1/2] mm, memcg: reset memory.low during memcg offlining Johannes Weiner
@ 2017-07-27 14:47         ` Michal Hocko
  2 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-07-27 14:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-kernel, Vladimir Davydov, Tejun Heo, Johannes Weiner,
	kernel-team, cgroups, linux-mm

On Thu 27-07-17 14:04:27, Roman Gushchin wrote:
> A removed memory cgroup with a defined memory.low and some belonging
> pagecache has very low chances to be freed.
> 
> If a cgroup has been removed, there is likely no memory pressure inside
> the cgroup, and the pagecache is protected from the external pressure
> by the defined low limit. The cgroup will be freed only after
> the reclaim of all belonging pages. And it will not happen until
> there are any reclaimable memory in the system. That means,
> there is a good chance, that a cold pagecache will reside
> in the memory for an undefined amount of time, wasting
> system resources.
> 
> This problem was fixed earlier by commit fa06235b8eb0
> ("cgroup: reset css on destruction"), but it's not a best way
> to do it, as we can't really reset all limits/counters during
> cgroup offlining.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: kernel-team@fb.com
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

my ack for this patch still holds.
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d61133e6af99..7b24210596ea 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4300,6 +4300,8 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  	}
>  	spin_unlock(&memcg->event_list_lock);
>  
> +	memcg->low = 0;
> +
>  	memcg_offline_kmem(memcg);
>  	wb_memcg_offline(memcg);
>  
> -- 
> 2.13.3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-07-27 14:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-25 11:40 [PATCH] mm, memcg: reset low limit during memcg offlining Roman Gushchin
2017-07-25 11:58 ` Michal Hocko
2017-07-25 12:06   ` Roman Gushchin
2017-07-25 12:05 ` Vladimir Davydov
2017-07-25 12:31   ` Roman Gushchin
2017-07-25 12:44     ` Michal Hocko
2017-07-26  8:30     ` Vladimir Davydov
2017-07-26 12:06       ` Tejun Heo
2017-07-27 13:04       ` [PATCH 1/2] mm, memcg: reset memory.low " Roman Gushchin
2017-07-27 13:04         ` [PATCH 2/2] cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction") Roman Gushchin
2017-07-27 13:52           ` Tejun Heo
2017-07-27 14:36           ` Johannes Weiner
2017-07-27 14:35         ` [PATCH 1/2] mm, memcg: reset memory.low during memcg offlining Johannes Weiner
2017-07-27 14:47         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).