[PATCH] mm: memcontrol: switch to rcu protection in drain_all

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock()
@ 2019-08-01 23:35 Roman Gushchin
  2019-08-02  8:04 ` Michal Hocko
  2019-08-02 16:55 ` Roman Gushchin
  0 siblings, 2 replies; 7+ messages in thread
From: Roman Gushchin @ 2019-08-01 23:35 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Michal Hocko, Johannes Weiner, linux-kernel, kernel-team,
	Roman Gushchin, Michal Hocko

Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
introduced css_tryget()/css_put() calls in drain_all_stock(),
which are supposed to protect the target memory cgroup from being
released during the mem_cgroup_is_descendant() call.

However, it's not completely safe. In theory, memcg can go away
between reading stock->cached pointer and calling css_tryget().

So, let's read the stock->cached pointer and evaluate the memory
cgroup inside a rcu read section, and get rid of
css_tryget()/css_put() calls.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5c7b9facb0eb..d856b64426b7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2235,21 +2235,22 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
 		struct mem_cgroup *memcg;
+		bool flush = false;
 
+		rcu_read_lock();
 		memcg = stock->cached;
-		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
-			continue;
-		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
-			css_put(&memcg->css);
-			continue;
-		}
-		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
+		if (memcg && stock->nr_pages &&
+		    mem_cgroup_is_descendant(memcg, root_memcg))
+			flush = true;
+		rcu_read_unlock();
+
+		if (flush &&
+		    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
 			if (cpu == curcpu)
 				drain_local_stock(&stock->work);
 			else
 				schedule_work_on(cpu, &stock->work);
 		}
-		css_put(&memcg->css);
 	}
 	put_cpu();
 	mutex_unlock(&percpu_charge_mutex);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock()
  2019-08-01 23:35 [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock() Roman Gushchin
@ 2019-08-02  8:04 ` Michal Hocko
  2019-08-02  8:59   ` Michal Hocko
  2019-08-02 16:55 ` Roman Gushchin
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2019-08-02  8:04 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, Johannes Weiner, linux-kernel, kernel-team

On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> introduced css_tryget()/css_put() calls in drain_all_stock(),
> which are supposed to protect the target memory cgroup from being
> released during the mem_cgroup_is_descendant() call.
> 
> However, it's not completely safe. In theory, memcg can go away
> between reading stock->cached pointer and calling css_tryget().

I have to remember how is this whole thing supposed to work, it's been
some time since I've looked into that.

> So, let's read the stock->cached pointer and evaluate the memory
> cgroup inside a rcu read section, and get rid of
> css_tryget()/css_put() calls.

Could you be more specific why does RCU help here?

> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memcontrol.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5c7b9facb0eb..d856b64426b7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2235,21 +2235,22 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
>  		struct mem_cgroup *memcg;
> +		bool flush = false;
>  
> +		rcu_read_lock();
>  		memcg = stock->cached;
> -		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
> -			continue;
> -		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
> -			css_put(&memcg->css);
> -			continue;
> -		}
> -		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
> +		if (memcg && stock->nr_pages &&
> +		    mem_cgroup_is_descendant(memcg, root_memcg))
> +			flush = true;
> +		rcu_read_unlock();
> +
> +		if (flush &&
> +		    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>  			if (cpu == curcpu)
>  				drain_local_stock(&stock->work);
>  			else
>  				schedule_work_on(cpu, &stock->work);
>  		}
> -		css_put(&memcg->css);
>  	}
>  	put_cpu();
>  	mutex_unlock(&percpu_charge_mutex);
> -- 
> 2.21.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock()
  2019-08-02  8:04 ` Michal Hocko
@ 2019-08-02  8:59   ` Michal Hocko
  2019-08-02 17:00     ` Roman Gushchin
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2019-08-02  8:59 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, Johannes Weiner, linux-kernel, kernel-team

On Fri 02-08-19 10:04:22, Michal Hocko wrote:
> On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > which are supposed to protect the target memory cgroup from being
> > released during the mem_cgroup_is_descendant() call.
> > 
> > However, it's not completely safe. In theory, memcg can go away
> > between reading stock->cached pointer and calling css_tryget().
> 
> I have to remember how is this whole thing supposed to work, it's been
> some time since I've looked into that.

OK, I guess I remember now and I do not see how the race is possible.
Stock cache is keeping its memcg alive because it elevates the reference
counting for each cached charge. And that should keep the whole chain up
to the root (of draining) alive, no? Or do I miss something, could you
generate a sequence of events that would lead to use-after-free?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock()
  2019-08-01 23:35 [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock() Roman Gushchin
  2019-08-02  8:04 ` Michal Hocko
@ 2019-08-02 16:55 ` Roman Gushchin
  1 sibling, 0 replies; 7+ messages in thread
From: Roman Gushchin @ 2019-08-02 16:55 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Andrew Morton, linux-mm, Michal Hocko, Johannes Weiner,
	linux-kernel, Kernel Team, Michal Hocko

On Fri, Aug 02, 2019 at 11:33:33AM +0800, Hillf Danton wrote:
> 
> On Thu, 1 Aug 2019 16:35:13 -0700 Roman Gushchin wrote:
> > 
> > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > which are supposed to protect the target memory cgroup from being
> > released during the mem_cgroup_is_descendant() call.
> > 
> > However, it's not completely safe. In theory, memcg can go away
> > between reading stock->cached pointer and calling css_tryget().
> 
> Good catch!
> > 
> > So, let's read the stock->cached pointer and evaluate the memory
> > cgroup inside a rcu read section, and get rid of
> > css_tryget()/css_put() calls.
> 
> You need to either adjust the boundry of the rcu-protected section, or
> retain the call pairs, as the memcg cache is dereferenced again in
> drain_stock().

Not really. drain_stock() is always accessing the local percpu stock, and
stock->cached memcg pointer is protected by references of stocked pages.
Pages are stocked and drained only locally, so they can't go away.
So if (stock->nr_pages > 0), the memcg has at least stock->nr_pages references.

Also, because stocks on other cpus are drained via scheduled work,
neither rcu_read_lock(), not css_tryget()/css_put() protects it.

That's exactly the reason why I think this code is worth changing: it
looks confusing. It looks like css_tryget()/css_put() protect stock
draining, however it's not true.

Thanks!

> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/memcontrol.c | 17 +++++++++--------
> >  1 file changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 5c7b9facb0eb..d856b64426b7 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2235,21 +2235,22 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> >  	for_each_online_cpu(cpu) {
> >  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> >  		struct mem_cgroup *memcg;
> > +		bool flush = false;
> >  
> > +		rcu_read_lock();
> >  		memcg = stock->cached;
> > -		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
> > -			continue;
> > -		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
> > -			css_put(&memcg->css);
> > -			continue;
> > -		}
> > -		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
> > +		if (memcg && stock->nr_pages &&
> > +		    mem_cgroup_is_descendant(memcg, root_memcg))
> > +			flush = true;
> > +		rcu_read_unlock();
> > +
> > +		if (flush &&
> > +		    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
> >  			if (cpu == curcpu)
> >  				drain_local_stock(&stock->work);
> >  			else
> >  				schedule_work_on(cpu, &stock->work);
> >  		}
> > -		css_put(&memcg->css);
> >  	}
> >  	put_cpu();
> >  	mutex_unlock(&percpu_charge_mutex);
> > -- 
> > 2.21.0
> > 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock()
  2019-08-02  8:59   ` Michal Hocko
@ 2019-08-02 17:00     ` Roman Gushchin
  2019-08-02 17:14       ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Roman Gushchin @ 2019-08-02 17:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, Johannes Weiner, linux-kernel, Kernel Team

On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote:
> On Fri 02-08-19 10:04:22, Michal Hocko wrote:
> > On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > > which are supposed to protect the target memory cgroup from being
> > > released during the mem_cgroup_is_descendant() call.
> > > 
> > > However, it's not completely safe. In theory, memcg can go away
> > > between reading stock->cached pointer and calling css_tryget().
> > 
> > I have to remember how is this whole thing supposed to work, it's been
> > some time since I've looked into that.
> 
> OK, I guess I remember now and I do not see how the race is possible.
> Stock cache is keeping its memcg alive because it elevates the reference
> counting for each cached charge. And that should keep the whole chain up
> to the root (of draining) alive, no? Or do I miss something, could you
> generate a sequence of events that would lead to use-after-free?

Right, but it's true when you reading a local percpu stock.
But here we read a remote stock->cached pointer, which can be cleared
by a remote concurrent drain_local_stock() execution.

In theory, it could be the last reference, and the memcg can be destroyed
remotely, so we end up trying to call css_tryget() over freed memory.

The race is theoretical, but as I wrote in the thread, I think
that it's still worth fixing, because the current code looks confusing
(and this confirms my feelings).

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock()
  2019-08-02 17:00     ` Roman Gushchin
@ 2019-08-02 17:14       ` Michal Hocko
  2019-08-02 17:36         ` Roman Gushchin
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2019-08-02 17:14 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, Johannes Weiner, linux-kernel, Kernel Team

On Fri 02-08-19 17:00:34, Roman Gushchin wrote:
> On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote:
> > On Fri 02-08-19 10:04:22, Michal Hocko wrote:
> > > On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> > > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > > > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > > > which are supposed to protect the target memory cgroup from being
> > > > released during the mem_cgroup_is_descendant() call.
> > > > 
> > > > However, it's not completely safe. In theory, memcg can go away
> > > > between reading stock->cached pointer and calling css_tryget().
> > > 
> > > I have to remember how is this whole thing supposed to work, it's been
> > > some time since I've looked into that.
> > 
> > OK, I guess I remember now and I do not see how the race is possible.
> > Stock cache is keeping its memcg alive because it elevates the reference
> > counting for each cached charge. And that should keep the whole chain up
> > to the root (of draining) alive, no? Or do I miss something, could you
> > generate a sequence of events that would lead to use-after-free?
> 
> Right, but it's true when you reading a local percpu stock.
> But here we read a remote stock->cached pointer, which can be cleared
> by a remote concurrent drain_local_stock() execution.

OK, I can see how refill_stock can race with drain_all_stock. I am not
sure I see drain_local_stock race because that should be triggered only
from drain_all_stock and only one cpu is allowed to do that. Maybe we
might have scheduled a work from the previous run?

In any case, please document the race in the changelog please. This code
is indeed tricky and a comment would help as well.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock()
  2019-08-02 17:14       ` Michal Hocko
@ 2019-08-02 17:36         ` Roman Gushchin
  0 siblings, 0 replies; 7+ messages in thread
From: Roman Gushchin @ 2019-08-02 17:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, Johannes Weiner, linux-kernel, Kernel Team

On Fri, Aug 02, 2019 at 07:14:51PM +0200, Michal Hocko wrote:
> On Fri 02-08-19 17:00:34, Roman Gushchin wrote:
> > On Fri, Aug 02, 2019 at 10:59:47AM +0200, Michal Hocko wrote:
> > > On Fri 02-08-19 10:04:22, Michal Hocko wrote:
> > > > On Thu 01-08-19 16:35:13, Roman Gushchin wrote:
> > > > > Commit 72f0184c8a00 ("mm, memcg: remove hotplug locking from try_charge")
> > > > > introduced css_tryget()/css_put() calls in drain_all_stock(),
> > > > > which are supposed to protect the target memory cgroup from being
> > > > > released during the mem_cgroup_is_descendant() call.
> > > > > 
> > > > > However, it's not completely safe. In theory, memcg can go away
> > > > > between reading stock->cached pointer and calling css_tryget().
> > > > 
> > > > I have to remember how is this whole thing supposed to work, it's been
> > > > some time since I've looked into that.
> > > 
> > > OK, I guess I remember now and I do not see how the race is possible.
> > > Stock cache is keeping its memcg alive because it elevates the reference
> > > counting for each cached charge. And that should keep the whole chain up
> > > to the root (of draining) alive, no? Or do I miss something, could you
> > > generate a sequence of events that would lead to use-after-free?
> > 
> > Right, but it's true when you reading a local percpu stock.
> > But here we read a remote stock->cached pointer, which can be cleared
> > by a remote concurrent drain_local_stock() execution.
> 
> OK, I can see how refill_stock can race with drain_all_stock. I am not
> sure I see drain_local_stock race because that should be triggered only
> from drain_all_stock and only one cpu is allowed to do that. Maybe we
> might have scheduled a work from the previous run?

Exactly. Previously executed drain_all_stock() -> schedule_work ->
drain_local_stock() on a remote cpu races with checking memcg pointer
from drain_all_stock.

> 
> In any case, please document the race in the changelog please. This code
> is indeed tricky and a comment would help as well.

Sure, will send out v2 soon.

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-02 17:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-01 23:35 [PATCH] mm: memcontrol: switch to rcu protection in drain_all_stock() Roman Gushchin
2019-08-02  8:04 ` Michal Hocko
2019-08-02  8:59   ` Michal Hocko
2019-08-02 17:00     ` Roman Gushchin
2019-08-02 17:14       ` Michal Hocko
2019-08-02 17:36         ` Roman Gushchin
2019-08-02 16:55 ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).