All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Tue, 12 Feb 2013 12:25:26 -0500	[thread overview]
Message-ID: <20130212172526.GC25235@cmpxchg.org> (raw)
In-Reply-To: <20130212161051.GQ2666@linux.vnet.ibm.com>

On Tue, Feb 12, 2013 at 08:10:51AM -0800, Paul E. McKenney wrote:
> On Tue, Feb 12, 2013 at 04:43:30PM +0100, Michal Hocko wrote:
> > On Tue 12-02-13 10:10:02, Johannes Weiner wrote:
> > > On Tue, Feb 12, 2013 at 10:54:19AM +0100, Michal Hocko wrote:
> > > > On Mon 11-02-13 17:39:43, Johannes Weiner wrote:
> > > > > On Mon, Feb 11, 2013 at 10:27:56PM +0100, Michal Hocko wrote:
> > > > > > On Mon 11-02-13 14:58:24, Johannes Weiner wrote:
> > > > > > > That way, if the dead count gives the go-ahead, you KNOW that the
> > > > > > > position cache is valid, because it has been updated first.
> > > > > > 
> > > > > > OK, you are right. We can live without css_tryget because dead_count is
> > > > > > either OK which means that css would be alive at least this rcu period
> > > > > > (and RCU walk would be safe as well) or it is incremented which means
> > > > > > that we have started css_offline already and then css is dead already.
> > > > > > So css_tryget can be dropped.
> > > > > 
> > > > > Not quite :)
> > > > > 
> > > > > The dead_count check is for completed destructions,
> > > > 
> > > > Not quite :P. dead_count is incremented in css_offline callback which is
> > > > called before the cgroup core releases its last reference and unlinks
> > > > the group from the siblinks. css_tryget would already fail at this stage
> > > > because CSS_DEACT_BIAS is in place at that time but this doesn't break
> > > > RCU walk. So I think we are safe even without css_get.
> > > 
> > > But you drop the RCU lock before you return.
> > >
> > > dead_count IS incremented for every destruction, but it's not reliable
> > > for concurrent ones, is what I meant.  Again, if there is a dead_count
> > > mismatch, your pointer might be dangling, easy case.  However, even if
> > > there is no mismatch, you could still race with a destruction that has
> > > marked the object dead, and then frees it once you drop the RCU lock,
> > > so you need try_get() to check if the object is dead, or you could
> > > return a pointer to freed or soon to be freed memory.
> > 
> > Wait a moment. But what prevents from the following race?
> > 
> > rcu_read_lock()
> > 						mem_cgroup_css_offline(memcg)
> > 						root->dead_count++
> > iter->last_dead_count = root->dead_count
> > iter->last_visited = memcg
> > 						// final
> > 						css_put(memcg);
> > // last_visited is still valid
> > rcu_read_unlock()
> > [...]
> > // next iteration
> > rcu_read_lock()
> > iter->last_dead_count == root->dead_count
> > // KABOOM
> > 
> > The race window between dead_count++ and css_put is quite big but that
> > is not important because that css_put can happen anytime before we start
> > the next iteration and take rcu_read_lock.
> 
> The usual approach is to make sure that there is a grace period (either
> synchronize_rcu() or call_rcu()) between the time that the data is
> made inaccessible to readers (this would be mem_cgroup_css_offline()?)
> and the time it is freed (css_put(), correct?).

Absolutely!  And there is a synchronize_rcu() in between those two
operations.

However, we want to keep a weak reference to the cgroup after we drop
the rcu read-side lock, so rcu alone is not enough for us to guarantee
object life time.  We still have to carefully detect any concurrent
offlinings in order to validate the weak reference next time around.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Tue, 12 Feb 2013 12:25:26 -0500	[thread overview]
Message-ID: <20130212172526.GC25235@cmpxchg.org> (raw)
In-Reply-To: <20130212161051.GQ2666@linux.vnet.ibm.com>

On Tue, Feb 12, 2013 at 08:10:51AM -0800, Paul E. McKenney wrote:
> On Tue, Feb 12, 2013 at 04:43:30PM +0100, Michal Hocko wrote:
> > On Tue 12-02-13 10:10:02, Johannes Weiner wrote:
> > > On Tue, Feb 12, 2013 at 10:54:19AM +0100, Michal Hocko wrote:
> > > > On Mon 11-02-13 17:39:43, Johannes Weiner wrote:
> > > > > On Mon, Feb 11, 2013 at 10:27:56PM +0100, Michal Hocko wrote:
> > > > > > On Mon 11-02-13 14:58:24, Johannes Weiner wrote:
> > > > > > > That way, if the dead count gives the go-ahead, you KNOW that the
> > > > > > > position cache is valid, because it has been updated first.
> > > > > > 
> > > > > > OK, you are right. We can live without css_tryget because dead_count is
> > > > > > either OK which means that css would be alive at least this rcu period
> > > > > > (and RCU walk would be safe as well) or it is incremented which means
> > > > > > that we have started css_offline already and then css is dead already.
> > > > > > So css_tryget can be dropped.
> > > > > 
> > > > > Not quite :)
> > > > > 
> > > > > The dead_count check is for completed destructions,
> > > > 
> > > > Not quite :P. dead_count is incremented in css_offline callback which is
> > > > called before the cgroup core releases its last reference and unlinks
> > > > the group from the siblinks. css_tryget would already fail at this stage
> > > > because CSS_DEACT_BIAS is in place at that time but this doesn't break
> > > > RCU walk. So I think we are safe even without css_get.
> > > 
> > > But you drop the RCU lock before you return.
> > >
> > > dead_count IS incremented for every destruction, but it's not reliable
> > > for concurrent ones, is what I meant.  Again, if there is a dead_count
> > > mismatch, your pointer might be dangling, easy case.  However, even if
> > > there is no mismatch, you could still race with a destruction that has
> > > marked the object dead, and then frees it once you drop the RCU lock,
> > > so you need try_get() to check if the object is dead, or you could
> > > return a pointer to freed or soon to be freed memory.
> > 
> > Wait a moment. But what prevents from the following race?
> > 
> > rcu_read_lock()
> > 						mem_cgroup_css_offline(memcg)
> > 						root->dead_count++
> > iter->last_dead_count = root->dead_count
> > iter->last_visited = memcg
> > 						// final
> > 						css_put(memcg);
> > // last_visited is still valid
> > rcu_read_unlock()
> > [...]
> > // next iteration
> > rcu_read_lock()
> > iter->last_dead_count == root->dead_count
> > // KABOOM
> > 
> > The race window between dead_count++ and css_put is quite big but that
> > is not important because that css_put can happen anytime before we start
> > the next iteration and take rcu_read_lock.
> 
> The usual approach is to make sure that there is a grace period (either
> synchronize_rcu() or call_rcu()) between the time that the data is
> made inaccessible to readers (this would be mem_cgroup_css_offline()?)
> and the time it is freed (css_put(), correct?).

Absolutely!  And there is a synchronize_rcu() in between those two
operations.

However, we want to keep a weak reference to the cgroup after we drop
the rcu read-side lock, so rcu alone is not enough for us to guarantee
object life time.  We still have to carefully detect any concurrent
offlinings in order to validate the weak reference next time around.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-02-12 17:25 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-03 17:54 [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 1/7] memcg: synchronize per-zone iterator access by a spinlock Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 2/7] memcg: keep prev's css alive for the whole mem_cgroup_iter Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 3/7] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-07  6:18   ` Kamezawa Hiroyuki
2013-01-07  6:18     ` Kamezawa Hiroyuki
2013-02-08 19:33   ` Johannes Weiner
2013-02-08 19:33     ` Johannes Weiner
2013-02-11 15:16     ` Michal Hocko
2013-02-11 15:16       ` Michal Hocko
2013-02-11 17:56       ` Johannes Weiner
2013-02-11 17:56         ` Johannes Weiner
2013-02-11 19:29         ` Michal Hocko
2013-02-11 19:29           ` Michal Hocko
2013-02-11 19:58           ` Johannes Weiner
2013-02-11 19:58             ` Johannes Weiner
2013-02-11 21:27             ` Michal Hocko
2013-02-11 21:27               ` Michal Hocko
2013-02-11 22:07               ` Michal Hocko
2013-02-11 22:07                 ` Michal Hocko
2013-02-11 22:39               ` Johannes Weiner
2013-02-11 22:39                 ` Johannes Weiner
2013-02-12  9:54                 ` Michal Hocko
2013-02-12  9:54                   ` Michal Hocko
2013-02-12 15:10                   ` Johannes Weiner
2013-02-12 15:10                     ` Johannes Weiner
2013-02-12 15:43                     ` Michal Hocko
2013-02-12 15:43                       ` Michal Hocko
2013-02-12 16:10                       ` Paul E. McKenney
2013-02-12 16:10                         ` Paul E. McKenney
2013-02-12 17:25                         ` Johannes Weiner [this message]
2013-02-12 17:25                           ` Johannes Weiner
2013-02-12 18:31                           ` Paul E. McKenney
2013-02-12 18:31                             ` Paul E. McKenney
2013-02-12 19:53                             ` Johannes Weiner
2013-02-12 19:53                               ` Johannes Weiner
2013-02-13  9:51                               ` Michal Hocko
2013-02-13  9:51                                 ` Michal Hocko
2013-02-12 17:56                         ` Michal Hocko
2013-02-12 17:56                           ` Michal Hocko
2013-02-12 16:13                       ` Michal Hocko
2013-02-12 16:13                         ` Michal Hocko
2013-02-12 16:24                         ` Michal Hocko
2013-02-12 16:24                           ` Michal Hocko
2013-02-12 16:37                           ` Michal Hocko
2013-02-12 16:37                             ` Michal Hocko
2013-02-12 16:41                           ` Johannes Weiner
2013-02-12 16:41                             ` Johannes Weiner
2013-02-12 17:12                             ` Michal Hocko
2013-02-12 17:12                               ` Michal Hocko
2013-02-12 17:37                               ` Johannes Weiner
2013-02-12 17:37                                 ` Johannes Weiner
2013-02-13  8:11                                 ` Glauber Costa
2013-02-13  8:11                                   ` Glauber Costa
2013-02-13 10:38                                   ` Michal Hocko
2013-02-13 10:38                                     ` Michal Hocko
2013-02-13 10:34                                 ` Michal Hocko
2013-02-13 10:34                                   ` Michal Hocko
2013-02-13 12:56                                   ` Michal Hocko
2013-02-13 12:56                                     ` Michal Hocko
2013-02-12 16:33                       ` Johannes Weiner
2013-02-12 16:33                         ` Johannes Weiner
2013-01-03 17:54 ` [PATCH v3 5/7] memcg: simplify mem_cgroup_iter Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 6/7] memcg: further " Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 7/7] cgroup: remove css_get_next Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-04  3:42   ` Li Zefan
2013-01-04  3:42     ` Li Zefan
2013-01-23 12:52 ` [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-23 12:52   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130212172526.GC25235@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=glommer@parallels.com \
    --cc=htejun@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.