All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Tue, 12 Feb 2013 16:43:30 +0100	[thread overview]
Message-ID: <20130212154330.GG4863@dhcp22.suse.cz> (raw)
In-Reply-To: <20130212151002.GD15951@cmpxchg.org>

On Tue 12-02-13 10:10:02, Johannes Weiner wrote:
> On Tue, Feb 12, 2013 at 10:54:19AM +0100, Michal Hocko wrote:
> > On Mon 11-02-13 17:39:43, Johannes Weiner wrote:
> > > On Mon, Feb 11, 2013 at 10:27:56PM +0100, Michal Hocko wrote:
> > > > On Mon 11-02-13 14:58:24, Johannes Weiner wrote:
> > > > > That way, if the dead count gives the go-ahead, you KNOW that the
> > > > > position cache is valid, because it has been updated first.
> > > > 
> > > > OK, you are right. We can live without css_tryget because dead_count is
> > > > either OK which means that css would be alive at least this rcu period
> > > > (and RCU walk would be safe as well) or it is incremented which means
> > > > that we have started css_offline already and then css is dead already.
> > > > So css_tryget can be dropped.
> > > 
> > > Not quite :)
> > > 
> > > The dead_count check is for completed destructions,
> > 
> > Not quite :P. dead_count is incremented in css_offline callback which is
> > called before the cgroup core releases its last reference and unlinks
> > the group from the siblinks. css_tryget would already fail at this stage
> > because CSS_DEACT_BIAS is in place at that time but this doesn't break
> > RCU walk. So I think we are safe even without css_get.
> 
> But you drop the RCU lock before you return.
>
> dead_count IS incremented for every destruction, but it's not reliable
> for concurrent ones, is what I meant.  Again, if there is a dead_count
> mismatch, your pointer might be dangling, easy case.  However, even if
> there is no mismatch, you could still race with a destruction that has
> marked the object dead, and then frees it once you drop the RCU lock,
> so you need try_get() to check if the object is dead, or you could
> return a pointer to freed or soon to be freed memory.

Wait a moment. But what prevents from the following race?

rcu_read_lock()
						mem_cgroup_css_offline(memcg)
						root->dead_count++
iter->last_dead_count = root->dead_count
iter->last_visited = memcg
						// final
						css_put(memcg);
// last_visited is still valid
rcu_read_unlock()
[...]
// next iteration
rcu_read_lock()
iter->last_dead_count == root->dead_count
// KABOOM

The race window between dead_count++ and css_put is quite big but that
is not important because that css_put can happen anytime before we start
the next iteration and take rcu_read_lock.
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>,
	Glauber Costa <glommer@parallels.com>,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators
Date: Tue, 12 Feb 2013 16:43:30 +0100	[thread overview]
Message-ID: <20130212154330.GG4863@dhcp22.suse.cz> (raw)
In-Reply-To: <20130212151002.GD15951@cmpxchg.org>

On Tue 12-02-13 10:10:02, Johannes Weiner wrote:
> On Tue, Feb 12, 2013 at 10:54:19AM +0100, Michal Hocko wrote:
> > On Mon 11-02-13 17:39:43, Johannes Weiner wrote:
> > > On Mon, Feb 11, 2013 at 10:27:56PM +0100, Michal Hocko wrote:
> > > > On Mon 11-02-13 14:58:24, Johannes Weiner wrote:
> > > > > That way, if the dead count gives the go-ahead, you KNOW that the
> > > > > position cache is valid, because it has been updated first.
> > > > 
> > > > OK, you are right. We can live without css_tryget because dead_count is
> > > > either OK which means that css would be alive at least this rcu period
> > > > (and RCU walk would be safe as well) or it is incremented which means
> > > > that we have started css_offline already and then css is dead already.
> > > > So css_tryget can be dropped.
> > > 
> > > Not quite :)
> > > 
> > > The dead_count check is for completed destructions,
> > 
> > Not quite :P. dead_count is incremented in css_offline callback which is
> > called before the cgroup core releases its last reference and unlinks
> > the group from the siblinks. css_tryget would already fail at this stage
> > because CSS_DEACT_BIAS is in place at that time but this doesn't break
> > RCU walk. So I think we are safe even without css_get.
> 
> But you drop the RCU lock before you return.
>
> dead_count IS incremented for every destruction, but it's not reliable
> for concurrent ones, is what I meant.  Again, if there is a dead_count
> mismatch, your pointer might be dangling, easy case.  However, even if
> there is no mismatch, you could still race with a destruction that has
> marked the object dead, and then frees it once you drop the RCU lock,
> so you need try_get() to check if the object is dead, or you could
> return a pointer to freed or soon to be freed memory.

Wait a moment. But what prevents from the following race?

rcu_read_lock()
						mem_cgroup_css_offline(memcg)
						root->dead_count++
iter->last_dead_count = root->dead_count
iter->last_visited = memcg
						// final
						css_put(memcg);
// last_visited is still valid
rcu_read_unlock()
[...]
// next iteration
rcu_read_lock()
iter->last_dead_count == root->dead_count
// KABOOM

The race window between dead_count++ and css_put is quite big but that
is not important because that css_put can happen anytime before we start
the next iteration and take rcu_read_lock.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-02-12 15:43 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-03 17:54 [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-03 17:54 ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 1/7] memcg: synchronize per-zone iterator access by a spinlock Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 2/7] memcg: keep prev's css alive for the whole mem_cgroup_iter Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 3/7] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-07  6:18   ` Kamezawa Hiroyuki
2013-01-07  6:18     ` Kamezawa Hiroyuki
2013-02-08 19:33   ` Johannes Weiner
2013-02-08 19:33     ` Johannes Weiner
2013-02-11 15:16     ` Michal Hocko
2013-02-11 15:16       ` Michal Hocko
2013-02-11 17:56       ` Johannes Weiner
2013-02-11 17:56         ` Johannes Weiner
2013-02-11 19:29         ` Michal Hocko
2013-02-11 19:29           ` Michal Hocko
2013-02-11 19:58           ` Johannes Weiner
2013-02-11 19:58             ` Johannes Weiner
2013-02-11 21:27             ` Michal Hocko
2013-02-11 21:27               ` Michal Hocko
2013-02-11 22:07               ` Michal Hocko
2013-02-11 22:07                 ` Michal Hocko
2013-02-11 22:39               ` Johannes Weiner
2013-02-11 22:39                 ` Johannes Weiner
2013-02-12  9:54                 ` Michal Hocko
2013-02-12  9:54                   ` Michal Hocko
2013-02-12 15:10                   ` Johannes Weiner
2013-02-12 15:10                     ` Johannes Weiner
2013-02-12 15:43                     ` Michal Hocko [this message]
2013-02-12 15:43                       ` Michal Hocko
2013-02-12 16:10                       ` Paul E. McKenney
2013-02-12 16:10                         ` Paul E. McKenney
2013-02-12 17:25                         ` Johannes Weiner
2013-02-12 17:25                           ` Johannes Weiner
2013-02-12 18:31                           ` Paul E. McKenney
2013-02-12 18:31                             ` Paul E. McKenney
2013-02-12 19:53                             ` Johannes Weiner
2013-02-12 19:53                               ` Johannes Weiner
2013-02-13  9:51                               ` Michal Hocko
2013-02-13  9:51                                 ` Michal Hocko
2013-02-12 17:56                         ` Michal Hocko
2013-02-12 17:56                           ` Michal Hocko
2013-02-12 16:13                       ` Michal Hocko
2013-02-12 16:13                         ` Michal Hocko
2013-02-12 16:24                         ` Michal Hocko
2013-02-12 16:24                           ` Michal Hocko
2013-02-12 16:37                           ` Michal Hocko
2013-02-12 16:37                             ` Michal Hocko
2013-02-12 16:41                           ` Johannes Weiner
2013-02-12 16:41                             ` Johannes Weiner
2013-02-12 17:12                             ` Michal Hocko
2013-02-12 17:12                               ` Michal Hocko
2013-02-12 17:37                               ` Johannes Weiner
2013-02-12 17:37                                 ` Johannes Weiner
2013-02-13  8:11                                 ` Glauber Costa
2013-02-13  8:11                                   ` Glauber Costa
2013-02-13 10:38                                   ` Michal Hocko
2013-02-13 10:38                                     ` Michal Hocko
2013-02-13 10:34                                 ` Michal Hocko
2013-02-13 10:34                                   ` Michal Hocko
2013-02-13 12:56                                   ` Michal Hocko
2013-02-13 12:56                                     ` Michal Hocko
2013-02-12 16:33                       ` Johannes Weiner
2013-02-12 16:33                         ` Johannes Weiner
2013-01-03 17:54 ` [PATCH v3 5/7] memcg: simplify mem_cgroup_iter Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 6/7] memcg: further " Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-03 17:54 ` [PATCH v3 7/7] cgroup: remove css_get_next Michal Hocko
2013-01-03 17:54   ` Michal Hocko
2013-01-04  3:42   ` Li Zefan
2013-01-04  3:42     ` Li Zefan
2013-01-23 12:52 ` [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko
2013-01-23 12:52   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130212154330.GG4863@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=htejun@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.