From: Michal Hocko <mhocko@suse.cz> To: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>, Glauber Costa <glommer@parallels.com>, Li Zefan <lizefan@huawei.com> Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators Date: Tue, 12 Feb 2013 10:54:19 +0100 [thread overview] Message-ID: <20130212095419.GB4863@dhcp22.suse.cz> (raw) In-Reply-To: <20130211223943.GC15951@cmpxchg.org> On Mon 11-02-13 17:39:43, Johannes Weiner wrote: > On Mon, Feb 11, 2013 at 10:27:56PM +0100, Michal Hocko wrote: > > On Mon 11-02-13 14:58:24, Johannes Weiner wrote: > > > On Mon, Feb 11, 2013 at 08:29:29PM +0100, Michal Hocko wrote: > > > > On Mon 11-02-13 12:56:19, Johannes Weiner wrote: > > > > > On Mon, Feb 11, 2013 at 04:16:49PM +0100, Michal Hocko wrote: > > > > > > Maybe we could keep the counter per memcg but that would mean that we > > > > > > would need to go up the hierarchy as well. We wouldn't have to go over > > > > > > node-zone-priority cleanup so it would be much more lightweight. > > > > > > > > > > > > I am not sure this is necessarily better than explicit cleanup because > > > > > > it brings yet another kind of generation number to the game but I guess > > > > > > I can live with it if people really thing the relaxed way is much > > > > > > better. > > > > > > What do you think about the patch below (untested yet)? > > > > > > > > > > Better, but I think you can get rid of both locks: > > > > > > > > What is the other lock you have in mind. > > > > > > The iter lock itself. I mean, multiple reclaimers can still race but > > > there won't be any corruption (if you make iter->dead_count a long, > > > setting it happens atomically, we nly need the memcg->dead_count to be > > > an atomic because of the inc) and the worst that could happen is that > > > a reclaim starts at the wrong point in hierarchy, right? > > > > The lack of synchronization basically means that 2 parallel reclaimers > > can reclaim every group exactly once (ideally) or up to each group > > twice in the worst case. > > So the exclusion was quite comfortable. > > It's quite unlikely, though. Don't forget that they actually reclaim > in between, I just can't see them line up perfectly and race to the > iterator at the same time repeatedly. It's more likely to happen at > the higher priority levels where less reclaim happens, and then it's > not a big deal anyway. With lower priority levels, when the glitches > would be more problematic, they also become even less likely. Fair enough, I will drop that patch in the next version. > > > But as you said in the changelog that introduced the lock, it's never > > > actually been a practical problem. > > > > That is true but those bugs would be subtle though so I wouldn't be > > opposed to prevent from them before we get burnt. But if you think that > > we should keep the previous semantic I can drop that patch. > > I just think that the problem is unlikely and not that big of a deal. > > > > You just need to put the wmb back in place, so that we never see the > > > dead_count give the green light while the cached position is stale, or > > > we'll tryget random memory. > > > > > > > > mem_cgroup_iter: > > > > > rcu_read_lock() > > > > > if atomic_read(&root->dead_count) == iter->dead_count: > > > > > smp_rmb() > > > > > if tryget(iter->position): > > > > > position = iter->position > > > > > memcg = find_next(postion) > > > > > css_put(position) > > > > > iter->position = memcg > > > > > smp_wmb() /* Write position cache BEFORE marking it uptodate */ > > > > > iter->dead_count = atomic_read(&root->dead_count) > > > > > rcu_read_unlock() > > > > > > > > Updated patch bellow: > > > > > > Cool, thanks. I hope you don't find it too ugly anymore :-) > > > > It's getting trick and you know how people love when you have to play > > and rely on atomics with memory barriers... > > My bumper sticker reads "I don't believe in mutual exclusion" (the > kernel hacker's version of smile for the red light camera). Ohh, those easy riders. > I mean, you were the one complaining about the lock... > > > > That way, if the dead count gives the go-ahead, you KNOW that the > > > position cache is valid, because it has been updated first. > > > > OK, you are right. We can live without css_tryget because dead_count is > > either OK which means that css would be alive at least this rcu period > > (and RCU walk would be safe as well) or it is incremented which means > > that we have started css_offline already and then css is dead already. > > So css_tryget can be dropped. > > Not quite :) > > The dead_count check is for completed destructions, Not quite :P. dead_count is incremented in css_offline callback which is called before the cgroup core releases its last reference and unlinks the group from the siblinks. css_tryget would already fail at this stage because CSS_DEACT_BIAS is in place at that time but this doesn't break RCU walk. So I think we are safe even without css_get. Or am I missing something? [...] -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz> To: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Ying Han <yinghan@google.com>, Tejun Heo <htejun@gmail.com>, Glauber Costa <glommer@parallels.com>, Li Zefan <lizefan@huawei.com> Subject: Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators Date: Tue, 12 Feb 2013 10:54:19 +0100 [thread overview] Message-ID: <20130212095419.GB4863@dhcp22.suse.cz> (raw) In-Reply-To: <20130211223943.GC15951@cmpxchg.org> On Mon 11-02-13 17:39:43, Johannes Weiner wrote: > On Mon, Feb 11, 2013 at 10:27:56PM +0100, Michal Hocko wrote: > > On Mon 11-02-13 14:58:24, Johannes Weiner wrote: > > > On Mon, Feb 11, 2013 at 08:29:29PM +0100, Michal Hocko wrote: > > > > On Mon 11-02-13 12:56:19, Johannes Weiner wrote: > > > > > On Mon, Feb 11, 2013 at 04:16:49PM +0100, Michal Hocko wrote: > > > > > > Maybe we could keep the counter per memcg but that would mean that we > > > > > > would need to go up the hierarchy as well. We wouldn't have to go over > > > > > > node-zone-priority cleanup so it would be much more lightweight. > > > > > > > > > > > > I am not sure this is necessarily better than explicit cleanup because > > > > > > it brings yet another kind of generation number to the game but I guess > > > > > > I can live with it if people really thing the relaxed way is much > > > > > > better. > > > > > > What do you think about the patch below (untested yet)? > > > > > > > > > > Better, but I think you can get rid of both locks: > > > > > > > > What is the other lock you have in mind. > > > > > > The iter lock itself. I mean, multiple reclaimers can still race but > > > there won't be any corruption (if you make iter->dead_count a long, > > > setting it happens atomically, we nly need the memcg->dead_count to be > > > an atomic because of the inc) and the worst that could happen is that > > > a reclaim starts at the wrong point in hierarchy, right? > > > > The lack of synchronization basically means that 2 parallel reclaimers > > can reclaim every group exactly once (ideally) or up to each group > > twice in the worst case. > > So the exclusion was quite comfortable. > > It's quite unlikely, though. Don't forget that they actually reclaim > in between, I just can't see them line up perfectly and race to the > iterator at the same time repeatedly. It's more likely to happen at > the higher priority levels where less reclaim happens, and then it's > not a big deal anyway. With lower priority levels, when the glitches > would be more problematic, they also become even less likely. Fair enough, I will drop that patch in the next version. > > > But as you said in the changelog that introduced the lock, it's never > > > actually been a practical problem. > > > > That is true but those bugs would be subtle though so I wouldn't be > > opposed to prevent from them before we get burnt. But if you think that > > we should keep the previous semantic I can drop that patch. > > I just think that the problem is unlikely and not that big of a deal. > > > > You just need to put the wmb back in place, so that we never see the > > > dead_count give the green light while the cached position is stale, or > > > we'll tryget random memory. > > > > > > > > mem_cgroup_iter: > > > > > rcu_read_lock() > > > > > if atomic_read(&root->dead_count) == iter->dead_count: > > > > > smp_rmb() > > > > > if tryget(iter->position): > > > > > position = iter->position > > > > > memcg = find_next(postion) > > > > > css_put(position) > > > > > iter->position = memcg > > > > > smp_wmb() /* Write position cache BEFORE marking it uptodate */ > > > > > iter->dead_count = atomic_read(&root->dead_count) > > > > > rcu_read_unlock() > > > > > > > > Updated patch bellow: > > > > > > Cool, thanks. I hope you don't find it too ugly anymore :-) > > > > It's getting trick and you know how people love when you have to play > > and rely on atomics with memory barriers... > > My bumper sticker reads "I don't believe in mutual exclusion" (the > kernel hacker's version of smile for the red light camera). Ohh, those easy riders. > I mean, you were the one complaining about the lock... > > > > That way, if the dead count gives the go-ahead, you KNOW that the > > > position cache is valid, because it has been updated first. > > > > OK, you are right. We can live without css_tryget because dead_count is > > either OK which means that css would be alive at least this rcu period > > (and RCU walk would be safe as well) or it is incremented which means > > that we have started css_offline already and then css is dead already. > > So css_tryget can be dropped. > > Not quite :) > > The dead_count check is for completed destructions, Not quite :P. dead_count is incremented in css_offline callback which is called before the cgroup core releases its last reference and unlinks the group from the siblinks. css_tryget would already fail at this stage because CSS_DEACT_BIAS is in place at that time but this doesn't break RCU walk. So I think we are safe even without css_get. Or am I missing something? [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-02-12 9:54 UTC|newest] Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-01-03 17:54 [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko 2013-01-03 17:54 ` Michal Hocko 2013-01-03 17:54 ` [PATCH v3 1/7] memcg: synchronize per-zone iterator access by a spinlock Michal Hocko 2013-01-03 17:54 ` Michal Hocko 2013-01-03 17:54 ` [PATCH v3 2/7] memcg: keep prev's css alive for the whole mem_cgroup_iter Michal Hocko 2013-01-03 17:54 ` Michal Hocko 2013-01-03 17:54 ` [PATCH v3 3/7] memcg: rework mem_cgroup_iter to use cgroup iterators Michal Hocko 2013-01-03 17:54 ` Michal Hocko 2013-01-03 17:54 ` [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators Michal Hocko 2013-01-03 17:54 ` Michal Hocko 2013-01-07 6:18 ` Kamezawa Hiroyuki 2013-01-07 6:18 ` Kamezawa Hiroyuki 2013-02-08 19:33 ` Johannes Weiner 2013-02-08 19:33 ` Johannes Weiner 2013-02-11 15:16 ` Michal Hocko 2013-02-11 15:16 ` Michal Hocko 2013-02-11 17:56 ` Johannes Weiner 2013-02-11 17:56 ` Johannes Weiner 2013-02-11 19:29 ` Michal Hocko 2013-02-11 19:29 ` Michal Hocko 2013-02-11 19:58 ` Johannes Weiner 2013-02-11 19:58 ` Johannes Weiner 2013-02-11 21:27 ` Michal Hocko 2013-02-11 21:27 ` Michal Hocko 2013-02-11 22:07 ` Michal Hocko 2013-02-11 22:07 ` Michal Hocko 2013-02-11 22:39 ` Johannes Weiner 2013-02-11 22:39 ` Johannes Weiner 2013-02-12 9:54 ` Michal Hocko [this message] 2013-02-12 9:54 ` Michal Hocko 2013-02-12 15:10 ` Johannes Weiner 2013-02-12 15:10 ` Johannes Weiner 2013-02-12 15:43 ` Michal Hocko 2013-02-12 15:43 ` Michal Hocko 2013-02-12 16:10 ` Paul E. McKenney 2013-02-12 16:10 ` Paul E. McKenney 2013-02-12 17:25 ` Johannes Weiner 2013-02-12 17:25 ` Johannes Weiner 2013-02-12 18:31 ` Paul E. McKenney 2013-02-12 18:31 ` Paul E. McKenney 2013-02-12 19:53 ` Johannes Weiner 2013-02-12 19:53 ` Johannes Weiner 2013-02-13 9:51 ` Michal Hocko 2013-02-13 9:51 ` Michal Hocko 2013-02-12 17:56 ` Michal Hocko 2013-02-12 17:56 ` Michal Hocko 2013-02-12 16:13 ` Michal Hocko 2013-02-12 16:13 ` Michal Hocko 2013-02-12 16:24 ` Michal Hocko 2013-02-12 16:24 ` Michal Hocko 2013-02-12 16:37 ` Michal Hocko 2013-02-12 16:37 ` Michal Hocko 2013-02-12 16:41 ` Johannes Weiner 2013-02-12 16:41 ` Johannes Weiner 2013-02-12 17:12 ` Michal Hocko 2013-02-12 17:12 ` Michal Hocko 2013-02-12 17:37 ` Johannes Weiner 2013-02-12 17:37 ` Johannes Weiner 2013-02-13 8:11 ` Glauber Costa 2013-02-13 8:11 ` Glauber Costa 2013-02-13 10:38 ` Michal Hocko 2013-02-13 10:38 ` Michal Hocko 2013-02-13 10:34 ` Michal Hocko 2013-02-13 10:34 ` Michal Hocko 2013-02-13 12:56 ` Michal Hocko 2013-02-13 12:56 ` Michal Hocko 2013-02-12 16:33 ` Johannes Weiner 2013-02-12 16:33 ` Johannes Weiner 2013-01-03 17:54 ` [PATCH v3 5/7] memcg: simplify mem_cgroup_iter Michal Hocko 2013-01-03 17:54 ` Michal Hocko 2013-01-03 17:54 ` [PATCH v3 6/7] memcg: further " Michal Hocko 2013-01-03 17:54 ` Michal Hocko 2013-01-03 17:54 ` [PATCH v3 7/7] cgroup: remove css_get_next Michal Hocko 2013-01-03 17:54 ` Michal Hocko 2013-01-04 3:42 ` Li Zefan 2013-01-04 3:42 ` Li Zefan 2013-01-23 12:52 ` [PATCH v3 0/7] rework mem_cgroup iterator Michal Hocko 2013-01-23 12:52 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20130212095419.GB4863@dhcp22.suse.cz \ --to=mhocko@suse.cz \ --cc=glommer@parallels.com \ --cc=hannes@cmpxchg.org \ --cc=htejun@gmail.com \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=lizefan@huawei.com \ --cc=yinghan@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.