From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751597AbcGRLyl (ORCPT ); Mon, 18 Jul 2016 07:54:41 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:49314 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751396AbcGRLyh (ORCPT ); Mon, 18 Jul 2016 07:54:37 -0400 Date: Mon, 18 Jul 2016 13:54:27 +0200 From: Peter Zijlstra To: Oleg Nesterov Cc: "Paul E. McKenney" , mingo@kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, john.stultz@linaro.org, dimitrysh@google.com, romlem@google.com, ccross@google.com, tkjos@google.com Subject: Re: [PATCH] rcu_sync: simplify the state machine, introduce __rcu_sync_enter() Message-ID: <20160718115427.GB6862@twins.programming.kicks-ass.net> References: <20160714183022.336211504@infradead.org> <20160714184351.GA18388@redhat.com> <20160714192018.GM30154@twins.programming.kicks-ass.net> <20160715132709.GA27644@redhat.com> <20160715133928.GH7094@linux.vnet.ibm.com> <20160715134523.GB28589@redhat.com> <20160715153840.GI7094@linux.vnet.ibm.com> <20160715164938.GA4294@redhat.com> <20160715180140.GM7094@linux.vnet.ibm.com> <20160716171007.GA30000@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160716171007.GA30000@redhat.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jul 16, 2016 at 07:10:07PM +0200, Oleg Nesterov wrote: > Peter, Paul, could you review? Do you see any hole? > > Why. Firstly, note that the state machine was greatly simplified, and > rsp->cb_state has gone, we have a single "state" variable, gp_state. > Note also the ->sync() op has gone (and actually ->wait() too, see the > "TODO" comment). Yes, that seems like a nice simplification. > GP_IDLE - owned by __rcu_sync_enter() which can only move this state to > > GP_ENTER - owned by rcu-callback which moves it to > > GP_PASSED - owned by the last rcu_sync_exit() which moves it to > > GP_EXIT - owned by rcu-callback which moves it back to GP_IDLE. > > Yes, this is a bit simplified, we also have GP_REPLAY, but hopefully > clear. > > And, there is another possible transition, GP_ENTER -> GP_IDLE, because > not it is possible to call __rcu_sync_enter() and rcu_sync_exit() in any > state (except obviously they should be balanced), and they do not block. > > The only blocking call is __rcu_sync_wait() which actually waits for GP. > Obviously should only be called if gp_count != 0, iow after __rcu_sync_enter. So I'm a complete moron today, to aid the failing brain I draw pictures. I think I ended up with this: .----> GP_IDLE <--------------. | | | | | __rcu_sync_enter() | / rcu_sync_exit() | v | | GP_ENTER --------------' | | | | | v | GP_PASSED <---------. | | | | | rcu_sync_exit() | / __rcu_sync_enter() | v | `----- GP_EXIT ------------' ^ | __rcu_sync_enter() + rcu_sync_exit() v GP_RETRY > enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY }; > > static void rcu_sync_func(struct rcu_head *rcu); > > static void rcu_sync_call(struct rcu_sync *rsp) > { > // TODO: THIS IS SUBOPTIMAL. We want to call it directly > // if rcu_blocking_is_gp() == T, but it has might_sleep(). Not sure I get that comment.. > gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func); > } > > static void rcu_sync_func(struct rcu_head *rcu) > { > struct rcu_sync *rsp = container_of(rcu, struct rcu_sync, cb_head); > unsigned long flags; > Right, those are 'stable' states and must be advanced through explicit calls to __rcu_sync_enter()/rcu_sync_exit() respectively. > BUG_ON(rsp->gp_state == GP_IDLE); > BUG_ON(rsp->gp_state == GP_PASSED); > > spin_lock_irqsave(&rsp->rss_lock, flags); > if (rsp->gp_count) { > /* > * We're at least a GP after the first __rcu_sync_enter(). > */ > rsp->gp_state = GP_PASSED; So we can end up here in two ways afaict. The simple way: someone called __rcu_sync_enter(), we go IDLE -> ENTER with a raised count. Once the GP passes, we get here, observe the raised count and advance to PASSED. The more involved way: we were EXIT and someone calls __rcu_sync_enter() to raise the count again. The callback from rcu_sync_exit() was still pending and once we get here we observe the raised count and voila. Now, since state != IDLE, I suppose this is valid, but it does hurt my brain. Else, !count: > } else if (rsp->gp_state == GP_REPLAY) { Fairly straight forward: during EXIT someone did __rcu_sync_enter + rcu_sync_exit() and we need to wait longer still. > /* > * A new rcu_sync_exit() has happened; requeue the callback > * to catch a later GP. > */ > rsp->gp_state = GP_EXIT; > rcu_sync_call(rsp); > } else { Again two ways here, simple: we were EXIT, GP passed, we let em rip. But its also possible, as you noted, that someone called rcu_sync_exit() during ENTER and we ended up with !count which gets us back to IDLE. > /* > * We're at least a GP after the last rcu_sync_exit(); > * eveybody will now have observed the write side critical > * section. Let 'em rip!. > * > * OR. ->gp_state can be still GP_ENTER if __rcu_sync_wait() > * wasn't called after __rcu_sync_enter(), abort. > */ > rsp->gp_state = GP_IDLE; > } > spin_unlock_irqrestore(&rsp->rss_lock, flags); > } Agreed on the rest. So I think its solid, but you failed to mention one state transition, which seems ok in any case.