From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758699AbZBLQSV (ORCPT ); Thu, 12 Feb 2009 11:18:21 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753099AbZBLQSJ (ORCPT ); Thu, 12 Feb 2009 11:18:09 -0500 Received: from e4.ny.us.ibm.com ([32.97.182.144]:50445 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751681AbZBLQSI (ORCPT ); Thu, 12 Feb 2009 11:18:08 -0500 Date: Thu, 12 Feb 2009 08:18:05 -0800 From: "Paul E. McKenney" To: Mathieu Desnoyers Cc: ltt-dev@lists.casi.polymtl.ca, linux-kernel@vger.kernel.org Subject: Re: [ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost) Message-ID: <20090212161805.GB6759@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20090211153246.GA6694@linux.vnet.ibm.com> <20090211185203.GA29852@Krystal> <20090211200903.GG6694@linux.vnet.ibm.com> <20090211214258.GA32407@Krystal> <20090212003549.GU6694@linux.vnet.ibm.com> <20090212023308.GA21157@linux.vnet.ibm.com> <20090212023707.GA21193@linux.vnet.ibm.com> <20090212041044.GA12612@Krystal> <20090212050909.GB8317@linux.vnet.ibm.com> <20090212054707.GA15577@Krystal> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="a8Wt8u1KmwUX3Y2C" Content-Disposition: inline In-Reply-To: <20090212054707.GA15577@Krystal> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --a8Wt8u1KmwUX3Y2C Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Feb 12, 2009 at 12:47:07AM -0500, Mathieu Desnoyers wrote: > * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote: > > On Wed, Feb 11, 2009 at 11:10:44PM -0500, Mathieu Desnoyers wrote: > > > * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote: > > > > On Wed, Feb 11, 2009 at 06:33:08PM -0800, Paul E. McKenney wrote: > > > > > On Wed, Feb 11, 2009 at 04:35:49PM -0800, Paul E. McKenney wrote: > > > > > > On Wed, Feb 11, 2009 at 04:42:58PM -0500, Mathieu Desnoyers wrote: > > > > > > > * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote: > > > > > > > > [ . . . ] > > > > > > > > > > > > (BTW, I do not trust my model yet, as it currently cannot detect the > > > > > > > > failure case I pointed out earlier. :-/ Here and I thought that the > > > > > > > > point of such models was to detect additional failure cases!!!) > > > > > > > > > > > > > > > > > > > > > > Yes, I'll have to dig deeper into it. > > > > > > > > > > > > Well, as I said, I attached the current model and the error trail. > > > > > > > > > > And I had bugs in my model that allowed the rcu_read_lock() model > > > > > to nest indefinitely, which overflowed into the top bit, messing > > > > > things up. :-/ > > > > > > > > > > Attached is a fixed model. This model validates correctly (woo-hoo!). > > > > > Even better, gives the expected error if you comment out line 180 and > > > > > uncomment line 213, this latter corresponding to the error case I called > > > > > out a few days ago. > > > > > > > > > > I will play with removing models of mb... > > > > > > > > And commenting out the models of mb between the counter flips and the > > > > test for readers still passes validation, as expected, and as shown in > > > > the attached Promela code. > > > > > > > > > > Hrm, in the email I sent you about the memory barrier, I said that it > > > would not make the algorithm incorrect, but that it would cause > > > situations where it would be impossible for the writer to do any > > > progress as long as there are readers active. I think we would have to > > > enhance the model or at least express this through some LTL statement to > > > validate this specific behavior. > > > > But if the writer fails to make progress, then the counter remains at a > > given value, which causes readers to drain, which allows the writer to > > eventually make progress again. Right? > > > > Not necessarily. If we don't have the proper memory barriers, we can > have the writer waiting on, say, parity 0 *before* it has performed the > parity switch. Therefore, even newly coming readers will add up to > parity 0. But the write that changes the parity will eventually make it out. OK, so your argument is that we at least need a compiler barrier? Regardless, please see attached for a modified version of the Promela model that fully models omitting out the memory barrier that my rcu_nest32.[hc] implementation omits. (It is possible to partially model removal of other memory barriers via #if 0, but to fully model would need to enumerate the permutations as shown on lines 231-257.) > In your model, this is not detected, because eventually all readers will > execute, and only then the writer will be able to update the data. But > in reality, if we run a very busy 4096-cores machines where there is > always at least one reader active, the the writer will be stuck forever, > and that's really bad. Assuming that the reordering is done by the CPU, the write will eventually get out -- it is stuck in (say) the store buffer, and the cache line will eventually arrive, and then the value will eventually be seen by the readers. We might need a -compiler- barrier, but then again, I am not sure that we are talking about the same memory barrier -- again, please see attached lines 231-257 to see which one that I eliminated. Also, the original model I sent out has a minor bug that prevents it from fully modeling the nested-read-side case. The patch below fixes this. Signed-off-by: Paul E. McKenney --- urcu.spin | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/formal-model/urcu.spin b/formal-model/urcu.spin index e5bfff3..611464b 100644 --- a/formal-model/urcu.spin +++ b/formal-model/urcu.spin @@ -124,9 +124,13 @@ proctype urcu_reader() break; :: tmp < 4 && reader_progress[tmp] != 0 -> tmp = tmp + 1; - :: tmp >= 4 -> + :: tmp >= 4 && + reader_progress[0] == reader_progress[3] -> done = 1; break; + :: tmp >= 4 && + reader_progress[0] != reader_progress[3] -> + break; od; do :: tmp < 4 && reader_progress[tmp] == 0 -> --a8Wt8u1KmwUX3Y2C Content-Type: text/plain; charset=us-ascii Content-Description: urcu_mbmin.spin Content-Disposition: attachment; filename="urcu_mbmin.spin" /* * urcu_mbmin.spin: Promela code to validate urcu. See commit number * 3a9e6e9df706b8d39af94d2f027210e2e7d4106e of Mathieu Desnoyer's * git archive at git://lttng.org/userspace-rcu.git, but with * memory barriers removed. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (c) 2009 Paul E. McKenney, IBM Corporation. */ /* Promela validation variables. */ bit removed = 0; /* Has RCU removal happened, e.g., list_del_rcu()? */ bit free = 0; /* Has RCU reclamation happened, e.g., kfree()? */ bit need_mb = 0; /* =1 says need reader mb, =0 for reader response. */ byte reader_progress[4]; /* Count of read-side statement executions. */ /* urcu definitions and variables, taken straight from the algorithm. */ #define RCU_GP_CTR_BIT (1 << 7) #define RCU_GP_CTR_NEST_MASK (RCU_GP_CTR_BIT - 1) byte urcu_gp_ctr = 1; byte urcu_active_readers = 0; /* Model the RCU read-side critical section. */ proctype urcu_reader() { bit done = 0; bit mbok; byte tmp; byte tmp_removed; byte tmp_free; /* Absorb any early requests for memory barriers. */ do :: need_mb == 1 -> need_mb = 0; :: 1 -> skip; :: 1 -> break; od; /* * Each pass through this loop executes one read-side statement * from the following code fragment: * * rcu_read_lock(); [0a] * rcu_read_lock(); [0b] * p = rcu_dereference(global_p); [1] * x = p->data; [2] * rcu_read_unlock(); [3b] * rcu_read_unlock(); [3a] * * Because we are modeling a weak-memory machine, these statements * can be seen in any order, the only restriction being that * rcu_read_unlock() cannot precede the corresponding rcu_read_lock(). * The placement of the inner rcu_read_lock() and rcu_read_unlock() * is non-deterministic, the above is but one possible placement. * Intestingly enough, this model validates all possible placements * of the inner rcu_read_lock() and rcu_read_unlock() statements, * with the only constraint being that the rcu_read_lock() must * precede the rcu_read_unlock(). * * We also respond to memory-barrier requests, but only if our * execution happens to be ordered. If the current state is * misordered, we ignore memory-barrier requests. */ do :: 1 -> if :: reader_progress[0] < 2 -> /* [0a and 0b] */ tmp = urcu_active_readers; if :: (tmp & RCU_GP_CTR_NEST_MASK) == 0 -> tmp = urcu_gp_ctr; do :: (reader_progress[1] + reader_progress[2] + reader_progress[3] == 0) && need_mb == 1 -> need_mb = 0; :: 1 -> skip; :: 1 -> break; od; urcu_active_readers = tmp; :: else -> urcu_active_readers = tmp + 1; fi; reader_progress[0] = reader_progress[0] + 1; :: reader_progress[1] == 0 -> /* [1] */ tmp_removed = removed; reader_progress[1] = 1; :: reader_progress[2] == 0 -> /* [2] */ tmp_free = free; reader_progress[2] = 1; :: ((reader_progress[0] > reader_progress[3]) && (reader_progress[3] < 2)) -> /* [3a and 3b] */ tmp = urcu_active_readers - 1; urcu_active_readers = tmp; reader_progress[3] = reader_progress[3] + 1; :: else -> break; fi; /* Process memory-barrier requests, if it is safe to do so. */ atomic { mbok = 0; tmp = 0; do :: tmp < 4 && reader_progress[tmp] == 0 -> tmp = tmp + 1; break; :: tmp < 4 && reader_progress[tmp] != 0 -> tmp = tmp + 1; :: tmp >= 4 && reader_progress[0] == reader_progress[3] -> done = 1; break; :: tmp >= 4 && reader_progress[0] != reader_progress[3] -> break; od; do :: tmp < 4 && reader_progress[tmp] == 0 -> tmp = tmp + 1; :: tmp < 4 && reader_progress[tmp] != 0 -> break; :: tmp >= 4 -> mbok = 1; break; od } if :: mbok == 1 -> /* We get here if mb processing is safe. */ do :: need_mb == 1 -> need_mb = 0; :: 1 -> skip; :: 1 -> break; od; :: else -> skip; fi; /* * Check to see if we have modeled the entire RCU read-side * critical section, and leave if so. */ if :: done == 1 -> break; :: else -> skip; fi od; assert((tmp_free == 0) || (tmp_removed == 1)); /* Process any late-arriving memory-barrier requests. */ do :: need_mb == 1 -> need_mb = 0; :: 1 -> skip; :: 1 -> break; od; } /* Model the RCU update process. */ proctype urcu_updater() { byte tmp; /* prior synchronize_rcu(), second counter flip. */ need_mb = 1; /* mb() A */ do :: need_mb == 1 -> skip; :: need_mb == 0 -> break; od; urcu_gp_ctr = urcu_gp_ctr + RCU_GP_CTR_BIT; need_mb = 1; /* mb() B */ do :: need_mb == 1 -> skip; :: need_mb == 0 -> break; od; do :: 1 -> if :: (urcu_active_readers & RCU_GP_CTR_NEST_MASK) != 0 && (urcu_active_readers & ~RCU_GP_CTR_NEST_MASK) != (urcu_gp_ctr & ~RCU_GP_CTR_NEST_MASK) -> skip; :: else -> break; fi od; need_mb = 1; /* mb() C absolutely required by analogy with G */ do :: need_mb == 1 -> skip; :: need_mb == 0 -> break; od; /* Removal statement, e.g., list_del_rcu(). */ removed = 1; /* current synchronize_rcu(), first counter flip. */ need_mb = 1; /* mb() D suggested */ do :: need_mb == 1 -> skip; :: need_mb == 0 -> break; od; urcu_gp_ctr = urcu_gp_ctr + RCU_GP_CTR_BIT; need_mb = 1; /* mb() E required if D not present */ do :: need_mb == 1 -> skip; :: need_mb == 0 -> break; od; /* current synchronize_rcu(), first-flip check plus second flip. */ if :: 1 -> do :: 1 -> if :: (urcu_active_readers & RCU_GP_CTR_NEST_MASK) != 0 && (urcu_active_readers & ~RCU_GP_CTR_NEST_MASK) != (urcu_gp_ctr & ~RCU_GP_CTR_NEST_MASK) -> skip; :: else -> break; fi; od; urcu_gp_ctr = urcu_gp_ctr + RCU_GP_CTR_BIT; :: 1 -> tmp = urcu_gp_ctr; urcu_gp_ctr = urcu_gp_ctr + RCU_GP_CTR_BIT; do :: 1 -> if :: (urcu_active_readers & RCU_GP_CTR_NEST_MASK) != 0 && (urcu_active_readers & ~RCU_GP_CTR_NEST_MASK) != (tmp & ~RCU_GP_CTR_NEST_MASK) -> skip; :: else -> break; fi; od; fi; /* current synchronize_rcu(), second counter flip check. */ need_mb = 1; /* mb() F not required */ do :: need_mb == 1 -> skip; :: need_mb == 0 -> break; od; do :: 1 -> if :: (urcu_active_readers & RCU_GP_CTR_NEST_MASK) != 0 && (urcu_active_readers & ~RCU_GP_CTR_NEST_MASK) != (urcu_gp_ctr & ~RCU_GP_CTR_NEST_MASK) -> skip; :: else -> break; fi; od; need_mb = 1; /* mb() G absolutely required */ do :: need_mb == 1 -> skip; :: need_mb == 0 -> break; od; /* free-up step, e.g., kfree(). */ free = 1; } /* * Initialize the array, spawn a reader and an updater. Because readers * are independent of each other, only one reader is needed. */ init { atomic { reader_progress[0] = 0; reader_progress[1] = 0; reader_progress[2] = 0; reader_progress[3] = 0; run urcu_reader(); run urcu_updater(); } } --a8Wt8u1KmwUX3Y2C Content-Type: application/x-sh Content-Description: urcu_mbmin.sh Content-Disposition: attachment; filename="urcu_mbmin.sh" Content-Transfer-Encoding: quoted-printable spin -a urcu_mbmin.spin =0Acc -DSAFETY -o pan pan.c=0A./pan=0A --a8Wt8u1KmwUX3Y2C--