Re: [PATCH 2/5] rhashtable: don't hold lock on first table throughout insertion.

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: NeilBrown <neilb@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	Thomas Graf <tgraf@suug.ch>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/5] rhashtable: don't hold lock on first table throughout insertion.
Date: Tue, 24 Jul 2018 15:58:25 -0700	[thread overview]
Message-ID: <20180724225825.GE12945@linux.vnet.ibm.com> (raw)
In-Reply-To: <87r2jtpqm4.fsf@notabene.neil.brown.name>

On Tue, Jul 24, 2018 at 07:52:03AM +1000, NeilBrown wrote:
> On Mon, Jul 23 2018, Paul E. McKenney wrote:
> 
> > On Mon, Jul 23, 2018 at 09:13:43AM +1000, NeilBrown wrote:
> >> On Sun, Jul 22 2018, Paul E. McKenney wrote:
> >> >
> >> > One issue is that the ->func pointer can legitimately be NULL while on
> >> > RCU's callback lists.  This happens when someone invokes kfree_rcu()
> >> > with the rcu_head structure at the beginning of the enclosing structure.
> >> > I could add an offset to avoid this, or perhaps the kmalloc() folks
> >> > could be persuaded Rao Shoaib's patch moving kfree_rcu() handling to
> >> > the slab allocators, so that RCU only ever sees function pointers in
> >> > the ->func field.
> >> >
> >> > Either way, this should be hidden behind an API to allow adjustments
> >> > to be made if needed.  Maybe something like is_after_call_rcu()?
> >> > This would (for example) allow debug-object checks to be used to catch
> >> > check-after-free bugs.
> >> >
> >> > Would something of that sort work for you?
> >> 
> >> Yes, if you could provide an is_after_call_rcu() API, that would
> >> perfectly suit my use-case.
> >
> > After beating my head against the object-debug code a bit, I have to ask
> > if it would be OK for you if the is_after_call_rcu() API also takes the
> > function that was passed to RCU.
> 
> Sure.  It feels a bit clumsy, but I can see it could be easier to make
> robust.
> So yes: I'm fine with pass the same function and rcu_head to both
> call_rcu() and is_after_call_rcu().  Actually, when I say it like that,
> it seems less clumsy :-)

How about like this?  (It needs refinements, like lockdep, but should
get the gist.)

							Thanx, Paul

------------------------------------------------------------------------

commit 5aa0ebf4799b8bddbbd0124db1c008526e99fc7c
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Jul 24 15:28:09 2018 -0700

    rcu: Provide functions for determining if call_rcu() has been invoked
    
    This commit adds is_after_call_rcu() and is_after_call_rcu_init()
    functions to help RCU users detect when another CPU has passed
    the specified rcu_head structure and function to call_rcu().
    The is_after_call_rcu_init() should be invoked before making the
    structure visible to RCU readers, and then the is_after_call_rcu() may
    be invoked from within an RCU read-side critical section on an rcu_head
    structure that was obtained during a traversal of the data structure
    in question.  The is_after_call_rcu() function will return true if the
    rcu_head structure has already been passed (with the specified function)
    to call_rcu(), otherwise it will return false.
    
    If is_after_call_rcu_init() has not been invoked on the rcu_head
    structure or if the rcu_head (AKA callback) has already been invoked,
    then is_after_call_rcu() will do WARN_ON_ONCE().
    
    Reported-by: NeilBrown <neilb@suse.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index e4f821165d0b..82e5a91539b5 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -857,6 +857,45 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
 #endif /* #else #ifdef CONFIG_ARCH_WEAK_RELEASE_ACQUIRE */
 
 
+/* Has the specified rcu_head structure been handed to call_rcu()? */
+
+/*
+ * is_after_call_rcu_init - Initialize rcu_head for is_after_call_rcu()
+ * @rhp: The rcu_head structure to initialize.
+ *
+ * If you intend to invoke is_after_call_rcu() to test whether a given
+ * rcu_head structure has already been passed to call_rcu(), then you must
+ * also invoke this is_after_call_rcu_init() function on it just after
+ * allocating that structure.  Calls to this function must not race with
+ * calls to call_rcu(), is_after_call_rcu(), or callback invocation.
+ */
+static inline void is_after_call_rcu_init(struct rcu_head *rhp)
+{
+	rhp->func = (rcu_callback_t)~0L;
+}
+
+/*
+ * is_after_call_rcu - Has this rcu_head been passed to call_rcu()?
+ * @rhp: The rcu_head structure to test.
+ * @func: The function passed to call_rcu() along with @rhp.
+ *
+ * Returns @true if the @rhp has been passed to call_rcu() with @func, and
+ * @false otherwise.  Emits a warning in any other case, including the
+ * case where @rhp has already been invoked after a grace period.
+ * Calls to this function must not race with callback invocation.  One
+ * way to avoid such races is to enclose the call to is_after_call_rcu()
+ * in an RCU read-side critical section that includes a read-side fetch
+ * of the pointer to the structure containing @rhp.
+ */
+static inline bool is_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
+{
+	if (READ_ONCE(rhp->func) == f)
+		return true;
+	WARN_ON_ONCE(READ_ONCE(rhp->func) != (rcu_callback_t)~0L);
+	return false;
+}
+
+
 /* Transitional pre-consolidation compatibility definitions. */
 
 static inline void synchronize_rcu_bh(void)
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 5dec94509a7e..4c56c1d98fb3 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -224,6 +224,7 @@ void kfree(const void *);
  */
 static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
 {
+	rcu_callback_t f;
 	unsigned long offset = (unsigned long)head->func;
 
 	rcu_lock_acquire(&rcu_callback_map);
@@ -234,7 +235,9 @@ static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
 		return true;
 	} else {
 		RCU_TRACE(trace_rcu_invoke_callback(rn, head);)
-		head->func(head);
+		f = head->func;
+		WRITE_ONCE(head->func, (rcu_callback_t)0L);
+		f(head);
 		rcu_lock_release(&rcu_callback_map);
 		return false;
 	}