Re: [PATCH v2] docs/RCU: Bring back smp_wmb()

From: "Paul E. McKenney" <paulmck@kernel.org>
To: Alan Huang <mmpgouride@gmail.com>
Cc: frederic@kernel.org, quic_neeraju@quicinc.com,
	joel@joelfernandes.org, josh@joshtriplett.org,
	boqun.feng@gmail.com, corbet@lwn.net, rcu@vger.kernel.org,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH v2] docs/RCU: Bring back smp_wmb()
Date: Fri, 14 Jul 2023 16:23:06 -0700	[thread overview]
Message-ID: <9eaf506f-cc14-4da6-9efc-057c0c3e56b0@paulmck-laptop> (raw)
In-Reply-To: <20230711150906.466434-1-mmpgouride@gmail.com>

On Tue, Jul 11, 2023 at 03:09:06PM +0000, Alan Huang wrote:
> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is
> n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(),
> which modifies obj->obj_node.next. There may be readers holding the
> reference of obj in lockless_lookup, and when updater modifies ->next,
> readers can see the change immediately because of SLAB_TYPESAFE_BY_RCU.
> 
> There are two memory ordering required in the insertion algorithm,
> we need to make sure obj->key is updated before obj->obj_node.next
> and obj->refcnt, atomic_set_release is not enough to provide the
> required memory barrier.
> 
> Signed-off-by: Alan Huang <mmpgouride@gmail.com>

This is an interesting one!!!

Now I am having a hard time believing that the smp_rmb() suffices.

> ---
> Changelog:
>   v1 -> v2: Use _ONCE to protect obj->key.
> 
>  Documentation/RCU/rculist_nulls.rst | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> index 21e40fcc08de..2a9f5a63d334 100644
> --- a/Documentation/RCU/rculist_nulls.rst
> +++ b/Documentation/RCU/rculist_nulls.rst
> @@ -47,7 +47,7 @@ objects, which is having below type.
>      * reuse these object before the RCU grace period, we
>      * must check key after getting the reference on object
>      */
> -    if (obj->key != key) { // not the object we expected
> +    if (READ_ONCE(obj->key) != key) { // not the object we expected
>        put_ref(obj);
>        rcu_read_unlock();
>        goto begin;
> @@ -64,10 +64,10 @@ but a version with an additional memory barrier (smp_rmb())
>    {
>      struct hlist_node *node, *next;
>      for (pos = rcu_dereference((head)->first);
> -         pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
> +         pos && ({ next = READ_ONCE(pos->next); smp_rmb(); prefetch(next); 1; }) &&

Suppose that lockless_lookup() is delayed just before fetching pos->next,
and that there were 17 more node to search in the list.

Then consider the following sequence of events:

o	The updater deletes this same node and kmem_cache_free()s it.

o	Another updater kmem_cache_alloc()s that same memory and
	inserts it into an empty hash chain with a different key.

o	Then lockless_lookup() fetches pos->next and sees a NULL pointer,
	thus failing to search the remaining 17 nodes in the list,
	one of which had the desired key value.

o	The lookup algorithm resumes and sees the NULL return from
	lockless_lookup(), and ends up with a NULL obj.

	And this happens even with the strongest possible ordering
	everywhere.

OK, yes, it is late on Friday.  So what am I missing here?

Independent of that, does hlist_add_head_rcu() need to replace its
"n->next = first" with "WRITE_ONCE(n->next, first)"?

						Thanx, Paul

>           ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; });
>           pos = rcu_dereference(next))
> -      if (obj->key == key)
> +      if (READ_ONCE(obj->key) == key)
>          return obj;
>      return NULL;
>    }
> @@ -111,8 +111,13 @@ detect the fact that it missed following items in original chain.
>     */
>    obj = kmem_cache_alloc(...);
>    lock_chain(); // typically a spin_lock()
> -  obj->key = key;
> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> +  WRITE_ONCE(obj->key, key);
> +  /*
> +   * We need to make sure obj->key is updated before obj->obj_node.next
> +   * and obj->refcnt.
> +   */
> +  smp_wmb();
> +  atomic_set(&obj->refcnt, 1);
>    hlist_add_head_rcu(&obj->obj_node, list);
>    unlock_chain(); // typically a spin_unlock()
>  
> @@ -165,12 +170,12 @@ Note that using hlist_nulls means the type of 'obj_node' field of
>    begin:
>    rcu_read_lock();
>    hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) {
> -    if (obj->key == key) {
> +    if (READ_ONCE(obj->key) == key) {
>        if (!try_get_ref(obj)) { // might fail for free objects
>  	rcu_read_unlock();
>          goto begin;
>        }
> -      if (obj->key != key) { // not the object we expected
> +      if (READ_ONCE(obj->key) != key) { // not the object we expected
>          put_ref(obj);
>  	rcu_read_unlock();
>          goto begin;
> @@ -206,7 +211,7 @@ hlist_add_head_rcu().
>     */
>    obj = kmem_cache_alloc(cachep);
>    lock_chain(); // typically a spin_lock()
> -  obj->key = key;
> +  WRITE_ONCE(obj->key, key);
>    atomic_set_release(&obj->refcnt, 1); // key before refcnt
>    /*
>     * insert obj in RCU way (readers might be traversing chain)
> -- 
> 2.34.1
>