netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] A few rhashtables cleanups
@ 2018-04-23 22:29 NeilBrown
  2018-04-23 22:29 ` [PATCH 4/4] rhashtable: improve rhashtable_walk stability when stop/start used NeilBrown
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: NeilBrown @ 2018-04-23 22:29 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu, David Miller; +Cc: netdev, linux-kernel

2 patches fixes documentation
1 fixes a bit in rhashtable_walk_start()
1 improves rhashtable_walk stability.

All reviewed and Acked.

Thanks,
NeilBrown


---

NeilBrown (4):
      rhashtable: remove outdated comments about grow_decision etc
      rhashtable: Revise incorrect comment on r{hl,hash}table_walk_enter()
      rhashtable: reset iter when rhashtable_walk_start sees new table
      rhashtable: improve rhashtable_walk stability when stop/start used.


 include/linux/rhashtable.h |   38 +++++++++++++++------------------
 lib/rhashtable.c           |   51 ++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 63 insertions(+), 26 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] rhashtable: remove outdated comments about grow_decision etc
  2018-04-23 22:29 [PATCH 0/4] A few rhashtables cleanups NeilBrown
                   ` (2 preceding siblings ...)
  2018-04-23 22:29 ` [PATCH 2/4] rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter() NeilBrown
@ 2018-04-23 22:29 ` NeilBrown
  2018-04-24 17:38 ` [PATCH 0/4] A few rhashtables cleanups David Miller
  4 siblings, 0 replies; 9+ messages in thread
From: NeilBrown @ 2018-04-23 22:29 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu, David Miller; +Cc: netdev, linux-kernel

grow_decision and shink_decision no longer exist, so remove
the remaining references to them.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 include/linux/rhashtable.h |   33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 1f8ad121eb43..87d443a5b11d 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -836,9 +836,8 @@ static inline void *__rhashtable_insert_fast(
  *
  * It is safe to call this function from atomic context.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  */
 static inline int rhashtable_insert_fast(
 	struct rhashtable *ht, struct rhash_head *obj,
@@ -866,9 +865,8 @@ static inline int rhashtable_insert_fast(
  *
  * It is safe to call this function from atomic context.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  */
 static inline int rhltable_insert_key(
 	struct rhltable *hlt, const void *key, struct rhlist_head *list,
@@ -890,9 +888,8 @@ static inline int rhltable_insert_key(
  *
  * It is safe to call this function from atomic context.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  */
 static inline int rhltable_insert(
 	struct rhltable *hlt, struct rhlist_head *list,
@@ -922,9 +919,8 @@ static inline int rhltable_insert(
  *
  * It is safe to call this function from atomic context.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  */
 static inline int rhashtable_lookup_insert_fast(
 	struct rhashtable *ht, struct rhash_head *obj,
@@ -981,9 +977,8 @@ static inline void *rhashtable_lookup_get_insert_fast(
  *
  * Lookups may occur in parallel with hashtable mutations and resizing.
  *
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
  *
  * Returns zero on success.
  */
@@ -1134,8 +1129,8 @@ static inline int __rhashtable_remove_fast(
  * walk the bucket chain upon removal. The removal operation is thus
  * considerable slow if the hash table is not correctly sized.
  *
- * Will automatically shrink the table via rhashtable_expand() if the
- * shrink_decision function specified at rhashtable_init() returns true.
+ * Will automatically shrink the table if permitted when residency drops
+ * below 30%.
  *
  * Returns zero on success, -ENOENT if the entry could not be found.
  */
@@ -1156,8 +1151,8 @@ static inline int rhashtable_remove_fast(
  * walk the bucket chain upon removal. The removal operation is thus
  * considerable slow if the hash table is not correctly sized.
  *
- * Will automatically shrink the table via rhashtable_expand() if the
- * shrink_decision function specified at rhashtable_init() returns true.
+ * Will automatically shrink the table if permitted when residency drops
+ * below 30%
  *
  * Returns zero on success, -ENOENT if the entry could not be found.
  */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter()
  2018-04-23 22:29 [PATCH 0/4] A few rhashtables cleanups NeilBrown
  2018-04-23 22:29 ` [PATCH 4/4] rhashtable: improve rhashtable_walk stability when stop/start used NeilBrown
  2018-04-23 22:29 ` [PATCH 3/4] rhashtable: reset iter when rhashtable_walk_start sees new table NeilBrown
@ 2018-04-23 22:29 ` NeilBrown
  2018-04-23 22:29 ` [PATCH 1/4] rhashtable: remove outdated comments about grow_decision etc NeilBrown
  2018-04-24 17:38 ` [PATCH 0/4] A few rhashtables cleanups David Miller
  4 siblings, 0 replies; 9+ messages in thread
From: NeilBrown @ 2018-04-23 22:29 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu, David Miller; +Cc: netdev, linux-kernel

Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though
they do take a spinlock without irq protection.
So revise the comments to accurately state the contexts in which
these functions can be called.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 include/linux/rhashtable.h |    5 +++--
 lib/rhashtable.c           |    5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 87d443a5b11d..4e1f535c2034 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -1268,8 +1268,9 @@ static inline int rhashtable_walk_init(struct rhashtable *ht,
  * For a completely stable walk you should construct your own data
  * structure outside the hash table.
  *
- * This function may sleep so you must not call it from interrupt
- * context or with spin locks held.
+ * This function may be called from any process context, including
+ * non-preemptable context, but cannot be called from softirq or
+ * hardirq context.
  *
  * You must call rhashtable_walk_exit after this function returns.
  */
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 2b2b79974b61..6d490f51174e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -668,8 +668,9 @@ EXPORT_SYMBOL_GPL(rhashtable_insert_slow);
  * For a completely stable walk you should construct your own data
  * structure outside the hash table.
  *
- * This function may sleep so you must not call it from interrupt
- * context or with spin locks held.
+ * This function may be called from any process context, including
+ * non-preemptable context, but cannot be called from softirq or
+ * hardirq context.
  *
  * You must call rhashtable_walk_exit after this function returns.
  */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] rhashtable: reset iter when rhashtable_walk_start sees new table
  2018-04-23 22:29 [PATCH 0/4] A few rhashtables cleanups NeilBrown
  2018-04-23 22:29 ` [PATCH 4/4] rhashtable: improve rhashtable_walk stability when stop/start used NeilBrown
@ 2018-04-23 22:29 ` NeilBrown
  2018-04-23 22:29 ` [PATCH 2/4] rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter() NeilBrown
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: NeilBrown @ 2018-04-23 22:29 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu, David Miller; +Cc: netdev, linux-kernel

The documentation claims that when rhashtable_walk_start_check()
detects a resize event, it will rewind back to the beginning
of the table.  This is not true.  We need to set ->slot and
->skip to be zero for it to be true.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 lib/rhashtable.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6d490f51174e..81edf1ab38ab 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -737,6 +737,8 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter)
 
 	if (!iter->walker.tbl && !iter->end_of_table) {
 		iter->walker.tbl = rht_dereference_rcu(ht->tbl, ht);
+		iter->slot = 0;
+		iter->skip = 0;
 		return -EAGAIN;
 	}
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] rhashtable: improve rhashtable_walk stability when stop/start used.
  2018-04-23 22:29 [PATCH 0/4] A few rhashtables cleanups NeilBrown
@ 2018-04-23 22:29 ` NeilBrown
  2018-07-05 11:20   ` [4/4] " Paolo Abeni
  2018-04-23 22:29 ` [PATCH 3/4] rhashtable: reset iter when rhashtable_walk_start sees new table NeilBrown
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2018-04-23 22:29 UTC (permalink / raw)
  To: Thomas Graf, Herbert Xu, David Miller; +Cc: netdev, linux-kernel

When a walk of an rhashtable is interrupted with rhastable_walk_stop()
and then rhashtable_walk_start(), the location to restart from is based
on a 'skip' count in the current hash chain, and this can be incorrect
if insertions or deletions have happened.  This does not happen when
the walk is not stopped and started as iter->p is a placeholder which
is safe to use while holding the RCU read lock.

In rhashtable_walk_start() we can revalidate that 'p' is still in the
same hash chain.  If it isn't then the current method is still used.

With this patch, if a rhashtable walker ensures that the current
object remains in the table over a stop/start period (possibly by
elevating the reference count if that is sufficient), it can be sure
that a walk will not miss objects that were in the hashtable for the
whole time of the walk.

rhashtable_walk_start() may not find the object even though it is
still in the hashtable if a rehash has moved it to a new table.  In
this case it will (eventually) get -EAGAIN and will need to proceed
through the whole table again to be sure to see everything at least
once.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 lib/rhashtable.c |   44 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 81edf1ab38ab..9427b5766134 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -727,6 +727,7 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter)
 	__acquires(RCU)
 {
 	struct rhashtable *ht = iter->ht;
+	bool rhlist = ht->rhlist;
 
 	rcu_read_lock();
 
@@ -735,13 +736,52 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter)
 		list_del(&iter->walker.list);
 	spin_unlock(&ht->lock);
 
-	if (!iter->walker.tbl && !iter->end_of_table) {
+	if (iter->end_of_table)
+		return 0;
+	if (!iter->walker.tbl) {
 		iter->walker.tbl = rht_dereference_rcu(ht->tbl, ht);
 		iter->slot = 0;
 		iter->skip = 0;
 		return -EAGAIN;
 	}
 
+	if (iter->p && !rhlist) {
+		/*
+		 * We need to validate that 'p' is still in the table, and
+		 * if so, update 'skip'
+		 */
+		struct rhash_head *p;
+		int skip = 0;
+		rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
+			skip++;
+			if (p == iter->p) {
+				iter->skip = skip;
+				goto found;
+			}
+		}
+		iter->p = NULL;
+	} else if (iter->p && rhlist) {
+		/* Need to validate that 'list' is still in the table, and
+		 * if so, update 'skip' and 'p'.
+		 */
+		struct rhash_head *p;
+		struct rhlist_head *list;
+		int skip = 0;
+		rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
+			for (list = container_of(p, struct rhlist_head, rhead);
+			     list;
+			     list = rcu_dereference(list->next)) {
+				skip++;
+				if (list == iter->list) {
+					iter->p = p;
+					skip = skip;
+					goto found;
+				}
+			}
+		}
+		iter->p = NULL;
+	}
+found:
 	return 0;
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_start_check);
@@ -917,8 +957,6 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
 		iter->walker.tbl = NULL;
 	spin_unlock(&ht->lock);
 
-	iter->p = NULL;
-
 out:
 	rcu_read_unlock();
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] A few rhashtables cleanups
  2018-04-23 22:29 [PATCH 0/4] A few rhashtables cleanups NeilBrown
                   ` (3 preceding siblings ...)
  2018-04-23 22:29 ` [PATCH 1/4] rhashtable: remove outdated comments about grow_decision etc NeilBrown
@ 2018-04-24 17:38 ` David Miller
  4 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2018-04-24 17:38 UTC (permalink / raw)
  To: neilb; +Cc: tgraf, herbert, netdev, linux-kernel

From: NeilBrown <neilb@suse.com>
Date: Tue, 24 Apr 2018 08:29:13 +1000

> 2 patches fixes documentation
> 1 fixes a bit in rhashtable_walk_start()
> 1 improves rhashtable_walk stability.
> 
> All reviewed and Acked.

Series applied to net-next, thanks Neil.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [4/4] rhashtable: improve rhashtable_walk stability when stop/start used.
  2018-04-23 22:29 ` [PATCH 4/4] rhashtable: improve rhashtable_walk stability when stop/start used NeilBrown
@ 2018-07-05 11:20   ` Paolo Abeni
  2018-07-05 22:56     ` NeilBrown
  2018-07-07 22:11     ` NeilBrown
  0 siblings, 2 replies; 9+ messages in thread
From: Paolo Abeni @ 2018-07-05 11:20 UTC (permalink / raw)
  To: NeilBrown, Thomas Graf, Herbert Xu, David Miller; +Cc: netdev, linux-kernel

On Tue, 2018-04-24 at 08:29 +1000, NeilBrown wrote:
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 81edf1ab38ab..9427b5766134 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -727,6 +727,7 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter)
>  	__acquires(RCU)
>  {
>  	struct rhashtable *ht = iter->ht;
> +	bool rhlist = ht->rhlist;
>  
>  	rcu_read_lock();
>  
> @@ -735,13 +736,52 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter)
>  		list_del(&iter->walker.list);
>  	spin_unlock(&ht->lock);
>  
> -	if (!iter->walker.tbl && !iter->end_of_table) {
> +	if (iter->end_of_table)
> +		return 0;
> +	if (!iter->walker.tbl) {
>  		iter->walker.tbl = rht_dereference_rcu(ht->tbl, ht);
>  		iter->slot = 0;
>  		iter->skip = 0;
>  		return -EAGAIN;
>  	}
>  
> +	if (iter->p && !rhlist) {
> +		/*
> +		 * We need to validate that 'p' is still in the table, and
> +		 * if so, update 'skip'
> +		 */
> +		struct rhash_head *p;
> +		int skip = 0;
> +		rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
> +			skip++;
> +			if (p == iter->p) {
> +				iter->skip = skip;
> +				goto found;
> +			}
> +		}
> +		iter->p = NULL;
> +	} else if (iter->p && rhlist) {
> +		/* Need to validate that 'list' is still in the table, and
> +		 * if so, update 'skip' and 'p'.
> +		 */
> +		struct rhash_head *p;
> +		struct rhlist_head *list;
> +		int skip = 0;
> +		rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
> +			for (list = container_of(p, struct rhlist_head, rhead);
> +			     list;
> +			     list = rcu_dereference(list->next)) {
> +				skip++;
> +				if (list == iter->list) {
> +					iter->p = p;
> +					skip = skip;
> +					goto found;
> +				}
> +			}
> +		}
> +		iter->p = NULL;
> +	}
> +found:
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(rhashtable_walk_start_check);

While testing new code that uses the rhashtable walker, I'm obeserving
an use after free, that is apparently caused by the above:

[  146.834815] ==================================================================
[  146.842933] BUG: KASAN: use-after-free in inet_frag_worker+0x9f/0x210
[  146.850120] Read of size 4 at addr ffff881b6b9342d8 by task kworker/13:1/177
[  146.857984] 
[  146.859645] CPU: 13 PID: 177 Comm: kworker/13:1 Not tainted 4.18.0-rc3.mirror_unclone_6_frag_dbg+ #1974
[  146.870128] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
[  146.878478] Workqueue: events inet_frag_worker
[  146.883433] Call Trace:
[  146.886162]  dump_stack+0x90/0xe3
[  146.889861]  print_address_description+0x6a/0x2a0
[  146.895109]  kasan_report+0x176/0x2d0
[  146.899193]  ? inet_frag_worker+0x9f/0x210
[  146.903762]  inet_frag_worker+0x9f/0x210
[  146.908142]  process_one_work+0x24f/0x6e0
[  146.912614]  ? process_one_work+0x1a6/0x6e0
[  146.917285]  worker_thread+0x4e/0x3d0
[  146.921373]  kthread+0x106/0x140
[  146.924970]  ? process_one_work+0x6e0/0x6e0
[  146.929637]  ? kthread_bind+0x10/0x10
[  146.933723]  ret_from_fork+0x3a/0x50
[  146.937717] 
[  146.939377] Allocated by task 177:
[  146.943170]  kasan_kmalloc+0x86/0xb0
[  146.947158]  __kmalloc_node+0x181/0x430
[  146.951438]  kvmalloc_node+0x4f/0x70
[  146.955427]  alloc_bucket_spinlocks+0x34/0xa0
[  146.960286]  bucket_table_alloc.isra.13+0x61/0x180
[  146.965630]  rhashtable_rehash_alloc+0x26/0x80
[  146.970585]  rht_deferred_worker+0x394/0x810
[  146.975348]  process_one_work+0x24f/0x6e0
[  146.979819]  worker_thread+0x4e/0x3d0
[  146.983902]  kthread+0x106/0x140
[  146.987502]  ret_from_fork+0x3a/0x50
[  146.991487] 
[  146.993146] Freed by task 90:
[  146.996455]  __kasan_slab_free+0x11d/0x180
[  147.001025]  kfree+0x113/0x350
[  147.004431]  bucket_table_free+0x1c/0x70
[  147.008806]  rcu_process_callbacks+0x6c8/0x880
[  147.013762]  __do_softirq+0xd5/0x505
[  147.017747] 
[  147.019407] The buggy address belongs to the object at ffff881b6b934200
[  147.019407]  which belongs to the cache kmalloc-8192 of size 8192
[  147.033574] The buggy address is located 216 bytes inside of
[  147.033574]  8192-byte region [ffff881b6b934200, ffff881b6b936200)
[  147.046773] The buggy address belongs to the page:
[  147.052116] page:ffffea006dae4c00 count:1 mapcount:0 mapping:ffff880107c0e400 index:0x0 compound_mapcount: 0
[  147.063086] flags: 0x17ffffc0008100(slab|head)
[  147.068043] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880107c0e400
[  147.076684] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
[  147.085324] page dumped because: kasan: bad access detected
[  147.091540] 
[  147.093210] Memory state around the buggy address:
[  147.098553]  ffff881b6b934180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  147.106613]  ffff881b6b934200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  147.114670] >ffff881b6b934280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  147.122730]                                                     ^
[  147.129527]  ffff881b6b934300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  147.137587]  ffff881b6b934380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  147.145646] ==================================================================

Reverting the above change avoid the issue. 

I *think* that reusing iter->p after a
rcu_read_lock()/rcu_read_unlock() is unsafe:
if the table has been reashed we can still and reuse 'p', but the
related grace period could be already expired.

I can't think of any better fix than a revert. Other opinions welcome!

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [4/4] rhashtable: improve rhashtable_walk stability when stop/start used.
  2018-07-05 11:20   ` [4/4] " Paolo Abeni
@ 2018-07-05 22:56     ` NeilBrown
  2018-07-07 22:11     ` NeilBrown
  1 sibling, 0 replies; 9+ messages in thread
From: NeilBrown @ 2018-07-05 22:56 UTC (permalink / raw)
  To: Paolo Abeni, Thomas Graf, Herbert Xu, David Miller; +Cc: netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6361 bytes --]

On Thu, Jul 05 2018, Paolo Abeni wrote:

> On Tue, 2018-04-24 at 08:29 +1000, NeilBrown wrote:
>> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
>> index 81edf1ab38ab..9427b5766134 100644
>> --- a/lib/rhashtable.c
>> +++ b/lib/rhashtable.c
>> @@ -727,6 +727,7 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter)
>>  	__acquires(RCU)
>>  {
>>  	struct rhashtable *ht = iter->ht;
>> +	bool rhlist = ht->rhlist;
>>  
>>  	rcu_read_lock();
>>  
>> @@ -735,13 +736,52 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter)
>>  		list_del(&iter->walker.list);
>>  	spin_unlock(&ht->lock);
>>  
>> -	if (!iter->walker.tbl && !iter->end_of_table) {
>> +	if (iter->end_of_table)
>> +		return 0;
>> +	if (!iter->walker.tbl) {
>>  		iter->walker.tbl = rht_dereference_rcu(ht->tbl, ht);
>>  		iter->slot = 0;
>>  		iter->skip = 0;
>>  		return -EAGAIN;
>>  	}
>>  
>> +	if (iter->p && !rhlist) {
>> +		/*
>> +		 * We need to validate that 'p' is still in the table, and
>> +		 * if so, update 'skip'
>> +		 */
>> +		struct rhash_head *p;
>> +		int skip = 0;
>> +		rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
>> +			skip++;
>> +			if (p == iter->p) {
>> +				iter->skip = skip;
>> +				goto found;
>> +			}
>> +		}
>> +		iter->p = NULL;
>> +	} else if (iter->p && rhlist) {
>> +		/* Need to validate that 'list' is still in the table, and
>> +		 * if so, update 'skip' and 'p'.
>> +		 */
>> +		struct rhash_head *p;
>> +		struct rhlist_head *list;
>> +		int skip = 0;
>> +		rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
>> +			for (list = container_of(p, struct rhlist_head, rhead);
>> +			     list;
>> +			     list = rcu_dereference(list->next)) {
>> +				skip++;
>> +				if (list == iter->list) {
>> +					iter->p = p;
>> +					skip = skip;
>> +					goto found;
>> +				}
>> +			}
>> +		}
>> +		iter->p = NULL;
>> +	}
>> +found:
>>  	return 0;
>>  }
>>  EXPORT_SYMBOL_GPL(rhashtable_walk_start_check);
>
> While testing new code that uses the rhashtable walker, I'm obeserving
> an use after free, that is apparently caused by the above:
>
> [  146.834815] ==================================================================
> [  146.842933] BUG: KASAN: use-after-free in inet_frag_worker+0x9f/0x210
> [  146.850120] Read of size 4 at addr ffff881b6b9342d8 by task kworker/13:1/177
> [  146.857984] 
> [  146.859645] CPU: 13 PID: 177 Comm: kworker/13:1 Not tainted 4.18.0-rc3.mirror_unclone_6_frag_dbg+ #1974
> [  146.870128] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
> [  146.878478] Workqueue: events inet_frag_worker
> [  146.883433] Call Trace:
> [  146.886162]  dump_stack+0x90/0xe3
> [  146.889861]  print_address_description+0x6a/0x2a0
> [  146.895109]  kasan_report+0x176/0x2d0
> [  146.899193]  ? inet_frag_worker+0x9f/0x210
> [  146.903762]  inet_frag_worker+0x9f/0x210
> [  146.908142]  process_one_work+0x24f/0x6e0
> [  146.912614]  ? process_one_work+0x1a6/0x6e0
> [  146.917285]  worker_thread+0x4e/0x3d0
> [  146.921373]  kthread+0x106/0x140
> [  146.924970]  ? process_one_work+0x6e0/0x6e0
> [  146.929637]  ? kthread_bind+0x10/0x10
> [  146.933723]  ret_from_fork+0x3a/0x50
> [  146.937717] 
> [  146.939377] Allocated by task 177:
> [  146.943170]  kasan_kmalloc+0x86/0xb0
> [  146.947158]  __kmalloc_node+0x181/0x430
> [  146.951438]  kvmalloc_node+0x4f/0x70
> [  146.955427]  alloc_bucket_spinlocks+0x34/0xa0
> [  146.960286]  bucket_table_alloc.isra.13+0x61/0x180
> [  146.965630]  rhashtable_rehash_alloc+0x26/0x80
> [  146.970585]  rht_deferred_worker+0x394/0x810
> [  146.975348]  process_one_work+0x24f/0x6e0
> [  146.979819]  worker_thread+0x4e/0x3d0
> [  146.983902]  kthread+0x106/0x140
> [  146.987502]  ret_from_fork+0x3a/0x50
> [  146.991487] 
> [  146.993146] Freed by task 90:
> [  146.996455]  __kasan_slab_free+0x11d/0x180
> [  147.001025]  kfree+0x113/0x350
> [  147.004431]  bucket_table_free+0x1c/0x70
> [  147.008806]  rcu_process_callbacks+0x6c8/0x880
> [  147.013762]  __do_softirq+0xd5/0x505
> [  147.017747] 
> [  147.019407] The buggy address belongs to the object at ffff881b6b934200
> [  147.019407]  which belongs to the cache kmalloc-8192 of size 8192
> [  147.033574] The buggy address is located 216 bytes inside of
> [  147.033574]  8192-byte region [ffff881b6b934200, ffff881b6b936200)
> [  147.046773] The buggy address belongs to the page:
> [  147.052116] page:ffffea006dae4c00 count:1 mapcount:0 mapping:ffff880107c0e400 index:0x0 compound_mapcount: 0
> [  147.063086] flags: 0x17ffffc0008100(slab|head)
> [  147.068043] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880107c0e400
> [  147.076684] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
> [  147.085324] page dumped because: kasan: bad access detected
> [  147.091540] 
> [  147.093210] Memory state around the buggy address:
> [  147.098553]  ffff881b6b934180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [  147.106613]  ffff881b6b934200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  147.114670] >ffff881b6b934280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  147.122730]                                                     ^
> [  147.129527]  ffff881b6b934300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  147.137587]  ffff881b6b934380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  147.145646] ==================================================================
>
> Reverting the above change avoid the issue. 
>
> I *think* that reusing iter->p after a
> rcu_read_lock()/rcu_read_unlock() is unsafe:
> if the table has been reashed we can still and reuse 'p', but the
> related grace period could be already expired.

Thanks for the report.
It would be unsafe to dereference iter->p, but the code doesn't.
At least, it doesn't dereference it until it has searched through the
table and confirmed that the pointer is still in the table.

Could you please use scripts/faddr2line to identify exactly where the
error is occurring?
e.g
   ./scripts/faddr2line vmlinux.o inet_frag_worker+0x9f/0x210

(any .o which contains inet_frag_worker should work).
Thanks,
NeilBrown

>
> I can't think of any better fix than a revert. Other opinions welcome!
>
> Paolo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [4/4] rhashtable: improve rhashtable_walk stability when stop/start used.
  2018-07-05 11:20   ` [4/4] " Paolo Abeni
  2018-07-05 22:56     ` NeilBrown
@ 2018-07-07 22:11     ` NeilBrown
  1 sibling, 0 replies; 9+ messages in thread
From: NeilBrown @ 2018-07-07 22:11 UTC (permalink / raw)
  To: Paolo Abeni, Thomas Graf, Herbert Xu, David Miller; +Cc: netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]

On Thu, Jul 05 2018, Paolo Abeni wrote:

>
> While testing new code that uses the rhashtable walker, I'm obeserving
> an use after free, that is apparently caused by the above:
>
> [  146.834815] ==================================================================
> [  146.842933] BUG: KASAN: use-after-free in inet_frag_worker+0x9f/0x210
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Hi,
 did you get a chance to run ./scripts/faddr2line on this address and
find out where it is crashing?  I had a look in the code you posted and
couldn't see anything obvious.


> [  146.850120] Read of size 4 at addr ffff881b6b9342d8 by task kworker/13:1/177
> [  146.857984] 
> [  146.859645] CPU: 13 PID: 177 Comm: kworker/13:1 Not tainted 4.18.0-rc3.mirror_unclone_6_frag_dbg+ #1974
> [  146.870128] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
> [  146.878478] Workqueue: events inet_frag_worker
> [  146.883433] Call Trace:
> [  146.886162]  dump_stack+0x90/0xe3
> [  146.889861]  print_address_description+0x6a/0x2a0
> [  146.895109]  kasan_report+0x176/0x2d0
> [  146.899193]  ? inet_frag_worker+0x9f/0x210
> [  146.903762]  inet_frag_worker+0x9f/0x210
> [  146.908142]  process_one_work+0x24f/0x6e0
> [  146.912614]  ? process_one_work+0x1a6/0x6e0
> [  146.917285]  worker_thread+0x4e/0x3d0
> [  146.921373]  kthread+0x106/0x140
> [  146.924970]  ? process_one_work+0x6e0/0x6e0
> [  146.929637]  ? kthread_bind+0x10/0x10
> [  146.933723]  ret_from_fork+0x3a/0x50
> [  146.937717] 
> [  146.939377] Allocated by task 177:
> [  146.943170]  kasan_kmalloc+0x86/0xb0
> [  146.947158]  __kmalloc_node+0x181/0x430
> [  146.951438]  kvmalloc_node+0x4f/0x70
> [  146.955427]  alloc_bucket_spinlocks+0x34/0xa0

This seems to suggest that it is one of the bucket spinlocks that is
being incorrectly referenced, but they aren't referenced in the failing
code at all.  I cannot imagine how to interpret that.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-07-07 22:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-23 22:29 [PATCH 0/4] A few rhashtables cleanups NeilBrown
2018-04-23 22:29 ` [PATCH 4/4] rhashtable: improve rhashtable_walk stability when stop/start used NeilBrown
2018-07-05 11:20   ` [4/4] " Paolo Abeni
2018-07-05 22:56     ` NeilBrown
2018-07-07 22:11     ` NeilBrown
2018-04-23 22:29 ` [PATCH 3/4] rhashtable: reset iter when rhashtable_walk_start sees new table NeilBrown
2018-04-23 22:29 ` [PATCH 2/4] rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter() NeilBrown
2018-04-23 22:29 ` [PATCH 1/4] rhashtable: remove outdated comments about grow_decision etc NeilBrown
2018-04-24 17:38 ` [PATCH 0/4] A few rhashtables cleanups David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).