All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] rhashtable: Allow local locks to be used and tested
@ 2014-11-13 10:10 Herbert Xu
  2014-11-13 10:11 ` [PATCH 1/4] netlink: Move mutex_is_held under PROVE_LOCKING Herbert Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Herbert Xu @ 2014-11-13 10:10 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Graf

Hi:

This series moves mutex_is_held entirely under PROVE_LOCKING so
there is zero foot print when we're not debugging.  More importantly
it adds a parrent argument to mutex_is_held so that we can test
local locks rather than global ones (e.g., per-namespace locks).

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] netlink: Move mutex_is_held under PROVE_LOCKING
  2014-11-13 10:10 [PATCH 0/4] rhashtable: Allow local locks to be used and tested Herbert Xu
@ 2014-11-13 10:11 ` Herbert Xu
  2014-11-13 10:11 ` [PATCH 2/4] netfilter: " Herbert Xu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2014-11-13 10:11 UTC (permalink / raw)
  To: netdev, Thomas Graf

The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled.  This patch modifies netlink so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/netlink/af_netlink.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 580b794..53b8ea7 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -114,14 +114,14 @@ static atomic_t nl_table_users = ATOMIC_INIT(0);
 DEFINE_MUTEX(nl_sk_hash_lock);
 EXPORT_SYMBOL_GPL(nl_sk_hash_lock);
 
+#ifdef CONFIG_PROVE_LOCKING
 static int lockdep_nl_sk_hash_is_held(void)
 {
-#ifdef CONFIG_LOCKDEP
 	if (debug_locks)
 		return lockdep_is_held(&nl_sk_hash_lock) || lockdep_is_held(&nl_table_lock);
-#endif
 	return 1;
 }
+#endif
 
 static ATOMIC_NOTIFIER_HEAD(netlink_chain);
 
@@ -3133,7 +3133,9 @@ static int __init netlink_proto_init(void)
 		.max_shift = 16, /* 64K */
 		.grow_decision = rht_grow_above_75,
 		.shrink_decision = rht_shrink_below_30,
+#ifdef CONFIG_PROVE_LOCKING
 		.mutex_is_held = lockdep_nl_sk_hash_is_held,
+#endif
 	};
 
 	if (err != 0)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] netfilter: Move mutex_is_held under PROVE_LOCKING
  2014-11-13 10:10 [PATCH 0/4] rhashtable: Allow local locks to be used and tested Herbert Xu
  2014-11-13 10:11 ` [PATCH 1/4] netlink: Move mutex_is_held under PROVE_LOCKING Herbert Xu
@ 2014-11-13 10:11 ` Herbert Xu
  2014-11-13 10:11 ` [PATCH 3/4] rhashtable: " Herbert Xu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2014-11-13 10:11 UTC (permalink / raw)
  To: netdev, Thomas Graf

The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled.  This patch modifies netfilter so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/netfilter/nft_hash.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 8892b7b..b86305c 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -153,10 +153,12 @@ static unsigned int nft_hash_privsize(const struct nlattr * const nla[])
 	return sizeof(struct rhashtable);
 }
 
+#ifdef CONFIG_PROVE_LOCKING
 static int lockdep_nfnl_lock_is_held(void)
 {
 	return lockdep_nfnl_is_held(NFNL_SUBSYS_NFTABLES);
 }
+#endif
 
 static int nft_hash_init(const struct nft_set *set,
 			 const struct nft_set_desc *desc,
@@ -171,7 +173,9 @@ static int nft_hash_init(const struct nft_set *set,
 		.hashfn = jhash,
 		.grow_decision = rht_grow_above_75,
 		.shrink_decision = rht_shrink_below_30,
+#ifdef CONFIG_PROVE_LOCKING
 		.mutex_is_held = lockdep_nfnl_lock_is_held,
+#endif
 	};
 
 	return rhashtable_init(priv, &params);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] rhashtable: Move mutex_is_held under PROVE_LOCKING
  2014-11-13 10:10 [PATCH 0/4] rhashtable: Allow local locks to be used and tested Herbert Xu
  2014-11-13 10:11 ` [PATCH 1/4] netlink: Move mutex_is_held under PROVE_LOCKING Herbert Xu
  2014-11-13 10:11 ` [PATCH 2/4] netfilter: " Herbert Xu
@ 2014-11-13 10:11 ` Herbert Xu
  2014-11-13 10:11 ` [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held Herbert Xu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Herbert Xu @ 2014-11-13 10:11 UTC (permalink / raw)
  To: netdev, Thomas Graf

The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled.  This patch makes the mutex_is_held field in rhashtable
optional depending on PROVE_LOCKING.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |    2 ++
 lib/rhashtable.c           |    8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index fb298e9d..96ce8ce 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -65,7 +65,9 @@ struct rhashtable_params {
 						 size_t new_size);
 	bool			(*shrink_decision)(const struct rhashtable *ht,
 						   size_t new_size);
+#ifdef CONFIG_PROVE_LOCKING
 	int			(*mutex_is_held)(void);
+#endif
 };
 
 /**
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 624a0b7..548bb29 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -532,7 +532,9 @@ static size_t rounded_hashtable_size(struct rhashtable_params *params)
  *	.key_offset = offsetof(struct test_obj, key),
  *	.key_len = sizeof(int),
  *	.hashfn = arch_fast_hash,
+ * #ifdef CONFIG_PROVE_LOCKING
  *	.mutex_is_held = &my_mutex_is_held,
+ * #endif
  * };
  *
  * Configuration Example 2: Variable length keys
@@ -552,7 +554,9 @@ static size_t rounded_hashtable_size(struct rhashtable_params *params)
  *	.head_offset = offsetof(struct test_obj, node),
  *	.hashfn = arch_fast_hash,
  *	.obj_hashfn = my_hash_fn,
+ * #ifdef CONFIG_PROVE_LOCKING
  *	.mutex_is_held = &my_mutex_is_held,
+ * #endif
  * };
  */
 int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params)
@@ -613,10 +617,12 @@ EXPORT_SYMBOL_GPL(rhashtable_destroy);
 #define TEST_PTR	((void *) 0xdeadbeef)
 #define TEST_NEXPANDS	4
 
+#ifdef CONFIG_PROVE_LOCKING
 static int test_mutex_is_held(void)
 {
 	return 1;
 }
+#endif
 
 struct test_obj {
 	void			*ptr;
@@ -767,7 +773,9 @@ static int __init test_rht_init(void)
 		.key_offset = offsetof(struct test_obj, value),
 		.key_len = sizeof(int),
 		.hashfn = arch_fast_hash,
+#ifdef CONFIG_PROVE_LOCKING
 		.mutex_is_held = &test_mutex_is_held,
+#endif
 		.grow_decision = rht_grow_above_75,
 		.shrink_decision = rht_shrink_below_30,
 	};

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-13 10:10 [PATCH 0/4] rhashtable: Allow local locks to be used and tested Herbert Xu
                   ` (2 preceding siblings ...)
  2014-11-13 10:11 ` [PATCH 3/4] rhashtable: " Herbert Xu
@ 2014-11-13 10:11 ` Herbert Xu
  2014-11-13 10:37   ` Thomas Graf
  2014-11-13 10:41 ` [PATCH 0/4] rhashtable: Allow local locks to be used and tested Thomas Graf
  2014-11-13 20:13 ` David Miller
  5 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2014-11-13 10:11 UTC (permalink / raw)
  To: netdev, Thomas Graf

Currently mutex_is_held can only test locks in the that are global
since it takes no arguments.  This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.

This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |    3 ++-
 lib/rhashtable.c           |    4 ++--
 net/netfilter/nft_hash.c   |    2 +-
 net/netlink/af_netlink.c   |    2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 96ce8ce..473e26b 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -66,7 +66,8 @@ struct rhashtable_params {
 	bool			(*shrink_decision)(const struct rhashtable *ht,
 						   size_t new_size);
 #ifdef CONFIG_PROVE_LOCKING
-	int			(*mutex_is_held)(void);
+	int			(*mutex_is_held)(void *parent);
+	void			*parent;
 #endif
 };
 
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 548bb29..6acc4ce 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -32,7 +32,7 @@
 #ifdef CONFIG_PROVE_LOCKING
 int lockdep_rht_mutex_is_held(const struct rhashtable *ht)
 {
-	return ht->p.mutex_is_held();
+	return ht->p.mutex_is_held(ht->p.parent);
 }
 EXPORT_SYMBOL_GPL(lockdep_rht_mutex_is_held);
 #endif
@@ -618,7 +618,7 @@ EXPORT_SYMBOL_GPL(rhashtable_destroy);
 #define TEST_NEXPANDS	4
 
 #ifdef CONFIG_PROVE_LOCKING
-static int test_mutex_is_held(void)
+static int test_mutex_is_held(void *parent)
 {
 	return 1;
 }
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index b86305c..3f75aaa 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -154,7 +154,7 @@ static unsigned int nft_hash_privsize(const struct nlattr * const nla[])
 }
 
 #ifdef CONFIG_PROVE_LOCKING
-static int lockdep_nfnl_lock_is_held(void)
+static int lockdep_nfnl_lock_is_held(void *parent)
 {
 	return lockdep_nfnl_is_held(NFNL_SUBSYS_NFTABLES);
 }
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 53b8ea7..9e0628c 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -115,7 +115,7 @@ DEFINE_MUTEX(nl_sk_hash_lock);
 EXPORT_SYMBOL_GPL(nl_sk_hash_lock);
 
 #ifdef CONFIG_PROVE_LOCKING
-static int lockdep_nl_sk_hash_is_held(void)
+static int lockdep_nl_sk_hash_is_held(void *parent)
 {
 	if (debug_locks)
 		return lockdep_is_held(&nl_sk_hash_lock) || lockdep_is_held(&nl_table_lock);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-13 10:11 ` [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held Herbert Xu
@ 2014-11-13 10:37   ` Thomas Graf
  2014-11-13 10:38     ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Graf @ 2014-11-13 10:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

On 11/13/14 at 06:11pm, Herbert Xu wrote:
> Currently mutex_is_held can only test locks in the that are global
> since it takes no arguments.  This prevents rhashtable from being
> used in places where locks are lock, e.g., per-namespace locks.
> 
> This patch adds a parent field to mutex_is_held and rhashtable_params
> so that local locks can be used (and tested).
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Could you fix the documentation of rhashtable_init() as well?

[...]
 * struct rhashtable_params params = {
 *      .head_offset = offsetof(struct test_obj, node),
 *      .key_offset = offsetof(struct test_obj, key),
 *      .key_len = sizeof(int),
 *      .hashfn = arch_fast_hash,
 *      .mutex_is_held = &my_mutex_is_held,
 * };
[...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-13 10:37   ` Thomas Graf
@ 2014-11-13 10:38     ` Herbert Xu
  2014-11-13 10:41       ` Thomas Graf
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2014-11-13 10:38 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev

On Thu, Nov 13, 2014 at 10:37:23AM +0000, Thomas Graf wrote:
> On 11/13/14 at 06:11pm, Herbert Xu wrote:
> > Currently mutex_is_held can only test locks in the that are global
> > since it takes no arguments.  This prevents rhashtable from being
> > used in places where locks are lock, e.g., per-namespace locks.
> > 
> > This patch adds a parent field to mutex_is_held and rhashtable_params
> > so that local locks can be used (and tested).
> > 
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Could you fix the documentation of rhashtable_init() as well?
> 
> [...]
>  * struct rhashtable_params params = {
>  *      .head_offset = offsetof(struct test_obj, node),
>  *      .key_offset = offsetof(struct test_obj, key),
>  *      .key_len = sizeof(int),
>  *      .hashfn = arch_fast_hash,
>  *      .mutex_is_held = &my_mutex_is_held,
>  * };
> [...]

Sorry I missed that.  Will do.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-13 10:38     ` Herbert Xu
@ 2014-11-13 10:41       ` Thomas Graf
  2014-11-13 10:43         ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Graf @ 2014-11-13 10:41 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

On 11/13/14 at 06:38pm, Herbert Xu wrote:
> On Thu, Nov 13, 2014 at 10:37:23AM +0000, Thomas Graf wrote:
> > On 11/13/14 at 06:11pm, Herbert Xu wrote:
> > > Currently mutex_is_held can only test locks in the that are global
> > > since it takes no arguments.  This prevents rhashtable from being
> > > used in places where locks are lock, e.g., per-namespace locks.
> > > 
> > > This patch adds a parent field to mutex_is_held and rhashtable_params
> > > so that local locks can be used (and tested).
> > > 
> > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> > 
> > Could you fix the documentation of rhashtable_init() as well?
> > 
> > [...]
> >  * struct rhashtable_params params = {
> >  *      .head_offset = offsetof(struct test_obj, node),
> >  *      .key_offset = offsetof(struct test_obj, key),
> >  *      .key_len = sizeof(int),
> >  *      .hashfn = arch_fast_hash,
> >  *      .mutex_is_held = &my_mutex_is_held,
> >  * };
> > [...]
> 
> Sorry I missed that.  Will do.

Never mind. You did fix it. I looked at the wrong patch.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] rhashtable: Allow local locks to be used and tested
  2014-11-13 10:10 [PATCH 0/4] rhashtable: Allow local locks to be used and tested Herbert Xu
                   ` (3 preceding siblings ...)
  2014-11-13 10:11 ` [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held Herbert Xu
@ 2014-11-13 10:41 ` Thomas Graf
  2014-11-13 20:13 ` David Miller
  5 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2014-11-13 10:41 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

On 11/13/14 at 06:10pm, Herbert Xu wrote:
> Hi:
> 
> This series moves mutex_is_held entirely under PROVE_LOCKING so
> there is zero foot print when we're not debugging.  More importantly
> it adds a parrent argument to mutex_is_held so that we can test
> local locks rather than global ones (e.g., per-namespace locks).

LGTM

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-13 10:41       ` Thomas Graf
@ 2014-11-13 10:43         ` Herbert Xu
  2014-11-15  3:25           ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2014-11-13 10:43 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev

On Thu, Nov 13, 2014 at 10:41:24AM +0000, Thomas Graf wrote:
>
> Never mind. You did fix it. I looked at the wrong patch.

OK.  Now comes the fun part of shoehorning the xfrm_policy bydst
hash into rhashtable :)
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] rhashtable: Allow local locks to be used and tested
  2014-11-13 10:10 [PATCH 0/4] rhashtable: Allow local locks to be used and tested Herbert Xu
                   ` (4 preceding siblings ...)
  2014-11-13 10:41 ` [PATCH 0/4] rhashtable: Allow local locks to be used and tested Thomas Graf
@ 2014-11-13 20:13 ` David Miller
  5 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2014-11-13 20:13 UTC (permalink / raw)
  To: herbert; +Cc: netdev, tgraf

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 13 Nov 2014 18:10:25 +0800

> This series moves mutex_is_held entirely under PROVE_LOCKING so
> there is zero foot print when we're not debugging.  More importantly
> it adds a parrent argument to mutex_is_held so that we can test
> local locks rather than global ones (e.g., per-namespace locks).

Series applied, thanks Herbert.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-13 10:43         ` Herbert Xu
@ 2014-11-15  3:25           ` Herbert Xu
  2014-11-15 11:16             ` Thomas Graf
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2014-11-15  3:25 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev

On Thu, Nov 13, 2014 at 06:43:43PM +0800, Herbert Xu wrote:
> On Thu, Nov 13, 2014 at 10:41:24AM +0000, Thomas Graf wrote:
> >
> > Never mind. You did fix it. I looked at the wrong patch.
> 
> OK.  Now comes the fun part of shoehorning the xfrm_policy bydst
> hash into rhashtable :)

Thomas, it appears that rhashtable as it stands cannot handle
changing the random seed because of the way it constructs the
new hash table without degrading into a linked list.  Is there
something I'm missing?

FWIW my hashtable in net/bridge/br_multicast.c handles rehashing
correctly.  Any objections to me converting rhashtable to use my
scheme instead?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-15  3:25           ` Herbert Xu
@ 2014-11-15 11:16             ` Thomas Graf
  2014-11-15 11:23               ` Herbert Xu
  2014-11-15 19:12               ` Josh Triplett
  0 siblings, 2 replies; 21+ messages in thread
From: Thomas Graf @ 2014-11-15 11:16 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, eric.dumazet, paulmck, josh

On 11/15/14 at 11:25am, Herbert Xu wrote:
> On Thu, Nov 13, 2014 at 06:43:43PM +0800, Herbert Xu wrote:
> > On Thu, Nov 13, 2014 at 10:41:24AM +0000, Thomas Graf wrote:
> > >
> > > Never mind. You did fix it. I looked at the wrong patch.
> > 
> > OK.  Now comes the fun part of shoehorning the xfrm_policy bydst
> > hash into rhashtable :)
> 
> Thomas, it appears that rhashtable as it stands cannot handle
> changing the random seed because of the way it constructs the
> new hash table without degrading into a linked list.  Is there
> something I'm missing?
> 
> FWIW my hashtable in net/bridge/br_multicast.c handles rehashing
> correctly.  Any objections to me converting rhashtable to use my
> scheme instead?

Can you elaborate a bit?

The point of rhashtable is to not require two sets of linked list
pointers as done by MDB or OVS flow tables to work around the
increased cache footprint of that approach. The difference of the
two algos is dicussed in this paper [0].

The disadvantage of rhashtable is that, AFAIK, the hash function
cannot change while resizing as it would break the mutual linked
lists.

rhashtable will eventually be converted to per bucket locks with
deferred resizing to allow concurrent insertions and removal to
multiple buckets and to allow for mutations from atomic context
which is needed to use it for TCP hashtables. This work is in
progress:
http://lwn.net/Articles/611951/

Eric mentioned he is working on the rcu'ification of sockets which
is what is missing to make it work.

[0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-15 11:16             ` Thomas Graf
@ 2014-11-15 11:23               ` Herbert Xu
  2014-11-15 15:51                 ` Herbert Xu
  2014-11-15 19:12               ` Josh Triplett
  1 sibling, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2014-11-15 11:23 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev, eric.dumazet, paulmck, josh, David S. Miller

On Sat, Nov 15, 2014 at 11:16:26AM +0000, Thomas Graf wrote:
>
> > Thomas, it appears that rhashtable as it stands cannot handle
> > changing the random seed because of the way it constructs the
> > new hash table without degrading into a linked list.  Is there
> > something I'm missing?
> > 
> > FWIW my hashtable in net/bridge/br_multicast.c handles rehashing
> > correctly.  Any objections to me converting rhashtable to use my
> > scheme instead?
> 
> Can you elaborate a bit?
> 
> The point of rhashtable is to not require two sets of linked list
> pointers as done by MDB or OVS flow tables to work around the
> increased cache footprint of that approach. The difference of the
> two algos is dicussed in this paper [0].
> 
> The disadvantage of rhashtable is that, AFAIK, the hash function
> cannot change while resizing as it would break the mutual linked
> lists.

Well it doesn't break so much as degenerate into a linked list (I'm
talking about the concept rather than what the current code does).

This is I think a show-stopper because for anything that can be
influenced by remote parties we have to reseed and therefore
be able to cope with hashes changing on the fly.  Since most
hash tables in the network stack can be influenced by remote
entities (including the xfrm policy bydst hash that I am currently
working on), this is something rhashtable must be able to support
if it is to be used throughout the network stack.

Unless there is a better solution then I think keeping two lists
will have to do.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-15 11:23               ` Herbert Xu
@ 2014-11-15 15:51                 ` Herbert Xu
  2014-11-17  6:20                   ` Thomas Graf
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2014-11-15 15:51 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev, eric.dumazet, paulmck, josh, David S. Miller

On Sat, Nov 15, 2014 at 07:23:13PM +0800, Herbert Xu wrote:
> 
> This is I think a show-stopper because for anything that can be
> influenced by remote parties we have to reseed and therefore
> be able to cope with hashes changing on the fly.  Since most
> hash tables in the network stack can be influenced by remote
> entities (including the xfrm policy bydst hash that I am currently
> working on), this is something rhashtable must be able to support
> if it is to be used throughout the network stack.

So I noticed that you got rid of the rehash when you converted
netlink over to rthashtable.  Was this aspect of the conversion
discussed anywhere? In particular, how do you protect against
a malicious user that's trying to attack the netlink hash table?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-15 11:16             ` Thomas Graf
  2014-11-15 11:23               ` Herbert Xu
@ 2014-11-15 19:12               ` Josh Triplett
  2014-11-16  2:22                 ` Herbert Xu
  1 sibling, 1 reply; 21+ messages in thread
From: Josh Triplett @ 2014-11-15 19:12 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Herbert Xu, netdev, eric.dumazet, paulmck

On Sat, Nov 15, 2014 at 11:16:26AM +0000, Thomas Graf wrote:
> On 11/15/14 at 11:25am, Herbert Xu wrote:
> > On Thu, Nov 13, 2014 at 06:43:43PM +0800, Herbert Xu wrote:
> > > On Thu, Nov 13, 2014 at 10:41:24AM +0000, Thomas Graf wrote:
> > > >
> > > > Never mind. You did fix it. I looked at the wrong patch.
> > > 
> > > OK.  Now comes the fun part of shoehorning the xfrm_policy bydst
> > > hash into rhashtable :)
> > 
> > Thomas, it appears that rhashtable as it stands cannot handle
> > changing the random seed because of the way it constructs the
> > new hash table without degrading into a linked list.  Is there
> > something I'm missing?
> > 
> > FWIW my hashtable in net/bridge/br_multicast.c handles rehashing
> > correctly.  Any objections to me converting rhashtable to use my
> > scheme instead?
> 
> Can you elaborate a bit?
> 
> The point of rhashtable is to not require two sets of linked list
> pointers as done by MDB or OVS flow tables to work around the
> increased cache footprint of that approach. The difference of the
> two algos is dicussed in this paper [0].
> 
> The disadvantage of rhashtable is that, AFAIK, the hash function
> cannot change while resizing as it would break the mutual linked
> lists.

You can handle hash function changes with rhashtable without needing a
second set of linked-list pointers, actually.  You'd have to add ~1
unlikely() conditional to the reader common case, and you'd make readers
*during* a hash algorithm change (which I'd hope happens as rarely as a
resize) somewhat less efficient, but you wouldn't use any more memory
than a resize currently does, and you wouldn't use the memory of extra
linked-list pointers in the common case.

Rather than the current approach of switching out the hash table pointer
and having readers only search the new table (which will have
valid-but-crosslinked buckets during a resize), instead keep two hash
table pointers (each with their own hash parameters) and a single
toggleable old/new indicator.  Readers check both hash table pointers,
and if valid, check both tables, first old then new (order important to
make sure they don't miss a node).  Then, the rehashing algorithm can
incrementally move nodes from the old buckets to the new buckets,
*without* disrupting concurrent readers.

Rough rehashing algorithm sketch:

- Set up the new empty table with the new set of hash parameters.
- synchronize_rcu().  Readers will now search both old and new tables.
- Peel nodes off the ends of the old hash table and add them to the new
  hash table (similar to the unzip step in the resize algorithm).
  Because readers search old-then-new, and writers make each node appear
  in the new table before making it disappear from the old, you don't
  need a synchronize_rcu() here, just an smp_wmb() after linking the
  node into the new table and before unlinking the node from the old
  table.  Also, because you remove nodes from the *ends* of old-table
  buckets, you don't have to worry about a reader's linked-list walk
  getting dragged over to the new table and missing nodes from the old
  one.
- Once all nodes have moved to the new table, mark the pointer to the
  old table as NULL, synchronize_rcu(), and free the old table.
- Toggle the old/new flag.

Ordering principles in this algorithm:
- You want readers to see the changes made by the rehasher in order.
- If the reader reads location A then B, and the rehasher writes
  location B then A, the rehasher just needs an smp_wmb() in between.
- If the reader reads location A then B, and the rehasher writes
  location A then B, the rehasher needs a synchronize_rcu() in between.


- Josh Triplett

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-15 19:12               ` Josh Triplett
@ 2014-11-16  2:22                 ` Herbert Xu
  2014-11-16  2:37                   ` Josh Triplett
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2014-11-16  2:22 UTC (permalink / raw)
  To: Josh Triplett; +Cc: tgraf, netdev, eric.dumazet, paulmck

Josh Triplett <josh@joshtriplett.org> wrote:
>
> - Set up the new empty table with the new set of hash parameters.
> - synchronize_rcu().  Readers will now search both old and new tables.
> - Peel nodes off the ends of the old hash table and add them to the new

We currently use a singly linked list in rhashtable.  Peeling nodes
off the end would mean upgrading to a doubly linked list, which is
no different than keeping two lists in terms of cache footprint, no?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-16  2:22                 ` Herbert Xu
@ 2014-11-16  2:37                   ` Josh Triplett
  2014-11-17  4:46                     ` Herbert Xu
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Triplett @ 2014-11-16  2:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: tgraf, netdev, eric.dumazet, paulmck

On November 15, 2014 6:22:27 PM PST, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>Josh Triplett <josh@joshtriplett.org> wrote:
>>
>> - Set up the new empty table with the new set of hash parameters.
>> - synchronize_rcu().  Readers will now search both old and new
>tables.
>> - Peel nodes off the ends of the old hash table and add them to the
>new
>
>We currently use a singly linked list in rhashtable.  Peeling nodes
>off the end would mean upgrading to a doubly linked list, which is
>no different than keeping two lists in terms of cache footprint, no?

No, since each pass just handles one set of nodes from each bucket anyway, you can just do a bit more work in the rehasher instead.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-16  2:37                   ` Josh Triplett
@ 2014-11-17  4:46                     ` Herbert Xu
  2014-11-17  6:16                       ` Thomas Graf
  0 siblings, 1 reply; 21+ messages in thread
From: Herbert Xu @ 2014-11-17  4:46 UTC (permalink / raw)
  To: Josh Triplett; +Cc: tgraf, netdev, eric.dumazet, paulmck

On Sat, Nov 15, 2014 at 06:37:26PM -0800, Josh Triplett wrote:
> On November 15, 2014 6:22:27 PM PST, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >Josh Triplett <josh@joshtriplett.org> wrote:
> >>
> >> - Set up the new empty table with the new set of hash parameters.
> >> - synchronize_rcu().  Readers will now search both old and new
> >tables.
> >> - Peel nodes off the ends of the old hash table and add them to the
> >new
> >
> >We currently use a singly linked list in rhashtable.  Peeling nodes
> >off the end would mean upgrading to a doubly linked list, which is
> >no different than keeping two lists in terms of cache footprint, no?
> 
> No, since each pass just handles one set of nodes from each bucket anyway, you can just do a bit more work in the rehasher instead.

OK let me see if I could implement something like that in rhashtable.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-17  4:46                     ` Herbert Xu
@ 2014-11-17  6:16                       ` Thomas Graf
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2014-11-17  6:16 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Josh Triplett, netdev, eric.dumazet, paulmck

On 11/17/14 at 12:46pm, Herbert Xu wrote:
> On Sat, Nov 15, 2014 at 06:37:26PM -0800, Josh Triplett wrote:
> > On November 15, 2014 6:22:27 PM PST, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > >Josh Triplett <josh@joshtriplett.org> wrote:
> > >>
> > >> - Set up the new empty table with the new set of hash parameters.
> > >> - synchronize_rcu().  Readers will now search both old and new
> > >tables.
> > >> - Peel nodes off the ends of the old hash table and add them to the
> > >new
> > >
> > >We currently use a singly linked list in rhashtable.  Peeling nodes
> > >off the end would mean upgrading to a doubly linked list, which is
> > >no different than keeping two lists in terms of cache footprint, no?
> > 
> > No, since each pass just handles one set of nodes from each bucket anyway, you can just do a bit more work in the rehasher instead.
> 
> OK let me see if I could implement something like that in rhashtable.

This sounds great. Thanks Herbert and Josh!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held
  2014-11-15 15:51                 ` Herbert Xu
@ 2014-11-17  6:20                   ` Thomas Graf
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2014-11-17  6:20 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, eric.dumazet, paulmck, josh, David S. Miller

On 11/15/14 at 11:51pm, Herbert Xu wrote:
> So I noticed that you got rid of the rehash when you converted
> netlink over to rthashtable.  Was this aspect of the conversion
> discussed anywhere? In particular, how do you protect against
> a malicious user that's trying to attack the netlink hash table?

I looked at the trade off between lockless lookups and rehashing and
figured it makes sense to chose making lookups cheaper as sockets can
be constrained per user. However, given the followup-up discussion it
seems we can get both.

Thomas

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-11-17  6:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-13 10:10 [PATCH 0/4] rhashtable: Allow local locks to be used and tested Herbert Xu
2014-11-13 10:11 ` [PATCH 1/4] netlink: Move mutex_is_held under PROVE_LOCKING Herbert Xu
2014-11-13 10:11 ` [PATCH 2/4] netfilter: " Herbert Xu
2014-11-13 10:11 ` [PATCH 3/4] rhashtable: " Herbert Xu
2014-11-13 10:11 ` [PATCH 4/4] rhashtable: Add parent argument to mutex_is_held Herbert Xu
2014-11-13 10:37   ` Thomas Graf
2014-11-13 10:38     ` Herbert Xu
2014-11-13 10:41       ` Thomas Graf
2014-11-13 10:43         ` Herbert Xu
2014-11-15  3:25           ` Herbert Xu
2014-11-15 11:16             ` Thomas Graf
2014-11-15 11:23               ` Herbert Xu
2014-11-15 15:51                 ` Herbert Xu
2014-11-17  6:20                   ` Thomas Graf
2014-11-15 19:12               ` Josh Triplett
2014-11-16  2:22                 ` Herbert Xu
2014-11-16  2:37                   ` Josh Triplett
2014-11-17  4:46                     ` Herbert Xu
2014-11-17  6:16                       ` Thomas Graf
2014-11-13 10:41 ` [PATCH 0/4] rhashtable: Allow local locks to be used and tested Thomas Graf
2014-11-13 20:13 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.