All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
@ 2014-09-02  9:38 Pablo Neira Ayuso
  2014-09-02  9:38 ` [PATCH 2/3] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-02  9:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber, tgraf

The sets are released from the rcu callback, after the rule is removed
from the chain list, which implies that nfnetlink cannot update the
hashes (thus, no resizing may occur) and no packets are walking on the
set anymore.

This resolves a lockdep splat in the nft_hash_destroy() path since the
nfnl mutex is not held there.

===============================
[ INFO: suspicious RCU usage. ]
3.16.0-rc2+ #168 Not tainted
-------------------------------
net/netfilter/nft_hash.c:362 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
1 lock held by ksoftirqd/0/3:
 #0:  (rcu_callback){......}, at: [<ffffffff81096393>] rcu_process_callbacks+0x27e/0x4c7

stack backtrace:
CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.16.0-rc2+ #168
Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
 0000000000000001 ffff88011769bb98 ffffffff8142c922 0000000000000006
 ffff880117694090 ffff88011769bbc8 ffffffff8107c3ff ffff8800cba52400
 ffff8800c476bea8 ffff8800c476bea8 ffff8800cba52400 ffff88011769bc08
Call Trace:
 [<ffffffff8142c922>] dump_stack+0x4e/0x68
 [<ffffffff8107c3ff>] lockdep_rcu_suspicious+0xfa/0x103
 [<ffffffffa079931e>] nft_hash_destroy+0x50/0x137 [nft_hash]
 [<ffffffffa078cd57>] nft_set_destroy+0x11/0x2a [nf_tables]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_hash.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 28fb8f3..8892b7b 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -180,15 +180,17 @@ static int nft_hash_init(const struct nft_set *set,
 static void nft_hash_destroy(const struct nft_set *set)
 {
 	const struct rhashtable *priv = nft_set_priv(set);
-	const struct bucket_table *tbl;
+	const struct bucket_table *tbl = priv->tbl;
 	struct nft_hash_elem *he, *next;
 	unsigned int i;
 
-	tbl = rht_dereference(priv->tbl, priv);
-	for (i = 0; i < tbl->size; i++)
-		rht_for_each_entry_safe(he, next, tbl->buckets[i], priv, node)
+	for (i = 0; i < tbl->size; i++) {
+		for (he = rht_entry(tbl->buckets[i], struct nft_hash_elem, node);
+		     he != NULL; he = next) {
+			next = rht_entry(he->node.next, struct nft_hash_elem, node);
 			nft_hash_elem_destroy(set, he);
-
+		}
+	}
 	rhashtable_destroy(priv);
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] netfilter: nft_rbtree: no need for spinlock from set destroy path
  2014-09-02  9:38 [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
@ 2014-09-02  9:38 ` Pablo Neira Ayuso
  2014-09-02 11:11   ` Thomas Graf
  2014-09-02  9:38 ` [PATCH 3/3] rhashtable: fix lockdep splat in rhashtable_destroy() Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-02  9:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber, tgraf

The sets are released from the rcu callback, after the rule is removed
from the chain list, which implies that nfnetlink cannot update the
rbtree and no packets are walking on the set anymore. Thus, we can get
rid of the spinlock in the set destroy path there.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_rbtree.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c
index e1836ff..46214f2 100644
--- a/net/netfilter/nft_rbtree.c
+++ b/net/netfilter/nft_rbtree.c
@@ -234,13 +234,11 @@ static void nft_rbtree_destroy(const struct nft_set *set)
 	struct nft_rbtree_elem *rbe;
 	struct rb_node *node;
 
-	spin_lock_bh(&nft_rbtree_lock);
 	while ((node = priv->root.rb_node) != NULL) {
 		rb_erase(node, &priv->root);
 		rbe = rb_entry(node, struct nft_rbtree_elem, node);
 		nft_rbtree_elem_destroy(set, rbe);
 	}
-	spin_unlock_bh(&nft_rbtree_lock);
 }
 
 static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] rhashtable: fix lockdep splat in rhashtable_destroy()
  2014-09-02  9:38 [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
  2014-09-02  9:38 ` [PATCH 2/3] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
@ 2014-09-02  9:38 ` Pablo Neira Ayuso
  2014-09-02 11:02   ` Thomas Graf
  2014-09-02 10:14 ` [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Patrick McHardy
  2014-09-02 11:02 ` Thomas Graf
  3 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-02  9:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber, tgraf

No need for rht_dereference() from rhashtable_destroy() since the
existing callers don't hold the mutex when invoking this function
from:

1) Netlink, this is called in case of memory allocation errors in the
   initialization path, no nl_sk_hash_lock is held.
2) Netfilter, this is called from the rcu callback, no nfnl_lock is
   held either.

I think it's reasonable to assume that the caller has to make sure
that no hash resizing may happen before releasing the bucket array.
Therefore, the caller should be responsible for releasing this in a
safe way, document this to make people aware of it.

This resolves a rcu lockdep splat in nft_hash:

===============================
[ INFO: suspicious RCU usage. ]
3.16.0+ #178 Not tainted
-------------------------------
lib/rhashtable.c:596 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
1 lock held by ksoftirqd/2/18:
 #0:  (rcu_callback){......}, at: [<ffffffff810918fd>] rcu_process_callbacks+0x27e/0x4c7

stack backtrace:
CPU: 2 PID: 18 Comm: ksoftirqd/2 Not tainted 3.16.0+ #178
Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
 0000000000000001 ffff88011706bb68 ffffffff8143debc 0000000000000000
 ffff880117062610 ffff88011706bb98 ffffffff81077515 ffff8800ca041a50
 0000000000000004 ffff8800ca386480 ffff8800ca041a00 ffff88011706bbb8
Call Trace:
 [<ffffffff8143debc>] dump_stack+0x4e/0x68
 [<ffffffff81077515>] lockdep_rcu_suspicious+0xfa/0x103
 [<ffffffff81228b1b>] rhashtable_destroy+0x46/0x52
 [<ffffffffa06f21a7>] nft_hash_destroy+0x73/0x82 [nft_hash]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 lib/rhashtable.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index a2c7881..fc0dd8e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -589,13 +589,13 @@ EXPORT_SYMBOL_GPL(rhashtable_init);
  * rhashtable_destroy - destroy hash table
  * @ht:		the hash table to destroy
  *
- * Frees the bucket array.
+ * Frees the bucket array. This function is not rcu safe, therefore the caller
+ * has to make sure that no resizing may happen by unpublishing the hashtable
+ * and waiting for the quiescent cycle before releasing the bucket array.
  */
 void rhashtable_destroy(const struct rhashtable *ht)
 {
-	const struct bucket_table *tbl = rht_dereference(ht->tbl, ht);
-
-	bucket_table_free(tbl);
+	bucket_table_free(ht->tbl);
 }
 EXPORT_SYMBOL_GPL(rhashtable_destroy);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02  9:38 [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
  2014-09-02  9:38 ` [PATCH 2/3] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
  2014-09-02  9:38 ` [PATCH 3/3] rhashtable: fix lockdep splat in rhashtable_destroy() Pablo Neira Ayuso
@ 2014-09-02 10:14 ` Patrick McHardy
  2014-09-02 10:38   ` Pablo Neira Ayuso
  2014-09-02 11:02 ` Thomas Graf
  3 siblings, 1 reply; 14+ messages in thread
From: Patrick McHardy @ 2014-09-02 10:14 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, tgraf

On Tue, Sep 02, 2014 at 11:38:39AM +0200, Pablo Neira Ayuso wrote:
> The sets are released from the rcu callback, after the rule is removed
> from the chain list, which implies that nfnetlink cannot update the
> hashes (thus, no resizing may occur) and no packets are walking on the
> set anymore.

Unrelated to your patch, but to the RCU destruction: how does that make
sure that nfnetlink notifications are received in the proper order?
I mean, theoretically a new set with the same name could exist at that
time. The same problem exists for all objects that have user defined
identifiers or refer to them.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02 10:14 ` [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Patrick McHardy
@ 2014-09-02 10:38   ` Pablo Neira Ayuso
  2014-09-02 10:59     ` Patrick McHardy
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-02 10:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, tgraf

On Tue, Sep 02, 2014 at 11:14:41AM +0100, Patrick McHardy wrote:
> On Tue, Sep 02, 2014 at 11:38:39AM +0200, Pablo Neira Ayuso wrote:
> > The sets are released from the rcu callback, after the rule is removed
> > from the chain list, which implies that nfnetlink cannot update the
> > hashes (thus, no resizing may occur) and no packets are walking on the
> > set anymore.
> 
> Unrelated to your patch, but to the RCU destruction: how does that make
> sure that nfnetlink notifications are received in the proper order?
> I mean, theoretically a new set with the same name could exist at that
> time. The same problem exists for all objects that have user defined
> identifiers or refer to them.

All the events (with the exception of anonymous sets) are sent in
order from the commit path, so they are delivered in order.

The anonymous sets are problematic, we need to notify this from the
commit path too to ensure the right ordering. I was trying to avoid
some specific notify() interface in expr->ops but it seems we need it
for nft_lookup.c.

Can you think of a better solution?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02 10:38   ` Pablo Neira Ayuso
@ 2014-09-02 10:59     ` Patrick McHardy
  2014-09-02 12:22       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 14+ messages in thread
From: Patrick McHardy @ 2014-09-02 10:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, tgraf

On Tue, Sep 02, 2014 at 12:38:56PM +0200, Pablo Neira Ayuso wrote:
> On Tue, Sep 02, 2014 at 11:14:41AM +0100, Patrick McHardy wrote:
> > On Tue, Sep 02, 2014 at 11:38:39AM +0200, Pablo Neira Ayuso wrote:
> > > The sets are released from the rcu callback, after the rule is removed
> > > from the chain list, which implies that nfnetlink cannot update the
> > > hashes (thus, no resizing may occur) and no packets are walking on the
> > > set anymore.
> > 
> > Unrelated to your patch, but to the RCU destruction: how does that make
> > sure that nfnetlink notifications are received in the proper order?
> > I mean, theoretically a new set with the same name could exist at that
> > time. The same problem exists for all objects that have user defined
> > identifiers or refer to them.
> 
> All the events (with the exception of anonymous sets) are sent in
> order from the commit path, so they are delivered in order.

Sure, I was talking about independant additions:

- delete set X
- RCU callback delayed
- add set X, notify
- RCU callback executes, notifies for delete set X

Same thing applies to all other objects that don't have a unique identifier
chosen by the kernel.

> The anonymous sets are problematic, we need to notify this from the
> commit path too to ensure the right ordering. I was trying to avoid
> some specific notify() interface in expr->ops but it seems we need it
> for nft_lookup.c.
> 
> Can you think of a better solution?

No, unless we can come up with a way that's synchronous.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02  9:38 [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2014-09-02 10:14 ` [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Patrick McHardy
@ 2014-09-02 11:02 ` Thomas Graf
  3 siblings, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2014-09-02 11:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber

On 09/02/14 at 11:38am, Pablo Neira Ayuso wrote:
> The sets are released from the rcu callback, after the rule is removed
> from the chain list, which implies that nfnetlink cannot update the
> hashes (thus, no resizing may occur) and no packets are walking on the
> set anymore.
> 
> This resolves a lockdep splat in the nft_hash_destroy() path since the
> nfnl mutex is not held there.
> 
> ===============================
> [ INFO: suspicious RCU usage. ]
> 3.16.0-rc2+ #168 Not tainted
> -------------------------------
> net/netfilter/nft_hash.c:362 suspicious rcu_dereference_protected() usage!
> 
> other info that might help us debug this:
> 
> rcu_scheduler_active = 1, debug_locks = 1
> 1 lock held by ksoftirqd/0/3:
>  #0:  (rcu_callback){......}, at: [<ffffffff81096393>] rcu_process_callbacks+0x27e/0x4c7
> 
> stack backtrace:
> CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.16.0-rc2+ #168
> Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
>  0000000000000001 ffff88011769bb98 ffffffff8142c922 0000000000000006
>  ffff880117694090 ffff88011769bbc8 ffffffff8107c3ff ffff8800cba52400
>  ffff8800c476bea8 ffff8800c476bea8 ffff8800cba52400 ffff88011769bc08
> Call Trace:
>  [<ffffffff8142c922>] dump_stack+0x4e/0x68
>  [<ffffffff8107c3ff>] lockdep_rcu_suspicious+0xfa/0x103
>  [<ffffffffa079931e>] nft_hash_destroy+0x50/0x137 [nft_hash]
>  [<ffffffffa078cd57>] nft_set_destroy+0x11/0x2a [nf_tables]
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rhashtable: fix lockdep splat in rhashtable_destroy()
  2014-09-02  9:38 ` [PATCH 3/3] rhashtable: fix lockdep splat in rhashtable_destroy() Pablo Neira Ayuso
@ 2014-09-02 11:02   ` Thomas Graf
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2014-09-02 11:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber

On 09/02/14 at 11:38am, Pablo Neira Ayuso wrote:
> No need for rht_dereference() from rhashtable_destroy() since the
> existing callers don't hold the mutex when invoking this function
> from:
> 
> 1) Netlink, this is called in case of memory allocation errors in the
>    initialization path, no nl_sk_hash_lock is held.
> 2) Netfilter, this is called from the rcu callback, no nfnl_lock is
>    held either.
> 
> I think it's reasonable to assume that the caller has to make sure
> that no hash resizing may happen before releasing the bucket array.
> Therefore, the caller should be responsible for releasing this in a
> safe way, document this to make people aware of it.
> 
> This resolves a rcu lockdep splat in nft_hash:
> 
> ===============================
> [ INFO: suspicious RCU usage. ]
> 3.16.0+ #178 Not tainted
> -------------------------------
> lib/rhashtable.c:596 suspicious rcu_dereference_protected() usage!
> 
> other info that might help us debug this:
> 
> rcu_scheduler_active = 1, debug_locks = 1
> 1 lock held by ksoftirqd/2/18:
>  #0:  (rcu_callback){......}, at: [<ffffffff810918fd>] rcu_process_callbacks+0x27e/0x4c7
> 
> stack backtrace:
> CPU: 2 PID: 18 Comm: ksoftirqd/2 Not tainted 3.16.0+ #178
> Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
>  0000000000000001 ffff88011706bb68 ffffffff8143debc 0000000000000000
>  ffff880117062610 ffff88011706bb98 ffffffff81077515 ffff8800ca041a50
>  0000000000000004 ffff8800ca386480 ffff8800ca041a00 ffff88011706bbb8
> Call Trace:
>  [<ffffffff8143debc>] dump_stack+0x4e/0x68
>  [<ffffffff81077515>] lockdep_rcu_suspicious+0xfa/0x103
>  [<ffffffff81228b1b>] rhashtable_destroy+0x46/0x52
>  [<ffffffffa06f21a7>] nft_hash_destroy+0x73/0x82 [nft_hash]
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] netfilter: nft_rbtree: no need for spinlock from set destroy path
  2014-09-02  9:38 ` [PATCH 2/3] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
@ 2014-09-02 11:11   ` Thomas Graf
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2014-09-02 11:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber

On 09/02/14 at 11:38am, Pablo Neira Ayuso wrote:
> The sets are released from the rcu callback, after the rule is removed
> from the chain list, which implies that nfnetlink cannot update the
> rbtree and no packets are walking on the set anymore. Thus, we can get
> rid of the spinlock in the set destroy path there.
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Reviewied-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02 10:59     ` Patrick McHardy
@ 2014-09-02 12:22       ` Pablo Neira Ayuso
  2014-09-02 12:36         ` Thomas Graf
  2014-09-02 13:28         ` Patrick McHardy
  0 siblings, 2 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-02 12:22 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, tgraf

On Tue, Sep 02, 2014 at 11:59:22AM +0100, Patrick McHardy wrote:
> On Tue, Sep 02, 2014 at 12:38:56PM +0200, Pablo Neira Ayuso wrote:
> > On Tue, Sep 02, 2014 at 11:14:41AM +0100, Patrick McHardy wrote:
> > > On Tue, Sep 02, 2014 at 11:38:39AM +0200, Pablo Neira Ayuso wrote:
> > > > The sets are released from the rcu callback, after the rule is removed
> > > > from the chain list, which implies that nfnetlink cannot update the
> > > > hashes (thus, no resizing may occur) and no packets are walking on the
> > > > set anymore.
> > > 
> > > Unrelated to your patch, but to the RCU destruction: how does that make
> > > sure that nfnetlink notifications are received in the proper order?
> > > I mean, theoretically a new set with the same name could exist at that
> > > time. The same problem exists for all objects that have user defined
> > > identifiers or refer to them.
> > 
> > All the events (with the exception of anonymous sets) are sent in
> > order from the commit path, so they are delivered in order.
> 
> Sure, I was talking about independant additions:
> 
> - delete set X
> - RCU callback delayed
> - add set X, notify
> - RCU callback executes, notifies for delete set X

Right, that's indeed a problem for bound-to-rule anonymous sets.

> Same thing applies to all other objects that don't have a unique identifier
> chosen by the kernel.

All other objects are always notified in order from the commit path,
so they seem fine to me.

> > The anonymous sets are problematic, we need to notify this from the
> > commit path too to ensure the right ordering. I was trying to avoid
> > some specific notify() interface in expr->ops but it seems we need it
> > for nft_lookup.c.
> > 
> > Can you think of a better solution?
> 
> No, unless we can come up with a way that's synchronous.

I would really like not to go back to the two nearly consecutive
synchronize_rcu() calls, it's slow.  I've been thinking on replacing
the current check in the packet path by static keys, but I didn't
manage to find the way yet.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02 12:22       ` Pablo Neira Ayuso
@ 2014-09-02 12:36         ` Thomas Graf
  2014-09-02 13:28         ` Patrick McHardy
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Graf @ 2014-09-02 12:36 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Patrick McHardy, netfilter-devel

On 09/02/14 at 02:22pm, Pablo Neira Ayuso wrote:
> On Tue, Sep 02, 2014 at 11:59:22AM +0100, Patrick McHardy wrote:
> > > The anonymous sets are problematic, we need to notify this from the
> > > commit path too to ensure the right ordering. I was trying to avoid
> > > some specific notify() interface in expr->ops but it seems we need it
> > > for nft_lookup.c.
> > > 
> > > Can you think of a better solution?
> > 
> > No, unless we can come up with a way that's synchronous.
> 
> I would really like not to go back to the two nearly consecutive
> synchronize_rcu() calls, it's slow.  I've been thinking on replacing
> the current check in the packet path by static keys, but I didn't
> manage to find the way yet.

Slight warning on additional complexity down the road: I'm about to
propose a rhashtable enhancement to move expansion/shrinking to a
async worker thread and protect insertion/removal by an array of
spinlocks. It will allow for much faster parallel inclusion/removal
but will require a sync on destruction to cancel/wait for any
outstanding/ongoing maintenance on the hash table.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02 12:22       ` Pablo Neira Ayuso
  2014-09-02 12:36         ` Thomas Graf
@ 2014-09-02 13:28         ` Patrick McHardy
  2014-09-02 13:41           ` Pablo Neira Ayuso
  1 sibling, 1 reply; 14+ messages in thread
From: Patrick McHardy @ 2014-09-02 13:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, tgraf

On Tue, Sep 02, 2014 at 02:22:07PM +0200, Pablo Neira Ayuso wrote:
> On Tue, Sep 02, 2014 at 11:59:22AM +0100, Patrick McHardy wrote:
> > On Tue, Sep 02, 2014 at 12:38:56PM +0200, Pablo Neira Ayuso wrote:
> > > On Tue, Sep 02, 2014 at 11:14:41AM +0100, Patrick McHardy wrote:
> > > > On Tue, Sep 02, 2014 at 11:38:39AM +0200, Pablo Neira Ayuso wrote:
> > > > > The sets are released from the rcu callback, after the rule is removed
> > > > > from the chain list, which implies that nfnetlink cannot update the
> > > > > hashes (thus, no resizing may occur) and no packets are walking on the
> > > > > set anymore.
> > > > 
> > > > Unrelated to your patch, but to the RCU destruction: how does that make
> > > > sure that nfnetlink notifications are received in the proper order?
> > > > I mean, theoretically a new set with the same name could exist at that
> > > > time. The same problem exists for all objects that have user defined
> > > > identifiers or refer to them.
> > > 
> > > All the events (with the exception of anonymous sets) are sent in
> > > order from the commit path, so they are delivered in order.
> > 
> > Sure, I was talking about independant additions:
> > 
> > - delete set X
> > - RCU callback delayed
> > - add set X, notify
> > - RCU callback executes, notifies for delete set X
> 
> Right, that's indeed a problem for bound-to-rule anonymous sets.
> 
> > Same thing applies to all other objects that don't have a unique identifier
> > chosen by the kernel.
> 
> All other objects are always notified in order from the commit path,
> so they seem fine to me.

You're right, this only seems to affect sets.

> > > The anonymous sets are problematic, we need to notify this from the
> > > commit path too to ensure the right ordering. I was trying to avoid
> > > some specific notify() interface in expr->ops but it seems we need it
> > > for nft_lookup.c.
> > > 
> > > Can you think of a better solution?
> > 
> > No, unless we can come up with a way that's synchronous.
> 
> I would really like not to go back to the two nearly consecutive
> synchronize_rcu() calls, it's slow.  I've been thinking on replacing
> the current check in the packet path by static keys, but I didn't
> manage to find the way yet.

Which check exactly are you referring to?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02 13:28         ` Patrick McHardy
@ 2014-09-02 13:41           ` Pablo Neira Ayuso
  2014-09-02 14:00             ` Patrick McHardy
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-02 13:41 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, tgraf

On Tue, Sep 02, 2014 at 02:28:14PM +0100, Patrick McHardy wrote:
> > > Same thing applies to all other objects that don't have a unique identifier
> > > chosen by the kernel.
> >
> > All other objects are always notified in order from the commit path,
> > so they seem fine to me.
>
> You're right, this only seems to affect sets.

Yes, but only the sets that are released through
nf_tables_unbind_set().

> > > > The anonymous sets are problematic, we need to notify this from the
> > > > commit path too to ensure the right ordering. I was trying to avoid
> > > > some specific notify() interface in expr->ops but it seems we need it
> > > > for nft_lookup.c.
> > > >
> > > > Can you think of a better solution?
> > >
> > > No, unless we can come up with a way that's synchronous.
> >
> > I would really like not to go back to the two nearly consecutive
> > synchronize_rcu() calls, it's slow.  I've been thinking on replacing
> > the current check in the packet path by static keys, but I didn't
> > manage to find the way yet.
>
> Which check exactly are you referring to?

>From the nft_do_chain:

        list_for_each_entry_continue_rcu(rule, &chain->rules, list) {

                /* This rule is not active, skip. */
                if (unlikely(rule->genmask & (1 << gencursor)))
                        continue;

Plus the trick that uses the synchronize_rcu() from the commit path to
make sure all packets have left the previous generation to deactivate
the rule before we start removing them from the list.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-02 13:41           ` Pablo Neira Ayuso
@ 2014-09-02 14:00             ` Patrick McHardy
  0 siblings, 0 replies; 14+ messages in thread
From: Patrick McHardy @ 2014-09-02 14:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, tgraf

On Tue, Sep 02, 2014 at 03:41:52PM +0200, Pablo Neira Ayuso wrote:
> On Tue, Sep 02, 2014 at 02:28:14PM +0100, Patrick McHardy wrote:
> > > > Same thing applies to all other objects that don't have a unique identifier
> > > > chosen by the kernel.
> > >
> > > All other objects are always notified in order from the commit path,
> > > so they seem fine to me.
> >
> > You're right, this only seems to affect sets.
> 
> Yes, but only the sets that are released through
> nf_tables_unbind_set().
> 
> > > > > The anonymous sets are problematic, we need to notify this from the
> > > > > commit path too to ensure the right ordering. I was trying to avoid
> > > > > some specific notify() interface in expr->ops but it seems we need it
> > > > > for nft_lookup.c.
> > > > >
> > > > > Can you think of a better solution?
> > > >
> > > > No, unless we can come up with a way that's synchronous.
> > >
> > > I would really like not to go back to the two nearly consecutive
> > > synchronize_rcu() calls, it's slow.  I've been thinking on replacing
> > > the current check in the packet path by static keys, but I didn't
> > > manage to find the way yet.
> >
> > Which check exactly are you referring to?
> 
> >From the nft_do_chain:
> 
>         list_for_each_entry_continue_rcu(rule, &chain->rules, list) {
> 
>                 /* This rule is not active, skip. */
>                 if (unlikely(rule->genmask & (1 << gencursor)))
>                         continue;
> 
> Plus the trick that uses the synchronize_rcu() from the commit path to
> make sure all packets have left the previous generation to deactivate
> the rule before we start removing them from the list.

This won't work. Static keys can't be dynamically allocated, which would
be necessary for this to work. This about it, the code that needs to be
patches is global, but the activeness is a property of a rule.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-09-02 14:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-02  9:38 [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
2014-09-02  9:38 ` [PATCH 2/3] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
2014-09-02 11:11   ` Thomas Graf
2014-09-02  9:38 ` [PATCH 3/3] rhashtable: fix lockdep splat in rhashtable_destroy() Pablo Neira Ayuso
2014-09-02 11:02   ` Thomas Graf
2014-09-02 10:14 ` [PATCH 1/3] netfilter: nft_hash: no need for rcu in the hash set destroy path Patrick McHardy
2014-09-02 10:38   ` Pablo Neira Ayuso
2014-09-02 10:59     ` Patrick McHardy
2014-09-02 12:22       ` Pablo Neira Ayuso
2014-09-02 12:36         ` Thomas Graf
2014-09-02 13:28         ` Patrick McHardy
2014-09-02 13:41           ` Pablo Neira Ayuso
2014-09-02 14:00             ` Patrick McHardy
2014-09-02 11:02 ` Thomas Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.