All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] nf pull request for net
@ 2014-09-23  9:24 Pablo Neira Ayuso
  2014-09-23  9:24 ` [PATCH 1/5] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-23  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

This series contains netfilter fixes for net, they are:

1) Fix lockdep splat in nft_hash when releasing sets from the
   rcu_callback context. We don't the mutex there anymore.

2) Remove unnecessary spinlock_bh in the destroy path of the nf_tables
   rbtree set type from rcu_callback context.

3) Fix another lockdep splat in rhashtable. None of the callers hold
   a mutex when calling rhashtable_destroy.

4) Fix duplicated error reporting from nfnetlink when aborting and
   replaying a batch.

5) Fix a Kconfig issue reported by kbuild robot.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit bec6bfb2437f4676dbaaacba6019e9dafef18962:

  amd-xgbe: Fix initialization of the wrong spin lock (2014-09-02 14:03:37 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to 679ab4ddbdfab8af39104e63819db71f428aefd9:

  netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup' (2014-09-07 17:25:16 +0200)

----------------------------------------------------------------
Pablo Neira Ayuso (5):
      netfilter: nft_hash: no need for rcu in the hash set destroy path
      netfilter: nft_rbtree: no need for spinlock from set destroy path
      rhashtable: fix lockdep splat in rhashtable_destroy()
      netfilter: nfnetlink: deliver netlink errors on batch completion
      netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup'

 lib/rhashtable.c           |    8 +++---
 net/netfilter/Kconfig      |    1 +
 net/netfilter/nfnetlink.c  |   64 +++++++++++++++++++++++++++++++++++++++++++-
 net/netfilter/nft_hash.c   |   12 +++++----
 net/netfilter/nft_rbtree.c |    2 --
 5 files changed, 75 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] netfilter: nft_hash: no need for rcu in the hash set destroy path
  2014-09-23  9:24 [PATCH 0/5] nf pull request for net Pablo Neira Ayuso
@ 2014-09-23  9:24 ` Pablo Neira Ayuso
  2014-09-23  9:24 ` [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-23  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

The sets are released from the rcu callback, after the rule is removed
from the chain list, which implies that nfnetlink cannot update the
hashes (thus, no resizing may occur) and no packets are walking on the
set anymore.

This resolves a lockdep splat in the nft_hash_destroy() path since the
nfnl mutex is not held there.

===============================
[ INFO: suspicious RCU usage. ]
3.16.0-rc2+ #168 Not tainted
-------------------------------
net/netfilter/nft_hash.c:362 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
1 lock held by ksoftirqd/0/3:
 #0:  (rcu_callback){......}, at: [<ffffffff81096393>] rcu_process_callbacks+0x27e/0x4c7

stack backtrace:
CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.16.0-rc2+ #168
Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
 0000000000000001 ffff88011769bb98 ffffffff8142c922 0000000000000006
 ffff880117694090 ffff88011769bbc8 ffffffff8107c3ff ffff8800cba52400
 ffff8800c476bea8 ffff8800c476bea8 ffff8800cba52400 ffff88011769bc08
Call Trace:
 [<ffffffff8142c922>] dump_stack+0x4e/0x68
 [<ffffffff8107c3ff>] lockdep_rcu_suspicious+0xfa/0x103
 [<ffffffffa079931e>] nft_hash_destroy+0x50/0x137 [nft_hash]
 [<ffffffffa078cd57>] nft_set_destroy+0x11/0x2a [nf_tables]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
---
 net/netfilter/nft_hash.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 28fb8f3..8892b7b 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -180,15 +180,17 @@ static int nft_hash_init(const struct nft_set *set,
 static void nft_hash_destroy(const struct nft_set *set)
 {
 	const struct rhashtable *priv = nft_set_priv(set);
-	const struct bucket_table *tbl;
+	const struct bucket_table *tbl = priv->tbl;
 	struct nft_hash_elem *he, *next;
 	unsigned int i;
 
-	tbl = rht_dereference(priv->tbl, priv);
-	for (i = 0; i < tbl->size; i++)
-		rht_for_each_entry_safe(he, next, tbl->buckets[i], priv, node)
+	for (i = 0; i < tbl->size; i++) {
+		for (he = rht_entry(tbl->buckets[i], struct nft_hash_elem, node);
+		     he != NULL; he = next) {
+			next = rht_entry(he->node.next, struct nft_hash_elem, node);
 			nft_hash_elem_destroy(set, he);
-
+		}
+	}
 	rhashtable_destroy(priv);
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from set destroy path
  2014-09-23  9:24 [PATCH 0/5] nf pull request for net Pablo Neira Ayuso
  2014-09-23  9:24 ` [PATCH 1/5] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
@ 2014-09-23  9:24 ` Pablo Neira Ayuso
  2014-09-23  9:52   ` Eric Dumazet
  2014-09-23  9:24 ` [PATCH 3/5] rhashtable: fix lockdep splat in rhashtable_destroy() Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-23  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

The sets are released from the rcu callback, after the rule is removed
from the chain list, which implies that nfnetlink cannot update the
rbtree and no packets are walking on the set anymore. Thus, we can get
rid of the spinlock in the set destroy path there.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewied-by: Thomas Graf <tgraf@suug.ch>
---
 net/netfilter/nft_rbtree.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c
index e1836ff..46214f2 100644
--- a/net/netfilter/nft_rbtree.c
+++ b/net/netfilter/nft_rbtree.c
@@ -234,13 +234,11 @@ static void nft_rbtree_destroy(const struct nft_set *set)
 	struct nft_rbtree_elem *rbe;
 	struct rb_node *node;
 
-	spin_lock_bh(&nft_rbtree_lock);
 	while ((node = priv->root.rb_node) != NULL) {
 		rb_erase(node, &priv->root);
 		rbe = rb_entry(node, struct nft_rbtree_elem, node);
 		nft_rbtree_elem_destroy(set, rbe);
 	}
-	spin_unlock_bh(&nft_rbtree_lock);
 }
 
 static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] rhashtable: fix lockdep splat in rhashtable_destroy()
  2014-09-23  9:24 [PATCH 0/5] nf pull request for net Pablo Neira Ayuso
  2014-09-23  9:24 ` [PATCH 1/5] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
  2014-09-23  9:24 ` [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
@ 2014-09-23  9:24 ` Pablo Neira Ayuso
  2014-09-23  9:24 ` [PATCH 4/5] netfilter: nfnetlink: deliver netlink errors on batch completion Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-23  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

No need for rht_dereference() from rhashtable_destroy() since the
existing callers don't hold the mutex when invoking this function
from:

1) Netlink, this is called in case of memory allocation errors in the
   initialization path, no nl_sk_hash_lock is held.
2) Netfilter, this is called from the rcu callback, no nfnl_lock is
   held either.

I think it's reasonable to assume that the caller has to make sure
that no hash resizing may happen before releasing the bucket array.
Therefore, the caller should be responsible for releasing this in a
safe way, document this to make people aware of it.

This resolves a rcu lockdep splat in nft_hash:

===============================
[ INFO: suspicious RCU usage. ]
3.16.0+ #178 Not tainted
-------------------------------
lib/rhashtable.c:596 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
1 lock held by ksoftirqd/2/18:
 #0:  (rcu_callback){......}, at: [<ffffffff810918fd>] rcu_process_callbacks+0x27e/0x4c7

stack backtrace:
CPU: 2 PID: 18 Comm: ksoftirqd/2 Not tainted 3.16.0+ #178
Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
 0000000000000001 ffff88011706bb68 ffffffff8143debc 0000000000000000
 ffff880117062610 ffff88011706bb98 ffffffff81077515 ffff8800ca041a50
 0000000000000004 ffff8800ca386480 ffff8800ca041a00 ffff88011706bbb8
Call Trace:
 [<ffffffff8143debc>] dump_stack+0x4e/0x68
 [<ffffffff81077515>] lockdep_rcu_suspicious+0xfa/0x103
 [<ffffffff81228b1b>] rhashtable_destroy+0x46/0x52
 [<ffffffffa06f21a7>] nft_hash_destroy+0x73/0x82 [nft_hash]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
---
 lib/rhashtable.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index a2c7881..fc0dd8e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -589,13 +589,13 @@ EXPORT_SYMBOL_GPL(rhashtable_init);
  * rhashtable_destroy - destroy hash table
  * @ht:		the hash table to destroy
  *
- * Frees the bucket array.
+ * Frees the bucket array. This function is not rcu safe, therefore the caller
+ * has to make sure that no resizing may happen by unpublishing the hashtable
+ * and waiting for the quiescent cycle before releasing the bucket array.
  */
 void rhashtable_destroy(const struct rhashtable *ht)
 {
-	const struct bucket_table *tbl = rht_dereference(ht->tbl, ht);
-
-	bucket_table_free(tbl);
+	bucket_table_free(ht->tbl);
 }
 EXPORT_SYMBOL_GPL(rhashtable_destroy);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] netfilter: nfnetlink: deliver netlink errors on batch completion
  2014-09-23  9:24 [PATCH 0/5] nf pull request for net Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2014-09-23  9:24 ` [PATCH 3/5] rhashtable: fix lockdep splat in rhashtable_destroy() Pablo Neira Ayuso
@ 2014-09-23  9:24 ` Pablo Neira Ayuso
  2014-09-23  9:24 ` [PATCH 5/5] netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup' Pablo Neira Ayuso
  2014-09-26 20:21 ` [PATCH 0/5] nf pull request for net David Miller
  5 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-23  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

We have to wait until the full batch has been processed to deliver the
netlink error messages to userspace. Otherwise, we may deliver
duplicated errors to userspace in case that we need to abort and replay
the transaction if any of the required modules needs to be autoloaded.

A simple way to reproduce this (assumming nft_meta is not loaded) with
the following test file:

 add table filter
 add chain filter test
 add chain bad test                 # intentional wrong unexistent table
 add rule filter test meta mark 0

Then, when trying to load the batch:

 # nft -f test
 test:4:1-19: Error: Could not process rule: No such file or directory
 add chain bad test
 ^^^^^^^^^^^^^^^^^^^
 test:4:1-19: Error: Could not process rule: No such file or directory
 add chain bad test
 ^^^^^^^^^^^^^^^^^^^

The error is reported twice, once when the batch is aborted due to
missing nft_meta and another when it is fully processed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink.c |   64 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index c138b8f..f37f071 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -222,6 +222,51 @@ replay:
 	}
 }
 
+struct nfnl_err {
+	struct list_head	head;
+	struct nlmsghdr		*nlh;
+	int			err;
+};
+
+static int nfnl_err_add(struct list_head *list, struct nlmsghdr *nlh, int err)
+{
+	struct nfnl_err *nfnl_err;
+
+	nfnl_err = kmalloc(sizeof(struct nfnl_err), GFP_KERNEL);
+	if (nfnl_err == NULL)
+		return -ENOMEM;
+
+	nfnl_err->nlh = nlh;
+	nfnl_err->err = err;
+	list_add_tail(&nfnl_err->head, list);
+
+	return 0;
+}
+
+static void nfnl_err_del(struct nfnl_err *nfnl_err)
+{
+	list_del(&nfnl_err->head);
+	kfree(nfnl_err);
+}
+
+static void nfnl_err_reset(struct list_head *err_list)
+{
+	struct nfnl_err *nfnl_err, *next;
+
+	list_for_each_entry_safe(nfnl_err, next, err_list, head)
+		nfnl_err_del(nfnl_err);
+}
+
+static void nfnl_err_deliver(struct list_head *err_list, struct sk_buff *skb)
+{
+	struct nfnl_err *nfnl_err, *next;
+
+	list_for_each_entry_safe(nfnl_err, next, err_list, head) {
+		netlink_ack(skb, nfnl_err->nlh, nfnl_err->err);
+		nfnl_err_del(nfnl_err);
+	}
+}
+
 static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
 				u_int16_t subsys_id)
 {
@@ -230,6 +275,7 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
 	const struct nfnetlink_subsystem *ss;
 	const struct nfnl_callback *nc;
 	bool success = true, done = false;
+	static LIST_HEAD(err_list);
 	int err;
 
 	if (subsys_id >= NFNL_SUBSYS_COUNT)
@@ -287,6 +333,7 @@ replay:
 		type = nlh->nlmsg_type;
 		if (type == NFNL_MSG_BATCH_BEGIN) {
 			/* Malformed: Batch begin twice */
+			nfnl_err_reset(&err_list);
 			success = false;
 			goto done;
 		} else if (type == NFNL_MSG_BATCH_END) {
@@ -333,6 +380,7 @@ replay:
 			 * original skb.
 			 */
 			if (err == -EAGAIN) {
+				nfnl_err_reset(&err_list);
 				ss->abort(skb);
 				nfnl_unlock(subsys_id);
 				kfree_skb(nskb);
@@ -341,11 +389,24 @@ replay:
 		}
 ack:
 		if (nlh->nlmsg_flags & NLM_F_ACK || err) {
+			/* Errors are delivered once the full batch has been
+			 * processed, this avoids that the same error is
+			 * reported several times when replaying the batch.
+			 */
+			if (nfnl_err_add(&err_list, nlh, err) < 0) {
+				/* We failed to enqueue an error, reset the
+				 * list of errors and send OOM to userspace
+				 * pointing to the batch header.
+				 */
+				nfnl_err_reset(&err_list);
+				netlink_ack(skb, nlmsg_hdr(oskb), -ENOMEM);
+				success = false;
+				goto done;
+			}
 			/* We don't stop processing the batch on errors, thus,
 			 * userspace gets all the errors that the batch
 			 * triggers.
 			 */
-			netlink_ack(skb, nlh, err);
 			if (err)
 				success = false;
 		}
@@ -361,6 +422,7 @@ done:
 	else
 		ss->abort(skb);
 
+	nfnl_err_deliver(&err_list, oskb);
 	nfnl_unlock(subsys_id);
 	kfree_skb(nskb);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup'
  2014-09-23  9:24 [PATCH 0/5] nf pull request for net Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2014-09-23  9:24 ` [PATCH 4/5] netfilter: nfnetlink: deliver netlink errors on batch completion Pablo Neira Ayuso
@ 2014-09-23  9:24 ` Pablo Neira Ayuso
  2014-09-26 20:21 ` [PATCH 0/5] nf pull request for net David Miller
  5 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-23  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

CONFIG_IPV6=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=y

   net/built-in.o: In function `nf_tproxy_get_sock_v6.constprop.11':
>> xt_TPROXY.c:(.text+0x583a1): undefined reference to `udp6_lib_lookup'
   net/built-in.o: In function `tproxy_tg_init':
>> xt_TPROXY.c:(.init.text+0x1dc3): undefined reference to `nf_defrag_ipv6_enable'

This fix is similar to 1a5bbfc ("netfilter: Fix build errors with
xt_socket.c").

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 4bef6eb..831f960 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -839,6 +839,7 @@ config NETFILTER_XT_TARGET_TPROXY
 	tristate '"TPROXY" target transparent proxying support'
 	depends on NETFILTER_XTABLES
 	depends on NETFILTER_ADVANCED
+	depends on (IPV6 || IPV6=n)
 	depends on IP_NF_MANGLE
 	select NF_DEFRAG_IPV4
 	select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from set destroy path
  2014-09-23  9:24 ` [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
@ 2014-09-23  9:52   ` Eric Dumazet
  2014-09-23 11:01     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2014-09-23  9:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev

On Tue, 2014-09-23 at 11:24 +0200, Pablo Neira Ayuso wrote:
> The sets are released from the rcu callback, after the rule is removed
> from the chain list, which implies that nfnetlink cannot update the
> rbtree and no packets are walking on the set anymore. Thus, we can get
> rid of the spinlock in the set destroy path there.
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Reviewied-by: Thomas Graf <tgraf@suug.ch>
> ---
>  net/netfilter/nft_rbtree.c |    2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c
> index e1836ff..46214f2 100644
> --- a/net/netfilter/nft_rbtree.c
> +++ b/net/netfilter/nft_rbtree.c
> @@ -234,13 +234,11 @@ static void nft_rbtree_destroy(const struct nft_set *set)
>  	struct nft_rbtree_elem *rbe;
>  	struct rb_node *node;
>  
> -	spin_lock_bh(&nft_rbtree_lock);
>  	while ((node = priv->root.rb_node) != NULL) {
>  		rb_erase(node, &priv->root);
>  		rbe = rb_entry(node, struct nft_rbtree_elem, node);
>  		nft_rbtree_elem_destroy(set, rbe);
>  	}
> -	spin_unlock_bh(&nft_rbtree_lock);
>  }
>  
>  static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,

BTW, do you know if destroying an rbtree is faster this way, or using
rb_first() ?

Most cases I see in the kernel use a rb_first instead of taking the
root.

Examples : (its not an exhaustive list)

net/netfilter/xt_connlimit.c:402
net/sched/sch_netem.c:380
net/sched/sch_fq.c:519
drivers/infiniband/hw/mlx4/cm.c:439
drivers/iommu/iova.c:324
drivers/md/dm-thin.c:1491
drivers/mtd/mtdswap.c:625
drivers/mtd/ubi/attach.c:636

This might be better for large trees, to get better cache locality,
but I have no experimental data.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from set destroy path
  2014-09-23  9:52   ` Eric Dumazet
@ 2014-09-23 11:01     ` Pablo Neira Ayuso
  2014-09-23 11:54       ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-23 11:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netfilter-devel, davem, netdev

On Tue, Sep 23, 2014 at 02:52:37AM -0700, Eric Dumazet wrote:
> On Tue, 2014-09-23 at 11:24 +0200, Pablo Neira Ayuso wrote:
> > The sets are released from the rcu callback, after the rule is removed
> > from the chain list, which implies that nfnetlink cannot update the
> > rbtree and no packets are walking on the set anymore. Thus, we can get
> > rid of the spinlock in the set destroy path there.
> > 
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > Reviewied-by: Thomas Graf <tgraf@suug.ch>
> > ---
> >  net/netfilter/nft_rbtree.c |    2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c
> > index e1836ff..46214f2 100644
> > --- a/net/netfilter/nft_rbtree.c
> > +++ b/net/netfilter/nft_rbtree.c
> > @@ -234,13 +234,11 @@ static void nft_rbtree_destroy(const struct nft_set *set)
> >  	struct nft_rbtree_elem *rbe;
> >  	struct rb_node *node;
> >  
> > -	spin_lock_bh(&nft_rbtree_lock);
> >  	while ((node = priv->root.rb_node) != NULL) {
> >  		rb_erase(node, &priv->root);
> >  		rbe = rb_entry(node, struct nft_rbtree_elem, node);
> >  		nft_rbtree_elem_destroy(set, rbe);
> >  	}
> > -	spin_unlock_bh(&nft_rbtree_lock);
> >  }
> >  
> >  static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
> 
> BTW, do you know if destroying an rbtree is faster this way, or using
> rb_first() ?
> 
> Most cases I see in the kernel use a rb_first instead of taking the
> root.
> 
> Examples : (its not an exhaustive list)
> 
> net/netfilter/xt_connlimit.c:402
> net/sched/sch_netem.c:380
> net/sched/sch_fq.c:519
> drivers/infiniband/hw/mlx4/cm.c:439
> drivers/iommu/iova.c:324
> drivers/md/dm-thin.c:1491
> drivers/mtd/mtdswap.c:625
> drivers/mtd/ubi/attach.c:636
> 
> This might be better for large trees, to get better cache locality,
> but I have no experimental data.

I'll send a follow up patch for nf-next to use rb_first() in that
patch. Thanks Eric.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from set destroy path
  2014-09-23 11:01     ` Pablo Neira Ayuso
@ 2014-09-23 11:54       ` Eric Dumazet
  2014-09-23 16:10         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2014-09-23 11:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev

On Tue, 2014-09-23 at 13:01 +0200, Pablo Neira Ayuso wrote:

> I'll send a follow up patch for nf-next to use rb_first() in that
> patch. Thanks Eric.

I did a test, and its indeed a bit faster to use rb_first(), by about 5%

Real win is to be able to build a chain using rb_first()/rb_next(),
(leaving the tree as is), then deleting the items in the chain, and
simply reset rb_root.

This only needs to reuse one pointer to store the item->next pointer.

This is then about ~50% faster, because we do not constantly rebalance
tree for every removed item.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from set destroy path
  2014-09-23 11:54       ` Eric Dumazet
@ 2014-09-23 16:10         ` Pablo Neira Ayuso
  2014-09-23 16:29           ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-23 16:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netfilter-devel, davem, netdev

On Tue, Sep 23, 2014 at 04:54:05AM -0700, Eric Dumazet wrote:
> On Tue, 2014-09-23 at 13:01 +0200, Pablo Neira Ayuso wrote:
> 
> > I'll send a follow up patch for nf-next to use rb_first() in that
> > patch. Thanks Eric.
> 
> I did a test, and its indeed a bit faster to use rb_first(), by about 5%
> 
> Real win is to be able to build a chain using rb_first()/rb_next(),
> (leaving the tree as is), then deleting the items in the chain, and
> simply reset rb_root.
> 
> This only needs to reuse one pointer to store the item->next pointer.
> 
> This is then about ~50% faster, because we do not constantly rebalance
> tree for every removed item.

Indeed.

struct nft_rbtree_elem {
        struct rb_node          node;
        u16                     flags;
        struct nft_data         key;
        struct nft_data         data[];
};

Actually, I could add to nft_data a pointer in the union area, but I'm
not very confortable with adding it for this specific case. At this
moment we're releasing this from rcu_callback which is "hiding" the
deletion time from the netlink interface.

But I'll keep this back in my head if we later on have some pointer
candidate to be reused in a nice way.

I'll send a patch to make the rb_first()/rb_next() conversion though.

Thanks for your comments!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from set destroy path
  2014-09-23 16:10         ` Pablo Neira Ayuso
@ 2014-09-23 16:29           ` Eric Dumazet
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2014-09-23 16:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev

On Tue, 2014-09-23 at 18:10 +0200, Pablo Neira Ayuso wrote:

> Actually, I could add to nft_data a pointer in the union area, but I'm
> not very confortable with adding it for this specific case. At this
> moment we're releasing this from rcu_callback which is "hiding" the
> deletion time from the netlink interface.
> 
> But I'll keep this back in my head if we later on have some pointer
> candidate to be reused in a nice way.
> 
> I'll send a patch to make the rb_first()/rb_next() conversion though.

Ah, forget what I said, its actually slower to build a list by 7%, I had
an error in my test.

So the rb_first() thing is the easy and fast way.

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] nf pull request for net
  2014-09-23  9:24 [PATCH 0/5] nf pull request for net Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2014-09-23  9:24 ` [PATCH 5/5] netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup' Pablo Neira Ayuso
@ 2014-09-26 20:21 ` David Miller
  5 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2014-09-26 20:21 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 23 Sep 2014 11:24:43 +0200

> This series contains netfilter fixes for net, they are:
> 
> 1) Fix lockdep splat in nft_hash when releasing sets from the
>    rcu_callback context. We don't the mutex there anymore.
> 
> 2) Remove unnecessary spinlock_bh in the destroy path of the nf_tables
>    rbtree set type from rcu_callback context.
> 
> 3) Fix another lockdep splat in rhashtable. None of the callers hold
>    a mutex when calling rhashtable_destroy.
> 
> 4) Fix duplicated error reporting from nfnetlink when aborting and
>    replaying a batch.
> 
> 5) Fix a Kconfig issue reported by kbuild robot.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-26 20:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23  9:24 [PATCH 0/5] nf pull request for net Pablo Neira Ayuso
2014-09-23  9:24 ` [PATCH 1/5] netfilter: nft_hash: no need for rcu in the hash set destroy path Pablo Neira Ayuso
2014-09-23  9:24 ` [PATCH 2/5] netfilter: nft_rbtree: no need for spinlock from " Pablo Neira Ayuso
2014-09-23  9:52   ` Eric Dumazet
2014-09-23 11:01     ` Pablo Neira Ayuso
2014-09-23 11:54       ` Eric Dumazet
2014-09-23 16:10         ` Pablo Neira Ayuso
2014-09-23 16:29           ` Eric Dumazet
2014-09-23  9:24 ` [PATCH 3/5] rhashtable: fix lockdep splat in rhashtable_destroy() Pablo Neira Ayuso
2014-09-23  9:24 ` [PATCH 4/5] netfilter: nfnetlink: deliver netlink errors on batch completion Pablo Neira Ayuso
2014-09-23  9:24 ` [PATCH 5/5] netfilter: xt_TPROXY: undefined reference to `udp6_lib_lookup' Pablo Neira Ayuso
2014-09-26 20:21 ` [PATCH 0/5] nf pull request for net David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.