All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch net v3 0/3] net_sched: fix filter chain reference counting
@ 2017-09-11 23:33 Cong Wang
  2017-09-11 23:33 ` [Patch net v3 1/3] net_sched: get rid of tcfa_rcu Cong Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Cong Wang @ 2017-09-11 23:33 UTC (permalink / raw)
  To: netdev; +Cc: jiri, jakub.kicinski, jhs, Cong Wang

This patchset fixes tc filter chain reference counting and nasty race
conditions with RCU callbacks. Please see each patch for details.

---

v3: Rebase on the latest -net
    Add code comment in patch 1
    Improve comment and changelog for patch 2
    Add patch 3

v2: Add patch 1
    Get rid of more ugly code in patch 2

Cong Wang (3):
  net_sched: get rid of tcfa_rcu
  net_sched: fix reference counting of tc filter chain
  net_sched: carefully handle tcf_block_put()

 include/net/act_api.h |  2 --
 net/sched/act_api.c   | 17 +++++++-------
 net/sched/cls_api.c   | 63 ++++++++++++++++++++++++++++++++-------------------
 3 files changed, 48 insertions(+), 34 deletions(-)

-- 
2.13.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
  2017-09-11 23:33 [Patch net v3 0/3] net_sched: fix filter chain reference counting Cong Wang
@ 2017-09-11 23:33 ` Cong Wang
  2017-09-12  9:42   ` Jiri Pirko
  2017-09-11 23:33 ` [Patch net v3 2/3] net_sched: fix reference counting of tc filter chain Cong Wang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2017-09-11 23:33 UTC (permalink / raw)
  To: netdev; +Cc: jiri, jakub.kicinski, jhs, Cong Wang, Eric Dumazet

gen estimator has been rewritten in commit 1c0d32fde5bd
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller is no longer needed to wait for a grace period.
So this patch gets rid of it.

This also completely closes a race condition between action free
path and filter chain add/remove path for the following patch.
Because otherwise the nested RCU callback can't be caught by
rcu_barrier().

Please see also the comments in code.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/act_api.h |  2 --
 net/sched/act_api.c   | 17 ++++++++---------
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8f3d5d8b5ae0..b944e0eb93be 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -34,7 +34,6 @@ struct tc_action {
 	struct gnet_stats_queue		tcfa_qstats;
 	struct net_rate_estimator __rcu *tcfa_rate_est;
 	spinlock_t			tcfa_lock;
-	struct rcu_head			tcfa_rcu;
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	struct gnet_stats_queue __percpu *cpu_qstats;
 	struct tc_cookie	*act_cookie;
@@ -50,7 +49,6 @@ struct tc_action {
 #define tcf_qstats	common.tcfa_qstats
 #define tcf_rate_est	common.tcfa_rate_est
 #define tcf_lock	common.tcfa_lock
-#define tcf_rcu		common.tcfa_rcu
 
 /* Update lastuse only if needed, to avoid dirtying a cache line.
  * We use a temp variable to avoid fetching jiffies twice.
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index a306974e2fb4..fcd7dc7b807a 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -53,10 +53,13 @@ static void tcf_action_goto_chain_exec(const struct tc_action *a,
 	res->goto_tp = rcu_dereference_bh(chain->filter_chain);
 }
 
-static void free_tcf(struct rcu_head *head)
+/* XXX: For standalone actions, we don't need a RCU grace period either, because
+ * actions are always connected to filters and filters are already destroyed in
+ * RCU callbacks, so after a RCU grace period actions are already disconnected
+ * from filters. Readers later can not find us.
+ */
+static void free_tcf(struct tc_action *p)
 {
-	struct tc_action *p = container_of(head, struct tc_action, tcfa_rcu);
-
 	free_percpu(p->cpu_bstats);
 	free_percpu(p->cpu_qstats);
 
@@ -76,11 +79,7 @@ static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
 	idr_remove_ext(&idrinfo->action_idr, p->tcfa_index);
 	spin_unlock_bh(&idrinfo->lock);
 	gen_kill_estimator(&p->tcfa_rate_est);
-	/*
-	 * gen_estimator est_timer() might access p->tcfa_lock
-	 * or bstats, wait a RCU grace period before freeing p
-	 */
-	call_rcu(&p->tcfa_rcu, free_tcf);
+	free_tcf(p);
 }
 
 int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
@@ -259,7 +258,7 @@ void tcf_idr_cleanup(struct tc_action *a, struct nlattr *est)
 {
 	if (est)
 		gen_kill_estimator(&a->tcfa_rate_est);
-	call_rcu(&a->tcfa_rcu, free_tcf);
+	free_tcf(a);
 }
 EXPORT_SYMBOL(tcf_idr_cleanup);
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Patch net v3 2/3] net_sched: fix reference counting of tc filter chain
  2017-09-11 23:33 [Patch net v3 0/3] net_sched: fix filter chain reference counting Cong Wang
  2017-09-11 23:33 ` [Patch net v3 1/3] net_sched: get rid of tcfa_rcu Cong Wang
@ 2017-09-11 23:33 ` Cong Wang
  2017-09-12 10:43   ` Jiri Pirko
  2017-09-11 23:33 ` [Patch net v3 3/3] net_sched: carefully handle tcf_block_put() Cong Wang
  2017-09-13  3:41 ` [Patch net v3 0/3] net_sched: fix filter chain reference counting David Miller
  3 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2017-09-11 23:33 UTC (permalink / raw)
  To: netdev; +Cc: jiri, jakub.kicinski, jhs, Cong Wang

This patch fixes the following ugliness of tc filter chain refcnt:

a) tp proto should hold a refcnt to the chain too. This significantly
   simplifies the logic.

b) Chain 0 is no longer special, it is created with refcnt=1 like any
   other chains. All the ugliness in tcf_chain_put() can be gone!

c) No need to handle the flushing oddly, because block still holds
   chain 0, it can not be released, this guarantees block is the last
   user.

d) The race condition with RCU callbacks is easier to handle with just
   a rcu_barrier(). Much easier to understand, nothing to hide. Thanks
   to the previous patch. Please see also the comments in code.

e) Make the code understandable by humans, much less error-prone.

Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_api.c | 47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c743f03cfebd..d29e79d98a69 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -182,7 +182,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 	list_add_tail(&chain->list, &block->chain_list);
 	chain->block = block;
 	chain->index = chain_index;
-	chain->refcnt = 0;
+	chain->refcnt = 1;
 	return chain;
 }
 
@@ -194,21 +194,20 @@ static void tcf_chain_flush(struct tcf_chain *chain)
 		RCU_INIT_POINTER(*chain->p_filter_chain, NULL);
 	while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) {
 		RCU_INIT_POINTER(chain->filter_chain, tp->next);
+		tcf_chain_put(chain);
 		tcf_proto_destroy(tp);
 	}
 }
 
 static void tcf_chain_destroy(struct tcf_chain *chain)
 {
-	/* May be already removed from the list by the previous call. */
-	if (!list_empty(&chain->list))
-		list_del_init(&chain->list);
+	list_del(&chain->list);
+	kfree(chain);
+}
 
-	/* There might still be a reference held when we got here from
-	 * tcf_block_put. Wait for the user to drop reference before free.
-	 */
-	if (!chain->refcnt)
-		kfree(chain);
+static void tcf_chain_hold(struct tcf_chain *chain)
+{
+	++chain->refcnt;
 }
 
 struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
@@ -217,24 +216,19 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
 	struct tcf_chain *chain;
 
 	list_for_each_entry(chain, &block->chain_list, list) {
-		if (chain->index == chain_index)
-			goto incref;
+		if (chain->index == chain_index) {
+			tcf_chain_hold(chain);
+			return chain;
+		}
 	}
-	chain = create ? tcf_chain_create(block, chain_index) : NULL;
 
-incref:
-	if (chain)
-		chain->refcnt++;
-	return chain;
+	return create ? tcf_chain_create(block, chain_index) : NULL;
 }
 EXPORT_SYMBOL(tcf_chain_get);
 
 void tcf_chain_put(struct tcf_chain *chain)
 {
-	/* Destroy unused chain, with exception of chain 0, which is the
-	 * default one and has to be always present.
-	 */
-	if (--chain->refcnt == 0 && !chain->filter_chain && chain->index != 0)
+	if (--chain->refcnt == 0)
 		tcf_chain_destroy(chain);
 }
 EXPORT_SYMBOL(tcf_chain_put);
@@ -279,10 +273,19 @@ void tcf_block_put(struct tcf_block *block)
 	if (!block)
 		return;
 
+	/* XXX: Standalone actions are not allowed to jump to any chain, and
+	 * bound actions should be all removed after flushing. However,
+	 * filters are destroyed in RCU callbacks, we have to flush and wait
+	 * for them inside the loop, otherwise we race with RCU callbacks on
+	 * this list.
+	 */
 	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
 		tcf_chain_flush(chain);
-		tcf_chain_destroy(chain);
+		rcu_barrier();
 	}
+
+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
+		tcf_chain_put(chain);
 	kfree(block);
 }
 EXPORT_SYMBOL(tcf_block_put);
@@ -360,6 +363,7 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 		rcu_assign_pointer(*chain->p_filter_chain, tp);
 	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain_info));
 	rcu_assign_pointer(*chain_info->pprev, tp);
+	tcf_chain_hold(chain);
 }
 
 static void tcf_chain_tp_remove(struct tcf_chain *chain,
@@ -371,6 +375,7 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 	if (chain->p_filter_chain && tp == chain->filter_chain)
 		RCU_INIT_POINTER(*chain->p_filter_chain, next);
 	RCU_INIT_POINTER(*chain_info->pprev, next);
+	tcf_chain_put(chain);
 }
 
 static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Patch net v3 3/3] net_sched: carefully handle tcf_block_put()
  2017-09-11 23:33 [Patch net v3 0/3] net_sched: fix filter chain reference counting Cong Wang
  2017-09-11 23:33 ` [Patch net v3 1/3] net_sched: get rid of tcfa_rcu Cong Wang
  2017-09-11 23:33 ` [Patch net v3 2/3] net_sched: fix reference counting of tc filter chain Cong Wang
@ 2017-09-11 23:33 ` Cong Wang
  2017-09-12 10:43   ` Jiri Pirko
  2017-09-13  3:41 ` [Patch net v3 0/3] net_sched: fix filter chain reference counting David Miller
  3 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2017-09-11 23:33 UTC (permalink / raw)
  To: netdev; +Cc: jiri, jakub.kicinski, jhs, Cong Wang

As pointed out by Jiri, there is still a race condition between
tcf_block_put() and tcf_chain_destroy() in a RCU callback. There
is no way to make it correct without proper locking or synchronization,
because both operate on a shared list.

Locking is hard, because the only lock we can pick here is a spinlock,
however, in tc_dump_tfilter() we iterate this list with a sleeping
function called (tcf_chain_dump()), which makes using a lock to protect
chain_list almost impossible.

Jiri suggested the idea of holding a refcnt before flushing, this works
because it guarantees us there would be no parallel tcf_chain_destroy()
during the loop, therefore the race condition is gone. But we have to
be very careful with proper synchronization with RCU callbacks.

Suggested-by: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_api.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index d29e79d98a69..0b2219adf520 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -275,15 +275,27 @@ void tcf_block_put(struct tcf_block *block)
 
 	/* XXX: Standalone actions are not allowed to jump to any chain, and
 	 * bound actions should be all removed after flushing. However,
-	 * filters are destroyed in RCU callbacks, we have to flush and wait
-	 * for them inside the loop, otherwise we race with RCU callbacks on
-	 * this list.
+	 * filters are destroyed in RCU callbacks, we have to hold the chains
+	 * first, otherwise we would always race with RCU callbacks on this list
+	 * without proper locking.
 	 */
-	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
+
+	/* Wait for existing RCU callbacks to cool down. */
+	rcu_barrier();
+
+	/* Hold a refcnt for all chains, except 0, in case they are gone. */
+	list_for_each_entry(chain, &block->chain_list, list)
+		if (chain->index)
+			tcf_chain_hold(chain);
+
+	/* No race on the list, because no chain could be destroyed. */
+	list_for_each_entry(chain, &block->chain_list, list)
 		tcf_chain_flush(chain);
-		rcu_barrier();
-	}
 
+	/* Wait for RCU callbacks to release the reference count. */
+	rcu_barrier();
+
+	/* At this point, all the chains should have refcnt == 1. */
 	list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
 		tcf_chain_put(chain);
 	kfree(block);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
  2017-09-11 23:33 ` [Patch net v3 1/3] net_sched: get rid of tcfa_rcu Cong Wang
@ 2017-09-12  9:42   ` Jiri Pirko
  2017-09-12 10:40     ` Jiri Pirko
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Pirko @ 2017-09-12  9:42 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jiri, jakub.kicinski, jhs, Eric Dumazet

Tue, Sep 12, 2017 at 01:33:30AM CEST, xiyou.wangcong@gmail.com wrote:
>gen estimator has been rewritten in commit 1c0d32fde5bd
>("net_sched: gen_estimator: complete rewrite of rate estimators"),
>the caller is no longer needed to wait for a grace period.
>So this patch gets rid of it.
>
>This also completely closes a race condition between action free
>path and filter chain add/remove path for the following patch.
>Because otherwise the nested RCU callback can't be caught by
>rcu_barrier().
>
>Please see also the comments in code.

Looks like this is causing a null pointer dereference bug for me, 100%
of the time. Just add and remove any rule with action and you get:


[  598.599825] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[  598.607782] IP: tcf_action_destroy+0xc0/0x140
[  598.612231] PGD 0 P4D 0 
[  598.614797] Oops: 0000 [#1] SMP KASAN
[  598.618525] Modules linked in: act_gact cls_flower sch_ingress rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl x86_pkg_temp_thermal coretemp mlxsw_spectrum kvm_intel mlxfw kvm parman bridge sunrpc irqbypass iTCO_wdt iTCO_vendor_support stp crct10dif_pclmul llc crc32_pclmul crc32c_intel mlxsw_pci ghash_clmulni_intel mlxsw_core i2c_i801 e1000e pcspkr ptp tpm_tis mei_me pps_core mei tpm_tis_core lpc_ich tpm shpchp video
[  598.659010] CPU: 1 PID: 758 Comm: bash Tainted: G    B           4.13.0jiri+ #70
[  598.666509] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016
[  598.677630] task: ffff880371624bc0 task.stack: ffff880387808000
[  598.683648] RIP: 0010:tcf_action_destroy+0xc0/0x140
[  598.688617] RSP: 0018:ffff88038d107cb8 EFLAGS: 00010282
[  598.693922] RAX: 0000000000000000 RBX: ffff88038d107d28 RCX: ffffffff820b80e0
[  598.701184] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000030
[  598.708405] RBP: ffff88038d107ce8 R08: 0000000000000001 R09: 0000000000000001
[  598.715607] R10: ffff88038d107b27 R11: fffffbfff0bcf36c R12: 0000000000000000
[  598.722816] R13: ffff88038d107d38 R14: ffff88036bf75650 R15: 0000000000000001
[  598.730047] FS:  00007f398050b700(0000) GS:ffff88038d100000(0000) knlGS:0000000000000000
[  598.738253] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  598.744086] CR2: 0000000000000030 CR3: 0000000371ac4001 CR4: 00000000001606e0
[  598.751328] Call Trace:
[  598.753809]  <IRQ>
[  598.755871]  tcf_exts_destroy+0x17f/0x260
[  598.775969]  fl_destroy_filter+0x1d/0x30 [cls_flower]
[  598.781069]  rcu_process_callbacks+0x6b2/0xe00


Kasan says:

[  597.503005] BUG: KASAN: use-after-free in tcf_action_destroy+0xad/0x140
[  597.509751] Read of size 8 at addr ffff88036bf75640 by task bash/758
[  597.516222] 
[  597.517761] CPU: 1 PID: 758 Comm: bash Not tainted 4.13.0jiri+ #70
[  597.524075] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016
[  597.535132] Call Trace:
[  597.537630]  <IRQ>
[  597.539718]  dump_stack+0xd5/0x150
[  597.554853]  print_address_description+0x86/0x410
[  597.559667]  kasan_report+0x181/0x4c0
[  597.583360]  tcf_action_destroy+0xad/0x140
[  597.587551]  tcf_exts_destroy+0x17f/0x260


Ubsan says:

[  598.184033] UBSAN: Undefined behaviour in net/sched/act_api.c:523:4
[  598.190409] member access within null pointer of type 'const struct tc_action_ops'
[  598.198076] CPU: 1 PID: 758 Comm: bash Tainted: G    B           4.13.0jiri+ #70
[  598.205570] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016
[  598.216669] Call Trace:
[  598.219157]  <IRQ>
[  598.221245]  dump_stack+0xd5/0x150
[  598.228703]  ubsan_epilogue+0xd/0x4e
[  598.232333]  __ubsan_handle_type_mismatch+0xf2/0x293
[  598.252880]  tcf_action_destroy+0x115/0x140
[  598.257151]  tcf_exts_destroy+0x17f/0x260
[  598.277336]  fl_destroy_filter+0x1d/0x30 [cls_flower]
[  598.282472]  rcu_process_callbacks+0x6b2/0xe00

Looks like you need to save owner of the module before you call
__tcf_idr_release so you can later on use it for module_put

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
  2017-09-12  9:42   ` Jiri Pirko
@ 2017-09-12 10:40     ` Jiri Pirko
  2017-09-12 21:10       ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Pirko @ 2017-09-12 10:40 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jiri, jakub.kicinski, jhs, Eric Dumazet

Tue, Sep 12, 2017 at 11:42:15AM CEST, jiri@resnulli.us wrote:
>Tue, Sep 12, 2017 at 01:33:30AM CEST, xiyou.wangcong@gmail.com wrote:
>>gen estimator has been rewritten in commit 1c0d32fde5bd
>>("net_sched: gen_estimator: complete rewrite of rate estimators"),
>>the caller is no longer needed to wait for a grace period.
>>So this patch gets rid of it.
>>
>>This also completely closes a race condition between action free
>>path and filter chain add/remove path for the following patch.
>>Because otherwise the nested RCU callback can't be caught by
>>rcu_barrier().
>>
>>Please see also the comments in code.
>
>Looks like this is causing a null pointer dereference bug for me, 100%
>of the time. Just add and remove any rule with action and you get:
>

[...]

>
>Looks like you need to save owner of the module before you call
>__tcf_idr_release so you can later on use it for module_put

This patch helps:

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index fcd7dc7..de73e71 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -514,13 +514,15 @@ EXPORT_SYMBOL(tcf_action_exec);
 
 int tcf_action_destroy(struct list_head *actions, int bind)
 {
+	const struct tc_action_ops *ops;
 	struct tc_action *a, *tmp;
 	int ret = 0;
 
 	list_for_each_entry_safe(a, tmp, actions, list) {
+		ops = a->ops;
 		ret = __tcf_idr_release(a, bind, true);
 		if (ret == ACT_P_DELETED)
-			module_put(a->ops->owner);
+			module_put(ops->owner);
 		else if (ret < 0)
 			return ret;
 	}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 2/3] net_sched: fix reference counting of tc filter chain
  2017-09-11 23:33 ` [Patch net v3 2/3] net_sched: fix reference counting of tc filter chain Cong Wang
@ 2017-09-12 10:43   ` Jiri Pirko
  0 siblings, 0 replies; 14+ messages in thread
From: Jiri Pirko @ 2017-09-12 10:43 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jiri, jakub.kicinski, jhs

Tue, Sep 12, 2017 at 01:33:31AM CEST, xiyou.wangcong@gmail.com wrote:
>This patch fixes the following ugliness of tc filter chain refcnt:
>
>a) tp proto should hold a refcnt to the chain too. This significantly
>   simplifies the logic.
>
>b) Chain 0 is no longer special, it is created with refcnt=1 like any
>   other chains. All the ugliness in tcf_chain_put() can be gone!
>
>c) No need to handle the flushing oddly, because block still holds
>   chain 0, it can not be released, this guarantees block is the last
>   user.
>
>d) The race condition with RCU callbacks is easier to handle with just
>   a rcu_barrier(). Much easier to understand, nothing to hide. Thanks
>   to the previous patch. Please see also the comments in code.
>
>e) Make the code understandable by humans, much less error-prone.
>
>Fixes: 744a4cf63e52 ("net: sched: fix use after free when tcf_chain_destroy is called multiple times")
>Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
>Cc: Jiri Pirko <jiri@mellanox.com>
>Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Looking good to me. Thanks!

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 3/3] net_sched: carefully handle tcf_block_put()
  2017-09-11 23:33 ` [Patch net v3 3/3] net_sched: carefully handle tcf_block_put() Cong Wang
@ 2017-09-12 10:43   ` Jiri Pirko
  0 siblings, 0 replies; 14+ messages in thread
From: Jiri Pirko @ 2017-09-12 10:43 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, jiri, jakub.kicinski, jhs

Tue, Sep 12, 2017 at 01:33:32AM CEST, xiyou.wangcong@gmail.com wrote:
>As pointed out by Jiri, there is still a race condition between
>tcf_block_put() and tcf_chain_destroy() in a RCU callback. There
>is no way to make it correct without proper locking or synchronization,
>because both operate on a shared list.
>
>Locking is hard, because the only lock we can pick here is a spinlock,
>however, in tc_dump_tfilter() we iterate this list with a sleeping
>function called (tcf_chain_dump()), which makes using a lock to protect
>chain_list almost impossible.
>
>Jiri suggested the idea of holding a refcnt before flushing, this works
>because it guarantees us there would be no parallel tcf_chain_destroy()
>during the loop, therefore the race condition is gone. But we have to
>be very careful with proper synchronization with RCU callbacks.
>
>Suggested-by: Jiri Pirko <jiri@mellanox.com>
>Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
  2017-09-12 10:40     ` Jiri Pirko
@ 2017-09-12 21:10       ` Cong Wang
  2017-09-12 21:36         ` Jiri Pirko
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2017-09-12 21:10 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Eric Dumazet

On Tue, Sep 12, 2017 at 3:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Sep 12, 2017 at 11:42:15AM CEST, jiri@resnulli.us wrote:
>>Tue, Sep 12, 2017 at 01:33:30AM CEST, xiyou.wangcong@gmail.com wrote:
>>>gen estimator has been rewritten in commit 1c0d32fde5bd
>>>("net_sched: gen_estimator: complete rewrite of rate estimators"),
>>>the caller is no longer needed to wait for a grace period.
>>>So this patch gets rid of it.
>>>
>>>This also completely closes a race condition between action free
>>>path and filter chain add/remove path for the following patch.
>>>Because otherwise the nested RCU callback can't be caught by
>>>rcu_barrier().
>>>
>>>Please see also the comments in code.
>>
>>Looks like this is causing a null pointer dereference bug for me, 100%
>>of the time. Just add and remove any rule with action and you get:
>>
>
> [...]
>
>>
>>Looks like you need to save owner of the module before you call
>>__tcf_idr_release so you can later on use it for module_put

Why do you believe it is this patch introduces the bug?

That code has been there since the beginning of git history:

+       for (a = act; a; a = act) {
+               if (a->ops && a->ops->cleanup) {
+                       DPRINTK("tcf_action_destroy destroying %p next %p\n",
+                               a, a->next);
+                       if (a->ops->cleanup(a, bind) == ACT_P_DELETED)
+                               module_put(a->ops->owner);
+                       act = act->next;

Seems to be a very old one. The reason why it exposes, I guess,
is call_rcu() somehow delays the free after module_put().


>
> This patch helps:

Looks good to me. Please feel free to submit a formal patch.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
  2017-09-12 21:10       ` Cong Wang
@ 2017-09-12 21:36         ` Jiri Pirko
  2017-09-12 21:53           ` Cong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Pirko @ 2017-09-12 21:36 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Eric Dumazet

Tue, Sep 12, 2017 at 11:10:22PM CEST, xiyou.wangcong@gmail.com wrote:
>On Tue, Sep 12, 2017 at 3:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Sep 12, 2017 at 11:42:15AM CEST, jiri@resnulli.us wrote:
>>>Tue, Sep 12, 2017 at 01:33:30AM CEST, xiyou.wangcong@gmail.com wrote:
>>>>gen estimator has been rewritten in commit 1c0d32fde5bd
>>>>("net_sched: gen_estimator: complete rewrite of rate estimators"),
>>>>the caller is no longer needed to wait for a grace period.
>>>>So this patch gets rid of it.
>>>>
>>>>This also completely closes a race condition between action free
>>>>path and filter chain add/remove path for the following patch.
>>>>Because otherwise the nested RCU callback can't be caught by
>>>>rcu_barrier().
>>>>
>>>>Please see also the comments in code.
>>>
>>>Looks like this is causing a null pointer dereference bug for me, 100%
>>>of the time. Just add and remove any rule with action and you get:
>>>
>>
>> [...]
>>
>>>
>>>Looks like you need to save owner of the module before you call
>>>__tcf_idr_release so you can later on use it for module_put
>
>Why do you believe it is this patch introduces the bug?
>
>That code has been there since the beginning of git history:
>
>+       for (a = act; a; a = act) {
>+               if (a->ops && a->ops->cleanup) {
>+                       DPRINTK("tcf_action_destroy destroying %p next %p\n",
>+                               a, a->next);
>+                       if (a->ops->cleanup(a, bind) == ACT_P_DELETED)
>+                               module_put(a->ops->owner);
>+                       act = act->next;
>
>Seems to be a very old one. The reason why it exposes, I guess,
>is call_rcu() somehow delays the free after module_put().

Yeah, looks like the race was just hard to hit. However with your patch,
it is very easy to hit.


>
>
>>
>> This patch helps:
>
>Looks good to me. Please feel free to submit a formal patch.

Okay, I will send the patch to you formally so you can add it as a first
patch of your patchset.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
  2017-09-12 21:36         ` Jiri Pirko
@ 2017-09-12 21:53           ` Cong Wang
  2017-09-13  6:13             ` Jiri Pirko
  0 siblings, 1 reply; 14+ messages in thread
From: Cong Wang @ 2017-09-12 21:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Eric Dumazet

On Tue, Sep 12, 2017 at 2:36 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Sep 12, 2017 at 11:10:22PM CEST, xiyou.wangcong@gmail.com wrote:
>>On Tue, Sep 12, 2017 at 3:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> This patch helps:
>>
>>Looks good to me. Please feel free to submit a formal patch.
>
> Okay, I will send the patch to you formally so you can add it as a first
> patch of your patchset.

I can carry it by myself if it fits to this patchset. However, I believe it
should be independent since it has to be backported much further
than this patchset. I don't know why no one triggered the crash
before call_rcu() was introduced there.

Anyway, I believe you should submit your patch alone, either before
or after this patchset, there should be no conflict.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 0/3] net_sched: fix filter chain reference counting
  2017-09-11 23:33 [Patch net v3 0/3] net_sched: fix filter chain reference counting Cong Wang
                   ` (2 preceding siblings ...)
  2017-09-11 23:33 ` [Patch net v3 3/3] net_sched: carefully handle tcf_block_put() Cong Wang
@ 2017-09-13  3:41 ` David Miller
  2017-09-13  6:13   ` Jiri Pirko
  3 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2017-09-13  3:41 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, jiri, jakub.kicinski, jhs

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 11 Sep 2017 16:33:29 -0700

> This patchset fixes tc filter chain reference counting and nasty race
> conditions with RCU callbacks. Please see each patch for details.

Series applied, thanks Cong.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 1/3] net_sched: get rid of tcfa_rcu
  2017-09-12 21:53           ` Cong Wang
@ 2017-09-13  6:13             ` Jiri Pirko
  0 siblings, 0 replies; 14+ messages in thread
From: Jiri Pirko @ 2017-09-13  6:13 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Jakub Kicinski,
	Jamal Hadi Salim, Eric Dumazet

Tue, Sep 12, 2017 at 11:53:09PM CEST, xiyou.wangcong@gmail.com wrote:
>On Tue, Sep 12, 2017 at 2:36 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Sep 12, 2017 at 11:10:22PM CEST, xiyou.wangcong@gmail.com wrote:
>>>On Tue, Sep 12, 2017 at 3:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> This patch helps:
>>>
>>>Looks good to me. Please feel free to submit a formal patch.
>>
>> Okay, I will send the patch to you formally so you can add it as a first
>> patch of your patchset.
>
>I can carry it by myself if it fits to this patchset. However, I believe it
>should be independent since it has to be backported much further
>than this patchset. I don't know why no one triggered the crash
>before call_rcu() was introduced there.
>
>Anyway, I believe you should submit your patch alone, either before
>or after this patchset, there should be no conflict.

Okay. Will to it before the patchset. Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch net v3 0/3] net_sched: fix filter chain reference counting
  2017-09-13  3:41 ` [Patch net v3 0/3] net_sched: fix filter chain reference counting David Miller
@ 2017-09-13  6:13   ` Jiri Pirko
  0 siblings, 0 replies; 14+ messages in thread
From: Jiri Pirko @ 2017-09-13  6:13 UTC (permalink / raw)
  To: David Miller; +Cc: xiyou.wangcong, netdev, jiri, jakub.kicinski, jhs

Wed, Sep 13, 2017 at 05:41:18AM CEST, davem@davemloft.net wrote:
>From: Cong Wang <xiyou.wangcong@gmail.com>
>Date: Mon, 11 Sep 2017 16:33:29 -0700
>
>> This patchset fixes tc filter chain reference counting and nasty race
>> conditions with RCU callbacks. Please see each patch for details.
>
>Series applied, thanks Cong.

Ha, so after :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-09-13  6:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-11 23:33 [Patch net v3 0/3] net_sched: fix filter chain reference counting Cong Wang
2017-09-11 23:33 ` [Patch net v3 1/3] net_sched: get rid of tcfa_rcu Cong Wang
2017-09-12  9:42   ` Jiri Pirko
2017-09-12 10:40     ` Jiri Pirko
2017-09-12 21:10       ` Cong Wang
2017-09-12 21:36         ` Jiri Pirko
2017-09-12 21:53           ` Cong Wang
2017-09-13  6:13             ` Jiri Pirko
2017-09-11 23:33 ` [Patch net v3 2/3] net_sched: fix reference counting of tc filter chain Cong Wang
2017-09-12 10:43   ` Jiri Pirko
2017-09-11 23:33 ` [Patch net v3 3/3] net_sched: carefully handle tcf_block_put() Cong Wang
2017-09-12 10:43   ` Jiri Pirko
2017-09-13  3:41 ` [Patch net v3 0/3] net_sched: fix filter chain reference counting David Miller
2017-09-13  6:13   ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.