All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API
@ 2019-09-18  7:31 Vlad Buslov
  2019-09-18  7:31 ` [PATCH net 1/3] net: sched: sch_htb: don't call qdisc_put() while holding tree lock Vlad Buslov
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Vlad Buslov @ 2019-09-18  7:31 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

TC filter API unlocking introduced several new fine-grained locks. The
change caused sleeping-while-atomic BUGs in several Qdiscs that call cls
APIs which need to obtain new mutex while holding sch tree spinlock. This
series fixes affected Qdiscs by ensuring that cls API that became sleeping
is only called outside of sch tree lock critical section.

Vlad Buslov (3):
  net: sched: sch_htb: don't call qdisc_put() while holding tree lock
  net: sched: multiq: don't call qdisc_put() while holding tree lock
  net: sched: sch_sfb: don't call qdisc_put() while holding tree lock

 net/sched/sch_htb.c    |  4 +++-
 net/sched/sch_multiq.c | 12 +++++++-----
 net/sched/sch_sfb.c    |  5 +++--
 3 files changed, 13 insertions(+), 8 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net 1/3] net: sched: sch_htb: don't call qdisc_put() while holding tree lock
  2019-09-18  7:31 [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Vlad Buslov
@ 2019-09-18  7:31 ` Vlad Buslov
  2019-09-18  7:32 ` [PATCH net 2/3] net: sched: multiq: " Vlad Buslov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2019-09-18  7:31 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Recent changes that removed rtnl dependency from rules update path of tc
also made tcf_block_put() function sleeping. This function is called from
ops->destroy() of several Qdisc implementations, which in turn is called by
qdisc_put(). Some Qdiscs call qdisc_put() while holding sch tree spinlock,
which results sleeping-while-atomic BUG.

Steps to reproduce for htb:

tc qdisc add dev ens1f0 root handle 1: htb default 12
tc class add dev ens1f0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps
tc qdisc add dev ens1f0 parent 1:1 handle 40: sfq perturb 10
tc class add dev ens1f0 parent 1:1 classid 1:2 htb rate 100kbps ceil 100kbps

Resulting dmesg:

[ 4791.148551] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
[ 4791.151354] in_atomic(): 1, irqs_disabled(): 0, pid: 27273, name: tc
[ 4791.152805] INFO: lockdep is turned off.
[ 4791.153605] CPU: 19 PID: 27273 Comm: tc Tainted: G        W         5.3.0-rc8+ #721
[ 4791.154336] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 4791.155075] Call Trace:
[ 4791.155803]  dump_stack+0x85/0xc0
[ 4791.156529]  ___might_sleep.cold+0xac/0xbc
[ 4791.157251]  __mutex_lock+0x5b/0x960
[ 4791.157966]  ? console_unlock+0x363/0x5d0
[ 4791.158676]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 4791.159395]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 4791.160103]  tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 4791.160815]  tcf_block_put_ext.part.0+0x21/0x50
[ 4791.161530]  tcf_block_put+0x50/0x70
[ 4791.162233]  sfq_destroy+0x15/0x50 [sch_sfq]
[ 4791.162936]  qdisc_destroy+0x5f/0x160
[ 4791.163642]  htb_change_class.cold+0x5df/0x69d [sch_htb]
[ 4791.164505]  tc_ctl_tclass+0x19d/0x480
[ 4791.165360]  rtnetlink_rcv_msg+0x170/0x4b0
[ 4791.166191]  ? netlink_deliver_tap+0x95/0x400
[ 4791.166907]  ? rtnl_dellink+0x2d0/0x2d0
[ 4791.167625]  netlink_rcv_skb+0x49/0x110
[ 4791.168345]  netlink_unicast+0x171/0x200
[ 4791.169058]  netlink_sendmsg+0x224/0x3f0
[ 4791.169771]  sock_sendmsg+0x5e/0x60
[ 4791.170475]  ___sys_sendmsg+0x2ae/0x330
[ 4791.171183]  ? ___sys_recvmsg+0x159/0x1f0
[ 4791.171894]  ? do_wp_page+0x9c/0x790
[ 4791.172595]  ? __handle_mm_fault+0xcd3/0x19e0
[ 4791.173309]  __sys_sendmsg+0x59/0xa0
[ 4791.174024]  do_syscall_64+0x5c/0xb0
[ 4791.174725]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 4791.175435] RIP: 0033:0x7f0aa41497b8
[ 4791.176129] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
4
[ 4791.177532] RSP: 002b:00007fff4e37d588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 4791.178243] RAX: ffffffffffffffda RBX: 000000005d8132f7 RCX: 00007f0aa41497b8
[ 4791.178947] RDX: 0000000000000000 RSI: 00007fff4e37d5f0 RDI: 0000000000000003
[ 4791.179662] RBP: 0000000000000000 R08: 0000000000000001 R09: 00000000020149a0
[ 4791.180382] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
[ 4791.181100] R13: 000000000047f640 R14: 0000000000000000 R15: 0000000000000000

Fix the issue by saving parent->leaf.q to local temporary variable and
calling qdisc_put() with it after sch tree lock is released.

Fixes: c266f64dbfa2 ("net: sched: protect block state with mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/sch_htb.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 7bcf20ef9145..8184c87da8be 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1302,6 +1302,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 	struct htb_class *cl = (struct htb_class *)*arg, *parent;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_HTB_MAX + 1];
+	struct Qdisc *parent_qdisc = NULL;
 	struct tc_htb_opt *hopt;
 	u64 rate64, ceil64;
 	int warn = 0;
@@ -1401,7 +1402,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 		if (parent && !parent->level) {
 			/* turn parent into inner node */
 			qdisc_purge_queue(parent->leaf.q);
-			qdisc_put(parent->leaf.q);
+			parent_qdisc = parent->leaf.q;
 			if (parent->prio_activity)
 				htb_deactivate(q, parent);
 
@@ -1480,6 +1481,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 	cl->cbuffer = PSCHED_TICKS2NS(hopt->cbuffer);
 
 	sch_tree_unlock(sch);
+	qdisc_put(parent_qdisc);
 
 	if (warn)
 		pr_warn("HTB: quantum of class %X is %s. Consider r2q change.\n",
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 2/3] net: sched: multiq: don't call qdisc_put() while holding tree lock
  2019-09-18  7:31 [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Vlad Buslov
  2019-09-18  7:31 ` [PATCH net 1/3] net: sched: sch_htb: don't call qdisc_put() while holding tree lock Vlad Buslov
@ 2019-09-18  7:32 ` Vlad Buslov
  2019-09-18 22:56   ` Cong Wang
  2019-09-18  7:32 ` [PATCH net 3/3] net: sched: sch_sfb: " Vlad Buslov
  2019-09-18 22:50 ` [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Cong Wang
  3 siblings, 1 reply; 9+ messages in thread
From: Vlad Buslov @ 2019-09-18  7:32 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Recent changes that removed rtnl dependency from rules update path of tc
also made tcf_block_put() function sleeping. This function is called from
ops->destroy() of several Qdisc implementations, which in turn is called by
qdisc_put(). Some Qdiscs call qdisc_put() while holding sch tree spinlock,
which results sleeping-while-atomic BUG.

Steps to reproduce for multiq:

tc qdisc add dev ens1f0 root handle 1: multiq
tc qdisc add dev ens1f0 parent 1:10 handle 50: sfq perturb 10
ethtool -L ens1f0 combined 2
tc qdisc change dev ens1f0 root handle 1: multiq

Resulting dmesg:

[ 5539.419344] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
[ 5539.420945] in_atomic(): 1, irqs_disabled(): 0, pid: 27658, name: tc
[ 5539.422435] INFO: lockdep is turned off.
[ 5539.423904] CPU: 21 PID: 27658 Comm: tc Tainted: G        W         5.3.0-rc8+ #721
[ 5539.425400] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 5539.426911] Call Trace:
[ 5539.428380]  dump_stack+0x85/0xc0
[ 5539.429823]  ___might_sleep.cold+0xac/0xbc
[ 5539.431262]  __mutex_lock+0x5b/0x960
[ 5539.432682]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 5539.434103]  ? __nla_validate_parse+0x51/0x840
[ 5539.435493]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 5539.436903]  tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 5539.438327]  tcf_block_put_ext.part.0+0x21/0x50
[ 5539.439752]  tcf_block_put+0x50/0x70
[ 5539.441165]  sfq_destroy+0x15/0x50 [sch_sfq]
[ 5539.442570]  qdisc_destroy+0x5f/0x160
[ 5539.444000]  multiq_tune+0x14a/0x420 [sch_multiq]
[ 5539.445421]  tc_modify_qdisc+0x324/0x840
[ 5539.446841]  rtnetlink_rcv_msg+0x170/0x4b0
[ 5539.448269]  ? netlink_deliver_tap+0x95/0x400
[ 5539.449691]  ? rtnl_dellink+0x2d0/0x2d0
[ 5539.451116]  netlink_rcv_skb+0x49/0x110
[ 5539.452522]  netlink_unicast+0x171/0x200
[ 5539.453914]  netlink_sendmsg+0x224/0x3f0
[ 5539.455304]  sock_sendmsg+0x5e/0x60
[ 5539.456686]  ___sys_sendmsg+0x2ae/0x330
[ 5539.458071]  ? ___sys_recvmsg+0x159/0x1f0
[ 5539.459461]  ? do_wp_page+0x9c/0x790
[ 5539.460846]  ? __handle_mm_fault+0xcd3/0x19e0
[ 5539.462263]  __sys_sendmsg+0x59/0xa0
[ 5539.463661]  do_syscall_64+0x5c/0xb0
[ 5539.465044]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 5539.466454] RIP: 0033:0x7f1fe08177b8
[ 5539.467863] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
4
[ 5539.470906] RSP: 002b:00007ffe812de5d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 5539.472483] RAX: ffffffffffffffda RBX: 000000005d8135e3 RCX: 00007f1fe08177b8
[ 5539.474069] RDX: 0000000000000000 RSI: 00007ffe812de640 RDI: 0000000000000003
[ 5539.475655] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000182e9b0
[ 5539.477203] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
[ 5539.478699] R13: 000000000047f640 R14: 0000000000000000 R15: 0000000000000000

Rearrange locking in multiq_tune() in following ways:

- Don't hold sch tree lock for whole duration of the loop that removes
  Qdiscs from queues that have been disabled. Obtain the lock each time
  parent qdisc is changed and release it afterwards.

- Move call to qdisc_put() after sch_tree_unlock(sch) in loop that adds
  Qdiscs on newly enabled tx queues. Since qdisc_put() implementation
  already verifies that argument Qdisc is not of builtin type before
  processing it we don't need additional conditional around the call here.

Fixes: c266f64dbfa2 ("net: sched: protect block state with mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/sch_multiq.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index e1087746f6a2..4cfa9a7bd29e 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -187,18 +187,21 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
 
 	sch_tree_lock(sch);
 	q->bands = qopt->bands;
+	sch_tree_unlock(sch);
+
 	for (i = q->bands; i < q->max_bands; i++) {
 		if (q->queues[i] != &noop_qdisc) {
 			struct Qdisc *child = q->queues[i];
 
+			sch_tree_lock(sch);
 			q->queues[i] = &noop_qdisc;
 			qdisc_tree_flush_backlog(child);
+			sch_tree_unlock(sch);
+
 			qdisc_put(child);
 		}
 	}
 
-	sch_tree_unlock(sch);
-
 	for (i = 0; i < q->bands; i++) {
 		if (q->queues[i] == &noop_qdisc) {
 			struct Qdisc *child, *old;
@@ -213,11 +216,10 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
 				if (child != &noop_qdisc)
 					qdisc_hash_add(child, true);
 
-				if (old != &noop_qdisc) {
+				if (old != &noop_qdisc)
 					qdisc_tree_flush_backlog(old);
-					qdisc_put(old);
-				}
 				sch_tree_unlock(sch);
+				qdisc_put(old);
 			}
 		}
 	}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 3/3] net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
  2019-09-18  7:31 [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Vlad Buslov
  2019-09-18  7:31 ` [PATCH net 1/3] net: sched: sch_htb: don't call qdisc_put() while holding tree lock Vlad Buslov
  2019-09-18  7:32 ` [PATCH net 2/3] net: sched: multiq: " Vlad Buslov
@ 2019-09-18  7:32 ` Vlad Buslov
  2019-09-18 22:50 ` [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Cong Wang
  3 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2019-09-18  7:32 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov,
	syzbot+ac54455281db908c581e

Recent changes that removed rtnl dependency from rules update path of tc
also made tcf_block_put() function sleeping. This function is called from
ops->destroy() of several Qdisc implementations, which in turn is called by
qdisc_put(). Some Qdiscs call qdisc_put() while holding sch tree spinlock,
which results sleeping-while-atomic BUG.

Steps to reproduce for sfb:

tc qdisc add dev ens1f0 handle 1: root sfb
tc qdisc add dev ens1f0 parent 1:10 handle 50: sfq perturb 10
tc qdisc change dev ens1f0 root handle 1: sfb

Resulting dmesg:

[ 7265.938717] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909
[ 7265.940152] in_atomic(): 1, irqs_disabled(): 0, pid: 28579, name: tc
[ 7265.941455] INFO: lockdep is turned off.
[ 7265.942744] CPU: 11 PID: 28579 Comm: tc Tainted: G        W         5.3.0-rc8+ #721
[ 7265.944065] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 7265.945396] Call Trace:
[ 7265.946709]  dump_stack+0x85/0xc0
[ 7265.947994]  ___might_sleep.cold+0xac/0xbc
[ 7265.949282]  __mutex_lock+0x5b/0x960
[ 7265.950543]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 7265.951803]  ? tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 7265.953022]  tcf_chain0_head_change_cb_del.isra.0+0x1b/0xf0
[ 7265.954248]  tcf_block_put_ext.part.0+0x21/0x50
[ 7265.955478]  tcf_block_put+0x50/0x70
[ 7265.956694]  sfq_destroy+0x15/0x50 [sch_sfq]
[ 7265.957898]  qdisc_destroy+0x5f/0x160
[ 7265.959099]  sfb_change+0x175/0x330 [sch_sfb]
[ 7265.960304]  tc_modify_qdisc+0x324/0x840
[ 7265.961503]  rtnetlink_rcv_msg+0x170/0x4b0
[ 7265.962692]  ? netlink_deliver_tap+0x95/0x400
[ 7265.963876]  ? rtnl_dellink+0x2d0/0x2d0
[ 7265.965064]  netlink_rcv_skb+0x49/0x110
[ 7265.966251]  netlink_unicast+0x171/0x200
[ 7265.967427]  netlink_sendmsg+0x224/0x3f0
[ 7265.968595]  sock_sendmsg+0x5e/0x60
[ 7265.969753]  ___sys_sendmsg+0x2ae/0x330
[ 7265.970916]  ? ___sys_recvmsg+0x159/0x1f0
[ 7265.972074]  ? do_wp_page+0x9c/0x790
[ 7265.973233]  ? __handle_mm_fault+0xcd3/0x19e0
[ 7265.974407]  __sys_sendmsg+0x59/0xa0
[ 7265.975591]  do_syscall_64+0x5c/0xb0
[ 7265.976753]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 7265.977938] RIP: 0033:0x7f229069f7b8
[ 7265.979117] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
4
[ 7265.981681] RSP: 002b:00007ffd7ed2d158 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 7265.983001] RAX: ffffffffffffffda RBX: 000000005d813ca1 RCX: 00007f229069f7b8
[ 7265.984336] RDX: 0000000000000000 RSI: 00007ffd7ed2d1c0 RDI: 0000000000000003
[ 7265.985682] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000165c9a0
[ 7265.987021] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
[ 7265.988309] R13: 000000000047f640 R14: 0000000000000000 R15: 0000000000000000

Save Qdisc that is being removed to temporary local variable and call
qdisc_put() with it after sch tree lock is released.

Reported-by: syzbot+ac54455281db908c581e@syzkaller.appspotmail.com
Fixes: c266f64dbfa2 ("net: sched: protect block state with mutex")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/sch_sfb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 1dff8506a715..802a34afece0 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -488,7 +488,7 @@ static int sfb_change(struct Qdisc *sch, struct nlattr *opt,
 		      struct netlink_ext_ack *extack)
 {
 	struct sfb_sched_data *q = qdisc_priv(sch);
-	struct Qdisc *child;
+	struct Qdisc *child, *old;
 	struct nlattr *tb[TCA_SFB_MAX + 1];
 	const struct tc_sfb_qopt *ctl = &sfb_default_ops;
 	u32 limit;
@@ -519,7 +519,7 @@ static int sfb_change(struct Qdisc *sch, struct nlattr *opt,
 	sch_tree_lock(sch);
 
 	qdisc_tree_flush_backlog(q->qdisc);
-	qdisc_put(q->qdisc);
+	old = q->qdisc;
 	q->qdisc = child;
 
 	q->rehash_interval = msecs_to_jiffies(ctl->rehash_interval);
@@ -542,6 +542,7 @@ static int sfb_change(struct Qdisc *sch, struct nlattr *opt,
 	sfb_init_perturbation(1, q);
 
 	sch_tree_unlock(sch);
+	qdisc_put(old);
 
 	return 0;
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API
  2019-09-18  7:31 [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Vlad Buslov
                   ` (2 preceding siblings ...)
  2019-09-18  7:32 ` [PATCH net 3/3] net: sched: sch_sfb: " Vlad Buslov
@ 2019-09-18 22:50 ` Cong Wang
  2019-09-19  8:53   ` Vlad Buslov
  3 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2019-09-18 22:50 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Wed, Sep 18, 2019 at 12:32 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> TC filter API unlocking introduced several new fine-grained locks. The
> change caused sleeping-while-atomic BUGs in several Qdiscs that call cls
> APIs which need to obtain new mutex while holding sch tree spinlock. This
> series fixes affected Qdiscs by ensuring that cls API that became sleeping
> is only called outside of sch tree lock critical section.

Sorry I just took a deeper look. It seems harder than just moving it
out of the critical section.

qdisc_destroy() calls ops->reset() which usually purges queues,
I don't see how it is safe to move it out of tree spinlock without
respecting fast path.

What do you think?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 2/3] net: sched: multiq: don't call qdisc_put() while holding tree lock
  2019-09-18  7:32 ` [PATCH net 2/3] net: sched: multiq: " Vlad Buslov
@ 2019-09-18 22:56   ` Cong Wang
  2019-09-19  8:56     ` Vlad Buslov
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2019-09-18 22:56 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Wed, Sep 18, 2019 at 12:32 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
> index e1087746f6a2..4cfa9a7bd29e 100644
> --- a/net/sched/sch_multiq.c
> +++ b/net/sched/sch_multiq.c
> @@ -187,18 +187,21 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
>
>         sch_tree_lock(sch);
>         q->bands = qopt->bands;
> +       sch_tree_unlock(sch);
> +
>         for (i = q->bands; i < q->max_bands; i++) {
>                 if (q->queues[i] != &noop_qdisc) {
>                         struct Qdisc *child = q->queues[i];
>
> +                       sch_tree_lock(sch);
>                         q->queues[i] = &noop_qdisc;
>                         qdisc_tree_flush_backlog(child);
> +                       sch_tree_unlock(sch);
> +
>                         qdisc_put(child);
>                 }
>         }

Repeatedly acquiring and releasing a spinlock in a loop
does not seem to be a good idea. Is it possible to save
those qdisc pointers to an array or something similar?

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API
  2019-09-18 22:50 ` [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Cong Wang
@ 2019-09-19  8:53   ` Vlad Buslov
  2019-09-19 19:13     ` Cong Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Vlad Buslov @ 2019-09-19  8:53 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiri Pirko, David Miller


On Thu 19 Sep 2019 at 01:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Sep 18, 2019 at 12:32 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> TC filter API unlocking introduced several new fine-grained locks. The
>> change caused sleeping-while-atomic BUGs in several Qdiscs that call cls
>> APIs which need to obtain new mutex while holding sch tree spinlock. This
>> series fixes affected Qdiscs by ensuring that cls API that became sleeping
>> is only called outside of sch tree lock critical section.
>
> Sorry I just took a deeper look. It seems harder than just moving it
> out of the critical section.
>
> qdisc_destroy() calls ops->reset() which usually purges queues,
> I don't see how it is safe to move it out of tree spinlock without
> respecting fast path.
>
> What do you think?

Hmm, maybe we can split qdisc destruction in two stage process for
affected qdiscs? Rough sketch:

1. Call qdisc_reset() (or qdisc_purge_queue()) on qdisc that are being
   deleted under sch tree lock protection.

2. Call new qdisc_put_empty() function after releasing the lock. This
   function would implement same functionality as a regular qdisc_put()
   besides resetting the Qdisc and freeing skb in its queues (already
   done by qdisc_reset())

In fact, affected queues already do the same or something similar:

- htb_change_class() calls qdisc_purge_queue() that calls qdisc_reset(),
  which makes reset inside qdisc_destroy() redundant.

- multiq_tune() calls qdisc_tree_flush_backlog() that has the same
  implementation as qdisc_purge_queue() minus actually resetting the
  Qdisc. Can we substitute first function with the second one here?

- sfb_change() - same as multiq_tune().

Do you think that would work?

Also, I'm kind of surprised that it worked before. Even though syzbot
complains about mutex that I added, cls API never honored expectation by
such Qdiscs that tcf_block_put doesn't sleep. From the top of my head
there are several places in the cls code where it has never been the
case: chain head change for ingress/clsact Qdiscs calls
mini_qdisc_pair_swap() which is a sleeping function,
tcf_block_flush_all_chains() eventually calls driver callbacks for any
offloaded filters which may sleep.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 2/3] net: sched: multiq: don't call qdisc_put() while holding tree lock
  2019-09-18 22:56   ` Cong Wang
@ 2019-09-19  8:56     ` Vlad Buslov
  0 siblings, 0 replies; 9+ messages in thread
From: Vlad Buslov @ 2019-09-19  8:56 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vlad Buslov, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiri Pirko, David Miller


On Thu 19 Sep 2019 at 01:56, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Sep 18, 2019 at 12:32 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
>> index e1087746f6a2..4cfa9a7bd29e 100644
>> --- a/net/sched/sch_multiq.c
>> +++ b/net/sched/sch_multiq.c
>> @@ -187,18 +187,21 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
>>
>>         sch_tree_lock(sch);
>>         q->bands = qopt->bands;
>> +       sch_tree_unlock(sch);
>> +
>>         for (i = q->bands; i < q->max_bands; i++) {
>>                 if (q->queues[i] != &noop_qdisc) {
>>                         struct Qdisc *child = q->queues[i];
>>
>> +                       sch_tree_lock(sch);
>>                         q->queues[i] = &noop_qdisc;
>>                         qdisc_tree_flush_backlog(child);
>> +                       sch_tree_unlock(sch);
>> +
>>                         qdisc_put(child);
>>                 }
>>         }
>
> Repeatedly acquiring and releasing a spinlock in a loop
> does not seem to be a good idea. Is it possible to save
> those qdisc pointers to an array or something similar?
>
> Thanks.

Sure. I implemented it the way I did because following loop in
multiq_tune() is implemented in exactly the same way: it repeatedly
acquires and releases sch tree lock for each new default Qdisc that it
creates.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API
  2019-09-19  8:53   ` Vlad Buslov
@ 2019-09-19 19:13     ` Cong Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2019-09-19 19:13 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Thu, Sep 19, 2019 at 1:53 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Thu 19 Sep 2019 at 01:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Sep 18, 2019 at 12:32 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >> TC filter API unlocking introduced several new fine-grained locks. The
> >> change caused sleeping-while-atomic BUGs in several Qdiscs that call cls
> >> APIs which need to obtain new mutex while holding sch tree spinlock. This
> >> series fixes affected Qdiscs by ensuring that cls API that became sleeping
> >> is only called outside of sch tree lock critical section.
> >
> > Sorry I just took a deeper look. It seems harder than just moving it
> > out of the critical section.
> >
> > qdisc_destroy() calls ops->reset() which usually purges queues,
> > I don't see how it is safe to move it out of tree spinlock without
> > respecting fast path.
> >
> > What do you think?
>
> Hmm, maybe we can split qdisc destruction in two stage process for
> affected qdiscs? Rough sketch:
>
> 1. Call qdisc_reset() (or qdisc_purge_queue()) on qdisc that are being
>    deleted under sch tree lock protection.
>
> 2. Call new qdisc_put_empty() function after releasing the lock. This
>    function would implement same functionality as a regular qdisc_put()
>    besides resetting the Qdisc and freeing skb in its queues (already
>    done by qdisc_reset())
>
> In fact, affected queues already do the same or something similar:
>
> - htb_change_class() calls qdisc_purge_queue() that calls qdisc_reset(),
>   which makes reset inside qdisc_destroy() redundant.
>
> - multiq_tune() calls qdisc_tree_flush_backlog() that has the same
>   implementation as qdisc_purge_queue() minus actually resetting the
>   Qdisc. Can we substitute first function with the second one here?
>
> - sfb_change() - same as multiq_tune().
>
> Do you think that would work?

I think they have to call qdisc_purge_queue() or whatever that calls
qdisc_reset() to reset all queues including qdisc->gso_skb and
qdisc->skb_bad_txq before releasing sch tree lock.

qdisc_tree_flush_backlog() is not sufficient.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-19 19:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18  7:31 [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Vlad Buslov
2019-09-18  7:31 ` [PATCH net 1/3] net: sched: sch_htb: don't call qdisc_put() while holding tree lock Vlad Buslov
2019-09-18  7:32 ` [PATCH net 2/3] net: sched: multiq: " Vlad Buslov
2019-09-18 22:56   ` Cong Wang
2019-09-19  8:56     ` Vlad Buslov
2019-09-18  7:32 ` [PATCH net 3/3] net: sched: sch_sfb: " Vlad Buslov
2019-09-18 22:50 ` [PATCH net 0/3] Fix Qdisc destroy issues caused by adding fine-grained locking to filter API Cong Wang
2019-09-19  8:53   ` Vlad Buslov
2019-09-19 19:13     ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.