* [PATCH net v2] net/sched: sch_ets: don't peek at classes beyond 'nbands'
@ 2021-11-23 13:53 Davide Caratti
2021-11-24 0:44 ` Cong Wang
0 siblings, 1 reply; 3+ messages in thread
From: Davide Caratti @ 2021-11-23 13:53 UTC (permalink / raw)
To: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
Jakub Kicinski, Petr Machata
Cc: netdev, Hangbin Liu
when the number of DRR classes decreases, the round-robin active list can
contain elements that have already been freed in ets_qdisc_change(). As a
consequence, it's possible to see a NULL dereference crash, caused by the
attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL:
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 910 Comm: mausezahn Not tainted 5.16.0-rc1+ #475
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
RIP: 0010:ets_qdisc_dequeue+0x129/0x2c0 [sch_ets]
Code: c5 01 41 39 ad e4 02 00 00 0f 87 18 ff ff ff 49 8b 85 c0 02 00 00 49 39 c4 0f 84 ba 00 00 00 49 8b ad c0 02 00 00 48 8b 7d 10 <48> 8b 47 18 48 8b 40 38 0f ae e8 ff d0 48 89 c3 48 85 c0 0f 84 9d
RSP: 0000:ffffbb36c0b5fdd8 EFLAGS: 00010287
RAX: ffff956678efed30 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffff9b938dc9 RDI: 0000000000000000
RBP: ffff956678efed30 R08: e2f3207fe360129c R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff956678efeac0
R13: ffff956678efe800 R14: ffff956611545000 R15: ffff95667ac8f100
FS: 00007f2aa9120740(0000) GS:ffff95667b800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 000000011070c000 CR4: 0000000000350ee0
Call Trace:
<TASK>
qdisc_peek_dequeued+0x29/0x70 [sch_ets]
tbf_dequeue+0x22/0x260 [sch_tbf]
__qdisc_run+0x7f/0x630
net_tx_action+0x290/0x4c0
__do_softirq+0xee/0x4f8
irq_exit_rcu+0xf4/0x130
sysvec_apic_timer_interrupt+0x52/0xc0
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0033:0x7f2aa7fc9ad4
Code: b9 ff ff 48 8b 54 24 18 48 83 c4 08 48 89 ee 48 89 df 5b 5d e9 ed fc ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa <53> 48 83 ec 10 48 8b 05 10 64 33 00 48 8b 00 48 85 c0 0f 85 84 00
RSP: 002b:00007ffe5d33fab8 EFLAGS: 00000202
RAX: 0000000000000002 RBX: 0000561f72c31460 RCX: 0000561f72c31720
RDX: 0000000000000002 RSI: 0000561f72c31722 RDI: 0000561f72c31720
RBP: 000000000000002a R08: 00007ffe5d33fa40 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 0000561f7187e380
R13: 0000000000000000 R14: 0000000000000000 R15: 0000561f72c31460
</TASK>
Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt intel_rapl_msr iTCO_vendor_support intel_rapl_common joydev virtio_balloon lpc_ich i2c_i801 i2c_smbus pcspkr ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel serio_raw libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000018
Ensuring that 'alist' was never zeroed [1] was not sufficient, we need to
check also for possible NULL 'qdisc' pointers in the leaf class.
[1] https://lore.kernel.org/netdev/60d274838bf09777f0371253416e8af71360bc08.1633609148.git.dcaratti@redhat.com/
v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb
from the next list item.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
net/sched/sch_ets.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index 0eae9ff5edf6..ecb569ffb3f1 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -476,10 +476,15 @@ static struct sk_buff *ets_qdisc_dequeue(struct Qdisc *sch)
return ets_qdisc_dequeue_skb(sch, skb);
}
+drr_dequeue:
if (list_empty(&q->active))
goto out;
cl = list_first_entry(&q->active, struct ets_class, alist);
+ if (!cl->qdisc) {
+ list_del(&cl->alist);
+ goto drr_dequeue;
+ }
skb = cl->qdisc->ops->peek(cl->qdisc);
if (!skb) {
qdisc_warn_nonwc(__func__, cl->qdisc);
--
2.31.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net v2] net/sched: sch_ets: don't peek at classes beyond 'nbands'
2021-11-23 13:53 [PATCH net v2] net/sched: sch_ets: don't peek at classes beyond 'nbands' Davide Caratti
@ 2021-11-24 0:44 ` Cong Wang
2021-11-24 11:07 ` Davide Caratti
0 siblings, 1 reply; 3+ messages in thread
From: Cong Wang @ 2021-11-24 0:44 UTC (permalink / raw)
To: Davide Caratti
Cc: Jamal Hadi Salim, Jiri Pirko, David S. Miller, Jakub Kicinski,
Petr Machata, Linux Kernel Network Developers, Hangbin Liu
On Tue, Nov 23, 2021 at 5:54 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> when the number of DRR classes decreases, the round-robin active list can
> contain elements that have already been freed in ets_qdisc_change(). As a
> consequence, it's possible to see a NULL dereference crash, caused by the
> attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL:
Where exactly is it set to NULL? In line 688?
686 for (i = q->nbands; i < oldbands; i++) {
687 qdisc_put(q->classes[i].qdisc);
688 q->classes[i].qdisc = NULL;
689 q->classes[i].quantum = 0;
690 q->classes[i].deficit = 0;
691 gnet_stats_basic_sync_init(&q->classes[i].bstats);
692 memset(&q->classes[i].qstats, 0,
sizeof(q->classes[i].qstats));
693 }
If so, your patch is not sufficient as the NULL assignment can happen
after the check you add here?
Thanks.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net v2] net/sched: sch_ets: don't peek at classes beyond 'nbands'
2021-11-24 0:44 ` Cong Wang
@ 2021-11-24 11:07 ` Davide Caratti
0 siblings, 0 replies; 3+ messages in thread
From: Davide Caratti @ 2021-11-24 11:07 UTC (permalink / raw)
To: Cong Wang
Cc: Jamal Hadi Salim, Jiri Pirko, David S. Miller, Jakub Kicinski,
Petr Machata, Linux Kernel Network Developers, Hangbin Liu
hello Cong, thanks for reviewing!
On Tue, Nov 23, 2021 at 04:44:46PM -0800, Cong Wang wrote:
> On Tue, Nov 23, 2021 at 5:54 AM Davide Caratti <dcaratti@redhat.com> wrote:
> >
> > when the number of DRR classes decreases, the round-robin active list can
> > contain elements that have already been freed in ets_qdisc_change(). As a
> > consequence, it's possible to see a NULL dereference crash, caused by the
> > attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL:
>
> Where exactly is it set to NULL? In line 688?
yes. At least, yes with the test I'm running to reproduce the crash:
# tc qdisc add dev ddd0 handle 10: parent 1: ets bands 8 strict 4 priomap 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
# mausezahn ddd0 -A 10.10.10.1 -B 10.10.10.2 -c 0 -a own -b 00:c1:a0:c1:a0:00 -t udp &
# tc qdisc change dev ddd0 handle 10: ets bands 4 strict 2 quanta 2500 2500 priomap 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
>
> 686 for (i = q->nbands; i < oldbands; i++) {
> 687 qdisc_put(q->classes[i].qdisc);
> 688 q->classes[i].qdisc = NULL;
> 689 q->classes[i].quantum = 0;
> 690 q->classes[i].deficit = 0;
> 691 gnet_stats_basic_sync_init(&q->classes[i].bstats);
> 692 memset(&q->classes[i].qstats, 0,
> sizeof(q->classes[i].qstats));
> 693 }
>
> If so, your patch is not sufficient as the NULL assignment can happen
> after the check you add here?
I think you are right, thanks for noticing. Probably we can keep this
NULL assignment outside the sch_tree_lock() / sch_tree_unlock(), it's
here since the beginning and it's not harmful.
We can "heal" the active list in ets_qdisc_change() so that it does not
contain elements beyond 'nbands': this is probably better as it doesn't
need to add code to the traffic path.
I will send a v3 soon.
--
davide
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-11-24 11:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23 13:53 [PATCH net v2] net/sched: sch_ets: don't peek at classes beyond 'nbands' Davide Caratti
2021-11-24 0:44 ` Cong Wang
2021-11-24 11:07 ` Davide Caratti
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.