All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
@ 2015-12-29 19:10 Jacob Siverskog
  2015-12-29 19:42 ` Rainer Weikusat
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Siverskog @ 2015-12-29 19:10 UTC (permalink / raw)
  To: netdev
  Cc: David S . Miller, Herbert Xu, Rainer Weikusat, Eric Dumazet,
	Konstantin Khlebnikov, Al Viro, linux-kernel, Jacob Siverskog

This should fix a NULL pointer dereference I encountered (dump
below). Since __skb_unlink is called while walking,
skb_queue_walk_safe should be used.

I investigated the oops and it seems like skb->next was NULL.

Oops:
Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = c5a7c000
[00000004] *pgd=85ab6831, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1] PREEMPT ARM
Modules linked in: xt_tcpudp xt_state xt_nat wlcore_sdio wl18xx wl12xx wlcore rfcomm nf_log_arp nf_log_common iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ipt_REJECT nh
CPU: 0 PID: 11902 Comm: mdnsd Not tainted 4.3.0 #2
Hardware name: Generic AM33XX (Flattened Device Tree)udpv6_recvmsg
task: c7123040 ti: c58ce000 task.ti: c58ce000
PC is at __skb_recv_datagram+0x428/0x598
LR is at udpv6_recvmsg+0x1d0/0x6d0
pc : [<c02fc0a8>]    lr : [<c0398f1c>]    psr: 60000093
sp : c58cfd98  ip : 20000013  fp : c7781040
r10: c58ce000  r9 : 00000000  r8 : c58cfe20
r7 : c58cfe1c  r6 : c06351d0  r5 : c77810ac  r4 : c583eac0
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 20000013
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 85a7c019  DAC: 00000051
Process mdnsd (pid: 11902, stack limit = 0xc58ce208)
Stack: (0xc58cfd98 to 0xc58d0000)
fd80:                                                       00000029 00000008
fda0: c02fbc64 00000000 00000000 00000040 c778115c c03a9ed8 c58cfdc4 00000065
fdc0: 0000005d 000000ff 00000000 00000000 fb000000 00000004 00000000 c0398d4c
fde0: c7781040 c58cff6c 000022f8 be8383ec c06351d0 00000000 c06351d0 c0398f1c
fe00: c58cfe24 c58cfe70 be8383e4 00000040 00000000 c77812a8 00000000 00000000
fe20: 00000000 c58cfe2c 00000004 c0398d4c c58cff6c c6a25c00 c58cfeb8 be8383ec
fe40: be838408 00000000 00000000 c0367a2c 00000000 c58cfe5c c58cff6c 00000000
fe60: c6a25c00 c58cff6c 00000040 c02efff4 00000000 be838874 be8388c0 000022f8
fe80: c0618d78 00000000 ff14338a 9abb1afe b2b617e7 00000ab4 b2b99e16 00000ab4
fea0: c6a1fcc0 c0608374 c0608374 00000000 00000000 00000001 e914000a 00000000
fec0: 000080fe 00000000 ff14338a 9abb1afe 00000004 ffffffff 00000000 c00499ac
fee0: c77ac000 00000002 c063a848 00000000 00000000 c0049c9c 00000000 c5a74c80
ff00: c77ac018 c7123040 00002dfb 0c95f107 c58cff78 05106300 00002dfb 00000051
ff20: be83ac08 00000001 00000018 be83ac08 00000008 00000000 00000004 be8383ec
ff40: c6a25c00 00000000 00000129 c000f3a4 c58ce000 c02f0d74 00000000 be83ac08
ff60: c58cff78 00000000 fffffff7 c58cfeb8 00000000 00000000 00000000 000022f8
ff80: c58cfe78 00000001 be838408 00000400 00000000 00000000 be838890 be83883f
ffa0: ffffffff c000f1e0 be838890 be83883f 0000000e be8383ec 00000000 00000000
ffc0: be838890 be83883f ffffffff 00000129 be838848 00000000 be838844 00000000
ffe0: 0006a228 be8383c8 0000ea5c b6f39fd8 60000010 0000000e 00000000 00000000
[<c02fc0a8>] (__skb_recv_datagram) from [<c0398f1c>] (udpv6_recvmsg+0x1d0/0x6d0)
[<c0398f1c>] (udpv6_recvmsg) from [<c0367a2c>] (inet_recvmsg+0x38/0x4c)
[<c0367a2c>] (inet_recvmsg) from [<c02efff4>] (___sys_recvmsg+0x94/0x170)
[<c02efff4>] (___sys_recvmsg) from [<c02f0d74>] (__sys_recvmsg+0x3c/0x6c)
[<c02f0d74>] (__sys_recvmsg) from [<c000f1e0>] (ret_fast_syscall+0x0/0x3c)
Code: e58b3074 e894000c e5841000 e5841004 (e5823004)
---[ end trace d16f25559f81d2a9 ]---
note: mdnsd[11902] exited with preempt_count 1
Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = c0004000
[00000004] *pgd=00000000
Internal error: Oops: 817 [#2] PREEMPT ARM
Modules linked in: xt_tcpudp xt_state xt_nat wlcore_sdio wl18xx wl12xx wlcore rfcomm nf_log_arp nf_log_common iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ipt_REJECT nh
CPU: 0 PID: 11902 Comm: mdnsd Tainted: G      D         4.3.0 #2
Hardware name: Generic AM33XX (Flattened Device Tree)
task: c7123040 ti: c58ce000 task.ti: c58ce000
PC is at inet_sock_destruct+0x40/0x1d0
LR is at sk_destruct+0x18/0xe0
pc : [<c0367ac0>]    lr : [<c02f3d7c>]    psr: a0000113
sp : c58cfb28  ip : 001180f9  fp : c7123040
r10: c5941a08  r9 : 00000000  r8 : 00000008
r7 : c7051950  r6 : c77810ac  r5 : 00000000  r4 : c7781040
r3 : c583eac0  r2 : 00000000  r1 : 00000000  r0 : c583eac0
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 858fc019  DAC: 00000051
Process mdnsd (pid: 11902, stack limit = 0xc58ce208)
Stack: (0xc58cfb28 to 0xc58d0000)
fb20:                   c7781040 c6a25c00 c6a534c8 c02f3d7c c06351d0 c7781040
fb40: c6a25c00 c0367f0c c037cfa8 c6a25c00 00000000 c02ee878 c02ee8f0 c5941a00
fb60: c6a25c20 c02ee8fc c02ee8f0 c00c2f04 00000000 00000000 c7123338 c5a8a700
fb80: c7123040 c063bd08 c5a74cb4 00000001 c06079d0 c0047600 0000007f 00000000
fba0: c06079d0 60000193 c58cfbb8 c0030308 c063be28 c0087514 c0668f2c c063a9c4
fbc0: c06079d0 c063a9c4 c06079d0 60000193 0000000b 00000001 c02fc0aa c06079d0
fbe0: c0534fa0 c00127e4 c58ce208 0000000b 00000000 00000000 00000008 c063a9c4
fc00: c58ce000 bf000000 658cfd48 33623835 20343730 34393865 63303030 38356520
fc20: 30303134 35652030 30313438 28203430 32383565 34303033 c0002029 c0087514
fc40: 00000000 00000004 00000817 c5a74c80 c58cfd48 00000004 00000000 c58ce000
fc60: c7781040 c0019aa4 00000817 c00164f8 00000000 00000000 00000000 00000000
fc80: 00000000 00000817 c0016288 00000004 c0607b14 c58cfd48 00000000 c58ce000
fca0: c7781040 c00092c0 00000000 c5957a00 c7738cf0 c7744940 c7158000 c7738cf0
fcc0: 00000000 c0321e3c c5aa287c 00000000 00000000 004c9b81 00000000 c7744940
fce0: c58ce000 00000000 00000000 c03080a8 c5957a68 00000001 c06351d0 00000000
fd00: c5941340 c5aa2800 c5aa287c c7744940 0000000e 00000000 c58ce000 00000000
fd20: c5aa287c c037e5f0 00000065 00000000 c02fc0a8 60000093 ffffffff c58cfd7c
fd40: c58cfe20 c0013120 20000013 00000000 00000000 00000000 c583eac0 c77810ac
fd60: c06351d0 c58cfe1c c58cfe20 00000000 c58ce000 c7781040 20000013 c58cfd98
fd80: c0398f1c c02fc0a8 60000093 ffffffff 00000051 00000010 00000029 00000008
fda0: c02fbc64 00000000 00000000 00000040 c778115c c03a9ed8 c58cfdc4 00000065
fdc0: 0000005d 000000ff 00000000 00000000 fb000000 00000004 00000000 c0398d4c
fde0: c7781040 c58cff6c 000022f8 be8383ec c06351d0 00000000 c06351d0 c0398f1c
fe00: c58cfe24 c58cfe70 be8383e4 00000040 00000000 c77812a8 00000000 00000000
fe20: 00000000 c58cfe2c 00000004 c0398d4c c58cff6c c6a25c00 c58cfeb8 be8383ec
fe40: be838408 00000000 00000000 c0367a2c 00000000 c58cfe5c c58cff6c 00000000
fe60: c6a25c00 c58cff6c 00000040 c02efff4 00000000 be838874 be8388c0 000022f8
fe80: c0618d78 00000000 ff14338a 9abb1afe b2b617e7 00000ab4 b2b99e16 00000ab4
fea0: c6a1fcc0 c0608374 c0608374 00000000 00000000 00000001 e914000a 00000000
fec0: 000080fe 00000000 ff14338a 9abb1afe 00000004 ffffffff 00000000 c00499ac
fee0: c77ac000 00000002 c063a848 00000000 00000000 c0049c9c 00000000 c5a74c80
ff00: c77ac018 c7123040 00002dfb 0c95f107 c58cff78 05106300 00002dfb 00000051
ff20: be83ac08 00000001 00000018 be83ac08 00000008 00000000 00000004 be8383ec
ff40: c6a25c00 00000000 00000129 c000f3a4 c58ce000 c02f0d74 00000000 be83ac08
ff60: c58cff78 00000000 fffffff7 c58cfeb8 00000000 00000000 00000000 000022f8
ff80: c58cfe78 00000001 be838408 00000400 00000000 00000000 be838890 be83883f
ffa0: ffffffff c000f1e0 be838890 be83883f 0000000e be8383ec 00000000 00000000
ffc0: be838890 be83883f ffffffff 00000129 be838848 00000000 be838844 00000000
ffe0: 0006a228 be8383c8 0000ea5c b6f39fd8 60000010 0000000e 00000000 00000000
[<c0367ac0>] (inet_sock_destruct) from [<c02f3d7c>] (sk_destruct+0x18/0xe0)
[<c02f3d7c>] (sk_destruct) from [<c0367f0c>] (inet_release+0x44/0x70)
[<c0367f0c>] (inet_release) from [<c02ee878>] (sock_release+0x20/0x98)
[<c02ee878>] (sock_release) from [<c02ee8fc>] (sock_close+0xc/0x14)
[<c02ee8fc>] (sock_close) from [<c00c2f04>] (__fput+0x80/0x1fc)
[<c00c2f04>] (__fput) from [<c0047600>] (task_work_run+0x6c/0x9c)
[<c0047600>] (task_work_run) from [<c0030308>] (do_exit+0x2ac/0x95c)
[<c0030308>] (do_exit) from [<c00127e4>] (die+0x180/0x3b0)
[<c00127e4>] (die) from [<c0019aa4>] (__do_kernel_fault.part.0+0x64/0x1e4)
[<c0019aa4>] (__do_kernel_fault.part.0) from [<c00164f8>] (do_page_fault+0x270/0x298)
[<c00164f8>] (do_page_fault) from [<c00092c0>] (do_DataAbort+0x38/0xb4)
[<c00092c0>] (do_DataAbort) from [<c0013120>] (__dabt_svc+0x40/0x60)
Exception stack(0xc58cfd48 to 0xc58cfd90)
fd40:                   20000013 00000000 00000000 00000000 c583eac0 c77810ac
fd60: c06351d0 c58cfe1c c58cfe20 00000000 c58ce000 c7781040 20000013 c58cfd98
fd80: c0398f1c c02fc0a8 60000093 ffffffff
[<c0013120>] (__dabt_svc) from [<c02fc0a8>] (__skb_recv_datagram+0x428/0x598)
[<c02fc0a8>] (__skb_recv_datagram) from [<c0398f1c>] (udpv6_recvmsg+0x1d0/0x6d0)
[<c0398f1c>] (udpv6_recvmsg) from [<c0367a2c>] (inet_recvmsg+0x38/0x4c)
[<c0367a2c>] (inet_recvmsg) from [<c02efff4>] (___sys_recvmsg+0x94/0x170)
[<c02efff4>] (___sys_recvmsg) from [<c02f0d74>] (__sys_recvmsg+0x3c/0x6c)
[<c02f0d74>] (__sys_recvmsg) from [<c000f1e0>] (ret_fast_syscall+0x0/0x3c)
Code: e5842074 e8930006 e5835000 e5835004 (e5812004)

Signed-off-by: Jacob Siverskog <jacob@teenage.engineering>
---
 net/core/datagram.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index fa9dc64..e8b3cab 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -201,7 +201,7 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
 					struct sk_buff **last)
 {
 	struct sk_buff_head *queue = &sk->sk_receive_queue;
-	struct sk_buff *skb;
+	struct sk_buff *skb, *next;
 	unsigned long cpu_flags;
 	/*
 	 * Caller is allowed not to check sk->sk_err before skb_recv_datagram()
@@ -222,7 +222,7 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
 
 		*last = (struct sk_buff *)queue;
 		spin_lock_irqsave(&queue->lock, cpu_flags);
-		skb_queue_walk(queue, skb) {
+		skb_queue_walk_safe(queue, skb, next) {
 			*last = skb;
 			*peeked = skb->peeked;
 			if (flags & MSG_PEEK) {
-- 
2.6.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-29 19:10 [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram Jacob Siverskog
@ 2015-12-29 19:42 ` Rainer Weikusat
  2015-12-29 20:08   ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Rainer Weikusat @ 2015-12-29 19:42 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: netdev, David S . Miller, Herbert Xu, Rainer Weikusat,
	Eric Dumazet, Konstantin Khlebnikov, Al Viro, linux-kernel

Jacob Siverskog <jacob@teenage.engineering> writes:
> This should fix a NULL pointer dereference I encountered (dump
> below). Since __skb_unlink is called while walking,
> skb_queue_walk_safe should be used.

The code in question is:

skb_queue_walk(queue, skb) {
	*last = skb;
	*peeked = skb->peeked;
	if (flags & MSG_PEEK) {
		if (_off >= skb->len && (skb->len || _off ||
					 skb->peeked)) {
			_off -= skb->len;
			continue;
		}

		skb = skb_set_peeked(skb);
		error = PTR_ERR(skb);
		if (IS_ERR(skb)) {
			spin_unlock_irqrestore(&queue->lock,
					       cpu_flags);
			goto no_packet;
		}
                
		atomic_inc(&skb->users);
	}  else
		__skb_unlink(skb, queue);

	spin_unlock_irqrestore(&queue->lock, cpu_flags);
	*off = _off;
	return skb;
}

__skb_unlink is only called prior to returning from the function.
Consequently, it won't affect the skb_queue_walk code.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-29 19:42 ` Rainer Weikusat
@ 2015-12-29 20:08   ` David Miller
  2015-12-30 11:14     ` Jacob Siverskog
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2015-12-29 20:08 UTC (permalink / raw)
  To: rweikusat
  Cc: jacob, netdev, herbert, edumazet, khlebnikov, viro, linux-kernel

From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Date: Tue, 29 Dec 2015 19:42:36 +0000

> Jacob Siverskog <jacob@teenage.engineering> writes:
>> This should fix a NULL pointer dereference I encountered (dump
>> below). Since __skb_unlink is called while walking,
>> skb_queue_walk_safe should be used.
> 
> The code in question is:
 ...
> __skb_unlink is only called prior to returning from the function.
> Consequently, it won't affect the skb_queue_walk code.

Agreed, this patch doesn't fix anything.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-29 20:08   ` David Miller
@ 2015-12-30 11:14     ` Jacob Siverskog
  2015-12-30 13:26       ` Eric Dumazet
  2015-12-30 15:32       ` Rainer Weikusat
  0 siblings, 2 replies; 22+ messages in thread
From: Jacob Siverskog @ 2015-12-30 11:14 UTC (permalink / raw)
  To: David Miller
  Cc: Rainer Weikusat, netdev, Herbert Xu, Eric Dumazet,
	Konstantin Khlebnikov, Al Viro, LKML

On Tue, Dec 29, 2015 at 9:08 PM, David Miller <davem@davemloft.net> wrote:
> From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
> Date: Tue, 29 Dec 2015 19:42:36 +0000
>
>> Jacob Siverskog <jacob@teenage.engineering> writes:
>>> This should fix a NULL pointer dereference I encountered (dump
>>> below). Since __skb_unlink is called while walking,
>>> skb_queue_walk_safe should be used.
>>
>> The code in question is:
>  ...
>> __skb_unlink is only called prior to returning from the function.
>> Consequently, it won't affect the skb_queue_walk code.
>
> Agreed, this patch doesn't fix anything.

Ok. Thanks for your feedback. How do you believe the issue could be
solved? Investigating it gives:

static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
{
struct sk_buff *next, *prev;

list->qlen--;
     51c: e2433001 sub r3, r3, #1
     520: e58b3074 str r3, [fp, #116] ; 0x74
next   = skb->next;
prev   = skb->prev;
     524: e894000c ldm r4, {r2, r3}
skb->next  = skb->prev = NULL;
     528: e5841000 str r1, [r4]
     52c: e5841004 str r1, [r4, #4]
next->prev = prev;
     530: e5823004 str r3, [r2, #4]                          <--
trapping instruction (r2 NULL)

Register contents:
r7 : c58cfe1c  r6 : c06351d0  r5 : c77810ac  r4 : c583eac0
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 20000013

If I understand this correctly, then r4 = skb, r2 = next, r3 = prev.

Should there be a check for this in __skb_try_recv_datagram?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-30 11:14     ` Jacob Siverskog
@ 2015-12-30 13:26       ` Eric Dumazet
  2015-12-30 14:30         ` Jacob Siverskog
  2015-12-30 15:32       ` Rainer Weikusat
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2015-12-30 13:26 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: David Miller, Rainer Weikusat, netdev, Herbert Xu,
	Konstantin Khlebnikov, Al Viro, LKML

On Wed, Dec 30, 2015 at 6:14 AM, Jacob Siverskog
<jacob@teenage.engineering> wrote:

> Ok. Thanks for your feedback. How do you believe the issue could be
> solved? Investigating it gives:
>
> static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
> {
> struct sk_buff *next, *prev;
>
> list->qlen--;
>      51c: e2433001 sub r3, r3, #1
>      520: e58b3074 str r3, [fp, #116] ; 0x74
> next   = skb->next;
> prev   = skb->prev;
>      524: e894000c ldm r4, {r2, r3}
> skb->next  = skb->prev = NULL;
>      528: e5841000 str r1, [r4]
>      52c: e5841004 str r1, [r4, #4]
> next->prev = prev;
>      530: e5823004 str r3, [r2, #4]                          <--
> trapping instruction (r2 NULL)
>
> Register contents:
> r7 : c58cfe1c  r6 : c06351d0  r5 : c77810ac  r4 : c583eac0
> r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 20000013
>
> If I understand this correctly, then r4 = skb, r2 = next, r3 = prev.
>
> Should there be a check for this in __skb_try_recv_datagram?

At this point corruption already happened.
We can not possibly detect every possible corruption caused by bugs
elsewhere in the kernel and just 'recover' at this point.
We must indeed find the root cause and fix it, instead of trying to hide it.

How often can you trigger this bug ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-30 13:26       ` Eric Dumazet
@ 2015-12-30 14:30         ` Jacob Siverskog
  2015-12-30 14:38           ` Eric Dumazet
  2015-12-30 22:30           ` Cong Wang
  0 siblings, 2 replies; 22+ messages in thread
From: Jacob Siverskog @ 2015-12-30 14:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Rainer Weikusat, netdev, Herbert Xu,
	Konstantin Khlebnikov, Al Viro, LKML

On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Dec 30, 2015 at 6:14 AM, Jacob Siverskog
> <jacob@teenage.engineering> wrote:
>
>> Ok. Thanks for your feedback. How do you believe the issue could be
>> solved? Investigating it gives:
>>
>> static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
>> {
>> struct sk_buff *next, *prev;
>>
>> list->qlen--;
>>      51c: e2433001 sub r3, r3, #1
>>      520: e58b3074 str r3, [fp, #116] ; 0x74
>> next   = skb->next;
>> prev   = skb->prev;
>>      524: e894000c ldm r4, {r2, r3}
>> skb->next  = skb->prev = NULL;
>>      528: e5841000 str r1, [r4]
>>      52c: e5841004 str r1, [r4, #4]
>> next->prev = prev;
>>      530: e5823004 str r3, [r2, #4]                          <--
>> trapping instruction (r2 NULL)
>>
>> Register contents:
>> r7 : c58cfe1c  r6 : c06351d0  r5 : c77810ac  r4 : c583eac0
>> r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 20000013
>>
>> If I understand this correctly, then r4 = skb, r2 = next, r3 = prev.
>>
>> Should there be a check for this in __skb_try_recv_datagram?
>
> At this point corruption already happened.
> We can not possibly detect every possible corruption caused by bugs
> elsewhere in the kernel and just 'recover' at this point.
> We must indeed find the root cause and fix it, instead of trying to hide it.
>
> How often can you trigger this bug ?

Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
few times when bringing up/down network interfaces. Does the trace
give any clue?

[<c02fc0a8>] (__skb_recv_datagram) from [<c0398f1c>] (udpv6_recvmsg+0x1d0/0x6d0)
[<c0398f1c>] (udpv6_recvmsg) from [<c0367a2c>] (inet_recvmsg+0x38/0x4c)
[<c0367a2c>] (inet_recvmsg) from [<c02efff4>] (___sys_recvmsg+0x94/0x170)
[<c02efff4>] (___sys_recvmsg) from [<c02f0d74>] (__sys_recvmsg+0x3c/0x6c)
[<c02f0d74>] (__sys_recvmsg) from [<c000f1e0>] (ret_fast_syscall+0x0/0x3c)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-30 14:30         ` Jacob Siverskog
@ 2015-12-30 14:38           ` Eric Dumazet
  2015-12-30 22:30           ` Cong Wang
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2015-12-30 14:38 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: David Miller, Rainer Weikusat, netdev, Herbert Xu,
	Konstantin Khlebnikov, Al Viro, LKML

On Wed, Dec 30, 2015 at 9:30 AM, Jacob Siverskog
<jacob@teenage.engineering> wrote:
> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:

>> At this point corruption already happened.
>> We can not possibly detect every possible corruption caused by bugs
>> elsewhere in the kernel and just 'recover' at this point.
>> We must indeed find the root cause and fix it, instead of trying to hide it.
>>
>> How often can you trigger this bug ?
>
> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
> few times when bringing up/down network interfaces. Does the trace
> give any clue?
>
> [<c02fc0a8>] (__skb_recv_datagram) from [<c0398f1c>] (udpv6_recvmsg+0x1d0/0x6d0)
> [<c0398f1c>] (udpv6_recvmsg) from [<c0367a2c>] (inet_recvmsg+0x38/0x4c)
> [<c0367a2c>] (inet_recvmsg) from [<c02efff4>] (___sys_recvmsg+0x94/0x170)
> [<c02efff4>] (___sys_recvmsg) from [<c02f0d74>] (__sys_recvmsg+0x3c/0x6c)
> [<c02f0d74>] (__sys_recvmsg) from [<c000f1e0>] (ret_fast_syscall+0x0/0x3c)

Not really : it only shows the point where the corruption is detected,
not the point where the corruption happened.

This might be caused by a netfilter module, a buggy driver... it is
hard to know.

You might add some traces on the skb itself, like its length or/and content.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-30 11:14     ` Jacob Siverskog
  2015-12-30 13:26       ` Eric Dumazet
@ 2015-12-30 15:32       ` Rainer Weikusat
  1 sibling, 0 replies; 22+ messages in thread
From: Rainer Weikusat @ 2015-12-30 15:32 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: David Miller, Rainer Weikusat, netdev, Herbert Xu, Eric Dumazet,
	Konstantin Khlebnikov, Al Viro, LKML

Jacob Siverskog <jacob@teenage.engineering> writes:
> On Tue, Dec 29, 2015 at 9:08 PM, David Miller <davem@davemloft.net> wrote:
>> From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
>> Date: Tue, 29 Dec 2015 19:42:36 +0000
>>
>>> Jacob Siverskog <jacob@teenage.engineering> writes:
>>>> This should fix a NULL pointer dereference I encountered (dump
>>>> below). Since __skb_unlink is called while walking,
>>>> skb_queue_walk_safe should be used.
>>>
>>> The code in question is:
>>  ...
>>> __skb_unlink is only called prior to returning from the function.
>>> Consequently, it won't affect the skb_queue_walk code.
>>
>> Agreed, this patch doesn't fix anything.
>
> Ok. Thanks for your feedback. How do you believe the issue could be
> solved? Investigating it gives:
>
> static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
> {

[...]

> next->prev = prev;
>      530: e5823004 str r3, [r2, #4]                          <--
> trapping instruction (r2 NULL)
>
> Register contents:
> r7 : c58cfe1c  r6 : c06351d0  r5 : c77810ac  r4 : c583eac0
> r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 20000013
>
> If I understand this correctly, then r4 = skb, r2 = next, r3 = prev.

Some additional information which may be helpful: The next->prev = prev
was pretty obvious from the original error message alone: The invalid
access happened at 4 but no register contained 4. Considering that this
is for ARM, this must have been caused by an instruction using an
address of the form

[Rx, #4]

ie, value of register x + 4. And the next->prev = prev is the only
access to something located 4 bytes beyond something else.

> Should there be a check for this in __skb_try_recv_datagram?

These lists are supposed to be circular, ie, the next pointer of the
last element should point to the first and the prev pointer of the first
to the last. If there's an element with ->next == NULL on the list,
something either didn't do inserts correctly or corrupted an originally
intact list.

General advice: The original error occurred with 4.3.0. Had this
happened to me, I'd either tried to locate the error in the same kernel
version or to reproduce the bug with the one I was planning to
modify. Trying to fix a 'strange memory access' error which was observed
with version x.y by modifying version x.z is IMHO needlessly moving on
shaky ground.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-30 14:30         ` Jacob Siverskog
  2015-12-30 14:38           ` Eric Dumazet
@ 2015-12-30 22:30           ` Cong Wang
  2016-01-04  9:10             ` Jacob Siverskog
  1 sibling, 1 reply; 22+ messages in thread
From: Cong Wang @ 2015-12-30 22:30 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: Eric Dumazet, David Miller, Rainer Weikusat, netdev, Herbert Xu,
	Konstantin Khlebnikov, Al Viro, LKML

On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
<jacob@teenage.engineering> wrote:
> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
>> How often can you trigger this bug ?
>
> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
> few times when bringing up/down network interfaces. Does the trace
> give any clue?
>

A little bit. You need to help people to narrow down the problem
because there are too many places using skb->next and skb->prev.

Since you mentioned it seems related to network interface flip,
what network interfaces are you using? What's is your TC setup?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2015-12-30 22:30           ` Cong Wang
@ 2016-01-04  9:10             ` Jacob Siverskog
  2016-01-04 15:25               ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Siverskog @ 2016-01-04  9:10 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, David Miller, Rainer Weikusat, netdev, Herbert Xu,
	Konstantin Khlebnikov, Al Viro, LKML

On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
> <jacob@teenage.engineering> wrote:
>> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
>>> How often can you trigger this bug ?
>>
>> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
>> few times when bringing up/down network interfaces. Does the trace
>> give any clue?
>>
>
> A little bit. You need to help people to narrow down the problem
> because there are too many places using skb->next and skb->prev.
>
> Since you mentioned it seems related to network interface flip,
> what network interfaces are you using? What's is your TC setup?
>
> Thanks.

The system contains only one physical network interface (TI WL1837,
wl18xx module).
The state prior to the crash was as follows:
- One virtual network interface active (as STA, associated with access point)
- Bluetooth (BLE only) active (same physical chip, co-existence,
btwilink/st_drv modules)

Actions made around the time of the crash:
- Bluetooth disabled
- One additional virtual network interface brought up (also as STA)

I believe the crash occurred between these two actions. I just saw
that there are some interesting events in the log prior to the crash:
kernel: Bluetooth: Unable to push skb to HCI core(-6)
kernel: (stc):  proto stack 4's ->recv failed
kernel: (stc): remove_channel_from_table: id 3
kernel: (stc): remove_channel_from_table: id 2
kernel: (stc): remove_channel_from_table: id 4
kernel: (stc):  all chnl_ids unregistered
kernel: (stk) :ldisc_install = 0(stc): st_tty_close

The first print is from btwilink.c. However, I can't see the
connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
6LoWPAN or anything similar).

Thanks, Jacob

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-04  9:10             ` Jacob Siverskog
@ 2016-01-04 15:25               ` Eric Dumazet
  2016-01-04 16:14                 ` Rainer Weikusat
  2016-01-05 11:07                 ` Jacob Siverskog
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2016-01-04 15:25 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
> > <jacob@teenage.engineering> wrote:
> >> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
> >>> How often can you trigger this bug ?
> >>
> >> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
> >> few times when bringing up/down network interfaces. Does the trace
> >> give any clue?
> >>
> >
> > A little bit. You need to help people to narrow down the problem
> > because there are too many places using skb->next and skb->prev.
> >
> > Since you mentioned it seems related to network interface flip,
> > what network interfaces are you using? What's is your TC setup?
> >
> > Thanks.
> 
> The system contains only one physical network interface (TI WL1837,
> wl18xx module).
> The state prior to the crash was as follows:
> - One virtual network interface active (as STA, associated with access point)
> - Bluetooth (BLE only) active (same physical chip, co-existence,
> btwilink/st_drv modules)
> 
> Actions made around the time of the crash:
> - Bluetooth disabled
> - One additional virtual network interface brought up (also as STA)
> 
> I believe the crash occurred between these two actions. I just saw
> that there are some interesting events in the log prior to the crash:
> kernel: Bluetooth: Unable to push skb to HCI core(-6)
> kernel: (stc):  proto stack 4's ->recv failed
> kernel: (stc): remove_channel_from_table: id 3
> kernel: (stc): remove_channel_from_table: id 2
> kernel: (stc): remove_channel_from_table: id 4
> kernel: (stc):  all chnl_ids unregistered
> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
> 
> The first print is from btwilink.c. However, I can't see the
> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
> 6LoWPAN or anything similar).
> 
> Thanks, Jacob

Definitely these details are useful ;)

Could you try :

diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
index 6e3af8b42cdd..0c99a74fb895 100644
--- a/drivers/misc/ti-st/st_core.c
+++ b/drivers/misc/ti-st/st_core.c
@@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
 		skb_queue_purge(&st_gdata->txq);
 		skb_queue_purge(&st_gdata->tx_waitq);
 		kfree_skb(st_gdata->rx_skb);
+		st_gdata->rx_skb = NULL;
 		kfree_skb(st_gdata->tx_skb);
+		st_gdata->tx_skb = NULL;
 		/* TTY ldisc cleanup */
 		err = tty_unregister_ldisc(N_TI_WL);
 		if (err)



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-04 15:25               ` Eric Dumazet
@ 2016-01-04 16:14                 ` Rainer Weikusat
  2016-01-04 16:45                   ` Eric Dumazet
  2016-01-05 11:07                 ` Jacob Siverskog
  1 sibling, 1 reply; 22+ messages in thread
From: Rainer Weikusat @ 2016-01-04 16:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jacob Siverskog, Cong Wang, Eric Dumazet, David Miller,
	Rainer Weikusat, netdev, Herbert Xu, Konstantin Khlebnikov,
	Al Viro, LKML

Eric Dumazet <eric.dumazet@gmail.com> writes:
> On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:

[...]

>> I believe the crash occurred between these two actions. I just saw
>> that there are some interesting events in the log prior to the crash:
>> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>> kernel: (stc):  proto stack 4's ->recv failed
>> kernel: (stc): remove_channel_from_table: id 3
>> kernel: (stc): remove_channel_from_table: id 2
>> kernel: (stc): remove_channel_from_table: id 4
>> kernel: (stc):  all chnl_ids unregistered
>> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
>> 
>> The first print is from btwilink.c. However, I can't see the
>> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
>> 6LoWPAN or anything similar).
>> 
>> Thanks, Jacob
>
> Definitely these details are useful ;)
>
> Could you try :
>
> diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> index 6e3af8b42cdd..0c99a74fb895 100644
> --- a/drivers/misc/ti-st/st_core.c
> +++ b/drivers/misc/ti-st/st_core.c
> @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
>  		skb_queue_purge(&st_gdata->txq);
>  		skb_queue_purge(&st_gdata->tx_waitq);
>  		kfree_skb(st_gdata->rx_skb);
> +		st_gdata->rx_skb = NULL;
>  		kfree_skb(st_gdata->tx_skb);
> +		st_gdata->tx_skb = NULL;
>  		/* TTY ldisc cleanup */
>  		err = tty_unregister_ldisc(N_TI_WL);
>  		if (err)

Hmm ... the code continues with

		err = tty_unregister_ldisc(N_TI_WL);
		if (err)
			pr_err("unable to un-register ldisc %ld", err);
		/* free the global data pointer */
		kfree(st_gdata);

So who would ever see that the rx_skb and tx_skb pointers were cleared
prior to freeing the data structure containing them?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-04 16:14                 ` Rainer Weikusat
@ 2016-01-04 16:45                   ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2016-01-04 16:45 UTC (permalink / raw)
  To: Rainer Weikusat
  Cc: Jacob Siverskog, Cong Wang, Eric Dumazet, David Miller, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Mon, 2016-01-04 at 16:14 +0000, Rainer Weikusat wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
> > On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
> 
> [...]
> 
> >> I believe the crash occurred between these two actions. I just saw
> >> that there are some interesting events in the log prior to the crash:
> >> kernel: Bluetooth: Unable to push skb to HCI core(-6)
> >> kernel: (stc):  proto stack 4's ->recv failed
> >> kernel: (stc): remove_channel_from_table: id 3
> >> kernel: (stc): remove_channel_from_table: id 2
> >> kernel: (stc): remove_channel_from_table: id 4
> >> kernel: (stc):  all chnl_ids unregistered
> >> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
> >> 
> >> The first print is from btwilink.c. However, I can't see the
> >> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
> >> 6LoWPAN or anything similar).
> >> 
> >> Thanks, Jacob
> >
> > Definitely these details are useful ;)
> >
> > Could you try :
> >
> > diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> > index 6e3af8b42cdd..0c99a74fb895 100644
> > --- a/drivers/misc/ti-st/st_core.c
> > +++ b/drivers/misc/ti-st/st_core.c
> > @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
> >  		skb_queue_purge(&st_gdata->txq);
> >  		skb_queue_purge(&st_gdata->tx_waitq);
> >  		kfree_skb(st_gdata->rx_skb);
> > +		st_gdata->rx_skb = NULL;
> >  		kfree_skb(st_gdata->tx_skb);
> > +		st_gdata->tx_skb = NULL;
> >  		/* TTY ldisc cleanup */
> >  		err = tty_unregister_ldisc(N_TI_WL);
> >  		if (err)
> 
> Hmm ... the code continues with
> 
> 		err = tty_unregister_ldisc(N_TI_WL);
> 		if (err)
> 			pr_err("unable to un-register ldisc %ld", err);
> 		/* free the global data pointer */
> 		kfree(st_gdata);
> 
> So who would ever see that the rx_skb and tx_skb pointers were cleared
> prior to freeing the data structure containing them?

This is the theory, but I suspect a use after free.

kfree(st_gdata) does not clear all content with 0, unless you use
special SLUB/SLAB debugging features.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-04 15:25               ` Eric Dumazet
  2016-01-04 16:14                 ` Rainer Weikusat
@ 2016-01-05 11:07                 ` Jacob Siverskog
  2016-01-05 14:14                   ` Eric Dumazet
  1 sibling, 1 reply; 22+ messages in thread
From: Jacob Siverskog @ 2016-01-05 11:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Mon, Jan 4, 2016 at 4:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
>> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
>> > <jacob@teenage.engineering> wrote:
>> >> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
>> >>> How often can you trigger this bug ?
>> >>
>> >> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
>> >> few times when bringing up/down network interfaces. Does the trace
>> >> give any clue?
>> >>
>> >
>> > A little bit. You need to help people to narrow down the problem
>> > because there are too many places using skb->next and skb->prev.
>> >
>> > Since you mentioned it seems related to network interface flip,
>> > what network interfaces are you using? What's is your TC setup?
>> >
>> > Thanks.
>>
>> The system contains only one physical network interface (TI WL1837,
>> wl18xx module).
>> The state prior to the crash was as follows:
>> - One virtual network interface active (as STA, associated with access point)
>> - Bluetooth (BLE only) active (same physical chip, co-existence,
>> btwilink/st_drv modules)
>>
>> Actions made around the time of the crash:
>> - Bluetooth disabled
>> - One additional virtual network interface brought up (also as STA)
>>
>> I believe the crash occurred between these two actions. I just saw
>> that there are some interesting events in the log prior to the crash:
>> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>> kernel: (stc):  proto stack 4's ->recv failed
>> kernel: (stc): remove_channel_from_table: id 3
>> kernel: (stc): remove_channel_from_table: id 2
>> kernel: (stc): remove_channel_from_table: id 4
>> kernel: (stc):  all chnl_ids unregistered
>> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
>>
>> The first print is from btwilink.c. However, I can't see the
>> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
>> 6LoWPAN or anything similar).
>>
>> Thanks, Jacob
>
> Definitely these details are useful ;)
>
> Could you try :
>
> diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> index 6e3af8b42cdd..0c99a74fb895 100644
> --- a/drivers/misc/ti-st/st_core.c
> +++ b/drivers/misc/ti-st/st_core.c
> @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
>                 skb_queue_purge(&st_gdata->txq);
>                 skb_queue_purge(&st_gdata->tx_waitq);
>                 kfree_skb(st_gdata->rx_skb);
> +               st_gdata->rx_skb = NULL;
>                 kfree_skb(st_gdata->tx_skb);
> +               st_gdata->tx_skb = NULL;
>                 /* TTY ldisc cleanup */
>                 err = tty_unregister_ldisc(N_TI_WL);
>                 if (err)
>
>

Sure. Since I don't have a good way to trigger the initial issue, I
can't really know if there is a difference with your patch. However,
normal usage seems to work as expected with your patch. I've tried to
reproduce the initial issue with and without your patch repeatedly for
hours and have not seen any crash in any of the runs so far.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-05 11:07                 ` Jacob Siverskog
@ 2016-01-05 14:14                   ` Eric Dumazet
  2016-01-05 14:34                     ` Jacob Siverskog
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2016-01-05 14:14 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Tue, 2016-01-05 at 12:07 +0100, Jacob Siverskog wrote:
> On Mon, Jan 4, 2016 at 4:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
> >> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
> >> > <jacob@teenage.engineering> wrote:
> >> >> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
> >> >>> How often can you trigger this bug ?
> >> >>
> >> >> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
> >> >> few times when bringing up/down network interfaces. Does the trace
> >> >> give any clue?
> >> >>
> >> >
> >> > A little bit. You need to help people to narrow down the problem
> >> > because there are too many places using skb->next and skb->prev.
> >> >
> >> > Since you mentioned it seems related to network interface flip,
> >> > what network interfaces are you using? What's is your TC setup?
> >> >
> >> > Thanks.
> >>
> >> The system contains only one physical network interface (TI WL1837,
> >> wl18xx module).
> >> The state prior to the crash was as follows:
> >> - One virtual network interface active (as STA, associated with access point)
> >> - Bluetooth (BLE only) active (same physical chip, co-existence,
> >> btwilink/st_drv modules)
> >>
> >> Actions made around the time of the crash:
> >> - Bluetooth disabled
> >> - One additional virtual network interface brought up (also as STA)
> >>
> >> I believe the crash occurred between these two actions. I just saw
> >> that there are some interesting events in the log prior to the crash:
> >> kernel: Bluetooth: Unable to push skb to HCI core(-6)
> >> kernel: (stc):  proto stack 4's ->recv failed
> >> kernel: (stc): remove_channel_from_table: id 3
> >> kernel: (stc): remove_channel_from_table: id 2
> >> kernel: (stc): remove_channel_from_table: id 4
> >> kernel: (stc):  all chnl_ids unregistered
> >> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
> >>
> >> The first print is from btwilink.c. However, I can't see the
> >> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
> >> 6LoWPAN or anything similar).
> >>
> >> Thanks, Jacob
> >
> > Definitely these details are useful ;)
> >
> > Could you try :
> >
> > diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> > index 6e3af8b42cdd..0c99a74fb895 100644
> > --- a/drivers/misc/ti-st/st_core.c
> > +++ b/drivers/misc/ti-st/st_core.c
> > @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
> >                 skb_queue_purge(&st_gdata->txq);
> >                 skb_queue_purge(&st_gdata->tx_waitq);
> >                 kfree_skb(st_gdata->rx_skb);
> > +               st_gdata->rx_skb = NULL;
> >                 kfree_skb(st_gdata->tx_skb);
> > +               st_gdata->tx_skb = NULL;
> >                 /* TTY ldisc cleanup */
> >                 err = tty_unregister_ldisc(N_TI_WL);
> >                 if (err)
> >
> >
> 
> Sure. Since I don't have a good way to trigger the initial issue, I
> can't really know if there is a difference with your patch. However,
> normal usage seems to work as expected with your patch. I've tried to
> reproduce the initial issue with and without your patch repeatedly for
> hours and have not seen any crash in any of the runs so far.
> --

You might build a kernel with KASAN support to get maybe more chances to
trigger the bug.

( https://www.kernel.org/doc/Documentation/kasan.txt )




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-05 14:14                   ` Eric Dumazet
@ 2016-01-05 14:34                     ` Jacob Siverskog
  2016-01-05 14:39                       ` Eric Dumazet
  2016-01-20 16:38                       ` Peter Hurley
  0 siblings, 2 replies; 22+ messages in thread
From: Jacob Siverskog @ 2016-01-05 14:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-01-05 at 12:07 +0100, Jacob Siverskog wrote:
>> On Mon, Jan 4, 2016 at 4:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
>> >> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >> > On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
>> >> > <jacob@teenage.engineering> wrote:
>> >> >> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
>> >> >>> How often can you trigger this bug ?
>> >> >>
>> >> >> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
>> >> >> few times when bringing up/down network interfaces. Does the trace
>> >> >> give any clue?
>> >> >>
>> >> >
>> >> > A little bit. You need to help people to narrow down the problem
>> >> > because there are too many places using skb->next and skb->prev.
>> >> >
>> >> > Since you mentioned it seems related to network interface flip,
>> >> > what network interfaces are you using? What's is your TC setup?
>> >> >
>> >> > Thanks.
>> >>
>> >> The system contains only one physical network interface (TI WL1837,
>> >> wl18xx module).
>> >> The state prior to the crash was as follows:
>> >> - One virtual network interface active (as STA, associated with access point)
>> >> - Bluetooth (BLE only) active (same physical chip, co-existence,
>> >> btwilink/st_drv modules)
>> >>
>> >> Actions made around the time of the crash:
>> >> - Bluetooth disabled
>> >> - One additional virtual network interface brought up (also as STA)
>> >>
>> >> I believe the crash occurred between these two actions. I just saw
>> >> that there are some interesting events in the log prior to the crash:
>> >> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>> >> kernel: (stc):  proto stack 4's ->recv failed
>> >> kernel: (stc): remove_channel_from_table: id 3
>> >> kernel: (stc): remove_channel_from_table: id 2
>> >> kernel: (stc): remove_channel_from_table: id 4
>> >> kernel: (stc):  all chnl_ids unregistered
>> >> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
>> >>
>> >> The first print is from btwilink.c. However, I can't see the
>> >> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
>> >> 6LoWPAN or anything similar).
>> >>
>> >> Thanks, Jacob
>> >
>> > Definitely these details are useful ;)
>> >
>> > Could you try :
>> >
>> > diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
>> > index 6e3af8b42cdd..0c99a74fb895 100644
>> > --- a/drivers/misc/ti-st/st_core.c
>> > +++ b/drivers/misc/ti-st/st_core.c
>> > @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
>> >                 skb_queue_purge(&st_gdata->txq);
>> >                 skb_queue_purge(&st_gdata->tx_waitq);
>> >                 kfree_skb(st_gdata->rx_skb);
>> > +               st_gdata->rx_skb = NULL;
>> >                 kfree_skb(st_gdata->tx_skb);
>> > +               st_gdata->tx_skb = NULL;
>> >                 /* TTY ldisc cleanup */
>> >                 err = tty_unregister_ldisc(N_TI_WL);
>> >                 if (err)
>> >
>> >
>>
>> Sure. Since I don't have a good way to trigger the initial issue, I
>> can't really know if there is a difference with your patch. However,
>> normal usage seems to work as expected with your patch. I've tried to
>> reproduce the initial issue with and without your patch repeatedly for
>> hours and have not seen any crash in any of the runs so far.
>> --
>
> You might build a kernel with KASAN support to get maybe more chances to
> trigger the bug.
>
> ( https://www.kernel.org/doc/Documentation/kasan.txt )
>

Ah. Doesn't seem to be supported on arm(32) unfortunately.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-05 14:34                     ` Jacob Siverskog
@ 2016-01-05 14:39                       ` Eric Dumazet
  2016-01-20 15:06                         ` Jacob Siverskog
  2016-01-20 16:38                       ` Peter Hurley
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2016-01-05 14:39 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Tue, 2016-01-05 at 15:34 +0100, Jacob Siverskog wrote:
> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> >
> > You might build a kernel with KASAN support to get maybe more chances to
> > trigger the bug.
> >
> > ( https://www.kernel.org/doc/Documentation/kasan.txt )
> >
> 
> Ah. Doesn't seem to be supported on arm(32) unfortunately.

Then you could at least use standard debugging features :

CONFIG_SLAB=y
CONFIG_SLABINFO=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

(Or equivalent SLUB options)

and

CONFIG_DEBUG_PAGEALLOC=y

(If arm(32) has CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y)




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-05 14:39                       ` Eric Dumazet
@ 2016-01-20 15:06                         ` Jacob Siverskog
  2016-01-20 15:48                           ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Siverskog @ 2016-01-20 15:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Tue, Jan 5, 2016 at 3:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-01-05 at 15:34 +0100, Jacob Siverskog wrote:
>> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> >
>> > You might build a kernel with KASAN support to get maybe more chances to
>> > trigger the bug.
>> >
>> > ( https://www.kernel.org/doc/Documentation/kasan.txt )
>> >
>>
>> Ah. Doesn't seem to be supported on arm(32) unfortunately.
>
> Then you could at least use standard debugging features :
>
> CONFIG_SLAB=y
> CONFIG_SLABINFO=y
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y
>
> (Or equivalent SLUB options)
>
> and
>
> CONFIG_DEBUG_PAGEALLOC=y
>
> (If arm(32) has CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y)

I tried with those enabled and while toggling power on the Bluetooth
interface I usually get this after a few iterations:
kernel: Bluetooth: Unable to push skb to HCI core(-6)
kernel: (stc):  proto stack 4's ->recv failed
kernel: Slab corruption (Not tainted): skbuff_head_cache start=c08a8a00, len=176
kernel: 0a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b a5  kkkkkkkkkkkkjkk.
kernel: Prev obj: start=c08a8940, len=176
kernel: 000: 00 00 00 00 00 00 00 00 31 73 52 00 43 17 2b 14  ........1sR.C.+.
kernel: 010: 00 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00  ................
kernel: Next obj: start=c08a8ac0, len=176
kernel: 000: 00 00 00 00 00 00 00 00 01 42 f6 50 36 17 2b 14  .........B.P6.+.
kernel: 010: 00 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00  ................

The "Unable to push skb" and "recv failed" lines always appear before
the corruption.

Unfortunately, the corruptions occur also with your patch.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-20 15:06                         ` Jacob Siverskog
@ 2016-01-20 15:48                           ` Eric Dumazet
  2016-01-20 16:17                             ` Jacob Siverskog
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2016-01-20 15:48 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Wed, 2016-01-20 at 16:06 +0100, Jacob Siverskog wrote:
> On Tue, Jan 5, 2016 at 3:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2016-01-05 at 15:34 +0100, Jacob Siverskog wrote:
> >> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >> >
> >> > You might build a kernel with KASAN support to get maybe more chances to
> >> > trigger the bug.
> >> >
> >> > ( https://www.kernel.org/doc/Documentation/kasan.txt )
> >> >
> >>
> >> Ah. Doesn't seem to be supported on arm(32) unfortunately.
> >
> > Then you could at least use standard debugging features :
> >
> > CONFIG_SLAB=y
> > CONFIG_SLABINFO=y
> > CONFIG_DEBUG_SLAB=y
> > CONFIG_DEBUG_SLAB_LEAK=y
> >
> > (Or equivalent SLUB options)
> >
> > and
> >
> > CONFIG_DEBUG_PAGEALLOC=y
> >
> > (If arm(32) has CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y)
> 
> I tried with those enabled and while toggling power on the Bluetooth
> interface I usually get this after a few iterations:
> kernel: Bluetooth: Unable to push skb to HCI core(-6)

Well, this code seems to be quite buggy.

I do not have time to audit it, but 5 minutes are enough to spot 2
issues.

skb, once given to another queue/layer should not be accessed anymore.

diff --git a/drivers/bluetooth/btwilink.c b/drivers/bluetooth/btwilink.c
index 24a652f9252b..2d3092aa6cfe 100644
--- a/drivers/bluetooth/btwilink.c
+++ b/drivers/bluetooth/btwilink.c
@@ -98,6 +98,7 @@ static void st_reg_completion_cb(void *priv_data, char data)
 static long st_receive(void *priv_data, struct sk_buff *skb)
 {
 	struct ti_st *lhst = priv_data;
+	unsigned int len;
 	int err;
 
 	if (!skb)
@@ -109,13 +110,14 @@ static long st_receive(void *priv_data, struct sk_buff *skb)
 	}
 
 	/* Forward skb to HCI core layer */
+	len = skb->len;
 	err = hci_recv_frame(lhst->hdev, skb);
 	if (err < 0) {
 		BT_ERR("Unable to push skb to HCI core(%d)", err);
 		return err;
 	}
 
-	lhst->hdev->stat.byte_rx += skb->len;
+	lhst->hdev->stat.byte_rx += len;
 
 	return 0;
 }
@@ -245,6 +247,7 @@ static int ti_st_send_frame(struct hci_dev *hdev, struct sk_buff *skb)
 {
 	struct ti_st *hst;
 	long len;
+	u8 pkt_type;
 
 	hst = hci_get_drvdata(hdev);
 
@@ -258,6 +261,7 @@ static int ti_st_send_frame(struct hci_dev *hdev, struct sk_buff *skb)
 	 * Freeing skb memory is taken care in shared transport layer,
 	 * so don't free skb memory here.
 	 */
+	pkt_type = hci_skb_pkt_type(skb);
 	len = hst->st_write(skb);
 	if (len < 0) {
 		kfree_skb(skb);
@@ -268,7 +272,7 @@ static int ti_st_send_frame(struct hci_dev *hdev, struct sk_buff *skb)
 
 	/* ST accepted our skb. So, Go ahead and do rest */
 	hdev->stat.byte_tx += len;
-	ti_st_tx_complete(hst, hci_skb_pkt_type(skb));
+	ti_st_tx_complete(hst, pkt_type);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-20 15:48                           ` Eric Dumazet
@ 2016-01-20 16:17                             ` Jacob Siverskog
  2016-01-20 17:02                               ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Siverskog @ 2016-01-20 16:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Wed, Jan 20, 2016 at 4:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2016-01-20 at 16:06 +0100, Jacob Siverskog wrote:
>> On Tue, Jan 5, 2016 at 3:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Tue, 2016-01-05 at 15:34 +0100, Jacob Siverskog wrote:
>> >> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >
>> >> >
>> >> > You might build a kernel with KASAN support to get maybe more chances to
>> >> > trigger the bug.
>> >> >
>> >> > ( https://www.kernel.org/doc/Documentation/kasan.txt )
>> >> >
>> >>
>> >> Ah. Doesn't seem to be supported on arm(32) unfortunately.
>> >
>> > Then you could at least use standard debugging features :
>> >
>> > CONFIG_SLAB=y
>> > CONFIG_SLABINFO=y
>> > CONFIG_DEBUG_SLAB=y
>> > CONFIG_DEBUG_SLAB_LEAK=y
>> >
>> > (Or equivalent SLUB options)
>> >
>> > and
>> >
>> > CONFIG_DEBUG_PAGEALLOC=y
>> >
>> > (If arm(32) has CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y)
>>
>> I tried with those enabled and while toggling power on the Bluetooth
>> interface I usually get this after a few iterations:
>> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>
> Well, this code seems to be quite buggy.
>
> I do not have time to audit it, but 5 minutes are enough to spot 2
> issues.
>
> skb, once given to another queue/layer should not be accessed anymore.
>

Ok. Unfortunately I still see the slab corruption even with your changes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-05 14:34                     ` Jacob Siverskog
  2016-01-05 14:39                       ` Eric Dumazet
@ 2016-01-20 16:38                       ` Peter Hurley
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Hurley @ 2016-01-20 16:38 UTC (permalink / raw)
  To: Jacob Siverskog, Eric Dumazet
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

Hi Jacob,

On 01/05/2016 06:34 AM, Jacob Siverskog wrote:
> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2016-01-05 at 12:07 +0100, Jacob Siverskog wrote:
>>> On Mon, Jan 4, 2016 at 4:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> On Mon, 2016-01-04 at 10:10 +0100, Jacob Siverskog wrote:
>>>>> On Wed, Dec 30, 2015 at 11:30 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>> On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
>>>>>> <jacob@teenage.engineering> wrote:
>>>>>>> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>> How often can you trigger this bug ?
>>>>>>>
>>>>>>> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
>>>>>>> few times when bringing up/down network interfaces. Does the trace
>>>>>>> give any clue?
>>>>>>>
>>>>>>
>>>>>> A little bit. You need to help people to narrow down the problem
>>>>>> because there are too many places using skb->next and skb->prev.
>>>>>>
>>>>>> Since you mentioned it seems related to network interface flip,
>>>>>> what network interfaces are you using? What's is your TC setup?
>>>>>>
>>>>>> Thanks.
>>>>>
>>>>> The system contains only one physical network interface (TI WL1837,
>>>>> wl18xx module).
>>>>> The state prior to the crash was as follows:
>>>>> - One virtual network interface active (as STA, associated with access point)
>>>>> - Bluetooth (BLE only) active (same physical chip, co-existence,
>>>>> btwilink/st_drv modules)
>>>>>
>>>>> Actions made around the time of the crash:
>>>>> - Bluetooth disabled
>>>>> - One additional virtual network interface brought up (also as STA)
>>>>>
>>>>> I believe the crash occurred between these two actions. I just saw
>>>>> that there are some interesting events in the log prior to the crash:
>>>>> kernel: Bluetooth: Unable to push skb to HCI core(-6)
>>>>> kernel: (stc):  proto stack 4's ->recv failed
>>>>> kernel: (stc): remove_channel_from_table: id 3
>>>>> kernel: (stc): remove_channel_from_table: id 2
>>>>> kernel: (stc): remove_channel_from_table: id 4
>>>>> kernel: (stc):  all chnl_ids unregistered
>>>>> kernel: (stk) :ldisc_install = 0(stc): st_tty_close
>>>>>
>>>>> The first print is from btwilink.c. However, I can't see the
>>>>> connection between Bluetooth (BLE) and UDP/IPv6 (we're not using
>>>>> 6LoWPAN or anything similar).
>>>>>
>>>>> Thanks, Jacob
>>>>
>>>> Definitely these details are useful ;)
>>>>
>>>> Could you try :
>>>>
>>>> diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
>>>> index 6e3af8b42cdd..0c99a74fb895 100644
>>>> --- a/drivers/misc/ti-st/st_core.c
>>>> +++ b/drivers/misc/ti-st/st_core.c
>>>> @@ -912,7 +912,9 @@ void st_core_exit(struct st_data_s *st_gdata)
>>>>                 skb_queue_purge(&st_gdata->txq);
>>>>                 skb_queue_purge(&st_gdata->tx_waitq);
>>>>                 kfree_skb(st_gdata->rx_skb);
>>>> +               st_gdata->rx_skb = NULL;
>>>>                 kfree_skb(st_gdata->tx_skb);
>>>> +               st_gdata->tx_skb = NULL;
>>>>                 /* TTY ldisc cleanup */
>>>>                 err = tty_unregister_ldisc(N_TI_WL);
>>>>                 if (err)

FWIW,

You don't need that ti-st junk to get the WL1837 working; the WL1837 only
has BT channels. Unfortunately, that's really all I can say about it; sorry.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram
  2016-01-20 16:17                             ` Jacob Siverskog
@ 2016-01-20 17:02                               ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2016-01-20 17:02 UTC (permalink / raw)
  To: Jacob Siverskog
  Cc: Cong Wang, Eric Dumazet, David Miller, Rainer Weikusat, netdev,
	Herbert Xu, Konstantin Khlebnikov, Al Viro, LKML

On Wed, 2016-01-20 at 17:17 +0100, Jacob Siverskog wrote:
> On Wed, Jan 20, 2016 at 4:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Wed, 2016-01-20 at 16:06 +0100, Jacob Siverskog wrote:
> >> On Tue, Jan 5, 2016 at 3:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Tue, 2016-01-05 at 15:34 +0100, Jacob Siverskog wrote:
> >> >> On Tue, Jan 5, 2016 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> >
> >> >> >
> >> >> > You might build a kernel with KASAN support to get maybe more chances to
> >> >> > trigger the bug.
> >> >> >
> >> >> > ( https://www.kernel.org/doc/Documentation/kasan.txt )
> >> >> >
> >> >>
> >> >> Ah. Doesn't seem to be supported on arm(32) unfortunately.
> >> >
> >> > Then you could at least use standard debugging features :
> >> >
> >> > CONFIG_SLAB=y
> >> > CONFIG_SLABINFO=y
> >> > CONFIG_DEBUG_SLAB=y
> >> > CONFIG_DEBUG_SLAB_LEAK=y
> >> >
> >> > (Or equivalent SLUB options)
> >> >
> >> > and
> >> >
> >> > CONFIG_DEBUG_PAGEALLOC=y
> >> >
> >> > (If arm(32) has CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y)
> >>
> >> I tried with those enabled and while toggling power on the Bluetooth
> >> interface I usually get this after a few iterations:
> >> kernel: Bluetooth: Unable to push skb to HCI core(-6)
> >
> > Well, this code seems to be quite buggy.
> >
> > I do not have time to audit it, but 5 minutes are enough to spot 2
> > issues.
> >
> > skb, once given to another queue/layer should not be accessed anymore.
> >
> 
> Ok. Unfortunately I still see the slab corruption even with your changes.

Patch was only showing potential _reads_ after free, which do not
generally corrupt memory.

As I said, a full audit is needed, and I don't have time for this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-01-20 17:02 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-29 19:10 [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram Jacob Siverskog
2015-12-29 19:42 ` Rainer Weikusat
2015-12-29 20:08   ` David Miller
2015-12-30 11:14     ` Jacob Siverskog
2015-12-30 13:26       ` Eric Dumazet
2015-12-30 14:30         ` Jacob Siverskog
2015-12-30 14:38           ` Eric Dumazet
2015-12-30 22:30           ` Cong Wang
2016-01-04  9:10             ` Jacob Siverskog
2016-01-04 15:25               ` Eric Dumazet
2016-01-04 16:14                 ` Rainer Weikusat
2016-01-04 16:45                   ` Eric Dumazet
2016-01-05 11:07                 ` Jacob Siverskog
2016-01-05 14:14                   ` Eric Dumazet
2016-01-05 14:34                     ` Jacob Siverskog
2016-01-05 14:39                       ` Eric Dumazet
2016-01-20 15:06                         ` Jacob Siverskog
2016-01-20 15:48                           ` Eric Dumazet
2016-01-20 16:17                             ` Jacob Siverskog
2016-01-20 17:02                               ` Eric Dumazet
2016-01-20 16:38                       ` Peter Hurley
2015-12-30 15:32       ` Rainer Weikusat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.