All of lore.kernel.org
 help / color / mirror / Atom feed
* mm/sched/net: BUG when running simple code
@ 2014-06-13  2:56 ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-06-13  2:56 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML, netdev, Dave Jones

Hi all,

Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
going on. It seems that we get a "unable to handle kernel paging request" when running
rather simple code, and I can't figure out how it would cause it.

The code in question is (in net/netlink/af_netlink.c):

static int netlink_getsockopt(struct socket *sock, int level, int optname,
                              char __user *optval, int __user *optlen)
{
        struct sock *sk = sock->sk;
        struct netlink_sock *nlk = nlk_sk(sk);
        int len, val, err;

        if (level != SOL_NETLINK)
                return -ENOPROTOOPT;

        if (get_user(len, optlen))
                return -EFAULT;
        if (len < 0)  <==== THIS
                return -EINVAL;

The disassembly I got shows:

        if (get_user(len, optlen))
     b1f:       e8 00 00 00 00          callq  b24 <netlink_getsockopt+0x44>
                        b20: R_X86_64_PC32      might_fault-0x4
     b24:       4c 89 e0                mov    %r12,%rax
     b27:       e8 00 00 00 00          callq  b2c <netlink_getsockopt+0x4c>
                        b28: R_X86_64_PC32      __get_user_4-0x4
     b2c:       85 c0                   test   %eax,%eax
     b2e:       74 10                   je     b40 <netlink_getsockopt+0x60>
                return -EFAULT;
     b30:       bb f2 ff ff ff          mov    $0xfffffff2,%ebx
     b35:       e9 06 01 00 00          jmpq   c40 <netlink_getsockopt+0x160>
     b3a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
        if (len < 0)
     b40:       85 d2                   test   %edx,%edx
     b42:       0f 88 f0 00 00 00       js     c38 <netlink_getsockopt+0x158>
                return -EINVAL;

Which agrees with the trace I got:

[  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
[  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[  516.309720] PGD 22031067 PUD 22032063 PMD 8000000020e001e1
[  516.309720] Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  516.309720] Dumping ftrace buffer:
[  516.309720]    (ftrace buffer empty)
[  516.309720] Modules linked in:
[  516.309720] CPU: 11 PID: 9212 Comm: trinity-c11 Tainted: G        W     3.15.0-next-20140612-sasha-00022-g5e4db85-dirty #645
[  516.309720] task: ffff8803fc860000 ti: ffff8803fc85c000 task.ti: ffff8803fc85c000
[  516.309720] RIP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[  516.309720] RSP: 0018:ffff8803fc85fed8  EFLAGS: 00010216
[  516.309720] RAX: ffffffffa0f12560 RBX: 00000000ffffffa4 RCX: 0000000000000003
[  516.309720] RDX: 00000000ffff9002 RSI: 0000000049908020 RDI: ffff88025c16a100
[  516.309720] RBP: ffff8803fc85ff18 R08: 0000000000000001 R09: c900000000fd37ff
[  516.309720] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffffff9002
[  516.309720] R13: ffff88025c16a100 R14: 0000000000000001 R15: ffff88025bfa9bd8
[  516.309720] FS:  00007f54be0a7700(0000) GS:ffff8802c8e00000(0000) knlGS:0000000000000000
[  516.309720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  516.309720] CR2: ffffffffa0f12560 CR3: 000000040b1fb000 CR4: 00000000000006a0
[  516.309720] Stack:
[  516.309720]  ffff8803fc85ff18 ffff8803fc85ff18 ffff8803fc85fef8 8900200549908020
[  516.309720]  ffff8803fc85ff18 ffffffff9ff66470 ffff8803fc85ff18 0000000000000037
[  516.309720]  ffff8803fc85ff78 ffffffff9ff69d26 0000000000000037 0000000000000004
[  516.309720] Call Trace:
[  516.309720] ? sockfd_lookup_light (net/socket.c:457)
[  516.309720] SyS_getsockopt (net/socket.c:1945 net/socket.c:1929)
[  516.309720] tracesys (arch/x86/kernel/entry_64.S:542)
[ 516.309720] Code: b2 fd 85 c0 74 10 bb f2 ff ff ff e9 06 01 00 00 66 0f 1f 44 00 00 85 d2 0f 88 f0 00 00 00 41 83 fd 04 74 42 41 83 fd 05 0f 84 88 <00> 00 00 41 83 fd 03 0f 85 de 00 00 00 83 fa 03 bb ea ff ff ff
All code
========
   0:	b2 fd                	mov    $0xfd,%dl
   2:	85 c0                	test   %eax,%eax
   4:	74 10                	je     0x16
   6:	bb f2 ff ff ff       	mov    $0xfffffff2,%ebx
   b:	e9 06 01 00 00       	jmpq   0x116
  10:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  16:	85 d2                	test   %edx,%edx
  18:*	0f 88 f0 00 00 00    	js     0x10e		<-- trapping instruction
  1e:	41 83 fd 04          	cmp    $0x4,%r13d
  22:	74 42                	je     0x66
  24:	41 83 fd 05          	cmp    $0x5,%r13d
  28:	0f 84 88 00 00 00    	je     0xb6
  2e:	41 83 fd 03          	cmp    $0x3,%r13d
  32:	0f 85 de 00 00 00    	jne    0x116
  38:	83 fa 03             	cmp    $0x3,%edx
  3b:	bb ea ff ff ff       	mov    $0xffffffea,%ebx
	...

Code starting with the faulting instruction
===========================================
   0:	00 00                	add    %al,(%rax)
   2:	00 41 83             	add    %al,-0x7d(%rcx)
   5:	fd                   	std
   6:	03 0f                	add    (%rdi),%ecx
   8:	85 de                	test   %ebx,%esi
   a:	00 00                	add    %al,(%rax)
   c:	00 83 fa 03 bb ea    	add    %al,-0x1544fc06(%rbx)
  12:	ff                   	(bad)
  13:	ff                   	(bad)
  14:	ff 00                	incl   (%rax)
[  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
[  516.309720]  RSP <ffff8803fc85fed8>
[  516.309720] CR2: ffffffffa0f12560

They only theory I had so far is that netlink is a module, and has gone away while the code
was executing, but netlink isn't a module on my kernel.



Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* mm/sched/net: BUG when running simple code
@ 2014-06-13  2:56 ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-06-13  2:56 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML, netdev, Dave Jones

Hi all,

Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
going on. It seems that we get a "unable to handle kernel paging request" when running
rather simple code, and I can't figure out how it would cause it.

The code in question is (in net/netlink/af_netlink.c):

static int netlink_getsockopt(struct socket *sock, int level, int optname,
                              char __user *optval, int __user *optlen)
{
        struct sock *sk = sock->sk;
        struct netlink_sock *nlk = nlk_sk(sk);
        int len, val, err;

        if (level != SOL_NETLINK)
                return -ENOPROTOOPT;

        if (get_user(len, optlen))
                return -EFAULT;
        if (len < 0)  <==== THIS
                return -EINVAL;

The disassembly I got shows:

        if (get_user(len, optlen))
     b1f:       e8 00 00 00 00          callq  b24 <netlink_getsockopt+0x44>
                        b20: R_X86_64_PC32      might_fault-0x4
     b24:       4c 89 e0                mov    %r12,%rax
     b27:       e8 00 00 00 00          callq  b2c <netlink_getsockopt+0x4c>
                        b28: R_X86_64_PC32      __get_user_4-0x4
     b2c:       85 c0                   test   %eax,%eax
     b2e:       74 10                   je     b40 <netlink_getsockopt+0x60>
                return -EFAULT;
     b30:       bb f2 ff ff ff          mov    $0xfffffff2,%ebx
     b35:       e9 06 01 00 00          jmpq   c40 <netlink_getsockopt+0x160>
     b3a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
        if (len < 0)
     b40:       85 d2                   test   %edx,%edx
     b42:       0f 88 f0 00 00 00       js     c38 <netlink_getsockopt+0x158>
                return -EINVAL;

Which agrees with the trace I got:

[  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
[  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[  516.309720] PGD 22031067 PUD 22032063 PMD 8000000020e001e1
[  516.309720] Oops: 0003 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  516.309720] Dumping ftrace buffer:
[  516.309720]    (ftrace buffer empty)
[  516.309720] Modules linked in:
[  516.309720] CPU: 11 PID: 9212 Comm: trinity-c11 Tainted: G        W     3.15.0-next-20140612-sasha-00022-g5e4db85-dirty #645
[  516.309720] task: ffff8803fc860000 ti: ffff8803fc85c000 task.ti: ffff8803fc85c000
[  516.309720] RIP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[  516.309720] RSP: 0018:ffff8803fc85fed8  EFLAGS: 00010216
[  516.309720] RAX: ffffffffa0f12560 RBX: 00000000ffffffa4 RCX: 0000000000000003
[  516.309720] RDX: 00000000ffff9002 RSI: 0000000049908020 RDI: ffff88025c16a100
[  516.309720] RBP: ffff8803fc85ff18 R08: 0000000000000001 R09: c900000000fd37ff
[  516.309720] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffffff9002
[  516.309720] R13: ffff88025c16a100 R14: 0000000000000001 R15: ffff88025bfa9bd8
[  516.309720] FS:  00007f54be0a7700(0000) GS:ffff8802c8e00000(0000) knlGS:0000000000000000
[  516.309720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  516.309720] CR2: ffffffffa0f12560 CR3: 000000040b1fb000 CR4: 00000000000006a0
[  516.309720] Stack:
[  516.309720]  ffff8803fc85ff18 ffff8803fc85ff18 ffff8803fc85fef8 8900200549908020
[  516.309720]  ffff8803fc85ff18 ffffffff9ff66470 ffff8803fc85ff18 0000000000000037
[  516.309720]  ffff8803fc85ff78 ffffffff9ff69d26 0000000000000037 0000000000000004
[  516.309720] Call Trace:
[  516.309720] ? sockfd_lookup_light (net/socket.c:457)
[  516.309720] SyS_getsockopt (net/socket.c:1945 net/socket.c:1929)
[  516.309720] tracesys (arch/x86/kernel/entry_64.S:542)
[ 516.309720] Code: b2 fd 85 c0 74 10 bb f2 ff ff ff e9 06 01 00 00 66 0f 1f 44 00 00 85 d2 0f 88 f0 00 00 00 41 83 fd 04 74 42 41 83 fd 05 0f 84 88 <00> 00 00 41 83 fd 03 0f 85 de 00 00 00 83 fa 03 bb ea ff ff ff
All code
========
   0:	b2 fd                	mov    $0xfd,%dl
   2:	85 c0                	test   %eax,%eax
   4:	74 10                	je     0x16
   6:	bb f2 ff ff ff       	mov    $0xfffffff2,%ebx
   b:	e9 06 01 00 00       	jmpq   0x116
  10:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  16:	85 d2                	test   %edx,%edx
  18:*	0f 88 f0 00 00 00    	js     0x10e		<-- trapping instruction
  1e:	41 83 fd 04          	cmp    $0x4,%r13d
  22:	74 42                	je     0x66
  24:	41 83 fd 05          	cmp    $0x5,%r13d
  28:	0f 84 88 00 00 00    	je     0xb6
  2e:	41 83 fd 03          	cmp    $0x3,%r13d
  32:	0f 85 de 00 00 00    	jne    0x116
  38:	83 fa 03             	cmp    $0x3,%edx
  3b:	bb ea ff ff ff       	mov    $0xffffffea,%ebx
	...

Code starting with the faulting instruction
===========================================
   0:	00 00                	add    %al,(%rax)
   2:	00 41 83             	add    %al,-0x7d(%rcx)
   5:	fd                   	std
   6:	03 0f                	add    (%rdi),%ecx
   8:	85 de                	test   %ebx,%esi
   a:	00 00                	add    %al,(%rax)
   c:	00 83 fa 03 bb ea    	add    %al,-0x1544fc06(%rbx)
  12:	ff                   	(bad)
  13:	ff                   	(bad)
  14:	ff 00                	incl   (%rax)
[  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
[  516.309720]  RSP <ffff8803fc85fed8>
[  516.309720] CR2: ffffffffa0f12560

They only theory I had so far is that netlink is a module, and has gone away while the code
was executing, but netlink isn't a module on my kernel.



Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  2:56 ` Sasha Levin
@ 2014-06-13  3:27   ` Dan Aloni
  -1 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-13  3:27 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
> Hi all,
> 
> Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
> going on. It seems that we get a "unable to handle kernel paging request" when running
> rather simple code, and I can't figure out how it would cause it.
[..]
> Which agrees with the trace I got:
> 
> [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
> [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[..]
> [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
> [  516.309720]  RSP <ffff8803fc85fed8>
> [  516.309720] CR2: ffffffffa0f12560
> 
> They only theory I had so far is that netlink is a module, and has gone away while the code
> was executing, but netlink isn't a module on my kernel.

The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):

    ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space

So seems it was in a module.

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-13  3:27   ` Dan Aloni
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-13  3:27 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
> Hi all,
> 
> Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
> going on. It seems that we get a "unable to handle kernel paging request" when running
> rather simple code, and I can't figure out how it would cause it.
[..]
> Which agrees with the trace I got:
> 
> [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
> [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
[..]
> [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
> [  516.309720]  RSP <ffff8803fc85fed8>
> [  516.309720] CR2: ffffffffa0f12560
> 
> They only theory I had so far is that netlink is a module, and has gone away while the code
> was executing, but netlink isn't a module on my kernel.

The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):

    ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space

So seems it was in a module.

-- 
Dan Aloni

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  3:27   ` Dan Aloni
@ 2014-06-13  4:01     ` Sasha Levin
  -1 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-06-13  4:01 UTC (permalink / raw)
  To: Dan Aloni
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On 06/12/2014 11:27 PM, Dan Aloni wrote:
> On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>> > Hi all,
>> > 
>> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
>> > going on. It seems that we get a "unable to handle kernel paging request" when running
>> > rather simple code, and I can't figure out how it would cause it.
> [..]
>> > Which agrees with the trace I got:
>> > 
>> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
>> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
> [..]
>> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>> > [  516.309720]  RSP <ffff8803fc85fed8>
>> > [  516.309720] CR2: ffffffffa0f12560
>> > 
>> > They only theory I had so far is that netlink is a module, and has gone away while the code
>> > was executing, but netlink isn't a module on my kernel.
> The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
> 
>     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
> 
> So seems it was in a module.

Yup, that's why that theory came up, but when I checked my config:

$ cat .config | grep NETLINK
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_ACCT=y
CONFIG_NETFILTER_NETLINK_QUEUE=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NF_CT_NETLINK=y
CONFIG_NF_CT_NETLINK_TIMEOUT=y
CONFIG_NF_CT_NETLINK_HELPER=y
CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
CONFIG_NETLINK_MMAP=y
CONFIG_NETLINK_DIAG=y
CONFIG_SCSI_NETLINK=y
CONFIG_QUOTA_NETLINK_INTERFACE=y

that theory went away. (also confirmed by not finding a netlink module.)

What about the kernel .text overflowing into the modules space? The loader
checks for that, but can something like that happen after everything is
up and running? I'll look into that tomorrow.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-13  4:01     ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-06-13  4:01 UTC (permalink / raw)
  To: Dan Aloni
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On 06/12/2014 11:27 PM, Dan Aloni wrote:
> On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>> > Hi all,
>> > 
>> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
>> > going on. It seems that we get a "unable to handle kernel paging request" when running
>> > rather simple code, and I can't figure out how it would cause it.
> [..]
>> > Which agrees with the trace I got:
>> > 
>> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
>> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
> [..]
>> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>> > [  516.309720]  RSP <ffff8803fc85fed8>
>> > [  516.309720] CR2: ffffffffa0f12560
>> > 
>> > They only theory I had so far is that netlink is a module, and has gone away while the code
>> > was executing, but netlink isn't a module on my kernel.
> The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
> 
>     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
> 
> So seems it was in a module.

Yup, that's why that theory came up, but when I checked my config:

$ cat .config | grep NETLINK
CONFIG_COMPAT_NETLINK_MESSAGES=y
CONFIG_NETFILTER_NETLINK=y
CONFIG_NETFILTER_NETLINK_ACCT=y
CONFIG_NETFILTER_NETLINK_QUEUE=y
CONFIG_NETFILTER_NETLINK_LOG=y
CONFIG_NF_CT_NETLINK=y
CONFIG_NF_CT_NETLINK_TIMEOUT=y
CONFIG_NF_CT_NETLINK_HELPER=y
CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
CONFIG_NETLINK_MMAP=y
CONFIG_NETLINK_DIAG=y
CONFIG_SCSI_NETLINK=y
CONFIG_QUOTA_NETLINK_INTERFACE=y

that theory went away. (also confirmed by not finding a netlink module.)

What about the kernel .text overflowing into the modules space? The loader
checks for that, but can something like that happen after everything is
up and running? I'll look into that tomorrow.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  4:01     ` Sasha Levin
@ 2014-06-13  4:13       ` Dave Jones
  -1 siblings, 0 replies; 23+ messages in thread
From: Dave Jones @ 2014-06-13  4:13 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dan Aloni, linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	LKML, netdev

On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
 > On 06/12/2014 11:27 PM, Dan Aloni wrote:
 > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
 > >> > Hi all,
 > >> > 
 > >> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
 > >> > going on. It seems that we get a "unable to handle kernel paging request" when running
 > >> > rather simple code, and I can't figure out how it would cause it.
 > > [..]
 > >> > Which agrees with the trace I got:
 > >> > 
 > >> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
 > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
 > > [..]
 > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
 > >> > [  516.309720]  RSP <ffff8803fc85fed8>
 > >> > [  516.309720] CR2: ffffffffa0f12560
 > >> > 
 > >> > They only theory I had so far is that netlink is a module, and has gone away while the code
 > >> > was executing, but netlink isn't a module on my kernel.
 > > The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
 > > 
 > >     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
 > > 
 > > So seems it was in a module.
 > 
 > Yup, that's why that theory came up, but when I checked my config:
 > ... 
 > that theory went away. (also confirmed by not finding a netlink module.)
 > 
 > What about the kernel .text overflowing into the modules space? The loader
 > checks for that, but can something like that happen after everything is
 > up and running? I'll look into that tomorrow.

another theory: Trinity can sometimes generate plausible looking module
addresses and pass those in structs etc.

I wonder if there's somewhere in that path that isn't checking that the address
in the optval it got is actually a userspace address before it tries to write to it.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-13  4:13       ` Dave Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Jones @ 2014-06-13  4:13 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dan Aloni, linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	LKML, netdev

On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
 > On 06/12/2014 11:27 PM, Dan Aloni wrote:
 > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
 > >> > Hi all,
 > >> > 
 > >> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
 > >> > going on. It seems that we get a "unable to handle kernel paging request" when running
 > >> > rather simple code, and I can't figure out how it would cause it.
 > > [..]
 > >> > Which agrees with the trace I got:
 > >> > 
 > >> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
 > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
 > > [..]
 > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
 > >> > [  516.309720]  RSP <ffff8803fc85fed8>
 > >> > [  516.309720] CR2: ffffffffa0f12560
 > >> > 
 > >> > They only theory I had so far is that netlink is a module, and has gone away while the code
 > >> > was executing, but netlink isn't a module on my kernel.
 > > The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
 > > 
 > >     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
 > > 
 > > So seems it was in a module.
 > 
 > Yup, that's why that theory came up, but when I checked my config:
 > ... 
 > that theory went away. (also confirmed by not finding a netlink module.)
 > 
 > What about the kernel .text overflowing into the modules space? The loader
 > checks for that, but can something like that happen after everything is
 > up and running? I'll look into that tomorrow.

another theory: Trinity can sometimes generate plausible looking module
addresses and pass those in structs etc.

I wonder if there's somewhere in that path that isn't checking that the address
in the optval it got is actually a userspace address before it tries to write to it.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  4:01     ` Sasha Levin
@ 2014-06-13  4:55       ` Dan Aloni
  -1 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-13  4:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
> On 06/12/2014 11:27 PM, Dan Aloni wrote:
> > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
> >> > Hi all,
> >> > 
> >> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
> >> > going on. It seems that we get a "unable to handle kernel paging request" when running
> >> > rather simple code, and I can't figure out how it would cause it.
> > [..]
> >> > Which agrees with the trace I got:
> >> > 
> >> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
> >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
> > [..]
> >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
> >> > [  516.309720]  RSP <ffff8803fc85fed8>
> >> > [  516.309720] CR2: ffffffffa0f12560
> >> > 
> >> > They only theory I had so far is that netlink is a module, and has gone away while the code
> >> > was executing, but netlink isn't a module on my kernel.
> > The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
> > 
> >     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
> > 
> > So seems it was in a module.
> 
> Yup, that's why that theory came up, but when I checked my config:
> 
> $ cat .config | grep NETLINK
> CONFIG_COMPAT_NETLINK_MESSAGES=y
> CONFIG_NETFILTER_NETLINK=y
> CONFIG_NETFILTER_NETLINK_ACCT=y
> CONFIG_NETFILTER_NETLINK_QUEUE=y
> CONFIG_NETFILTER_NETLINK_LOG=y
> CONFIG_NF_CT_NETLINK=y
> CONFIG_NF_CT_NETLINK_TIMEOUT=y
> CONFIG_NF_CT_NETLINK_HELPER=y
> CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
> CONFIG_NETLINK_MMAP=y
> CONFIG_NETLINK_DIAG=y
> CONFIG_SCSI_NETLINK=y
> CONFIG_QUOTA_NETLINK_INTERFACE=y
> 
> that theory went away. (also confirmed by not finding a netlink module.)
> 
> What about the kernel .text overflowing into the modules space? The loader
> checks for that, but can something like that happen after everything is
> up and running? I'll look into that tomorrow.

The kernel .text needs to be more than 512MB for the overlap to happen. 

    ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0

Also, it is bizarre that symbol resolution resolved ffffffffa0f12560 to 
a symbol that is in module space where af_netlink.o is surely not because of 
"obj-y := af_netlink.o" in the Makefile. 

What does your /proc/kallsyms show when sorted with regards to the symbols
in question?

Also curious are the addresses you have on the stack:

> [  516.309720] Stack:
> [  516.309720]  ffff8803fc85ff18 ffff8803fc85ff18 ffff8803fc85fef8 8900200549908020
> [  516.309720]  ffff8803fc85ff18 ffffffff9ff66470 ffff8803fc85ff18 0000000000000037
> [  516.309720]  ffff8803fc85ff78 ffffffff9ff69d26 0000000000000037 0000000000000004

0xffffffff9ff69d26 is just a small space before the beginning of the module 
mapping space, at the end of the kernel text mapping. Unless there are 
some tricks on those mappings, they should be unused, or perhaps 
CONFIG_DEBUG_PAGEALLOC is at play here?

And also, the Oops code of 0003 (PF_WRITE and PF_USER) might hint at
what Dave wrote.

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-13  4:55       ` Dan Aloni
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-13  4:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
> On 06/12/2014 11:27 PM, Dan Aloni wrote:
> > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
> >> > Hi all,
> >> > 
> >> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
> >> > going on. It seems that we get a "unable to handle kernel paging request" when running
> >> > rather simple code, and I can't figure out how it would cause it.
> > [..]
> >> > Which agrees with the trace I got:
> >> > 
> >> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
> >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
> > [..]
> >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
> >> > [  516.309720]  RSP <ffff8803fc85fed8>
> >> > [  516.309720] CR2: ffffffffa0f12560
> >> > 
> >> > They only theory I had so far is that netlink is a module, and has gone away while the code
> >> > was executing, but netlink isn't a module on my kernel.
> > The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
> > 
> >     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
> > 
> > So seems it was in a module.
> 
> Yup, that's why that theory came up, but when I checked my config:
> 
> $ cat .config | grep NETLINK
> CONFIG_COMPAT_NETLINK_MESSAGES=y
> CONFIG_NETFILTER_NETLINK=y
> CONFIG_NETFILTER_NETLINK_ACCT=y
> CONFIG_NETFILTER_NETLINK_QUEUE=y
> CONFIG_NETFILTER_NETLINK_LOG=y
> CONFIG_NF_CT_NETLINK=y
> CONFIG_NF_CT_NETLINK_TIMEOUT=y
> CONFIG_NF_CT_NETLINK_HELPER=y
> CONFIG_NETFILTER_NETLINK_QUEUE_CT=y
> CONFIG_NETLINK_MMAP=y
> CONFIG_NETLINK_DIAG=y
> CONFIG_SCSI_NETLINK=y
> CONFIG_QUOTA_NETLINK_INTERFACE=y
> 
> that theory went away. (also confirmed by not finding a netlink module.)
> 
> What about the kernel .text overflowing into the modules space? The loader
> checks for that, but can something like that happen after everything is
> up and running? I'll look into that tomorrow.

The kernel .text needs to be more than 512MB for the overlap to happen. 

    ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0

Also, it is bizarre that symbol resolution resolved ffffffffa0f12560 to 
a symbol that is in module space where af_netlink.o is surely not because of 
"obj-y := af_netlink.o" in the Makefile. 

What does your /proc/kallsyms show when sorted with regards to the symbols
in question?

Also curious are the addresses you have on the stack:

> [  516.309720] Stack:
> [  516.309720]  ffff8803fc85ff18 ffff8803fc85ff18 ffff8803fc85fef8 8900200549908020
> [  516.309720]  ffff8803fc85ff18 ffffffff9ff66470 ffff8803fc85ff18 0000000000000037
> [  516.309720]  ffff8803fc85ff78 ffffffff9ff69d26 0000000000000037 0000000000000004

0xffffffff9ff69d26 is just a small space before the beginning of the module 
mapping space, at the end of the kernel text mapping. Unless there are 
some tricks on those mappings, they should be unused, or perhaps 
CONFIG_DEBUG_PAGEALLOC is at play here?

And also, the Oops code of 0003 (PF_WRITE and PF_USER) might hint at
what Dave wrote.

-- 
Dan Aloni

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  4:55       ` Dan Aloni
@ 2014-06-13  5:26         ` Dan Aloni
  -1 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-13  5:26 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On Fri, Jun 13, 2014 at 07:55:55AM +0300, Dan Aloni wrote:
> > that theory went away. (also confirmed by not finding a netlink module.)
> > 
> > What about the kernel .text overflowing into the modules space? The loader
> > checks for that, but can something like that happen after everything is
> > up and running? I'll look into that tomorrow.
> 
> The kernel .text needs to be more than 512MB for the overlap to happen. 
> 
>     ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0
> 
> Also, it is bizarre that symbol resolution resolved ffffffffa0f12560 to 
> a symbol that is in module space where af_netlink.o is surely not because of 
> "obj-y := af_netlink.o" in the Makefile. 
> 
> What does your /proc/kallsyms show when sorted with regards to the symbols
> in question?
> 
> Also curious are the addresses you have on the stack:
> 
> > [  516.309720] Stack:
> > [  516.309720]  ffff8803fc85ff18 ffff8803fc85ff18 ffff8803fc85fef8 8900200549908020
> > [  516.309720]  ffff8803fc85ff18 ffffffff9ff66470 ffff8803fc85ff18 0000000000000037
> > [  516.309720]  ffff8803fc85ff78 ffffffff9ff69d26 0000000000000037 0000000000000004
>[..]

Oh, just figured about the new kASLR feature that got enabled
recently, it explains the addresses, but there was supposed to be a
line for it in the Oops, so I'm puzzled.

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-13  5:26         ` Dan Aloni
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-13  5:26 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On Fri, Jun 13, 2014 at 07:55:55AM +0300, Dan Aloni wrote:
> > that theory went away. (also confirmed by not finding a netlink module.)
> > 
> > What about the kernel .text overflowing into the modules space? The loader
> > checks for that, but can something like that happen after everything is
> > up and running? I'll look into that tomorrow.
> 
> The kernel .text needs to be more than 512MB for the overlap to happen. 
> 
>     ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0
> 
> Also, it is bizarre that symbol resolution resolved ffffffffa0f12560 to 
> a symbol that is in module space where af_netlink.o is surely not because of 
> "obj-y := af_netlink.o" in the Makefile. 
> 
> What does your /proc/kallsyms show when sorted with regards to the symbols
> in question?
> 
> Also curious are the addresses you have on the stack:
> 
> > [  516.309720] Stack:
> > [  516.309720]  ffff8803fc85ff18 ffff8803fc85ff18 ffff8803fc85fef8 8900200549908020
> > [  516.309720]  ffff8803fc85ff18 ffffffff9ff66470 ffff8803fc85ff18 0000000000000037
> > [  516.309720]  ffff8803fc85ff78 ffffffff9ff69d26 0000000000000037 0000000000000004
>[..]

Oh, just figured about the new kASLR feature that got enabled
recently, it explains the addresses, but there was supposed to be a
line for it in the Oops, so I'm puzzled.

-- 
Dan Aloni

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  4:55       ` Dan Aloni
@ 2014-06-13  5:31         ` Dan Aloni
  -1 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-13  5:31 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On Fri, Jun 13, 2014 at 07:55:55AM +0300, Dan Aloni wrote:
> And also, the Oops code of 0003 (PF_WRITE and PF_USER) might hint at
> what Dave wrote.

Scrape what I wrote about that, it's PF_PROT | PF_WRITE.

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-13  5:31         ` Dan Aloni
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-13  5:31 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra, LKML,
	netdev, Dave Jones

On Fri, Jun 13, 2014 at 07:55:55AM +0300, Dan Aloni wrote:
> And also, the Oops code of 0003 (PF_WRITE and PF_USER) might hint at
> what Dave wrote.

Scrape what I wrote about that, it's PF_PROT | PF_WRITE.

-- 
Dan Aloni

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  4:13       ` Dave Jones
@ 2014-06-13 15:13         ` Sasha Levin
  -1 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-06-13 15:13 UTC (permalink / raw)
  To: Dave Jones, Dan Aloni, linux-mm, Andrew Morton, Ingo Molnar,
	Peter Zijlstra, LKML, netdev

On 06/13/2014 12:13 AM, Dave Jones wrote:
> On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
>  > On 06/12/2014 11:27 PM, Dan Aloni wrote:
>  > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>  > >> > Hi all,
>  > >> > 
>  > >> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
>  > >> > going on. It seems that we get a "unable to handle kernel paging request" when running
>  > >> > rather simple code, and I can't figure out how it would cause it.
>  > > [..]
>  > >> > Which agrees with the trace I got:
>  > >> > 
>  > >> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
>  > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > > [..]
>  > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > >> > [  516.309720]  RSP <ffff8803fc85fed8>
>  > >> > [  516.309720] CR2: ffffffffa0f12560
>  > >> > 
>  > >> > They only theory I had so far is that netlink is a module, and has gone away while the code
>  > >> > was executing, but netlink isn't a module on my kernel.
>  > > The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
>  > > 
>  > >     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
>  > > 
>  > > So seems it was in a module.
>  > 
>  > Yup, that's why that theory came up, but when I checked my config:
>  > ... 
>  > that theory went away. (also confirmed by not finding a netlink module.)
>  > 
>  > What about the kernel .text overflowing into the modules space? The loader
>  > checks for that, but can something like that happen after everything is
>  > up and running? I'll look into that tomorrow.
> 
> another theory: Trinity can sometimes generate plausible looking module
> addresses and pass those in structs etc.
> 
> I wonder if there's somewhere in that path that isn't checking that the address
> in the optval it got is actually a userspace address before it tries to write to it.

This is, the access happened way before touching optval. The only thing that happened
before is reading optlen from userspace, but that happened using get_user() which should
mean that it was safe.

According to that trace, we died when *executing* a piece of code, not when accessing
some other memory. None of the instructions around the instruction we failed on don't
touch memory at all for that matter.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-13 15:13         ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-06-13 15:13 UTC (permalink / raw)
  To: Dave Jones, Dan Aloni, linux-mm, Andrew Morton, Ingo Molnar,
	Peter Zijlstra, LKML, netdev

On 06/13/2014 12:13 AM, Dave Jones wrote:
> On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
>  > On 06/12/2014 11:27 PM, Dan Aloni wrote:
>  > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>  > >> > Hi all,
>  > >> > 
>  > >> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
>  > >> > going on. It seems that we get a "unable to handle kernel paging request" when running
>  > >> > rather simple code, and I can't figure out how it would cause it.
>  > > [..]
>  > >> > Which agrees with the trace I got:
>  > >> > 
>  > >> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
>  > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > > [..]
>  > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > >> > [  516.309720]  RSP <ffff8803fc85fed8>
>  > >> > [  516.309720] CR2: ffffffffa0f12560
>  > >> > 
>  > >> > They only theory I had so far is that netlink is a module, and has gone away while the code
>  > >> > was executing, but netlink isn't a module on my kernel.
>  > > The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
>  > > 
>  > >     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
>  > > 
>  > > So seems it was in a module.
>  > 
>  > Yup, that's why that theory came up, but when I checked my config:
>  > ... 
>  > that theory went away. (also confirmed by not finding a netlink module.)
>  > 
>  > What about the kernel .text overflowing into the modules space? The loader
>  > checks for that, but can something like that happen after everything is
>  > up and running? I'll look into that tomorrow.
> 
> another theory: Trinity can sometimes generate plausible looking module
> addresses and pass those in structs etc.
> 
> I wonder if there's somewhere in that path that isn't checking that the address
> in the optval it got is actually a userspace address before it tries to write to it.

This is, the access happened way before touching optval. The only thing that happened
before is reading optlen from userspace, but that happened using get_user() which should
mean that it was safe.

According to that trace, we died when *executing* a piece of code, not when accessing
some other memory. None of the instructions around the instruction we failed on don't
touch memory at all for that matter.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  4:13       ` Dave Jones
@ 2014-06-17  3:17         ` Sasha Levin
  -1 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-06-17  3:17 UTC (permalink / raw)
  To: Dave Jones, Dan Aloni, linux-mm, Andrew Morton, Ingo Molnar,
	Peter Zijlstra, LKML, netdev

On 06/13/2014 12:13 AM, Dave Jones wrote:
> On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
>  > On 06/12/2014 11:27 PM, Dan Aloni wrote:
>  > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>  > >> > Hi all,
>  > >> > 
>  > >> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
>  > >> > going on. It seems that we get a "unable to handle kernel paging request" when running
>  > >> > rather simple code, and I can't figure out how it would cause it.
>  > > [..]
>  > >> > Which agrees with the trace I got:
>  > >> > 
>  > >> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
>  > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > > [..]
>  > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > >> > [  516.309720]  RSP <ffff8803fc85fed8>
>  > >> > [  516.309720] CR2: ffffffffa0f12560
>  > >> > 
>  > >> > They only theory I had so far is that netlink is a module, and has gone away while the code
>  > >> > was executing, but netlink isn't a module on my kernel.
>  > > The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
>  > > 
>  > >     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
>  > > 
>  > > So seems it was in a module.
>  > 
>  > Yup, that's why that theory came up, but when I checked my config:
>  > ... 
>  > that theory went away. (also confirmed by not finding a netlink module.)
>  > 
>  > What about the kernel .text overflowing into the modules space? The loader
>  > checks for that, but can something like that happen after everything is
>  > up and running? I'll look into that tomorrow.
> 
> another theory: Trinity can sometimes generate plausible looking module
> addresses and pass those in structs etc.
> 
> I wonder if there's somewhere in that path that isn't checking that the address
> in the optval it got is actually a userspace address before it tries to write to it.

It happened again, and this time I've left the kernel addresses in, and it's quite
interesting:

[   88.837926] Call Trace:
[   88.837926]  [<ffffffff9ff6a792>] __sock_create+0x292/0x3c0
[   88.837926]  [<ffffffff9ff6a610>] ? __sock_create+0x110/0x3c0
[   88.837926]  [<ffffffff9ff6a920>] sock_create+0x30/0x40
[   88.837926]  [<ffffffff9ff6ad4c>] SyS_socket+0x2c/0x70
[   88.837926]  [<ffffffffa0561c30>] ? tracesys+0x7e/0xe6
[   88.837926]  [<ffffffffa0561c93>] tracesys+0xe1/0xe6

tracesys() seems to live inside a module space here?


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-17  3:17         ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-06-17  3:17 UTC (permalink / raw)
  To: Dave Jones, Dan Aloni, linux-mm, Andrew Morton, Ingo Molnar,
	Peter Zijlstra, LKML, netdev

On 06/13/2014 12:13 AM, Dave Jones wrote:
> On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
>  > On 06/12/2014 11:27 PM, Dan Aloni wrote:
>  > > On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>  > >> > Hi all,
>  > >> > 
>  > >> > Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
>  > >> > going on. It seems that we get a "unable to handle kernel paging request" when running
>  > >> > rather simple code, and I can't figure out how it would cause it.
>  > > [..]
>  > >> > Which agrees with the trace I got:
>  > >> > 
>  > >> > [  516.309720] BUG: unable to handle kernel paging request at ffffffffa0f12560
>  > >> > [  516.309720] IP: netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > > [..]
>  > >> > [  516.309720] RIP netlink_getsockopt (net/netlink/af_netlink.c:2271)
>  > >> > [  516.309720]  RSP <ffff8803fc85fed8>
>  > >> > [  516.309720] CR2: ffffffffa0f12560
>  > >> > 
>  > >> > They only theory I had so far is that netlink is a module, and has gone away while the code
>  > >> > was executing, but netlink isn't a module on my kernel.
>  > > The RIP - 0xffffffffa0f12560 is in the range (from Documentation/x86/x86_64/mm.txt):
>  > > 
>  > >     ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
>  > > 
>  > > So seems it was in a module.
>  > 
>  > Yup, that's why that theory came up, but when I checked my config:
>  > ... 
>  > that theory went away. (also confirmed by not finding a netlink module.)
>  > 
>  > What about the kernel .text overflowing into the modules space? The loader
>  > checks for that, but can something like that happen after everything is
>  > up and running? I'll look into that tomorrow.
> 
> another theory: Trinity can sometimes generate plausible looking module
> addresses and pass those in structs etc.
> 
> I wonder if there's somewhere in that path that isn't checking that the address
> in the optval it got is actually a userspace address before it tries to write to it.

It happened again, and this time I've left the kernel addresses in, and it's quite
interesting:

[   88.837926] Call Trace:
[   88.837926]  [<ffffffff9ff6a792>] __sock_create+0x292/0x3c0
[   88.837926]  [<ffffffff9ff6a610>] ? __sock_create+0x110/0x3c0
[   88.837926]  [<ffffffff9ff6a920>] sock_create+0x30/0x40
[   88.837926]  [<ffffffff9ff6ad4c>] SyS_socket+0x2c/0x70
[   88.837926]  [<ffffffffa0561c30>] ? tracesys+0x7e/0xe6
[   88.837926]  [<ffffffffa0561c93>] tracesys+0xe1/0xe6

tracesys() seems to live inside a module space here?


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-17  3:17         ` Sasha Levin
@ 2014-06-17  4:30           ` Dan Aloni
  -1 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-17  4:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	LKML, netdev

On Mon, Jun 16, 2014 at 11:17:55PM -0400, Sasha Levin wrote:
> On 06/13/2014 12:13 AM, Dave Jones wrote:
> > On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
> > another theory: Trinity can sometimes generate plausible looking module
> > addresses and pass those in structs etc.
> > 
> > I wonder if there's somewhere in that path that isn't checking that the address
> > in the optval it got is actually a userspace address before it tries to write to it.
> 
> It happened again, and this time I've left the kernel addresses in, and it's quite
> interesting:
> 
> [   88.837926] Call Trace:
> [   88.837926]  [<ffffffff9ff6a792>] __sock_create+0x292/0x3c0
> [   88.837926]  [<ffffffff9ff6a610>] ? __sock_create+0x110/0x3c0
> [   88.837926]  [<ffffffff9ff6a920>] sock_create+0x30/0x40
> [   88.837926]  [<ffffffff9ff6ad4c>] SyS_socket+0x2c/0x70
> [   88.837926]  [<ffffffffa0561c30>] ? tracesys+0x7e/0xe6
> [   88.837926]  [<ffffffffa0561c93>] tracesys+0xe1/0xe6
> 
> tracesys() seems to live inside a module space here?

I think it's more likely kASLR. The Documentation/x86/x86_64/mm.txt doc needs updating.

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-06-17  4:30           ` Dan Aloni
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Aloni @ 2014-06-17  4:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Jones, linux-mm, Andrew Morton, Ingo Molnar, Peter Zijlstra,
	LKML, netdev

On Mon, Jun 16, 2014 at 11:17:55PM -0400, Sasha Levin wrote:
> On 06/13/2014 12:13 AM, Dave Jones wrote:
> > On Fri, Jun 13, 2014 at 12:01:37AM -0400, Sasha Levin wrote:
> > another theory: Trinity can sometimes generate plausible looking module
> > addresses and pass those in structs etc.
> > 
> > I wonder if there's somewhere in that path that isn't checking that the address
> > in the optval it got is actually a userspace address before it tries to write to it.
> 
> It happened again, and this time I've left the kernel addresses in, and it's quite
> interesting:
> 
> [   88.837926] Call Trace:
> [   88.837926]  [<ffffffff9ff6a792>] __sock_create+0x292/0x3c0
> [   88.837926]  [<ffffffff9ff6a610>] ? __sock_create+0x110/0x3c0
> [   88.837926]  [<ffffffff9ff6a920>] sock_create+0x30/0x40
> [   88.837926]  [<ffffffff9ff6ad4c>] SyS_socket+0x2c/0x70
> [   88.837926]  [<ffffffffa0561c30>] ? tracesys+0x7e/0xe6
> [   88.837926]  [<ffffffffa0561c93>] tracesys+0xe1/0xe6
> 
> tracesys() seems to live inside a module space here?

I think it's more likely kASLR. The Documentation/x86/x86_64/mm.txt doc needs updating.

-- 
Dan Aloni

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-06-13  2:56 ` Sasha Levin
  (?)
  (?)
@ 2014-07-08 14:51 ` Peter Zijlstra
  2014-07-08 15:25     ` Sasha Levin
  -1 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-07-08 14:51 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Ingo Molnar, LKML, netdev, Dave Jones

[-- Attachment #1: Type: text/plain, Size: 447 bytes --]

On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
> Hi all,
> 
> Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's
> going on. It seems that we get a "unable to handle kernel paging request" when running
> rather simple code, and I can't figure out how it would cause it.
> 

Are you running on AMD hardware? If so; check out this thread:

  http://marc.info/?i=53B02CEB.7010607@web.de

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
  2014-07-08 14:51 ` Peter Zijlstra
@ 2014-07-08 15:25     ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-07-08 15:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mm, Andrew Morton, Ingo Molnar, LKML, netdev, Dave Jones

On 07/08/2014 10:51 AM, Peter Zijlstra wrote:
> On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>> Hi all,
>> 
>> Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's going on. It seems that we get a "unable to handle kernel paging request" when running rather simple code, and I can't figure out how it would cause it.
>> 
> 
> Are you running on AMD hardware? If so; check out this thread:
> 
> http://marc.info/?i=53B02CEB.7010607@web.de
> 

Unfortunately (luckily?) it's all Intel over here.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mm/sched/net: BUG when running simple code
@ 2014-07-08 15:25     ` Sasha Levin
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2014-07-08 15:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mm, Andrew Morton, Ingo Molnar, LKML, netdev, Dave Jones

On 07/08/2014 10:51 AM, Peter Zijlstra wrote:
> On Thu, Jun 12, 2014 at 10:56:16PM -0400, Sasha Levin wrote:
>> Hi all,
>> 
>> Okay, I'm really lost. I got the following when fuzzing, and can't really explain what's going on. It seems that we get a "unable to handle kernel paging request" when running rather simple code, and I can't figure out how it would cause it.
>> 
> 
> Are you running on AMD hardware? If so; check out this thread:
> 
> http://marc.info/?i=53B02CEB.7010607@web.de
> 

Unfortunately (luckily?) it's all Intel over here.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-07-08 15:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-13  2:56 mm/sched/net: BUG when running simple code Sasha Levin
2014-06-13  2:56 ` Sasha Levin
2014-06-13  3:27 ` Dan Aloni
2014-06-13  3:27   ` Dan Aloni
2014-06-13  4:01   ` Sasha Levin
2014-06-13  4:01     ` Sasha Levin
2014-06-13  4:13     ` Dave Jones
2014-06-13  4:13       ` Dave Jones
2014-06-13 15:13       ` Sasha Levin
2014-06-13 15:13         ` Sasha Levin
2014-06-17  3:17       ` Sasha Levin
2014-06-17  3:17         ` Sasha Levin
2014-06-17  4:30         ` Dan Aloni
2014-06-17  4:30           ` Dan Aloni
2014-06-13  4:55     ` Dan Aloni
2014-06-13  4:55       ` Dan Aloni
2014-06-13  5:26       ` Dan Aloni
2014-06-13  5:26         ` Dan Aloni
2014-06-13  5:31       ` Dan Aloni
2014-06-13  5:31         ` Dan Aloni
2014-07-08 14:51 ` Peter Zijlstra
2014-07-08 15:25   ` Sasha Levin
2014-07-08 15:25     ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.