3.6rc6 slab corruption.

* 3.6rc6 slab corruption.
@ 2012-09-18 14:35 Dave Jones
  2012-09-18 18:38 ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Jones @ 2012-09-18 14:35 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Linus Torvalds

I was chasing a networking bug, and had trinity reduced to just making read & setsockopt calls,
and let that run overnight.  I woke up to 800mb of traces from a different bug..

The traces look mostly like this..

=============================================================================
BUG kmalloc-64 (Not tainted): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0xffff88001f4b4970-0xffff88001f4b4977. First byte 0xbb instead of 0xcc 
INFO: Allocated in u32_array_read+0xd1/0x110 age=0 cpu=6 pid=32767
        __slab_alloc+0x516/0x5a5
        __kmalloc+0x213/0x2c0
        u32_array_read+0xd1/0x110
        vfs_read+0xac/0x180
        sys_read+0x4d/0x90
        system_call_fastpath+0x1a/0x1f
INFO: Freed in u32_array_read+0x99/0x110 age=0 cpu=0 pid=32749
        __slab_free+0x3f/0x3bf
        kfree+0x2d5/0x310
        u32_array_read+0x99/0x110
        vfs_read+0xac/0x180
        sys_read+0x4d/0x90
        system_call_fastpath+0x1a/0x1f
INFO: Slab 0xffffea00007d2d00 objects=41 used=14 fp=0xffff88001f4b7410 flags=0x10000000004081
INFO: Object 0xffff88001f4b4930 @offset=2352 fp=0xffff88001f4b7410

Bytes b4 ffff88001f4b4920: 1b 20 1c 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  . ......ZZZZZZZZ
Object ffff88001f4b4930: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff88001f4b4940: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff88001f4b4950: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff88001f4b4960: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
Redzone ffff88001f4b4970: bb bb bb bb bb bb bb bb                          ........
Padding ffff88001f4b4ab0: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
Pid: 32756, comm: trinity-child52 Not tainted 3.6.0-rc6+ #44
Call Trace:
 [<ffffffff811c10ad>] ? print_section+0x3d/0x40
 [<ffffffff811c23fe>] print_trailer+0xfe/0x160
 [<ffffffff811c2592>] check_bytes_and_report+0xe2/0x120
 [<ffffffff81023b79>] ? native_sched_clock+0x19/0x80
 [<ffffffff811c2b5b>] check_object+0x18b/0x250
 [<ffffffff8169b7d7>] free_debug_processing+0xc0/0x1fd
 [<ffffffff812d2e29>] ? u32_array_read+0x99/0x110
 [<ffffffff8169ba5c>] __slab_free+0x3f/0x3bf
 [<ffffffff81365a1c>] ? debug_check_no_obj_freed+0x16c/0x210
 [<ffffffff810db04f>] ? lock_release_holdtime.part.26+0xf/0x180
 [<ffffffff812d2e29>] ? u32_array_read+0x99/0x110
 [<ffffffff811c5725>] kfree+0x2d5/0x310
 [<ffffffff812d2e29>] u32_array_read+0x99/0x110
 [<ffffffff811df88c>] vfs_read+0xac/0x180
 [<ffffffff811df9ad>] sys_read+0x4d/0x90
 [<ffffffff816aea2d>] system_call_fastpath+0x1a/0x1f
FIX kmalloc-64: Restoring 0xffff88001f4b4970-0xffff88001f4b4977=0xcc
=============================================================================

Which looks like we read some file (probably something in sysfs/procfs) that corrupted some internal state.
Any ideas on what I could do to narrow this down ?

The full traces are at http://www.codemonkey.org.uk/junk/slab-corrupt.txt
They vary a little later, but it looks like it's probably all the same problem to me.
Sometimes it flip-flops between "First byte 0xbb instead of 0xcc" and "First byte 0xcc instead of 0xbb"

The one outlier being this weird message..

Sep 18 02:00:13 bitcrush kernel: [36617.487681] hrtimer: interrupt took 242337 ns

Which is weird, but probably unrelated.

	Dave

^ permalink raw reply	[flat|nested] 33+ messages in thread