For some time I've been getting weird looking spinlock hangs during bootup - which bug I was unable to reproduce on the mainline kernel, until today. The lockup looks like this, reproduced on vanilla v3.8-rc7: Setting up Logical Volume Management: [ 13.140000] BUG: spinlock lockup suspected on CPU#1, lvm.static/139 [ 13.140000] BUG: spinlock lockup suspected on CPU#1, lvm.static/139 [ 13.140000] lock: 0x97fe9fc0, .magic: dead4ead, .owner: /-1, .owner_cpu: -1 [ 13.140000] Pid: 139, comm: lvm.static Not tainted 3.8.0-rc7 #216702 [ 13.140000] Call Trace: [ 13.140000] [<792b5e66>] spin_dump+0x73/0x7d [ 13.140000] [<7916a347>] do_raw_spin_lock+0xb2/0xe8 [ 13.140000] [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e [ 13.140000] [<790391e8>] ? prepare_to_wait+0x18/0x57 [ 13.140000] [<790391e8>] prepare_to_wait+0x18/0x57 [ 13.140000] [<7914a320>] ? generic_make_request+0x80/0xb5 [ 13.140000] [<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e [ 13.140000] [<792b7158>] __wait_on_bit+0x20/0x5f [ 13.140000] [<7914a427>] ? submit_bio+0xd2/0xfb [ 13.140000] [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c [ 13.140000] [<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e [ 13.140000] [<79039086>] ? autoremove_wake_function+0x31/0x31 [ 13.140000] [<790e4eff>] __wait_on_buffer+0x1b/0x1e [ 13.140000] [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254 [ 13.140000] [<7911015f>] ext3_iget+0x45/0x346 [ 13.140000] [<79114e03>] ext3_lookup+0x97/0xa6 [ 13.140000] [<790c9c0e>] lookup_real+0x20/0x35 [ 13.140000] [<790ca1d2>] __lookup_hash+0x2a/0x31 [ 13.140000] [<790caa0e>] lookup_slow+0x36/0x8c [ 13.140000] [<790cbdda>] path_lookupat+0xf9/0x5c5 [ 13.140000] [<790cc2c5>] filename_lookup+0x1f/0x84 [ 13.140000] [<790cde98>] user_path_at_empty+0x3f/0x65 [ 13.140000] [<790cdecb>] user_path_at+0xd/0xf [ 13.140000] [<790c6585>] vfs_fstatat+0x40/0x88 [ 13.140000] [<7903f844>] ? lg_local_unlock+0x31/0x47 [ 13.140000] [<790c65f8>] vfs_stat+0x13/0x15 [ 13.140000] [<790c67e2>] sys_stat64+0x11/0x22 [ 13.140000] [<790c3c47>] ? __fput+0x187/0x1a0 [ 13.140000] [<792b9d37>] ? restore_all+0xf/0xf [ 13.140000] [<79165bb4>] ? trace_hardirqs_on_thunk+0xc/0x10 [ 13.140000] [<792b9d04>] syscall_call+0x7/0xb [ 13.140000] sending NMI to all CPUs: NOTE: the reason why I was unable to reproduce it on mainline was that in my -tip testing I randomize not only .configs but also commonly used boot parameters. It turns out that in this particular case the randomized boot parameters appear to make a difference: CONFIG_CMDLINE="nmi_watchdog=0 nolapic_timer hpet=disable idle=poll highmem=512m acpi=off" (config attached.) Removing that line makes the kernel boot. Either due to one of those parameters turning on a broken model of timing or memory management (such as highmem - this is a 32-bit config), or just due to some other random detail: timing/races or kernel image layout. (This also makes me suspect that bisection might not be very useful.) Due to this uncertainty the range of suspected subsystems is pretty large: it could be ext3, VFS, block IO, timers, locking, scheduler or x86/mm - or something entirely different. I saw this lockup about a few dozen times, in the past few weeks. It almost always had a very similar pattern: spinlock lockup and ext3_iget(). The NMI dump from the other CPU is not very helpful: [ 118.510000] Call Trace: [ 118.510000] [<7900844b>] cpu_idle+0x86/0xb4 [ 118.510000] [<792a91df>] rest_init+0x103/0x108 [ 118.510000] [<794558cc>] start_kernel+0x2c7/0x2cc [ 118.510000] [<7945528e>] i386_start_kernel+0x44/0x46 (the system has two cores.) I am about 80% certain that the bug went upstream sometime during the v3.8 merge window, or shortly afterwards. I never triggered this lockup before the merge window - while afterwards it triggered once every day or so. Full bootlog attached. (ignore the double lines in the oops output, caused by double consoles). Any ideas, anyone? Thanks, Ingo