linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
@ 2008-08-21 17:28 Vegard Nossum
  2008-08-21 18:10 ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Vegard Nossum @ 2008-08-21 17:28 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List

On Thu, Aug 21, 2008 at 6:49 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> But I think it would be a lot more useful to see _everything_. Why
> doesn't output to ttyS0 work? It works with tty0... I have tried
> various combinations (leaving out console=tty0, leaving out
> earlyprintk, using netconsole, kexec crashdump), but nothing seems to
> be able to help. Do you have any suggestions?

I did it!

With a little debug patch, I got the oops to ttyS0, but with each
printk() duplicated (I removed some of them). This is from resume:

BUG: unable to handle kernel BUG: unable to handle kernel paging
requestpaging request at 00200200
 at 00200200
IP:IP: [<c038accc>] list_del+0xc/0x90
 [<c038accc>] list_del+0xc/0x90
*pdpt = 00000000331a0001 *pde = 0000000000000000
Oops: 0000 [#1] Oops: 0000 [#1] PREEMPT PREEMPT SMP SMP
DEBUG_PAGEALLOCDEBUG_PAGEALLOC
Pid: 3473, comm: bash Not tainted (2.6.27-rc4-00003-ga798564-dirty #30)
EIP: 0060:[<c038accc>] EFLAGS: 00210082 CPU: 0
EIP is at list_del+0xc/0x90
EAX: 00200200 EBX: f77fcc00 ECX: 00000001 EDX: f31a6000
ESI: c088e814 EDI: c088e83c EBP: f31a7be8 ESP: f31a7bd0
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 3473, ti=f31a6000 task=f30ea700 task.ti=f31a6000)
Stack: Stack: 00000000 00000000 00000002 00000002 00000000 00000000
c01b39be c01b39be f77cefa0 f77cefa0 f77fcbd0 f77fcbd0 f31a7bfc
f31a7bfc c01b39ef c01b39ef
              f77cefa0 f77cefa0 00000000 00000000 c1133598 c1133598
f31a7c3c f31a7c3c c01b3ebc c01b3ebc c0b530c0 c0b530c0 f31a7c18
f31a7c18 c015b60e c015b60e
              f31a7c18 f31a7c18 ffffffff ffffffff 00000020 00000020
c088e800 c088e800 00200046 00200046 00000000 00000000 00000000
00000000 00000000 00000000
Call Trace:
 [<c01b39be>]  [<c01b39be>] ? ? get_partial_node+0x3e/0xc0
 [<c01b39ef>]  [<c01b39ef>] ? ? get_partial_node+0x6f/0xc0
 [<c01b3ebc>]  [<c01b3ebc>] ? ? __slab_alloc+0x11c/0x4e0
 [<c015b60e>]  [<c015b60e>] ? ? get_lock_stats+0x1e/0x50
 [<c01b5ffc>]  [<c01b5ffc>] ? ? __kmalloc+0x11c/0x130
 [<c03ceeb2>]  [<c03ceeb2>] ? ? tty_buffer_request_room+0xe2/0x130
 [<c03ceeb2>]  [<c03ceeb2>] ? ? tty_buffer_request_room+0xe2/0x130
 [<c03ceeb2>]  [<c03ceeb2>] ? ? tty_buffer_request_room+0xe2/0x130
 [<c03d131d>]  [<c03d131d>] ? ? tty_insert_flip_string_flags+0x2d/0xa0
 [<c03efcf1>]  [<c03efcf1>] ? ? receive_chars+0x161/0x290
 [<c03f0d24>]  [<c03f0d24>] ? ? serial8250_interrupt+0x134/0x150
 [<c017db28>]  [<c017db28>] ? ? handle_IRQ_event+0x28/0x70
 [<c017f0ef>]  [<c017f0ef>] ? ? handle_edge_irq+0xaf/0x140
 [<c0108138>]  [<c0108138>] ? ? do_IRQ+0x48/0xa0
 [<c037b7a4>]  [<c037b7a4>] ? ? trace_hardirqs_off_thunk+0xc/0x18
 [<c0105a3c>]  [<c0105a3c>] ? ? common_interrupt+0x28/0x30
 [<c013b2d1>]  [<c013b2d1>] ? ? vprintk+0x151/0x3c0
 [<c015e884>]  [<c015e884>] ? ? trace_hardirqs_on_caller+0xd4/0x160
 [<c015e91b>]  [<c015e91b>] ? ? trace_hardirqs_on+0xb/0x10
 [<c0687983>]  [<c0687983>] ? ? _spin_unlock_irqrestore+0x43/0x70
 [<c038ef8b>]  [<c038ef8b>] ? ? pci_bus_read_config_byte+0x5b/0x70
 [<c013b55b>]  [<c013b55b>] ? ? printk+0x1b/0x20
 [<c059a748>]  [<c059a748>] ? ? pcibios_set_master+0x68/0xa0
 [<c03925c4>]  [<c03925c4>] ? ? pci_set_master+0x54/0x60
 [<c0513d9c>]  [<c0513d9c>] ? ? usb_hcd_pci_resume+0x4c/0xf0
 [<c015e91b>]  [<c015e91b>] ? ? trace_hardirqs_on+0xb/0x10
 [<c0393cf6>]  [<c0393cf6>] ? ? pci_legacy_resume+0x16/0x30
 [<c0393d5a>]  [<c0393d5a>] ? ? pci_pm_restore+0x4a/0x60
 [<c03f83e5>]  [<c03f83e5>] ? ? pm_op+0x115/0x130
 [<c03f8986>]  [<c03f8986>] ? ? device_resume+0xd6/0x380
 [<c0168791>]  [<c0168791>] ? ? hibernation_snapshot+0xa1/0x220
 [<c013b55b>]  [<c013b55b>] ? ? printk+0x1b/0x20
 [<c01689f0>]  [<c01689f0>] ? ? hibernate+0xe0/0x180
 [<c01674a0>]  [<c01674a0>] ? ? state_store+0x0/0xd0
 [<c016755f>]  [<c016755f>] ? ? state_store+0xbf/0xd0
 [<c01674a0>]  [<c01674a0>] ? ? state_store+0x0/0xd0
 [<c0375ef4>]  [<c0375ef4>] ? ? kobj_attr_store+0x24/0x30
 [<c01fa432>]  [<c01fa432>] ? ? sysfs_write_file+0xa2/0x100
 [<c01bbf06>]  [<c01bbf06>] ? ? vfs_write+0x96/0x130
 [<c01fa390>]  [<c01fa390>] ? ? sysfs_write_file+0x0/0x100
 [<c01bc44d>]  [<c01bc44d>] ? ? sys_write+0x3d/0x70
 [<c0104f3b>]  [<c0104f3b>] ? ? sysenter_do_call+0x12/0x3f
 =======================
Code: Code: e8 e8 01 01 89 89 44 44 24 24 04 04 e8 e8 94 94 08 08 db
db ff ff 8b 8b 55 55 04 04 b8 b8 80 80 45 45 7a 7a c0 c0 e8 e8 e7 e7
c2 c2 dd dd ff ff e8 e8
 a2 a2 c7 c7 d7 d7 ff ff eb eb a9 a9 55 55 89 89 e5 e5 53 53 89 89 c3
c3 83 83 ec ec 14 14 8b 8b 40 40 04 04 <8b> <8b> 00 00 39 39 d8 d8 75
75 24 24 8b 8b 13 13
8b 8b 42 42 04 04 39 39 d8 d8 75 75 41 41 8b 8b 43 43 04 04 89 89 42 42 04 04
EIP: [<c038accc>] EIP: [<c038accc>] list_del+0xc/0x90list_del+0xc/0x90
SS:ESP 0068:f31a7bd0
Kernel panic - not syncing: Fatal exception in interrupt

The EIP corresponds to the first line of list_del, lib/list_debug.c:46:

void list_del(struct list_head *entry)
{
        WARN(entry->prev->next != entry,

I actually have one theory: Data arrives on the serial port at the
wrong moment, so interrupt happens before everything is
restored/resumed. You can see the interrupt in the stack trace. Does
this seem possible?


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 17:28 latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?) Vegard Nossum
@ 2008-08-21 18:10 ` Rafael J. Wysocki
  2008-08-21 18:21   ` Vegard Nossum
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2008-08-21 18:10 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Linux Kernel Mailing List

On Thursday, 21 of August 2008, Vegard Nossum wrote:
> On Thu, Aug 21, 2008 at 6:49 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > But I think it would be a lot more useful to see _everything_. Why
> > doesn't output to ttyS0 work? It works with tty0... I have tried
> > various combinations (leaving out console=tty0, leaving out
> > earlyprintk, using netconsole, kexec crashdump), but nothing seems to
> > be able to help. Do you have any suggestions?
> 
> I did it!
> 
> With a little debug patch, I got the oops to ttyS0,

What exactly did you do to achieve that?

> but with each printk() duplicated (I removed some of them). This is from resume:
> 
> BUG: unable to handle kernel BUG: unable to handle kernel paging
> requestpaging request at 00200200
>  at 00200200
> IP:IP: [<c038accc>] list_del+0xc/0x90
>  [<c038accc>] list_del+0xc/0x90
> *pdpt = 00000000331a0001 *pde = 0000000000000000
> Oops: 0000 [#1] Oops: 0000 [#1] PREEMPT PREEMPT SMP SMP
> DEBUG_PAGEALLOCDEBUG_PAGEALLOC
> Pid: 3473, comm: bash Not tainted (2.6.27-rc4-00003-ga798564-dirty #30)
> EIP: 0060:[<c038accc>] EFLAGS: 00210082 CPU: 0
> EIP is at list_del+0xc/0x90
> EAX: 00200200 EBX: f77fcc00 ECX: 00000001 EDX: f31a6000
> ESI: c088e814 EDI: c088e83c EBP: f31a7be8 ESP: f31a7bd0
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process bash (pid: 3473, ti=f31a6000 task=f30ea700 task.ti=f31a6000)
> Stack: Stack: 00000000 00000000 00000002 00000002 00000000 00000000
> c01b39be c01b39be f77cefa0 f77cefa0 f77fcbd0 f77fcbd0 f31a7bfc
> f31a7bfc c01b39ef c01b39ef
>               f77cefa0 f77cefa0 00000000 00000000 c1133598 c1133598
> f31a7c3c f31a7c3c c01b3ebc c01b3ebc c0b530c0 c0b530c0 f31a7c18
> f31a7c18 c015b60e c015b60e
>               f31a7c18 f31a7c18 ffffffff ffffffff 00000020 00000020
> c088e800 c088e800 00200046 00200046 00000000 00000000 00000000
> 00000000 00000000 00000000
> Call Trace:
>  [<c01b39be>]  [<c01b39be>] ? ? get_partial_node+0x3e/0xc0
>  [<c01b39ef>]  [<c01b39ef>] ? ? get_partial_node+0x6f/0xc0
>  [<c01b3ebc>]  [<c01b3ebc>] ? ? __slab_alloc+0x11c/0x4e0
>  [<c015b60e>]  [<c015b60e>] ? ? get_lock_stats+0x1e/0x50
>  [<c01b5ffc>]  [<c01b5ffc>] ? ? __kmalloc+0x11c/0x130
>  [<c03ceeb2>]  [<c03ceeb2>] ? ? tty_buffer_request_room+0xe2/0x130
>  [<c03ceeb2>]  [<c03ceeb2>] ? ? tty_buffer_request_room+0xe2/0x130
>  [<c03ceeb2>]  [<c03ceeb2>] ? ? tty_buffer_request_room+0xe2/0x130
>  [<c03d131d>]  [<c03d131d>] ? ? tty_insert_flip_string_flags+0x2d/0xa0
>  [<c03efcf1>]  [<c03efcf1>] ? ? receive_chars+0x161/0x290
>  [<c03f0d24>]  [<c03f0d24>] ? ? serial8250_interrupt+0x134/0x150
>  [<c017db28>]  [<c017db28>] ? ? handle_IRQ_event+0x28/0x70
>  [<c017f0ef>]  [<c017f0ef>] ? ? handle_edge_irq+0xaf/0x140
>  [<c0108138>]  [<c0108138>] ? ? do_IRQ+0x48/0xa0
>  [<c037b7a4>]  [<c037b7a4>] ? ? trace_hardirqs_off_thunk+0xc/0x18
>  [<c0105a3c>]  [<c0105a3c>] ? ? common_interrupt+0x28/0x30
>  [<c013b2d1>]  [<c013b2d1>] ? ? vprintk+0x151/0x3c0
>  [<c015e884>]  [<c015e884>] ? ? trace_hardirqs_on_caller+0xd4/0x160
>  [<c015e91b>]  [<c015e91b>] ? ? trace_hardirqs_on+0xb/0x10
>  [<c0687983>]  [<c0687983>] ? ? _spin_unlock_irqrestore+0x43/0x70
>  [<c038ef8b>]  [<c038ef8b>] ? ? pci_bus_read_config_byte+0x5b/0x70
>  [<c013b55b>]  [<c013b55b>] ? ? printk+0x1b/0x20
>  [<c059a748>]  [<c059a748>] ? ? pcibios_set_master+0x68/0xa0
>  [<c03925c4>]  [<c03925c4>] ? ? pci_set_master+0x54/0x60
>  [<c0513d9c>]  [<c0513d9c>] ? ? usb_hcd_pci_resume+0x4c/0xf0
>  [<c015e91b>]  [<c015e91b>] ? ? trace_hardirqs_on+0xb/0x10
>  [<c0393cf6>]  [<c0393cf6>] ? ? pci_legacy_resume+0x16/0x30
>  [<c0393d5a>]  [<c0393d5a>] ? ? pci_pm_restore+0x4a/0x60
>  [<c03f83e5>]  [<c03f83e5>] ? ? pm_op+0x115/0x130
>  [<c03f8986>]  [<c03f8986>] ? ? device_resume+0xd6/0x380
>  [<c0168791>]  [<c0168791>] ? ? hibernation_snapshot+0xa1/0x220
>  [<c013b55b>]  [<c013b55b>] ? ? printk+0x1b/0x20
>  [<c01689f0>]  [<c01689f0>] ? ? hibernate+0xe0/0x180
>  [<c01674a0>]  [<c01674a0>] ? ? state_store+0x0/0xd0
>  [<c016755f>]  [<c016755f>] ? ? state_store+0xbf/0xd0
>  [<c01674a0>]  [<c01674a0>] ? ? state_store+0x0/0xd0
>  [<c0375ef4>]  [<c0375ef4>] ? ? kobj_attr_store+0x24/0x30
>  [<c01fa432>]  [<c01fa432>] ? ? sysfs_write_file+0xa2/0x100
>  [<c01bbf06>]  [<c01bbf06>] ? ? vfs_write+0x96/0x130
>  [<c01fa390>]  [<c01fa390>] ? ? sysfs_write_file+0x0/0x100
>  [<c01bc44d>]  [<c01bc44d>] ? ? sys_write+0x3d/0x70
>  [<c0104f3b>]  [<c0104f3b>] ? ? sysenter_do_call+0x12/0x3f
>  =======================
> Code: Code: e8 e8 01 01 89 89 44 44 24 24 04 04 e8 e8 94 94 08 08 db
> db ff ff 8b 8b 55 55 04 04 b8 b8 80 80 45 45 7a 7a c0 c0 e8 e8 e7 e7
> c2 c2 dd dd ff ff e8 e8
>  a2 a2 c7 c7 d7 d7 ff ff eb eb a9 a9 55 55 89 89 e5 e5 53 53 89 89 c3
> c3 83 83 ec ec 14 14 8b 8b 40 40 04 04 <8b> <8b> 00 00 39 39 d8 d8 75
> 75 24 24 8b 8b 13 13
> 8b 8b 42 42 04 04 39 39 d8 d8 75 75 41 41 8b 8b 43 43 04 04 89 89 42 42 04 04
> EIP: [<c038accc>] EIP: [<c038accc>] list_del+0xc/0x90list_del+0xc/0x90
> SS:ESP 0068:f31a7bd0
> Kernel panic - not syncing: Fatal exception in interrupt
> 
> The EIP corresponds to the first line of list_del, lib/list_debug.c:46:
> 
> void list_del(struct list_head *entry)
> {
>         WARN(entry->prev->next != entry,
> 
> I actually have one theory: Data arrives on the serial port at the
> wrong moment, so interrupt happens before everything is
> restored/resumed. You can see the interrupt in the stack trace. Does
> this seem possible?

To me, it does, but the question is if that's not caused by the serial console
itself ...

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 18:10 ` Rafael J. Wysocki
@ 2008-08-21 18:21   ` Vegard Nossum
  2008-08-21 18:45     ` Vegard Nossum
  2008-08-21 21:27     ` Pekka Enberg
  0 siblings, 2 replies; 21+ messages in thread
From: Vegard Nossum @ 2008-08-21 18:21 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List

On Thu, Aug 21, 2008 at 8:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, 21 of August 2008, Vegard Nossum wrote:
>> On Thu, Aug 21, 2008 at 6:49 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>> > But I think it would be a lot more useful to see _everything_. Why
>> > doesn't output to ttyS0 work? It works with tty0... I have tried
>> > various combinations (leaving out console=tty0, leaving out
>> > earlyprintk, using netconsole, kexec crashdump), but nothing seems to
>> > be able to help. Do you have any suggestions?
>>
>> I did it!
>>
>> With a little debug patch, I got the oops to ttyS0,
>
> What exactly did you do to achieve that?

Actually, it was not my patch that did it. I appended acpi=off to the
command line. I get similar oopses now, but they make it to the ttyS0.
Don't ask me why this makes a difference.

>> I actually have one theory: Data arrives on the serial port at the
>> wrong moment, so interrupt happens before everything is
>> restored/resumed. You can see the interrupt in the stack trace. Does
>> this seem possible?
>
> To me, it does, but the question is if that's not caused by the serial console
> itself ...

I got a "proper" oops this time (brace yourself, it's long ;-)), and
it does not have serial on the stack:

BUG: unable to handle kernel paging request at 00100104
IP: [<c038ad65>] __list_add+0x15/0x90
*pdpt = 00000000325bb001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3597, comm: bash Not tainted (2.6.27-rc4-00003-ga798564-dirty #30)
EIP: 0060:[<c038ad65>] EFLAGS: 00210082 CPU: 0
EIP is at __list_add+0x15/0x90
EAX: f7455764 EBX: 00100100 ECX: 00100100 EDX: f7455764
ESI: f7455764 EDI: f7455764 EBP: f25e7c18 ESP: f25e7bf4
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 3597, ti=f25e6000 task=f25aa700 task.ti=f25e6000)
Stack: f68120bc f6c00400 c0687133 00000000 00000002 00000000 f7455734 f7455764
       f7455764 f25e7c40 c018b954 0000001f 00000000 f6c00400 f6c00444 00000001
       f68120bc f68120b0 00200002 f25e7cb4 c018d317 f68120bc 00000000 f25aac0c
Call Trace:
 [<c0687133>] ? _spin_lock+0x63/0x70
 [<c018b954>] ? rmqueue_bulk+0x54/0x80
 [<c018d317>] ? get_page_from_freelist+0x5a7/0x720
 [<c01600ea>] ? __lock_acquire+0x27a/0xa00
 [<c018dd50>] ? __alloc_pages_internal+0xa0/0x450
 [<c01acd4b>] ? alloc_pages_current+0x7b/0xc0
 [<c01b37fb>] ? new_slab+0x1bb/0x2d0
 [<c0687877>] ? _spin_unlock+0x27/0x50
 [<c01b40ca>] ? __slab_alloc+0x32a/0x4e0
 [<c010b335>] ? native_sched_clock+0xb5/0x110
 [<c01b4424>] ? kmem_cache_alloc+0xb4/0xe0
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c01897a1>] ? mempool_alloc+0x31/0xf0
 [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
 [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
 [<c0368c7e>] ? get_request+0xae/0x2c0
 [<c036935c>] ? get_request_wait+0x1c/0xd0
 [<c0687462>] ? _spin_lock_irq+0x72/0x80
 [<c0369442>] ? blk_get_request+0x32/0x70
 [<c0471c1c>] ? generic_ide_resume+0x5c/0xf0
 [<c03f8bde>] ? device_resume+0x32e/0x380
 [<c0168791>] ? hibernation_snapshot+0xa1/0x220
 [<c013b55b>] ? printk+0x1b/0x20
 [<c01689f0>] ? hibernate+0xe0/0x180
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c016755f>] ? state_store+0xbf/0xd0
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c0375ef4>] ? kobj_attr_store+0x24/0x30
 [<c01fa432>] ? sysfs_write_file+0xa2/0x100
 [<c01bbf06>] ? vfs_write+0x96/0x130
 [<c01fa390>] ? sysfs_write_file+0x0/0x100
 [<c01bc44d>] ? sys_write+0x3d/0x70
 [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
 =======================
Code: c0 e8 d0 f7 da ff 8b 13 eb 97 8d b6 00 00 00 00 8d bf 00 00 00
00 55 89 e5 83 ec 24 89 5d f4 89 cb 89 75 f8 89 d6 89 7d fc 89 c7 <8b>
41 04 39 d0 75 1d 8b 06 39 d8 75 41 89 7b 04 89 1f 8b 5d f4
EIP: [<c038ad65>] __list_add+0x15/0x90 SS:ESP 0068:f25e7bf4
---[ end trace e3ed674f2f20c5d3 ]---
note: bash[3597] exited with preempt_count 2
Eeek! page_mapcount(page) went negative! (-1)
  page pfn = 3e74f
  page->flags = 210007c
  page->count = 1
  page->mapping = f6439028
  vma->vm_ops = generic_file_vm_ops+0x0/0x20
  vma->vm_ops->fault = filemap_fault+0x0/0x480
  vma->vm_file->f_op->mmap = generic_file_mmap+0x0/0x50
------------[ cut here ]------------
kernel BUG at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/mm/rmap.c:662!
invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3597, comm: bash Tainted: G      D   (2.6.27-rc4-00003-ga798564-dirty #30)
EIP: 0060:[<c01a0f39>] EFLAGS: 00210286 CPU: 0
EIP is at page_remove_rmap+0x109/0x120
EAX: 0000003b EBX: f7aa5684 ECX: f25e6000 EDX: 00000005
ESI: f25b94c8 EDI: 3e74f025 EBP: f25e7990 ESP: f25e7980
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process bash (pid: 3597, ti=f25e6000 task=f25aa700 task.ti=f25e6000)
Stack: c077ed7c f6439028 00000000 00000025 f25e7a38 c0198721 3e74f025 00000000
       f25e79c0 c015c5ea 00000025 00000000 325a3067 00000000 00000000 f25b94c8
       f25e7a50 00000000 00000000 3e74f025 00000000 00007000 0097a000 00000000
Call Trace:
 [<c0198721>] ? unmap_vmas+0x4b1/0x8b0
 [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
 [<c019d504>] ? exit_mmap+0x84/0x120
 [<c0138538>] ? mmput+0x48/0xa0
 [<c013c3d7>] ? exit_mm+0xe7/0x110
 [<c013d7a4>] ? do_exit+0x184/0x890
 [<c013b55b>] ? printk+0x1b/0x20
 [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
 [<c01060f1>] ? oops_end+0xb1/0xc0
 [<c01067c0>] ? die+0x50/0x70
 [<c0122b4f>] ? do_page_fault+0x1ef/0xa20
 [<c010b335>] ? native_sched_clock+0xb5/0x110
 [<c01600ea>] ? __lock_acquire+0x27a/0xa00
 [<c0122960>] ? do_page_fault+0x0/0xa20
 [<c0687d3a>] ? error_code+0x72/0x78
 [<c038ad65>] ? __list_add+0x15/0x90
 [<c0687133>] ? _spin_lock+0x63/0x70
 [<c018b954>] ? rmqueue_bulk+0x54/0x80
 [<c018d317>] ? get_page_from_freelist+0x5a7/0x720
 [<c01600ea>] ? __lock_acquire+0x27a/0xa00
 [<c018dd50>] ? __alloc_pages_internal+0xa0/0x450
 [<c01acd4b>] ? alloc_pages_current+0x7b/0xc0
 [<c01b37fb>] ? new_slab+0x1bb/0x2d0
 [<c0687877>] ? _spin_unlock+0x27/0x50
 [<c01b40ca>] ? __slab_alloc+0x32a/0x4e0
 [<c010b335>] ? native_sched_clock+0xb5/0x110
 [<c01b4424>] ? kmem_cache_alloc+0xb4/0xe0
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c01897a1>] ? mempool_alloc+0x31/0xf0
 [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
 [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
 [<c0368c7e>] ? get_request+0xae/0x2c0
 [<c036935c>] ? get_request_wait+0x1c/0xd0
 [<c0687462>] ? _spin_lock_irq+0x72/0x80
 [<c0369442>] ? blk_get_request+0x32/0x70
 [<c0471c1c>] ? generic_ide_resume+0x5c/0xf0
 [<c03f8bde>] ? device_resume+0x32e/0x380
 [<c0168791>] ? hibernation_snapshot+0xa1/0x220
 [<c013b55b>] ? printk+0x1b/0x20
 [<c01689f0>] ? hibernate+0xe0/0x180
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c016755f>] ? state_store+0xbf/0xd0
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c0375ef4>] ? kobj_attr_store+0x24/0x30
 [<c01fa432>] ? sysfs_write_file+0xa2/0x100
 [<c01bbf06>] ? vfs_write+0x96/0x130
 [<c01fa390>] ? sysfs_write_file+0x0/0x100
 [<c01bc44d>] ? sys_write+0x3d/0x70
 [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
 =======================
Code: c0 74 0d 8b 50 08 b8 ac ed 77 c0 e8 82 60 fc ff 8b 46 4c 85 c0
74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 d8 d1 77 c0 e8 67 60 fc ff <0f>
0b eb fe 8b 53 0c eb 95 8d b4 26 00 00 00 00 8d bc 27 00 00
EIP: [<c01a0f39>] page_remove_rmap+0x109/0x120 SS:ESP 0068:f25e7980
---[ end trace e3ed674f2f20c5d3 ]---
Fixing recursive fault but reboot is needed!
=============================================================================
BUG blkdev_ioc: Invalid object pointer 0xf5cdaca8
-----------------------------------------------------------------------------
INFO: Slab 0xf789e318 objects=14 used=14 fp=0x00000000 flags=0x2082083
Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
 [<c01b2576>] slab_err+0x46/0x50
 [<c01b2766>] ? check_slab+0xd6/0xf0
 [<c0181aef>] ? call_rcu+0x6f/0x80
 [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
 [<c01b3c78>] __slab_free+0x238/0x360
 [<c01b4749>] kmem_cache_free+0xa9/0x120
 [<c036b773>] ? put_io_context+0x53/0x70
 [<c036b773>] ? put_io_context+0x53/0x70
 [<c036b773>] put_io_context+0x53/0x70
 [<c036b82e>] exit_io_context+0x6e/0x80
 [<c013de6e>] do_exit+0x84e/0x890
 [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c013b55b>] ? printk+0x1b/0x20
 [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
 [<c01060f1>] oops_end+0xb1/0xc0
 [<c01067c0>] die+0x50/0x70
 [<c0106871>] do_trap+0x91/0xc0
 [<c0106940>] ? do_invalid_op+0x0/0xa0
 [<c01069c8>] do_invalid_op+0x88/0xa0
 [<c01a0f39>] ? page_remove_rmap+0x109/0x120
 [<c013b2d1>] ? vprintk+0x151/0x3c0
 [<c013b45b>] ? vprintk+0x2db/0x3c0
 [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
 [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
 [<c0687d3a>] error_code+0x72/0x78
 [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
 [<c01a0f39>] ? page_remove_rmap+0x109/0x120
 [<c0198721>] unmap_vmas+0x4b1/0x8b0
 [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
 [<c019d504>] exit_mmap+0x84/0x120
 [<c0138538>] mmput+0x48/0xa0
 [<c013c3d7>] exit_mm+0xe7/0x110
 [<c013d7a4>] do_exit+0x184/0x890
 [<c013b55b>] ? printk+0x1b/0x20
 [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
 [<c01060f1>] oops_end+0xb1/0xc0
 [<c01067c0>] die+0x50/0x70
 [<c0122b4f>] do_page_fault+0x1ef/0xa20
 [<c010b335>] ? native_sched_clock+0xb5/0x110
 [<c01600ea>] ? __lock_acquire+0x27a/0xa00
 [<c0122960>] ? do_page_fault+0x0/0xa20
 [<c0687d3a>] error_code+0x72/0x78
 [<c038ad65>] ? __list_add+0x15/0x90
 [<c0687133>] ? _spin_lock+0x63/0x70
 [<c018b954>] rmqueue_bulk+0x54/0x80
 [<c018d317>] get_page_from_freelist+0x5a7/0x720
 [<c01600ea>] ? __lock_acquire+0x27a/0xa00
 [<c018dd50>] __alloc_pages_internal+0xa0/0x450
 [<c01acd4b>] alloc_pages_current+0x7b/0xc0
 [<c01b37fb>] new_slab+0x1bb/0x2d0
 [<c0687877>] ? _spin_unlock+0x27/0x50
 [<c01b40ca>] __slab_alloc+0x32a/0x4e0
 [<c010b335>] ? native_sched_clock+0xb5/0x110
 [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c018969e>] mempool_alloc_slab+0xe/0x10
 [<c01897a1>] mempool_alloc+0x31/0xf0
 [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
 [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
 [<c0368c7e>] get_request+0xae/0x2c0
 [<c036935c>] get_request_wait+0x1c/0xd0
 [<c0687462>] ? _spin_lock_irq+0x72/0x80
 [<c0369442>] blk_get_request+0x32/0x70
 [<c0471c1c>] generic_ide_resume+0x5c/0xf0
 [<c03f8bde>] device_resume+0x32e/0x380
 [<c0168791>] hibernation_snapshot+0xa1/0x220
 [<c013b55b>] ? printk+0x1b/0x20
 [<c01689f0>] hibernate+0xe0/0x180
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c016755f>] state_store+0xbf/0xd0
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c0375ef4>] kobj_attr_store+0x24/0x30
 [<c01fa432>] sysfs_write_file+0xa2/0x100
 [<c01bbf06>] vfs_write+0x96/0x130
 [<c01fa390>] ? sysfs_write_file+0x0/0x100
 [<c01bc44d>] sys_write+0x3d/0x70
 [<c0104f3b>] sysenter_do_call+0x12/0x3f
 =======================
FIX blkdev_ioc: Object at 0xf5cdaca8 not freed
BUG: scheduling while atomic: bash/3597/0x00000006
INFO: lockdep is turned off.
Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
 [<c0135467>] __schedule_bug+0x77/0x80
 [<c0684ce2>] schedule+0x852/0x8f0
 [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
 [<c01b4779>] ? kmem_cache_free+0xd9/0x120
 [<c036b773>] ? put_io_context+0x53/0x70
 [<c036b773>] ? put_io_context+0x53/0x70
 [<c013de81>] do_exit+0x861/0x890
 [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c013b55b>] ? printk+0x1b/0x20
 [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
 [<c01060f1>] oops_end+0xb1/0xc0
 [<c01067c0>] die+0x50/0x70
 [<c0106871>] do_trap+0x91/0xc0
 [<c0106940>] ? do_invalid_op+0x0/0xa0
 [<c01069c8>] do_invalid_op+0x88/0xa0
 [<c01a0f39>] ? page_remove_rmap+0x109/0x120
 [<c013b2d1>] ? vprintk+0x151/0x3c0
 [<c013b45b>] ? vprintk+0x2db/0x3c0
 [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
 [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
 [<c0687d3a>] error_code+0x72/0x78
 [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
 [<c01a0f39>] ? page_remove_rmap+0x109/0x120
 [<c0198721>] unmap_vmas+0x4b1/0x8b0
 [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
 [<c019d504>] exit_mmap+0x84/0x120
 [<c0138538>] mmput+0x48/0xa0
 [<c013c3d7>] exit_mm+0xe7/0x110
 [<c013d7a4>] do_exit+0x184/0x890
 [<c013b55b>] ? printk+0x1b/0x20
 [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
 [<c01060f1>] oops_end+0xb1/0xc0
 [<c01067c0>] die+0x50/0x70
 [<c0122b4f>] do_page_fault+0x1ef/0xa20
 [<c010b335>] ? native_sched_clock+0xb5/0x110
 [<c01600ea>] ? __lock_acquire+0x27a/0xa00
 [<c0122960>] ? do_page_fault+0x0/0xa20
 [<c0687d3a>] error_code+0x72/0x78
 [<c038ad65>] ? __list_add+0x15/0x90
 [<c0687133>] ? _spin_lock+0x63/0x70
 [<c018b954>] rmqueue_bulk+0x54/0x80
 [<c018d317>] get_page_from_freelist+0x5a7/0x720
 [<c01600ea>] ? __lock_acquire+0x27a/0xa00
 [<c018dd50>] __alloc_pages_internal+0xa0/0x450
 [<c01acd4b>] alloc_pages_current+0x7b/0xc0
 [<c01b37fb>] new_slab+0x1bb/0x2d0
 [<c0687877>] ? _spin_unlock+0x27/0x50
 [<c01b40ca>] __slab_alloc+0x32a/0x4e0
 [<c010b335>] ? native_sched_clock+0xb5/0x110
 [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c018969e>] ? mempool_alloc_slab+0xe/0x10
 [<c018969e>] mempool_alloc_slab+0xe/0x10
 [<c01897a1>] mempool_alloc+0x31/0xf0
 [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
 [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
 [<c0368c7e>] get_request+0xae/0x2c0
 [<c036935c>] get_request_wait+0x1c/0xd0
 [<c0687462>] ? _spin_lock_irq+0x72/0x80
 [<c0369442>] blk_get_request+0x32/0x70
 [<c0471c1c>] generic_ide_resume+0x5c/0xf0
 [<c03f8bde>] device_resume+0x32e/0x380
 [<c0168791>] hibernation_snapshot+0xa1/0x220
 [<c013b55b>] ? printk+0x1b/0x20
 [<c01689f0>] hibernate+0xe0/0x180
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c016755f>] state_store+0xbf/0xd0
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c0375ef4>] kobj_attr_store+0x24/0x30
 [<c01fa432>] sysfs_write_file+0xa2/0x100

I can look up addresses in the vmlinux for accurate line numbers if needed.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 18:21   ` Vegard Nossum
@ 2008-08-21 18:45     ` Vegard Nossum
  2008-08-21 19:13       ` Rafael J. Wysocki
  2008-08-21 21:27     ` Pekka Enberg
  1 sibling, 1 reply; 21+ messages in thread
From: Vegard Nossum @ 2008-08-21 18:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List

On Thu, Aug 21, 2008 at 8:21 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> I got a "proper" oops this time (brace yourself, it's long ;-)), and
> it does not have serial on the stack:
>
> BUG: unable to handle kernel paging request at 00100104
> IP: [<c038ad65>] __list_add+0x15/0x90
> *pdpt = 00000000325bb001 *pde = 0000000000000000
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3597, comm: bash Not tainted (2.6.27-rc4-00003-ga798564-dirty #30)
> EIP: 0060:[<c038ad65>] EFLAGS: 00210082 CPU: 0
> EIP is at __list_add+0x15/0x90

This one is slightly different:

WARNING: at lib/list_debug.c:26 __list_add+0x61/0x90()
list_add corruption. next->prev should be prev (c088e34c), but was
00200200. (next=f7814774).
Pid: 3572, comm: bash Not tainted 2.6.27-rc4-00003-ga798564-dirty #30
 [<c013a56e>] warn_slowpath+0x5e/0x80
 [<c015f7bf>] ? validate_chain+0x7df/0xe90
 [<c014c4e8>] ? __kernel_text_address+0x8/0x20
 [<c01600ea>] ? __lock_acquire+0x27a/0xa00
 [<c015b60e>] ? get_lock_stats+0x1e/0x50
 [<c015b64d>] ? put_lock_stats+0xd/0x30
 [<c038adb1>] __list_add+0x61/0x90
 [<c01b3208>] deactivate_slab+0x158/0x1b0
 [<c01b3ea5>] __slab_alloc+0x105/0x4e0
 [<c015b60e>] ? get_lock_stats+0x1e/0x50
 [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
 [<c052a51b>] ? serio_queue_event+0x6b/0xe0
 [<c052a51b>] ? serio_queue_event+0x6b/0xe0
 [<c052a51b>] serio_queue_event+0x6b/0xe0
 [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
 [<c052a5b8>] serio_resume+0x28/0x30
 [<c03f8bde>] device_resume+0x32e/0x380
 [<c0168791>] hibernation_snapshot+0xa1/0x220
 [<c013b55b>] ? printk+0x1b/0x20
 [<c01689f0>] hibernate+0xe0/0x180
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c016755f>] state_store+0xbf/0xd0
 [<c01674a0>] ? state_store+0x0/0xd0
 [<c0375ef4>] kobj_attr_store+0x24/0x30
 [<c01fa432>] sysfs_write_file+0xa2/0x100
 [<c01bbf06>] vfs_write+0x96/0x130
 [<c01fa390>] ? sysfs_write_file+0x0/0x100
 [<c01bc44d>] sys_write+0x3d/0x70
 [<c0104f3b>] sysenter_do_call+0x12/0x3f
 =======================
---[ end trace 0bc52e2837e4b8b8 ]---

...followed by the "unable to handle kernel paging request" as in
previous e-mail. The numbers 00100104 and 00200200 seem to appear all
the time.

At least it is 100% reproducible now and with nice oopses too :-)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 18:45     ` Vegard Nossum
@ 2008-08-21 19:13       ` Rafael J. Wysocki
  2008-08-21 19:16         ` Vegard Nossum
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2008-08-21 19:13 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Linux Kernel Mailing List, Andrew Morton

On Thursday, 21 of August 2008, Vegard Nossum wrote:
> On Thu, Aug 21, 2008 at 8:21 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > I got a "proper" oops this time (brace yourself, it's long ;-)), and
> > it does not have serial on the stack:
> >
> > BUG: unable to handle kernel paging request at 00100104
> > IP: [<c038ad65>] __list_add+0x15/0x90
> > *pdpt = 00000000325bb001 *pde = 0000000000000000
> > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > Pid: 3597, comm: bash Not tainted (2.6.27-rc4-00003-ga798564-dirty #30)
> > EIP: 0060:[<c038ad65>] EFLAGS: 00210082 CPU: 0
> > EIP is at __list_add+0x15/0x90
> 
> This one is slightly different:
> 
> WARNING: at lib/list_debug.c:26 __list_add+0x61/0x90()
> list_add corruption. next->prev should be prev (c088e34c), but was
> 00200200. (next=f7814774).
> Pid: 3572, comm: bash Not tainted 2.6.27-rc4-00003-ga798564-dirty #30
>  [<c013a56e>] warn_slowpath+0x5e/0x80
>  [<c015f7bf>] ? validate_chain+0x7df/0xe90
>  [<c014c4e8>] ? __kernel_text_address+0x8/0x20
>  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>  [<c015b60e>] ? get_lock_stats+0x1e/0x50
>  [<c015b64d>] ? put_lock_stats+0xd/0x30
>  [<c038adb1>] __list_add+0x61/0x90
>  [<c01b3208>] deactivate_slab+0x158/0x1b0
>  [<c01b3ea5>] __slab_alloc+0x105/0x4e0
>  [<c015b60e>] ? get_lock_stats+0x1e/0x50
>  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
>  [<c052a51b>] ? serio_queue_event+0x6b/0xe0
>  [<c052a51b>] ? serio_queue_event+0x6b/0xe0
>  [<c052a51b>] serio_queue_event+0x6b/0xe0
>  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
>  [<c052a5b8>] serio_resume+0x28/0x30
>  [<c03f8bde>] device_resume+0x32e/0x380
>  [<c0168791>] hibernation_snapshot+0xa1/0x220
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c01689f0>] hibernate+0xe0/0x180
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c016755f>] state_store+0xbf/0xd0
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c0375ef4>] kobj_attr_store+0x24/0x30
>  [<c01fa432>] sysfs_write_file+0xa2/0x100
>  [<c01bbf06>] vfs_write+0x96/0x130
>  [<c01fa390>] ? sysfs_write_file+0x0/0x100
>  [<c01bc44d>] sys_write+0x3d/0x70
>  [<c0104f3b>] sysenter_do_call+0x12/0x3f
>  =======================
> ---[ end trace 0bc52e2837e4b8b8 ]---
> 
> ...followed by the "unable to handle kernel paging request" as in
> previous e-mail. The numbers 00100104 and 00200200 seem to appear all
> the time.
> 
> At least it is 100% reproducible now and with nice oopses too :-)

Hm, slab corruption, it seems.

Do you use SLAB or SLUB?

Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 19:13       ` Rafael J. Wysocki
@ 2008-08-21 19:16         ` Vegard Nossum
  2008-08-21 19:37           ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Vegard Nossum @ 2008-08-21 19:16 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Andrew Morton

On Thu, Aug 21, 2008 at 9:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> ...followed by the "unable to handle kernel paging request" as in
>> previous e-mail. The numbers 00100104 and 00200200 seem to appear all
>> the time.
>>
>> At least it is 100% reproducible now and with nice oopses too :-)
>
> Hm, slab corruption, it seems.
>
> Do you use SLAB or SLUB?

SLUB, with all possible debugging:

CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLABINFO=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y

(Thanks for helping out.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 19:16         ` Vegard Nossum
@ 2008-08-21 19:37           ` Rafael J. Wysocki
  2008-08-21 20:08             ` Vegard Nossum
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2008-08-21 19:37 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Linux Kernel Mailing List, Andrew Morton

On Thursday, 21 of August 2008, Vegard Nossum wrote:
> On Thu, Aug 21, 2008 at 9:13 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> ...followed by the "unable to handle kernel paging request" as in
> >> previous e-mail. The numbers 00100104 and 00200200 seem to appear all
> >> the time.
> >>
> >> At least it is 100% reproducible now and with nice oopses too :-)
> >
> > Hm, slab corruption, it seems.
> >
> > Do you use SLAB or SLUB?
> 
> SLUB, with all possible debugging:
> 
> CONFIG_SLUB_DEBUG=y
> # CONFIG_SLAB is not set
> CONFIG_SLUB=y
> # CONFIG_SLOB is not set
> CONFIG_SLABINFO=y
> CONFIG_SLUB_DEBUG_ON=y
> CONFIG_SLUB_STATS=y
> 
> (Thanks for helping out.)

(You're welcome.)

Can you switch to SLAB and retest?  I don't really think SLUB is the issue
here, but SLAB may give us additional information.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 19:37           ` Rafael J. Wysocki
@ 2008-08-21 20:08             ` Vegard Nossum
  2008-08-21 20:22               ` Rafael J. Wysocki
  2008-08-21 21:21               ` Pekka Enberg
  0 siblings, 2 replies; 21+ messages in thread
From: Vegard Nossum @ 2008-08-21 20:08 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Andrew Morton

On Thu, Aug 21, 2008 at 9:37 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> Can you switch to SLAB and retest?  I don't really think SLUB is the issue
> here, but SLAB may give us additional information.

Got this with SLAB:

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<c03897ac>] list_del+0xc/0x90
*pdpt = 0000000031480001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 7, comm: events/0 Not tainted (2.6.27-rc4-00003-gef9b1bc #33)
EIP: 0060:[<c03897ac>] EFLAGS: 00010082 CPU: 0
EIP is at list_del+0xc/0x90
EAX: 00000000 EBX: f556afa0 ECX: f54a5380 EDX: f54e3128
ESI: f54f1ef0 EDI: f556afa0 EBP: f68b7ee0 ESP: f68b7ec8
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process events/0 (pid: 7, ti=f68b6000 task=f689c338 task.ti=f68b6000)
Stack: c015b62e f68b7ed4 c015b66d f68b7efc 00000046 f54f1f20 f68b7f1c c01b4054
       00000000 f689c338 f54e315c 00000001 f54f1f20 f54a5380 00000000 f54e3128
       f55b3348 00000000 f54f1f20 f54f1ef0 00000001 f68b7f3c c01b41ba 00000000
Call Trace:
 [<c015b62e>] ? get_lock_stats+0x1e/0x50
 [<c015b66d>] ? put_lock_stats+0xd/0x30
 [<c01b4054>] ? free_block+0xa4/0x1b0
 [<c01b41ba>] ? drain_array+0x5a/0xb0
 [<c01b4f82>] ? cache_reap+0x72/0x230
 [<c014ada7>] ? run_workqueue+0x107/0x200
 [<c014ae0a>] ? run_workqueue+0x16a/0x200
 [<c014ada7>] ? run_workqueue+0x107/0x200
 [<c01b4f10>] ? cache_reap+0x0/0x230
 [<c014b94d>] ? worker_thread+0x7d/0xe0
 [<c014e4f0>] ? autoremove_wake_function+0x0/0x50
 [<c014b8d0>] ? worker_thread+0x0/0xe0
 [<c014e1e2>] ? kthread+0x42/0x70
 [<c014e1a0>] ? kthread+0x0/0x70
 [<c0105cf3>] ? kernel_thread_helper+0x7/0x14
 =======================
Code: e8 01 89 44 24 04 e8 d4 1d db ff 8b 55 04 b8 48 2f 7a c0 e8 47 d8 dd ff e8
 c2 dc d7 ff eb a9 55 89 e5 53 89 c3 83 ec 14 8b 40 04 <8b> 00 39 d8 75 24 8b 13
 8b 42 04 39 d8 75 41 8b 43 04 89 42 04
EIP: [<c03897ac>] list_del+0xc/0x90 SS:ESP 0068:f68b7ec8
---[ end trace 958cea1a710a109a ]---
note: events/0[7] exited with preempt_count 1
uhci_hcd 0000:00:1d.2: setting latency timer to 64
uhci_hcd 0000:00:1d.3: setting latency timer to 64
ehci_hcd 0000:00:1d.7: setting latency timer to 64
pci 0000:00:1e.0: setting latency timer to 64
------------[ cut here ]------------
WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/lib/list_debug.c:26
__list_add+0x61/0x90()
list_add corruption. next->prev should be prev (f6c00490), but was
f73ecea8. (next=f73ecea8).
Pid: 201, comm: rcu_torture_rea Tainted: G      D
2.6.27-rc4-00003-gef9b1bc #33
 [<c013a58e>] warn_slowpath+0x5e/0x80
 [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
 [<c010b325>] ? native_sched_clock+0xb5/0x110
 [<c0125c36>] ? kernel_map_pages+0xa6/0x130
 [<c0389891>] __list_add+0x61/0x90
 [<c018c978>] __free_pages_ok+0x368/0x410
 [<c018ccf2>] __free_pages+0x22/0x40
 [<c018cd58>] free_pages+0x48/0x50
 [<c010be69>] free_thread_info+0x19/0x20
 [<c0138929>] free_task+0x19/0x30
 [<c0139e31>] __put_task_struct+0x51/0xa0
 [<c013c0b7>] delayed_put_task_struct+0x27/0x30
 [<c018206c>] rcu_process_callbacks+0x6c/0xb0
 [<c013f5b3>] __do_softirq+0x83/0x100
 [<c013f6d5>] do_softirq+0xa5/0xb0
 [<c013f975>] irq_exit+0x95/0xa0
 [<c010813d>] do_IRQ+0x4d/0xa0
 [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
 [<c0105a3c>] common_interrupt+0x28/0x30
 [<c0683b44>] ? schedule+0x734/0x8f0
 [<c068007b>] ? start_secondary+0x19b/0x1c0
 [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
 [<c068695b>] ? _spin_unlock_irq+0x2b/0x60
 [<c0683b44>] schedule+0x734/0x8f0
 [<c010509e>] ? restore_nocheck_notrace+0x0/
 [<c015e93b>] ? trace_hardirqs_on+0xb/0x10
 [<c0180c88>] rcu_torture_reader+0x148/0x230
 [<c0181460>] ? rcu_torture_timer+0x0/0x100
 [<c0180d19>] ? rcu_torture_reader+0x1d9/0x230
 [<c0180b40>] ? rcu_torture_reader+0x0/0x230
 [<c014e1e2>] kthread+0x42/0x70
 [<c014e1a0>] ? kthread+0x0/0x70
 [<c0105cf3>] kernel_thread_helper+0x7/0x14
 =======================
---[ end trace 958cea1a710a109a ]---
eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
eth1: link down
hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
hda: UDMA/100 mode selected
------------[ cut here ]------------
kernel BUG at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/mm/slab.c:590!
invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3257, comm: S99local Tainted: G      D W (2.6.27-rc4-00003-gef9b1bc #33)
EIP: 0060:[<c01b24c0>] EFLAGS: 00210002 CPU: 0
EIP is at cache_free_debugcheck+0x2e0/0x310
EAX: 00000000 EBX: f4c1f688 ECX: c018976e EDX: f785f744
ESI: f6864180 EDI: f4c1f688 EBP: f25b1de8 ESP: f25b1d94
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process S99local (pid: 3257, ti=f25b0000 task=f253f338 task.ti=f25b0000)
Stack: 00200200 f4c1f688 00000000 f25b1e20 f25b1dac c0683402 f25b1e58 c018976e
       00000001 c036a7b0 00000002 00000000 00000000 c088c190 f253f338 f4c1f688
       f253f338 f25b1df8 f6864180 f6960ef0 f4c1f688 f25b1e0c c01b3c9c f253f7f4
Call Trace:
 [<c0683402>] ? wait_for_completion+0x12/0x20
 [<c018976e>] ? mempool_free_slab+0xe/0x10
 [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
 [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
 [<c018976e>] ? mempool_free_slab+0xe/0x10
 [<c018979c>] ? mempool_free+0x2c/0x90
 [<c0367252>] ? __blk_put_request+0x62/0x90
 [<c036764c>] ? blk_put_request+0x2c/0x50
 [<c0470732>] ? generic_ide_resume+0xa2/0xf0
 [<c03f767e>] ? device_resume+0x32e/0x380
 [<c01687d1>] ? hibernation_snapshot+0xa1/0x220
 [<c013b57b>] ? printk+0x1b/0x20
 [<c0168a30>] ? hibernate+0xe0/0x180
 [<c01674e0>] ? state_store+0x0/0xd0
 [<c016759f>] ? state_store+0xbf/0xd0
 [<c01674e0>] ? state_store+0x0/0xd0
 [<c03749d4>] ? kobj_attr_store+0x24/0x30
 [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
 [<c01ba946>] ? vfs_write+0x96/0x130
 [<c01f8e50>] ? sysfs_write_file+0x0/0x100
 [<c01bae8d>] ? sys_write+0x3d/0x70
 [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
 =======================
Code: 00 00 75 cd c7 03 21 43 65 87 8b 5e 28 e9 78 ff ff ff 0f 0b eb fe 90 8d 74
 26 00 8b 52 0c e9 ca fd ff ff 0f 0b eb fe 8d 74 26 00 <0f> 0b eb fe 0f 0b eb fe
 0f 0b eb fe 8d 74 26 00 8b 52 0c 8b 02
EIP: [<c01b24c0>] cache_free_debugcheck+0x2e0/0x310 SS:ESP 0068:f25b1d94
---[ end trace 958cea1a710a109a ]---
note: S99local[3257] exited with preempt_count 1
BUG: sleeping function called from invalid context at
/uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/rwsem.c:21
in_atomic():1, irqs_disabled():0
INFO: lockdep is turned off.
Pid: 3257, comm: S99local Tainted: G      D W 2.6.27-rc4-00003-gef9b1bc #33
 [<c01312c1>] __might_sleep+0xf1/0x140
 [<c0684fe9>] down_read+0x19/0x80
 [<c03e007b>] ? tty_audit_add_data+0x1bb/0x2e0
 [<c013c33b>] exit_mm+0x2b/0x110
 [<c013d7c4>] do_exit+0x184/0x890
 [<c013b57b>] ? printk+0x1b/0x20
 [<c013a52a>] ? print_oops_end_marker+0x2a/0x30
 [<c01060f1>] oops_end+0xb1/0xc0
 [<c01067c0>] die+0x50/0x70
 [<c0106871>] do_trap+0x91/0xc0
BUG: unable to handle kernel paging request at 00100104
IP: [<c0389845>] __list_add+0x15/0x90
*pdpt = 0000000031480001 *pde = 0000000000000000
Oops: 0000 [#3] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3257, comm: S99local Tainted: G      D W (2.6.27-rc4-00003-gef9b1bc #33)
EIP: 0060:[<c0389845>] EFLAGS: 00210086 CPU: 0
EIP is at __list_add+0x15/0x90
EAX: f73ece6c EBX: 00100100 ECX: 00100100 EDX: f73ece6c
ESI: f73ece6c EDI: f73ece6c EBP: f25b16dc ESP: f25b16b8
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process S99local (pid: 3257, ti=f25b0000 task=f253f338 task.ti=f25b0000)
Stack: f6814e44 f6c00400 c06860b3 00000000 00000002 00000000 f73ece3c f73ece6c
       f73ece6c f25b1704 c018b9e4 0000001f 00000000 f6c00400 f6c00444 00000001
       f6814e44 f6814e38 00200002 f25b1778 c018d3a7 f6814e44 00000000 00000000
Call Trace:
 [<c06860b3>] ? _spin_lock+0x63/0x70
 [<c018b9e4>] ? rmqueue_bulk+0x54/0x80
 [<c018d3a7>] ? get_page_from_freelist+0x5a7/0x720
 [<c018dde0>] ? __alloc_pages_internal+0xa0/0x450
 [<c01b25a2>] ? kmem_getpages+0x62/0x110
 [<c01b2f4c>] ? cache_grow+0x39c/0x3b0
 [<c03890b6>] ? _raw_spin_unlock+0x46/0x80
 [<c01b3503>] ? cache_alloc_refill+0x1f3/0x230
 [<c01b36e5>] ? __kmalloc+0x1a5/0x1e0
 [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
 [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
 [<c03cfe0d>] ? tty_insert_flip_string_flags+0x2d/0xa0
 [<c03ee7b1>] ? receive_chars+0x161/0x290
 [<c03ef7e4>] ? serial8250_interrupt+0x134/0x150
 [<c017dc08>] ? handle_IRQ_event+0x28/0x70
 [<c017f1cf>] ? handle_edge_irq+0xaf/0x140
 [<c0108138>] ? do_IRQ+0x48/0xa0
 [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
 [<c0105a3c>] ? common_interrupt+0x28/0x30
 [<c013b2f1>] ? vprintk+0x151/0x3c0
 [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
 [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
 [<c013b57b>] ? printk+0x1b/0x20
 [<c016700a>] ? __print_symbol+0x2a/0x40
 [<c037a254>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
 [<c013b2f1>] ? vprintk+0x151/0x3c0
 [<c013b47b>] ? vprintk+0x2db/0x3c0
 [<c013f91a>] ? irq_exit+0x3a/0xa0
 [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
 [<c013b2f1>] ? vprintk+0x151/0x3c0
 [<c0106871>] ? do_trap+0x91/0xc0
 [<c013b57b>] ? printk+0x1b/0x20
 [<c0106871>] ? do_trap+0x91/0xc0
 [<c01061f0>] ? print_trace_address+0x40/0x50
 [<c0106871>] ? do_trap+0x91/0xc0
 [<c01062aa>] ? dump_trace+0xaa/0x120
 [<c0106346>] ? show_trace_log_lvl+0x26/0x40
 [<c0106b8a>] ? show_trace+0x1a/0x20
 [<c01074d2>] ? dump_stack+0x72/0x80
 [<c01312c1>] ? __might_sleep+0xf1/0x140
 [<c0684fe9>] ? down_read+0x19/0x80
 [<c03e007b>] ? tty_audit_add_data+0x1bb/0x2e0
 [<c013c33b>] ? exit_mm+0x2b/0x110
 [<c013d7c4>] ? do_exit+0x184/0x890
 [<c013b57b>] ? printk+0x1b/0x20
 [<c013a52a>] ? print_oops_end_marker+0x2a/0x30
 [<c01060f1>] ? oops_end+0xb1/0xc0
 [<c01067c0>] ? die+0x50/0x70
 [<c0106871>] ? do_trap+0x91/0xc0
 [<c0106940>] ? do_invalid_op+0x0/0xa0
 [<c01069c8>] ? do_invalid_op+0x88/0xa0
 [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
 [<c015c60a>] ? print_lock_contention_bug+0x1a/0xe0
 [<c0182527>] ? rcu_irq_exit+0x17/0x90
 [<c0686cba>] ? error_code+0x72/0x78
 [<c018976e>] ? mempool_free_slab+0xe/0x10
 [<c068007b>] ? start_secondary+0x19b/0x1c0
 [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
 [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
 [<c0683402>] ? wait_for_completion+0x12/0x20
 [<c018976e>] ? mempool_free_slab+0xe/0x10
 [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
 [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
 [<c018976e>] ? mempool_free_slab+0xe/0x10
 [<c018979c>] ? mempool_free+0x2c/0x90
 [<c0367252>] ? __blk_put_request+0x62/0x90
 [<c036764c>] ? blk_put_request+0x2c/0x50
 [<c0470732>] ? generic_ide_resume+0xa2/0xf0
 [<c03f767e>] ? device_resume+0x32e/0x380
 [<c01687d1>] ? hibernation_snapshot+0xa1/0x220
 [<c013b57b>] ? printk+0x1b/0x20
 [<c0168a30>] ? hibernate+0xe0/0x180
 [<c01674e0>] ? state_store+0x0/0xd0
 [<c016759f>] ? state_store+0xbf/0xd0
 [<c01674e0>] ? state_store+0x0/0xd0
 [<c03749d4>] ? kobj_attr_store+0x24/0x30
 [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
 [<c01ba946>] ? vfs_write+0x96/0x130
 [<c01f8e50>] ? sysfs_write_file+0x0/0x100
 [<c01bae8d>] ? sys_write+0x3d/0x70
 [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
 =======================
Code: c0 e8 10 0d db ff 8b 13 eb 97 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5
 83 ec 24 89 5d f4 89 cb 89 75 f8 89 d6 89 7d fc 89 c7 <8b> 41 04 39 d0 75 1d 8b
 06 39 d8 75 41 89 7b 04 89 1f 8b 5d f4
EIP: [<c0389845>] __list_add+0x15/0x90 SS:ESP 0068:f25b16b8
Kernel panic - not syncing: Fatal exception in interrupt
------------[ cut here ]------------
WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/smp.c:328
smp_call_function_mask+0x1b1/0x1c0()
Pid: 3257, comm: S99local Tainted: G      D W 2.6.27-rc4-00f9b1bc #33
 [<c013a97f>] warn_on_slowpath+0x4f/0x80
 [<c0397ff4>] ? vgacon_set_cursor_size+0xa4/0x120
 [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
 [<c018214b>] ? __rcu_read_unlock+0x9b/0xc0
 [<c015332c>] ? __atomic_notifier_call_chain+0x3c/0x50
 [<c03db7fd>] ? vt_console_print+0x1dd/0x2a0
 [<c015335a>] ? atomic_notifier_call_chain+0x1a/0x20
 [<c03890b6>] ? _raw_spin_unlock+0x46/0x80
 [<c068680c>] ? _spin_unlock+0x3c/0x50
 [<c03db7fd>] ? vt_console_print+0x1dd/0x2a0
 [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
 [<c013af61>] ? release_console_sem+0x1c1/0x1f0
 [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
 [<c0165f81>] smp_call_function_mask+0x1b1/0x1c0
 [<c0119280>] ? stop_this_cpu+0x0/0x50
 [<c0684808>] ? mutex_unlock+0x8/0x10
 [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
 [<c06847f1>] ? __mutex_unlock_slowpath+0x151/0x160
 [<c0684808>] ? mutex_unlock+0x8/0x10
 [<c016c46d>] ? crash_kexec+0x6d/0xc0
 [<c0684808>] ? mutex_unlock+0x8/0x10
 [<c016c46d>] ? crash_kexec+0x6d/0xc0
 [<c0119280>] ? stop_this_cpu+0x0/0x50
 [<c0165fc0>] smp_call_function+0x30/0x60
 [<c011937e>] native_smp_send_stop+0x1e/0x70
 [<c013a879>] panic+0x69/0x120
 [<c01060e5>] oops_end+0xa5/0xc0
 [<c01067c0>] die+0x50/0x70
 [<c0122b4f>] do_page_fault+0x1ef/0xa20
 [<c0122960>] ? do_page_fault+0x0/0xa20
 [<c0686cba>] error_code+0x72/0x78
 [<c0389845>] ? __list_add+0x15/0x90
 [<c06860b3>] ? _spin_lock+0x63/0x70
 [<c018b9e4>] rmqueue_bulk+0x54/0x80
 [<c018d3a7>] get_page_from_freelist+0x5a7/0x720
 [<c018dde0>] __alloc_pages_internal+0xa0/0x450
 [<c01b25a2>] kmem_getpages+0x62/0x110
 [<c01b2f4c>] cache_grow+0x39c/0x3b0
 [<c03890b6>] ? _raw_spin_unlock+0x46/0x80
 [<c01b3503>] cache_alloc_refill+0x1f3/0x230
 [<c01b36e5>] __kmalloc+0x1a5/0x1e0
 [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
 [<c03cd9b2>] tty_buffer_request_room+0xe2/0x130
 [<c03cfe0d>] tty_insert_flip_string_flags+0x2d/0xa0
 [<c03ee7b1>] receive_chars+0x161/0x290
 [<c03ef7e4>] serial8250_interrupt+0x134/0x150
 [<c017dc08>] handle_IRQ_event+0x28/0x70
 [<c017f1cf>] handle_edge_irq+0xaf/0x140
 [<c0108138>] do_IRQ+0x48/0xa0
 [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
 [<c0105a3c>] common_interrupt+0x28/0x30
 [<c013b2f1>] ? vprintk+0x151/0x3c0
 [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
 [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
 [<c013b57b>] printk+0x1b/0x20
 [<c016700a>] __print_symbol+0x2a/0x40
 [<c037a254>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
 [<c013b2f1>] ? vprintk+0x151/0x3c0
 [<c013b47b>] ? vprintk+0x2db/0x3c0
 [<c013f91a>] ? irq_exit+0x3a/0xa0
 [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
 [<c013b2f1>] ? vprintk+0x151/0x3c0
 [<c0106871>] ? do_trap+0x91/0xc0
 [<c013b57b>] ? printk+0x1b/0x20
 [<c0106871>] ? do_trap+0x91/0xc0
 [<c01061f0>] print_trace_address+0x40/0x50
 [<c0106871>] ? do_trap+0x91/0xc0
 [<c01062aa>] dump_trace+0xaa/0x120
 [<c0106346>] show_trace_log_lvl+0x26/0x40
 [<c0106b8a>] show_trace+0x1a/0x20
 [<c01074d2>] dump_stack+0x72/0x80
 [<c01312c1>] __might_sleep+0xf1/0x140
 [<c0684fe9>] down_read+0x19/0x80
 [<c03e007b>] ? tty_audit_add_data+0x1bb/0x2e0
 [<c013c33b>] exit_mm+0x2b/0x110
 [<c013d7c4>] do_exit+0x184/0x890
 [<c013b57b>] ? printk+0x1b/0x20
 [<c013a52a>] ? print_oops_end_marker+0x2a/0x30
 [<c01060f1>] oops_end+0xb1/0xc0
 [<c01067c0>] die+0x50/0x70
 [<c0106871>] do_trap+0x91/0xc0
 [<c0106940>] ? do_invalid_op+0x0/0xa0
 [<c01069c8>] do_invalid_op+0x88/0xa0
 [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
 [<c015c60a>] ? print_lock_contention_bug+0x1a/0xe0
 [<c0182527>] ? rcu_irq_exit+0x17/0x90
 [<c0686cba>] error_code+0x72/0x78
 [<c018976e>] ? mempool_free_slab+0xe/0x10
 [<c068007b>] ? start_secondary+0x19b/0x1c0
 [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
 [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
 [<c0683402>] ? wait_for_completion+0x12/0x20
 [<c018976e>] ? mempool_free_slab+0xe/0x10
 [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
 [<c01b3c9c>] kmem_cache_free+0x5c/0x200
 [<c018976e>] mempool_free_slab+0xe/0x10
 [<c018979c>] mempool_free+0x2c/0x90
 [<c0367252>] __blk_put_request+0x62/0x90
 [<c036764c>] blk_put_request+0x2c/0x50
 [<c0470732>] generic_ide_resume+0xa2/0xf0
 [<c03f767e>] device_resume+0x32e/0x380
 [<c01687d1>] hibernation_snapshot+0xa1/0x220
 [<c013b57b>] ? printk+0x1b/0x20
 [<c0168a30>] hibernate+0xe0/0x180
 [<c01674e0>] ? state_store+0x0/0xd0
 [<c016759f>] state_store+0xbf/0xd0
 [<c01674e0>] ? state_store+0x0/0xd0
 [<c03749d4>] kobj_attr_store+0x24/0x30
 [<c01f8ef2>] sysfs_write_file+0xa2/0x100
 [<c01ba946>] vfs_write+0x96/0x130
 [<c01f8e50>] ? sysfs_write_file+0x0/0x100
 [<c01bae8d>] sys_write+0x3d/0x70
 [<c0104f3b>] sysenter_do_call+0x12/0x3f
 =======================
---[ end trace 958cea1a710a109a ]---
Rebooting in 10 seconds..Initializing cgroup subsys cpuset


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 20:08             ` Vegard Nossum
@ 2008-08-21 20:22               ` Rafael J. Wysocki
  2008-08-21 20:26                 ` Andrew Morton
  2008-08-21 21:21               ` Pekka Enberg
  1 sibling, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2008-08-21 20:22 UTC (permalink / raw)
  To: Vegard Nossum, Andrew Morton; +Cc: Linux Kernel Mailing List

On Thursday, 21 of August 2008, Vegard Nossum wrote:
> On Thu, Aug 21, 2008 at 9:37 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > Can you switch to SLAB and retest?  I don't really think SLUB is the issue
> > here, but SLAB may give us additional information.
> 
> Got this with SLAB:

Well, it looks like this happened while devices were suspended, but I can't say
what's going wrong.  Andrew?  Who's the right person to CC?

> BUG: unable to handle kernel NULL pointer dereference at 00000000
> IP: [<c03897ac>] list_del+0xc/0x90
> *pdpt = 0000000031480001 *pde = 0000000000000000
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 7, comm: events/0 Not tainted (2.6.27-rc4-00003-gef9b1bc #33)
> EIP: 0060:[<c03897ac>] EFLAGS: 00010082 CPU: 0
> EIP is at list_del+0xc/0x90
> EAX: 00000000 EBX: f556afa0 ECX: f54a5380 EDX: f54e3128
> ESI: f54f1ef0 EDI: f556afa0 EBP: f68b7ee0 ESP: f68b7ec8
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process events/0 (pid: 7, ti=f68b6000 task=f689c338 task.ti=f68b6000)
> Stack: c015b62e f68b7ed4 c015b66d f68b7efc 00000046 f54f1f20 f68b7f1c c01b4054
>        00000000 f689c338 f54e315c 00000001 f54f1f20 f54a5380 00000000 f54e3128
>        f55b3348 00000000 f54f1f20 f54f1ef0 00000001 f68b7f3c c01b41ba 00000000
> Call Trace:
>  [<c015b62e>] ? get_lock_stats+0x1e/0x50
>  [<c015b66d>] ? put_lock_stats+0xd/0x30
>  [<c01b4054>] ? free_block+0xa4/0x1b0
>  [<c01b41ba>] ? drain_array+0x5a/0xb0
>  [<c01b4f82>] ? cache_reap+0x72/0x230
>  [<c014ada7>] ? run_workqueue+0x107/0x200
>  [<c014ae0a>] ? run_workqueue+0x16a/0x200
>  [<c014ada7>] ? run_workqueue+0x107/0x200
>  [<c01b4f10>] ? cache_reap+0x0/0x230
>  [<c014b94d>] ? worker_thread+0x7d/0xe0
>  [<c014e4f0>] ? autoremove_wake_function+0x0/0x50
>  [<c014b8d0>] ? worker_thread+0x0/0xe0
>  [<c014e1e2>] ? kthread+0x42/0x70
>  [<c014e1a0>] ? kthread+0x0/0x70
>  [<c0105cf3>] ? kernel_thread_helper+0x7/0x14
>  =======================
> Code: e8 01 89 44 24 04 e8 d4 1d db ff 8b 55 04 b8 48 2f 7a c0 e8 47 d8 dd ff e8
>  c2 dc d7 ff eb a9 55 89 e5 53 89 c3 83 ec 14 8b 40 04 <8b> 00 39 d8 75 24 8b 13
>  8b 42 04 39 d8 75 41 8b 43 04 89 42 04
> EIP: [<c03897ac>] list_del+0xc/0x90 SS:ESP 0068:f68b7ec8
> ---[ end trace 958cea1a710a109a ]---
> note: events/0[7] exited with preempt_count 1
> uhci_hcd 0000:00:1d.2: setting latency timer to 64
> uhci_hcd 0000:00:1d.3: setting latency timer to 64
> ehci_hcd 0000:00:1d.7: setting latency timer to 64
> pci 0000:00:1e.0: setting latency timer to 64
> ------------[ cut here ]------------
> WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/lib/list_debug.c:26
> __list_add+0x61/0x90()
> list_add corruption. next->prev should be prev (f6c00490), but was
> f73ecea8. (next=f73ecea8).
> Pid: 201, comm: rcu_torture_rea Tainted: G      D
> 2.6.27-rc4-00003-gef9b1bc #33
>  [<c013a58e>] warn_slowpath+0x5e/0x80
>  [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
>  [<c010b325>] ? native_sched_clock+0xb5/0x110
>  [<c0125c36>] ? kernel_map_pages+0xa6/0x130
>  [<c0389891>] __list_add+0x61/0x90
>  [<c018c978>] __free_pages_ok+0x368/0x410
>  [<c018ccf2>] __free_pages+0x22/0x40
>  [<c018cd58>] free_pages+0x48/0x50
>  [<c010be69>] free_thread_info+0x19/0x20
>  [<c0138929>] free_task+0x19/0x30
>  [<c0139e31>] __put_task_struct+0x51/0xa0
>  [<c013c0b7>] delayed_put_task_struct+0x27/0x30
>  [<c018206c>] rcu_process_callbacks+0x6c/0xb0
>  [<c013f5b3>] __do_softirq+0x83/0x100
>  [<c013f6d5>] do_softirq+0xa5/0xb0
>  [<c013f975>] irq_exit+0x95/0xa0
>  [<c010813d>] do_IRQ+0x4d/0xa0
>  [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
>  [<c0105a3c>] common_interrupt+0x28/0x30
>  [<c0683b44>] ? schedule+0x734/0x8f0
>  [<c068007b>] ? start_secondary+0x19b/0x1c0
>  [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
>  [<c068695b>] ? _spin_unlock_irq+0x2b/0x60
>  [<c0683b44>] schedule+0x734/0x8f0
>  [<c010509e>] ? restore_nocheck_notrace+0x0/
>  [<c015e93b>] ? trace_hardirqs_on+0xb/0x10
>  [<c0180c88>] rcu_torture_reader+0x148/0x230
>  [<c0181460>] ? rcu_torture_timer+0x0/0x100
>  [<c0180d19>] ? rcu_torture_reader+0x1d9/0x230
>  [<c0180b40>] ? rcu_torture_reader+0x0/0x230
>  [<c014e1e2>] kthread+0x42/0x70
>  [<c014e1a0>] ? kthread+0x0/0x70
>  [<c0105cf3>] kernel_thread_helper+0x7/0x14
>  =======================
> ---[ end trace 958cea1a710a109a ]---
> eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
> eth1: link down
> hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
> hda: UDMA/100 mode selected
> ------------[ cut here ]------------
> kernel BUG at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/mm/slab.c:590!
> invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3257, comm: S99local Tainted: G      D W (2.6.27-rc4-00003-gef9b1bc #33)
> EIP: 0060:[<c01b24c0>] EFLAGS: 00210002 CPU: 0
> EIP is at cache_free_debugcheck+0x2e0/0x310
> EAX: 00000000 EBX: f4c1f688 ECX: c018976e EDX: f785f744
> ESI: f6864180 EDI: f4c1f688 EBP: f25b1de8 ESP: f25b1d94
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process S99local (pid: 3257, ti=f25b0000 task=f253f338 task.ti=f25b0000)
> Stack: 00200200 f4c1f688 00000000 f25b1e20 f25b1dac c0683402 f25b1e58 c018976e
>        00000001 c036a7b0 00000002 00000000 00000000 c088c190 f253f338 f4c1f688
>        f253f338 f25b1df8 f6864180 f6960ef0 f4c1f688 f25b1e0c c01b3c9c f253f7f4
> Call Trace:
>  [<c0683402>] ? wait_for_completion+0x12/0x20
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
>  [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c018979c>] ? mempool_free+0x2c/0x90
>  [<c0367252>] ? __blk_put_request+0x62/0x90
>  [<c036764c>] ? blk_put_request+0x2c/0x50
>  [<c0470732>] ? generic_ide_resume+0xa2/0xf0
>  [<c03f767e>] ? device_resume+0x32e/0x380
>  [<c01687d1>] ? hibernation_snapshot+0xa1/0x220
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c0168a30>] ? hibernate+0xe0/0x180
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c016759f>] ? state_store+0xbf/0xd0
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c03749d4>] ? kobj_attr_store+0x24/0x30
>  [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
>  [<c01ba946>] ? vfs_write+0x96/0x130
>  [<c01f8e50>] ? sysfs_write_file+0x0/0x100
>  [<c01bae8d>] ? sys_write+0x3d/0x70
>  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
>  =======================
> Code: 00 00 75 cd c7 03 21 43 65 87 8b 5e 28 e9 78 ff ff ff 0f 0b eb fe 90 8d 74
>  26 00 8b 52 0c e9 ca fd ff ff 0f 0b eb fe 8d 74 26 00 <0f> 0b eb fe 0f 0b eb fe
>  0f 0b eb fe 8d 74 26 00 8b 52 0c 8b 02
> EIP: [<c01b24c0>] cache_free_debugcheck+0x2e0/0x310 SS:ESP 0068:f25b1d94
> ---[ end trace 958cea1a710a109a ]---
> note: S99local[3257] exited with preempt_count 1
> BUG: sleeping function called from invalid context at
> /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/rwsem.c:21
> in_atomic():1, irqs_disabled():0
> INFO: lockdep is turned off.
> Pid: 3257, comm: S99local Tainted: G      D W 2.6.27-rc4-00003-gef9b1bc #33
>  [<c01312c1>] __might_sleep+0xf1/0x140
>  [<c0684fe9>] down_read+0x19/0x80
>  [<c03e007b>] ? tty_audit_add_data+0x1bb/0x2e0
>  [<c013c33b>] exit_mm+0x2b/0x110
>  [<c013d7c4>] do_exit+0x184/0x890
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c013a52a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] oops_end+0xb1/0xc0
>  [<c01067c0>] die+0x50/0x70
>  [<c0106871>] do_trap+0x91/0xc0
> BUG: unable to handle kernel paging request at 00100104
> IP: [<c0389845>] __list_add+0x15/0x90
> *pdpt = 0000000031480001 *pde = 0000000000000000
> Oops: 0000 [#3] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3257, comm: S99local Tainted: G      D W (2.6.27-rc4-00003-gef9b1bc #33)
> EIP: 0060:[<c0389845>] EFLAGS: 00210086 CPU: 0
> EIP is at __list_add+0x15/0x90
> EAX: f73ece6c EBX: 00100100 ECX: 00100100 EDX: f73ece6c
> ESI: f73ece6c EDI: f73ece6c EBP: f25b16dc ESP: f25b16b8
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process S99local (pid: 3257, ti=f25b0000 task=f253f338 task.ti=f25b0000)
> Stack: f6814e44 f6c00400 c06860b3 00000000 00000002 00000000 f73ece3c f73ece6c
>        f73ece6c f25b1704 c018b9e4 0000001f 00000000 f6c00400 f6c00444 00000001
>        f6814e44 f6814e38 00200002 f25b1778 c018d3a7 f6814e44 00000000 00000000
> Call Trace:
>  [<c06860b3>] ? _spin_lock+0x63/0x70
>  [<c018b9e4>] ? rmqueue_bulk+0x54/0x80
>  [<c018d3a7>] ? get_page_from_freelist+0x5a7/0x720
>  [<c018dde0>] ? __alloc_pages_internal+0xa0/0x450
>  [<c01b25a2>] ? kmem_getpages+0x62/0x110
>  [<c01b2f4c>] ? cache_grow+0x39c/0x3b0
>  [<c03890b6>] ? _raw_spin_unlock+0x46/0x80
>  [<c01b3503>] ? cache_alloc_refill+0x1f3/0x230
>  [<c01b36e5>] ? __kmalloc+0x1a5/0x1e0
>  [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
>  [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
>  [<c03cfe0d>] ? tty_insert_flip_string_flags+0x2d/0xa0
>  [<c03ee7b1>] ? receive_chars+0x161/0x290
>  [<c03ef7e4>] ? serial8250_interrupt+0x134/0x150
>  [<c017dc08>] ? handle_IRQ_event+0x28/0x70
>  [<c017f1cf>] ? handle_edge_irq+0xaf/0x140
>  [<c0108138>] ? do_IRQ+0x48/0xa0
>  [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
>  [<c0105a3c>] ? common_interrupt+0x28/0x30
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
>  [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c016700a>] ? __print_symbol+0x2a/0x40
>  [<c037a254>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c013b47b>] ? vprintk+0x2db/0x3c0
>  [<c013f91a>] ? irq_exit+0x3a/0xa0
>  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c01061f0>] ? print_trace_address+0x40/0x50
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c01062aa>] ? dump_trace+0xaa/0x120
>  [<c0106346>] ? show_trace_log_lvl+0x26/0x40
>  [<c0106b8a>] ? show_trace+0x1a/0x20
>  [<c01074d2>] ? dump_stack+0x72/0x80
>  [<c01312c1>] ? __might_sleep+0xf1/0x140
>  [<c0684fe9>] ? down_read+0x19/0x80
>  [<c03e007b>] ? tty_audit_add_data+0x1bb/0x2e0
>  [<c013c33b>] ? exit_mm+0x2b/0x110
>  [<c013d7c4>] ? do_exit+0x184/0x890
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c013a52a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] ? oops_end+0xb1/0xc0
>  [<c01067c0>] ? die+0x50/0x70
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c0106940>] ? do_invalid_op+0x0/0xa0
>  [<c01069c8>] ? do_invalid_op+0x88/0xa0
>  [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
>  [<c015c60a>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c0182527>] ? rcu_irq_exit+0x17/0x90
>  [<c0686cba>] ? error_code+0x72/0x78
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c068007b>] ? start_secondary+0x19b/0x1c0
>  [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
>  [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
>  [<c0683402>] ? wait_for_completion+0x12/0x20
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
>  [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c018979c>] ? mempool_free+0x2c/0x90
>  [<c0367252>] ? __blk_put_request+0x62/0x90
>  [<c036764c>] ? blk_put_request+0x2c/0x50
>  [<c0470732>] ? generic_ide_resume+0xa2/0xf0
>  [<c03f767e>] ? device_resume+0x32e/0x380
>  [<c01687d1>] ? hibernation_snapshot+0xa1/0x220
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c0168a30>] ? hibernate+0xe0/0x180
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c016759f>] ? state_store+0xbf/0xd0
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c03749d4>] ? kobj_attr_store+0x24/0x30
>  [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
>  [<c01ba946>] ? vfs_write+0x96/0x130
>  [<c01f8e50>] ? sysfs_write_file+0x0/0x100
>  [<c01bae8d>] ? sys_write+0x3d/0x70
>  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
>  =======================
> Code: c0 e8 10 0d db ff 8b 13 eb 97 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5
>  83 ec 24 89 5d f4 89 cb 89 75 f8 89 d6 89 7d fc 89 c7 <8b> 41 04 39 d0 75 1d 8b
>  06 39 d8 75 41 89 7b 04 89 1f 8b 5d f4
> EIP: [<c0389845>] __list_add+0x15/0x90 SS:ESP 0068:f25b16b8
> Kernel panic - not syncing: Fatal exception in interrupt
> ------------[ cut here ]------------
> WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/smp.c:328
> smp_call_function_mask+0x1b1/0x1c0()
> Pid: 3257, comm: S99local Tainted: G      D W 2.6.27-rc4-00f9b1bc #33
>  [<c013a97f>] warn_on_slowpath+0x4f/0x80
>  [<c0397ff4>] ? vgacon_set_cursor_size+0xa4/0x120
>  [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
>  [<c018214b>] ? __rcu_read_unlock+0x9b/0xc0
>  [<c015332c>] ? __atomic_notifier_call_chain+0x3c/0x50
>  [<c03db7fd>] ? vt_console_print+0x1dd/0x2a0
>  [<c015335a>] ? atomic_notifier_call_chain+0x1a/0x20
>  [<c03890b6>] ? _raw_spin_unlock+0x46/0x80
>  [<c068680c>] ? _spin_unlock+0x3c/0x50
>  [<c03db7fd>] ? vt_console_print+0x1dd/0x2a0
>  [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
>  [<c013af61>] ? release_console_sem+0x1c1/0x1f0
>  [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
>  [<c0165f81>] smp_call_function_mask+0x1b1/0x1c0
>  [<c0119280>] ? stop_this_cpu+0x0/0x50
>  [<c0684808>] ? mutex_unlock+0x8/0x10
>  [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
>  [<c06847f1>] ? __mutex_unlock_slowpath+0x151/0x160
>  [<c0684808>] ? mutex_unlock+0x8/0x10
>  [<c016c46d>] ? crash_kexec+0x6d/0xc0
>  [<c0684808>] ? mutex_unlock+0x8/0x10
>  [<c016c46d>] ? crash_kexec+0x6d/0xc0
>  [<c0119280>] ? stop_this_cpu+0x0/0x50
>  [<c0165fc0>] smp_call_function+0x30/0x60
>  [<c011937e>] native_smp_send_stop+0x1e/0x70
>  [<c013a879>] panic+0x69/0x120
>  [<c01060e5>] oops_end+0xa5/0xc0
>  [<c01067c0>] die+0x50/0x70
>  [<c0122b4f>] do_page_fault+0x1ef/0xa20
>  [<c0122960>] ? do_page_fault+0x0/0xa20
>  [<c0686cba>] error_code+0x72/0x78
>  [<c0389845>] ? __list_add+0x15/0x90
>  [<c06860b3>] ? _spin_lock+0x63/0x70
>  [<c018b9e4>] rmqueue_bulk+0x54/0x80
>  [<c018d3a7>] get_page_from_freelist+0x5a7/0x720
>  [<c018dde0>] __alloc_pages_internal+0xa0/0x450
>  [<c01b25a2>] kmem_getpages+0x62/0x110
>  [<c01b2f4c>] cache_grow+0x39c/0x3b0
>  [<c03890b6>] ? _raw_spin_unlock+0x46/0x80
>  [<c01b3503>] cache_alloc_refill+0x1f3/0x230
>  [<c01b36e5>] __kmalloc+0x1a5/0x1e0
>  [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
>  [<c03cd9b2>] tty_buffer_request_room+0xe2/0x130
>  [<c03cfe0d>] tty_insert_flip_string_flags+0x2d/0xa0
>  [<c03ee7b1>] receive_chars+0x161/0x290
>  [<c03ef7e4>] serial8250_interrupt+0x134/0x150
>  [<c017dc08>] handle_IRQ_event+0x28/0x70
>  [<c017f1cf>] handle_edge_irq+0xaf/0x140
>  [<c0108138>] do_IRQ+0x48/0xa0
>  [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
>  [<c0105a3c>] common_interrupt+0x28/0x30
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
>  [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
>  [<c013b57b>] printk+0x1b/0x20
>  [<c016700a>] __print_symbol+0x2a/0x40
>  [<c037a254>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c013b47b>] ? vprintk+0x2db/0x3c0
>  [<c013f91a>] ? irq_exit+0x3a/0xa0
>  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c01061f0>] print_trace_address+0x40/0x50
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c01062aa>] dump_trace+0xaa/0x120
>  [<c0106346>] show_trace_log_lvl+0x26/0x40
>  [<c0106b8a>] show_trace+0x1a/0x20
>  [<c01074d2>] dump_stack+0x72/0x80
>  [<c01312c1>] __might_sleep+0xf1/0x140
>  [<c0684fe9>] down_read+0x19/0x80
>  [<c03e007b>] ? tty_audit_add_data+0x1bb/0x2e0
>  [<c013c33b>] exit_mm+0x2b/0x110
>  [<c013d7c4>] do_exit+0x184/0x890
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c013a52a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] oops_end+0xb1/0xc0
>  [<c01067c0>] die+0x50/0x70
>  [<c0106871>] do_trap+0x91/0xc0
>  [<c0106940>] ? do_invalid_op+0x0/0xa0
>  [<c01069c8>] do_invalid_op+0x88/0xa0
>  [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
>  [<c015c60a>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c0182527>] ? rcu_irq_exit+0x17/0x90
>  [<c0686cba>] error_code+0x72/0x78
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c068007b>] ? start_secondary+0x19b/0x1c0
>  [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
>  [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
>  [<c0683402>] ? wait_for_completion+0x12/0x20
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
>  [<c01b3c9c>] kmem_cache_free+0x5c/0x200
>  [<c018976e>] mempool_free_slab+0xe/0x10
>  [<c018979c>] mempool_free+0x2c/0x90
>  [<c0367252>] __blk_put_request+0x62/0x90
>  [<c036764c>] blk_put_request+0x2c/0x50
>  [<c0470732>] generic_ide_resume+0xa2/0xf0
>  [<c03f767e>] device_resume+0x32e/0x380
>  [<c01687d1>] hibernation_snapshot+0xa1/0x220
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c0168a30>] hibernate+0xe0/0x180
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c016759f>] state_store+0xbf/0xd0
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c03749d4>] kobj_attr_store+0x24/0x30
>  [<c01f8ef2>] sysfs_write_file+0xa2/0x100
>  [<c01ba946>] vfs_write+0x96/0x130
>  [<c01f8e50>] ? sysfs_write_file+0x0/0x100
>  [<c01bae8d>] sys_write+0x3d/0x70
>  [<c0104f3b>] sysenter_do_call+0x12/0x3f
>  =======================
> ---[ end trace 958cea1a710a109a ]---
> Rebooting in 10 seconds..Initializing cgroup subsys cpuset
> 
> 
> Vegard
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 20:22               ` Rafael J. Wysocki
@ 2008-08-21 20:26                 ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2008-08-21 20:26 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: vegard.nossum, linux-kernel, Pekka Enberg

On Thu, 21 Aug 2008 22:22:44 +0200
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Thursday, 21 of August 2008, Vegard Nossum wrote:
> > On Thu, Aug 21, 2008 at 9:37 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > Can you switch to SLAB and retest?  I don't really think SLUB is the issue
> > > here, but SLAB may give us additional information.
> > 
> > Got this with SLAB:
> 
> Well, it looks like this happened while devices were suspended, but I can't say
> what's going wrong.  Andrew?  Who's the right person to CC?

Pekka, I guess.

> > BUG: unable to handle kernel NULL pointer dereference at 00000000
> > IP: [<c03897ac>] list_del+0xc/0x90
> > *pdpt = 0000000031480001 *pde = 0000000000000000
> > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > Pid: 7, comm: events/0 Not tainted (2.6.27-rc4-00003-gef9b1bc #33)
> > EIP: 0060:[<c03897ac>] EFLAGS: 00010082 CPU: 0
> > EIP is at list_del+0xc/0x90
> > EAX: 00000000 EBX: f556afa0 ECX: f54a5380 EDX: f54e3128
> > ESI: f54f1ef0 EDI: f556afa0 EBP: f68b7ee0 ESP: f68b7ec8
> >  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > Process events/0 (pid: 7, ti=f68b6000 task=f689c338 task.ti=f68b6000)
> > Stack: c015b62e f68b7ed4 c015b66d f68b7efc 00000046 f54f1f20 f68b7f1c c01b4054
> >        00000000 f689c338 f54e315c 00000001 f54f1f20 f54a5380 00000000 f54e3128
> >        f55b3348 00000000 f54f1f20 f54f1ef0 00000001 f68b7f3c c01b41ba 00000000
> > Call Trace:
> >  [<c015b62e>] ? get_lock_stats+0x1e/0x50
> >  [<c015b66d>] ? put_lock_stats+0xd/0x30
> >  [<c01b4054>] ? free_block+0xa4/0x1b0
> >  [<c01b41ba>] ? drain_array+0x5a/0xb0
> >  [<c01b4f82>] ? cache_reap+0x72/0x230
> >  [<c014ada7>] ? run_workqueue+0x107/0x200
> >  [<c014ae0a>] ? run_workqueue+0x16a/0x200
> >  [<c014ada7>] ? run_workqueue+0x107/0x200
> >  [<c01b4f10>] ? cache_reap+0x0/0x230
> >  [<c014b94d>] ? worker_thread+0x7d/0xe0
> >  [<c014e4f0>] ? autoremove_wake_function+0x0/0x50
> >  [<c014b8d0>] ? worker_thread+0x0/0xe0
> >  [<c014e1e2>] ? kthread+0x42/0x70
> >  [<c014e1a0>] ? kthread+0x0/0x70
> >  [<c0105cf3>] ? kernel_thread_helper+0x7/0x14
> >  =======================
> > Code: e8 01 89 44 24 04 e8 d4 1d db ff 8b 55 04 b8 48 2f 7a c0 e8 47 d8 dd ff e8
> >  c2 dc d7 ff eb a9 55 89 e5 53 89 c3 83 ec 14 8b 40 04 <8b> 00 39 d8 75 24 8b 13
> >  8b 42 04 39 d8 75 41 8b 43 04 89 42 04
> > EIP: [<c03897ac>] list_del+0xc/0x90 SS:ESP 0068:f68b7ec8

But slab hasn't changed in quite some time.  I expect something else
has reached over and scribbled on it.   Bisection time?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 20:08             ` Vegard Nossum
  2008-08-21 20:22               ` Rafael J. Wysocki
@ 2008-08-21 21:21               ` Pekka Enberg
  2008-08-21 22:10                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 21+ messages in thread
From: Pekka Enberg @ 2008-08-21 21:21 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Andrew Morton, Jens Axboe

Hi Vegard,

On Thu, Aug 21, 2008 at 11:08 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> On Thu, Aug 21, 2008 at 9:37 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> Can you switch to SLAB and retest?  I don't really think SLUB is the issue
>> here, but SLAB may give us additional information.
>
> Got this with SLAB:
>
> BUG: unable to handle kernel NULL pointer dereference at 00000000
> IP: [<c03897ac>] list_del+0xc/0x90
> *pdpt = 0000000031480001 *pde = 0000000000000000
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 7, comm: events/0 Not tainted (2.6.27-rc4-00003-gef9b1bc #33)
> EIP: 0060:[<c03897ac>] EFLAGS: 00010082 CPU: 0
> EIP is at list_del+0xc/0x90
> EAX: 00000000 EBX: f556afa0 ECX: f54a5380 EDX: f54e3128
> ESI: f54f1ef0 EDI: f556afa0 EBP: f68b7ee0 ESP: f68b7ec8
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process events/0 (pid: 7, ti=f68b6000 task=f689c338 task.ti=f68b6000)
> Stack: c015b62e f68b7ed4 c015b66d f68b7efc 00000046 f54f1f20 f68b7f1c c01b4054
>       00000000 f689c338 f54e315c 00000001 f54f1f20 f54a5380 00000000 f54e3128
>       f55b3348 00000000 f54f1f20 f54f1ef0 00000001 f68b7f3c c01b41ba 00000000
> Call Trace:
>  [<c015b62e>] ? get_lock_stats+0x1e/0x50
>  [<c015b66d>] ? put_lock_stats+0xd/0x30

Looks like slabp->list is corrupted when we do list_del() in
free_block(). Why is the stack trace so unreliable, though?

>  [<c01b4054>] ? free_block+0xa4/0x1b0
>  [<c01b41ba>] ? drain_array+0x5a/0xb0
>  [<c01b4f82>] ? cache_reap+0x72/0x230
>  [<c014ada7>] ? run_workqueue+0x107/0x200
>  [<c014ae0a>] ? run_workqueue+0x16a/0x200
>  [<c014ada7>] ? run_workqueue+0x107/0x200
>  [<c01b4f10>] ? cache_reap+0x0/0x230
>  [<c014b94d>] ? worker_thread+0x7d/0xe0
>  [<c014e4f0>] ? autoremove_wake_function+0x0/0x50
>  [<c014b8d0>] ? worker_thread+0x0/0xe0
>  [<c014e1e2>] ? kthread+0x42/0x70
>  [<c014e1a0>] ? kthread+0x0/0x70
>  [<c0105cf3>] ? kernel_thread_helper+0x7/0x14
>  =======================
> Code: e8 01 89 44 24 04 e8 d4 1d db ff 8b 55 04 b8 48 2f 7a c0 e8 47 d8 dd ff e8
>  c2 dc d7 ff eb a9 55 89 e5 53 89 c3 83 ec 14 8b 40 04 <8b> 00 39 d8 75 24 8b 13
>  8b 42 04 39 d8 75 41 8b 43 04 89 42 04
> EIP: [<c03897ac>] list_del+0xc/0x90 SS:ESP 0068:f68b7ec8
> ---[ end trace 958cea1a710a109a ]---
> note: events/0[7] exited with preempt_count 1
> uhci_hcd 0000:00:1d.2: setting latency timer to 64
> uhci_hcd 0000:00:1d.3: setting latency timer to 64
> ehci_hcd 0000:00:1d.7: setting latency timer to 64
> pci 0000:00:1e.0: setting latency timer to 64
> ------------[ cut here ]------------
> WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/lib/list_debug.c:26
> __list_add+0x61/0x90()
> list_add corruption. next->prev should be prev (f6c00490), but was
> f73ecea8. (next=f73ecea8).
> Pid: 201, comm: rcu_torture_rea Tainted: G      D
> 2.6.27-rc4-00003-gef9b1bc #33
>  [<c013a58e>] warn_slowpath+0x5e/0x80
>  [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
>  [<c010b325>] ? native_sched_clock+0xb5/0x110
>  [<c0125c36>] ? kernel_map_pages+0xa6/0x130

This looks most interesting. The free_list in zone->free_area is corrupted.

>  [<c0389891>] __list_add+0x61/0x90
>  [<c018c978>] __free_pages_ok+0x368/0x410
>  [<c018ccf2>] __free_pages+0x22/0x40
>  [<c018cd58>] free_pages+0x48/0x50
>  [<c010be69>] free_thread_info+0x19/0x20
>  [<c0138929>] free_task+0x19/0x30
>  [<c0139e31>] __put_task_struct+0x51/0xa0
>  [<c013c0b7>] delayed_put_task_struct+0x27/0x30
>  [<c018206c>] rcu_process_callbacks+0x6c/0xb0
>  [<c013f5b3>] __do_softirq+0x83/0x100
>  [<c013f6d5>] do_softirq+0xa5/0xb0
>  [<c013f975>] irq_exit+0x95/0xa0
>  [<c010813d>] do_IRQ+0x4d/0xa0
>  [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
>  [<c0105a3c>] common_interrupt+0x28/0x30
>  [<c0683b44>] ? schedule+0x734/0x8f0
>  [<c068007b>] ? start_secondary+0x19b/0x1c0
>  [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
>  [<c068695b>] ? _spin_unlock_irq+0x2b/0x60
>  [<c0683b44>] schedule+0x734/0x8f0
>  [<c010509e>] ? restore_nocheck_notrace+0x0/
>  [<c015e93b>] ? trace_hardirqs_on+0xb/0x10
>  [<c0180c88>] rcu_torture_reader+0x148/0x230
>  [<c0181460>] ? rcu_torture_timer+0x0/0x100
>  [<c0180d19>] ? rcu_torture_reader+0x1d9/0x230
>  [<c0180b40>] ? rcu_torture_reader+0x0/0x230
>  [<c014e1e2>] kthread+0x42/0x70
>  [<c014e1a0>] ? kthread+0x0/0x70
>  [<c0105cf3>] kernel_thread_helper+0x7/0x14
>  =======================
> ---[ end trace 958cea1a710a109a ]---
> eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
> eth1: link down
> hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
> hda: UDMA/100 mode selected
> ------------[ cut here ]------------
> kernel BUG at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/mm/slab.c:590!
> invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3257, comm: S99local Tainted: G      D W (2.6.27-rc4-00003-gef9b1bc #33)
> EIP: 0060:[<c01b24c0>] EFLAGS: 00210002 CPU: 0
> EIP is at cache_free_debugcheck+0x2e0/0x310
> EAX: 00000000 EBX: f4c1f688 ECX: c018976e EDX: f785f744
> ESI: f6864180 EDI: f4c1f688 EBP: f25b1de8 ESP: f25b1d94
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process S99local (pid: 3257, ti=f25b0000 task=f253f338 task.ti=f25b0000)
> Stack: 00200200 f4c1f688 00000000 f25b1e20 f25b1dac c0683402 f25b1e58 c018976e
>       00000001 c036a7b0 00000002 00000000 00000000 c088c190 f253f338 f4c1f688
>       f253f338 f25b1df8 f6864180 f6960ef0 f4c1f688 f25b1e0c c01b3c9c f253f7f4
> Call Trace:
>  [<c0683402>] ? wait_for_completion+0x12/0x20

Here the block layer is passing a pointer to a non-slab page to
kmem_cache_free(). Assuming we can rely on the stack trace, of
course...

>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
>  [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c018979c>] ? mempool_free+0x2c/0x90
>  [<c0367252>] ? __blk_put_request+0x62/0x90
>  [<c036764c>] ? blk_put_request+0x2c/0x50
>  [<c0470732>] ? generic_ide_resume+0xa2/0xf0
>  [<c03f767e>] ? device_resume+0x32e/0x380
>  [<c01687d1>] ? hibernation_snapshot+0xa1/0x220
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c0168a30>] ? hibernate+0xe0/0x180
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c016759f>] ? state_store+0xbf/0xd0
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c03749d4>] ? kobj_attr_store+0x24/0x30
>  [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
>  [<c01ba946>] ? vfs_write+0x96/0x130
>  [<c01f8e50>] ? sysfs_write_file+0x0/0x100
>  [<c01bae8d>] ? sys_write+0x3d/0x70
>  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
>  =======================
> Code: 00 00 75 cd c7 03 21 43 65 87 8b 5e 28 e9 78 ff ff ff 0f 0b eb fe 90 8d 74
>  26 00 8b 52 0c e9 ca fd ff ff 0f 0b eb fe 8d 74 26 00 <0f> 0b eb fe 0f 0b eb fe
>  0f 0b eb fe 8d 74 26 00 8b 52 0c 8b 02
> EIP: [<c01b24c0>] cache_free_debugcheck+0x2e0/0x310 SS:ESP 0068:f25b1d94
> ---[ end trace 958cea1a710a109a ]---
> note: S99local[3257] exited with preempt_count 1

> BUG: unable to handle kernel paging request at 00100104
> IP: [<c0389845>] __list_add+0x15/0x90
> *pdpt = 0000000031480001 *pde = 0000000000000000
> Oops: 0000 [#3] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3257, comm: S99local Tainted: G      D W (2.6.27-rc4-00003-gef9b1bc #33)
> EIP: 0060:[<c0389845>] EFLAGS: 00210086 CPU: 0
> EIP is at __list_add+0x15/0x90
> EAX: f73ece6c EBX: 00100100 ECX: 00100100 EDX: f73ece6c
> ESI: f73ece6c EDI: f73ece6c EBP: f25b16dc ESP: f25b16b8
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process S99local (pid: 3257, ti=f25b0000 task=f253f338 task.ti=f25b0000)
> Stack: f6814e44 f6c00400 c06860b3 00000000 00000002 00000000 f73ece3c f73ece6c
>       f73ece6c f25b1704 c018b9e4 0000001f 00000000 f6c00400 f6c00444 00000001
>       f6814e44 f6814e38 00200002 f25b1778 c018d3a7 f6814e44 00000000 00000000
> Call Trace:
>  [<c06860b3>] ? _spin_lock+0x63/0x70

And now we have the per-cpu page lists corrupted as well. Note that
even though we have tty showing up in the stack traces, the list has
already been corrupted by someone else. So I don't think tty is at
fault here.

>  [<c018b9e4>] ? rmqueue_bulk+0x54/0x80
>  [<c018d3a7>] ? get_page_from_freelist+0x5a7/0x720
>  [<c018dde0>] ? __alloc_pages_internal+0xa0/0x450
>  [<c01b25a2>] ? kmem_getpages+0x62/0x110
>  [<c01b2f4c>] ? cache_grow+0x39c/0x3b0
>  [<c03890b6>] ? _raw_spin_unlock+0x46/0x80
>  [<c01b3503>] ? cache_alloc_refill+0x1f3/0x230
>  [<c01b36e5>] ? __kmalloc+0x1a5/0x1e0
>  [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
>  [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
>  [<c03cfe0d>] ? tty_insert_flip_string_flags+0x2d/0xa0
>  [<c03ee7b1>] ? receive_chars+0x161/0x290
>  [<c03ef7e4>] ? serial8250_interrupt+0x134/0x150
>  [<c017dc08>] ? handle_IRQ_event+0x28/0x70
>  [<c017f1cf>] ? handle_edge_irq+0xaf/0x140
>  [<c0108138>] ? do_IRQ+0x48/0xa0
>  [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
>  [<c0105a3c>] ? common_interrupt+0x28/0x30
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
>  [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c016700a>] ? __print_symbol+0x2a/0x40
>  [<c037a254>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c013b47b>] ? vprintk+0x2db/0x3c0
>  [<c013f91a>] ? irq_exit+0x3a/0xa0
>  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
>  [<c013b2f1>] ? vprintk+0x151/0x3c0
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c01061f0>] ? print_trace_address+0x40/0x50
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c01062aa>] ? dump_trace+0xaa/0x120
>  [<c0106346>] ? show_trace_log_lvl+0x26/0x40
>  [<c0106b8a>] ? show_trace+0x1a/0x20
>  [<c01074d2>] ? dump_stack+0x72/0x80
>  [<c01312c1>] ? __might_sleep+0xf1/0x140
>  [<c0684fe9>] ? down_read+0x19/0x80
>  [<c03e007b>] ? tty_audit_add_data+0x1bb/0x2e0
>  [<c013c33b>] ? exit_mm+0x2b/0x110
>  [<c013d7c4>] ? do_exit+0x184/0x890
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c013a52a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] ? oops_end+0xb1/0xc0
>  [<c01067c0>] ? die+0x50/0x70
>  [<c0106871>] ? do_trap+0x91/0xc0
>  [<c0106940>] ? do_invalid_op+0x0/0xa0
>  [<c01069c8>] ? do_invalid_op+0x88/0xa0
>  [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
>  [<c015c60a>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c0182527>] ? rcu_irq_exit+0x17/0x90
>  [<c0686cba>] ? error_code+0x72/0x78
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c068007b>] ? start_secondary+0x19b/0x1c0
>  [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
>  [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
>  [<c0683402>] ? wait_for_completion+0x12/0x20
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
>  [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
>  [<c018976e>] ? mempool_free_slab+0xe/0x10
>  [<c018979c>] ? mempool_free+0x2c/0x90
>  [<c0367252>] ? __blk_put_request+0x62/0x90
>  [<c036764c>] ? blk_put_request+0x2c/0x50
>  [<c0470732>] ? generic_ide_resume+0xa2/0xf0
>  [<c03f767e>] ? device_resume+0x32e/0x380
>  [<c01687d1>] ? hibernation_snapshot+0xa1/0x220
>  [<c013b57b>] ? printk+0x1b/0x20
>  [<c0168a30>] ? hibernate+0xe0/0x180
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c016759f>] ? state_store+0xbf/0xd0
>  [<c01674e0>] ? state_store+0x0/0xd0
>  [<c03749d4>] ? kobj_attr_store+0x24/0x30
>  [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
>  [<c01ba946>] ? vfs_write+0x96/0x130
>  [<c01f8e50>] ? sysfs_write_file+0x0/0x100
>  [<c01bae8d>] ? sys_write+0x3d/0x70
>  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
>  =======================
> Code: c0 e8 10 0d db ff 8b 13 eb 97 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5
>  83 ec 24 89 5d f4 89 cb 89 75 f8 89 d6 89 7d fc 89 c7 <8b> 41 04 39 d0 75 1d 8b
>  06 39 d8 75 41 89 7b 04 89 1f 8b 5d f4
> EIP: [<c0389845>] __list_add+0x15/0x90 SS:ESP 0068:f25b16b8

I don't know the suspend code at all but the bogus pointer coming from
hibernation_snapshot() seems suspicious. Another possibility is that
the block layer is doing something strange here. Dunno.

                           Pekka

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 18:21   ` Vegard Nossum
  2008-08-21 18:45     ` Vegard Nossum
@ 2008-08-21 21:27     ` Pekka Enberg
  2008-08-21 22:16       ` Rafael J. Wysocki
  1 sibling, 1 reply; 21+ messages in thread
From: Pekka Enberg @ 2008-08-21 21:27 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Andrew Morton, Jens Axboe

On Thu, Aug 21, 2008 at 9:21 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> I got a "proper" oops this time (brace yourself, it's long ;-)), and
> it does not have serial on the stack:
>
> BUG: unable to handle kernel paging request at 00100104
> IP: [<c038ad65>] __list_add+0x15/0x90
> *pdpt = 00000000325bb001 *pde = 0000000000000000
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3597, comm: bash Not tainted (2.6.27-rc4-00003-ga798564-dirty #30)
> EIP: 0060:[<c038ad65>] EFLAGS: 00210082 CPU: 0
> EIP is at __list_add+0x15/0x90
> EAX: f7455764 EBX: 00100100 ECX: 00100100 EDX: f7455764
> ESI: f7455764 EDI: f7455764 EBP: f25e7c18 ESP: f25e7bf4
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process bash (pid: 3597, ti=f25e6000 task=f25aa700 task.ti=f25e6000)
> Stack: f68120bc f6c00400 c0687133 00000000 00000002 00000000 f7455734 f7455764
>       f7455764 f25e7c40 c018b954 0000001f 00000000 f6c00400 f6c00444 00000001
>       f68120bc f68120b0 00200002 f25e7cb4 c018d317 f68120bc 00000000 f25aac0c
> Call Trace:
>  [<c0687133>] ? _spin_lock+0x63/0x70

Corruption in the page allocator freelists.

>  [<c018b954>] ? rmqueue_bulk+0x54/0x80
>  [<c018d317>] ? get_page_from_freelist+0x5a7/0x720
>  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>  [<c018dd50>] ? __alloc_pages_internal+0xa0/0x450
>  [<c01acd4b>] ? alloc_pages_current+0x7b/0xc0
>  [<c01b37fb>] ? new_slab+0x1bb/0x2d0
>  [<c0687877>] ? _spin_unlock+0x27/0x50
>  [<c01b40ca>] ? __slab_alloc+0x32a/0x4e0
>  [<c010b335>] ? native_sched_clock+0xb5/0x110
>  [<c01b4424>] ? kmem_cache_alloc+0xb4/0xe0
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c01897a1>] ? mempool_alloc+0x31/0xf0
>  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
>  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
>  [<c0368c7e>] ? get_request+0xae/0x2c0
>  [<c036935c>] ? get_request_wait+0x1c/0xd0
>  [<c0687462>] ? _spin_lock_irq+0x72/0x80
>  [<c0369442>] ? blk_get_request+0x32/0x70
>  [<c0471c1c>] ? generic_ide_resume+0x5c/0xf0
>  [<c03f8bde>] ? device_resume+0x32e/0x380
>  [<c0168791>] ? hibernation_snapshot+0xa1/0x220
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c01689f0>] ? hibernate+0xe0/0x180
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c016755f>] ? state_store+0xbf/0xd0
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c0375ef4>] ? kobj_attr_store+0x24/0x30
>  [<c01fa432>] ? sysfs_write_file+0xa2/0x100
>  [<c01bbf06>] ? vfs_write+0x96/0x130
>  [<c01fa390>] ? sysfs_write_file+0x0/0x100
>  [<c01bc44d>] ? sys_write+0x3d/0x70
>  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
>  =======================
> Code: c0 e8 d0 f7 da ff 8b 13 eb 97 8d b6 00 00 00 00 8d bf 00 00 00
> 00 55 89 e5 83 ec 24 89 5d f4 89 cb 89 75 f8 89 d6 89 7d fc 89 c7 <8b>
> 41 04 39 d0 75 1d 8b 06 39 d8 75 41 89 7b 04 89 1f 8b 5d f4
> EIP: [<c038ad65>] __list_add+0x15/0x90 SS:ESP 0068:f25e7bf4
> ---[ end trace e3ed674f2f20c5d3 ]---
> note: bash[3597] exited with preempt_count 2
> Eeek! page_mapcount(page) went negative! (-1)
>  page pfn = 3e74f
>  page->flags = 210007c
>  page->count = 1
>  page->mapping = f6439028
>  vma->vm_ops = generic_file_vm_ops+0x0/0x20
>  vma->vm_ops->fault = filemap_fault+0x0/0x480
>  vma->vm_file->f_op->mmap = generic_file_mmap+0x0/0x50
> ------------[ cut here ]------------
> kernel BUG at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/mm/rmap.c:662!
> invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3597, comm: bash Tainted: G      D   (2.6.27-rc4-00003-ga798564-dirty #30)
> EIP: 0060:[<c01a0f39>] EFLAGS: 00210286 CPU: 0
> EIP is at page_remove_rmap+0x109/0x120
> EAX: 0000003b EBX: f7aa5684 ECX: f25e6000 EDX: 00000005
> ESI: f25b94c8 EDI: 3e74f025 EBP: f25e7990 ESP: f25e7980
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process bash (pid: 3597, ti=f25e6000 task=f25aa700 task.ti=f25e6000)
> Stack: c077ed7c f6439028 00000000 00000025 f25e7a38 c0198721 3e74f025 00000000
>       f25e79c0 c015c5ea 00000025 00000000 325a3067 00000000 00000000 f25b94c8
>       f25e7a50 00000000 00000000 3e74f025 00000000 00007000 0097a000 00000000
> Call Trace:

No idea what happened here but it looks bad :-)

>  [<c0198721>] ? unmap_vmas+0x4b1/0x8b0
>  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c019d504>] ? exit_mmap+0x84/0x120
>  [<c0138538>] ? mmput+0x48/0xa0
>  [<c013c3d7>] ? exit_mm+0xe7/0x110
>  [<c013d7a4>] ? do_exit+0x184/0x890
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] ? oops_end+0xb1/0xc0
>  [<c01067c0>] ? die+0x50/0x70
>  [<c0122b4f>] ? do_page_fault+0x1ef/0xa20
>  [<c010b335>] ? native_sched_clock+0xb5/0x110
>  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>  [<c0122960>] ? do_page_fault+0x0/0xa20
>  [<c0687d3a>] ? error_code+0x72/0x78
>  [<c038ad65>] ? __list_add+0x15/0x90
>  [<c0687133>] ? _spin_lock+0x63/0x70
>  [<c018b954>] ? rmqueue_bulk+0x54/0x80
>  [<c018d317>] ? get_page_from_freelist+0x5a7/0x720
>  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>  [<c018dd50>] ? __alloc_pages_internal+0xa0/0x450
>  [<c01acd4b>] ? alloc_pages_current+0x7b/0xc0
>  [<c01b37fb>] ? new_slab+0x1bb/0x2d0
>  [<c0687877>] ? _spin_unlock+0x27/0x50
>  [<c01b40ca>] ? __slab_alloc+0x32a/0x4e0
>  [<c010b335>] ? native_sched_clock+0xb5/0x110
>  [<c01b4424>] ? kmem_cache_alloc+0xb4/0xe0
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c01897a1>] ? mempool_alloc+0x31/0xf0
>  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
>  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
>  [<c0368c7e>] ? get_request+0xae/0x2c0
>  [<c036935c>] ? get_request_wait+0x1c/0xd0
>  [<c0687462>] ? _spin_lock_irq+0x72/0x80
>  [<c0369442>] ? blk_get_request+0x32/0x70
>  [<c0471c1c>] ? generic_ide_resume+0x5c/0xf0
>  [<c03f8bde>] ? device_resume+0x32e/0x380
>  [<c0168791>] ? hibernation_snapshot+0xa1/0x220
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c01689f0>] ? hibernate+0xe0/0x180
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c016755f>] ? state_store+0xbf/0xd0
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c0375ef4>] ? kobj_attr_store+0x24/0x30
>  [<c01fa432>] ? sysfs_write_file+0xa2/0x100
>  [<c01bbf06>] ? vfs_write+0x96/0x130
>  [<c01fa390>] ? sysfs_write_file+0x0/0x100
>  [<c01bc44d>] ? sys_write+0x3d/0x70
>  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
>  =======================
> Code: c0 74 0d 8b 50 08 b8 ac ed 77 c0 e8 82 60 fc ff 8b 46 4c 85 c0
> 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 d8 d1 77 c0 e8 67 60 fc ff <0f>
> 0b eb fe 8b 53 0c eb 95 8d b4 26 00 00 00 00 8d bc 27 00 00
> EIP: [<c01a0f39>] page_remove_rmap+0x109/0x120 SS:ESP 0068:f25e7980
> ---[ end trace e3ed674f2f20c5d3 ]---
> Fixing recursive fault but reboot is needed!
> =============================================================================
> BUG blkdev_ioc: Invalid object pointer 0xf5cdaca8
> -----------------------------------------------------------------------------

Ok, here we have the block layer passing a bad pointer to SLUB this
time. And it's also from the suspend code (although it's the resume
path this time). As we never see an oops from the block layer first,
it's possible that someone else corrupted everything and it just shows
up in the block layer. Maybe something worth investigating, though.

> INFO: Slab 0xf789e318 objects=14 used=14 fp=0x00000000 flags=0x2082083
> Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
>  [<c01b2576>] slab_err+0x46/0x50
>  [<c01b2766>] ? check_slab+0xd6/0xf0
>  [<c0181aef>] ? call_rcu+0x6f/0x80
>  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
>  [<c01b3c78>] __slab_free+0x238/0x360
>  [<c01b4749>] kmem_cache_free+0xa9/0x120
>  [<c036b773>] ? put_io_context+0x53/0x70
>  [<c036b773>] ? put_io_context+0x53/0x70
>  [<c036b773>] put_io_context+0x53/0x70
>  [<c036b82e>] exit_io_context+0x6e/0x80
>  [<c013de6e>] do_exit+0x84e/0x890
>  [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] oops_end+0xb1/0xc0
>  [<c01067c0>] die+0x50/0x70
>  [<c0106871>] do_trap+0x91/0xc0
>  [<c0106940>] ? do_invalid_op+0x0/0xa0
>  [<c01069c8>] do_invalid_op+0x88/0xa0
>  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
>  [<c013b2d1>] ? vprintk+0x151/0x3c0
>  [<c013b45b>] ? vprintk+0x2db/0x3c0
>  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c0687d3a>] error_code+0x72/0x78
>  [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
>  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
>  [<c0198721>] unmap_vmas+0x4b1/0x8b0
>  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c019d504>] exit_mmap+0x84/0x120
>  [<c0138538>] mmput+0x48/0xa0
>  [<c013c3d7>] exit_mm+0xe7/0x110
>  [<c013d7a4>] do_exit+0x184/0x890
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] oops_end+0xb1/0xc0
>  [<c01067c0>] die+0x50/0x70
>  [<c0122b4f>] do_page_fault+0x1ef/0xa20
>  [<c010b335>] ? native_sched_clock+0xb5/0x110
>  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>  [<c0122960>] ? do_page_fault+0x0/0xa20
>  [<c0687d3a>] error_code+0x72/0x78
>  [<c038ad65>] ? __list_add+0x15/0x90
>  [<c0687133>] ? _spin_lock+0x63/0x70
>  [<c018b954>] rmqueue_bulk+0x54/0x80
>  [<c018d317>] get_page_from_freelist+0x5a7/0x720
>  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>  [<c018dd50>] __alloc_pages_internal+0xa0/0x450
>  [<c01acd4b>] alloc_pages_current+0x7b/0xc0
>  [<c01b37fb>] new_slab+0x1bb/0x2d0
>  [<c0687877>] ? _spin_unlock+0x27/0x50
>  [<c01b40ca>] __slab_alloc+0x32a/0x4e0
>  [<c010b335>] ? native_sched_clock+0xb5/0x110
>  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c018969e>] mempool_alloc_slab+0xe/0x10
>  [<c01897a1>] mempool_alloc+0x31/0xf0
>  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
>  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
>  [<c0368c7e>] get_request+0xae/0x2c0
>  [<c036935c>] get_request_wait+0x1c/0xd0
>  [<c0687462>] ? _spin_lock_irq+0x72/0x80
>  [<c0369442>] blk_get_request+0x32/0x70
>  [<c0471c1c>] generic_ide_resume+0x5c/0xf0
>  [<c03f8bde>] device_resume+0x32e/0x380
>  [<c0168791>] hibernation_snapshot+0xa1/0x220
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c01689f0>] hibernate+0xe0/0x180
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c016755f>] state_store+0xbf/0xd0
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c0375ef4>] kobj_attr_store+0x24/0x30
>  [<c01fa432>] sysfs_write_file+0xa2/0x100
>  [<c01bbf06>] vfs_write+0x96/0x130
>  [<c01fa390>] ? sysfs_write_file+0x0/0x100
>  [<c01bc44d>] sys_write+0x3d/0x70
>  [<c0104f3b>] sysenter_do_call+0x12/0x3f
>  =======================
> FIX blkdev_ioc: Object at 0xf5cdaca8 not freed
> BUG: scheduling while atomic: bash/3597/0x00000006
> INFO: lockdep is turned off.
> Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
>  [<c0135467>] __schedule_bug+0x77/0x80
>  [<c0684ce2>] schedule+0x852/0x8f0
>  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
>  [<c01b4779>] ? kmem_cache_free+0xd9/0x120
>  [<c036b773>] ? put_io_context+0x53/0x70
>  [<c036b773>] ? put_io_context+0x53/0x70
>  [<c013de81>] do_exit+0x861/0x890
>  [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] oops_end+0xb1/0xc0
>  [<c01067c0>] die+0x50/0x70
>  [<c0106871>] do_trap+0x91/0xc0
>  [<c0106940>] ? do_invalid_op+0x0/0xa0
>  [<c01069c8>] do_invalid_op+0x88/0xa0
>  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
>  [<c013b2d1>] ? vprintk+0x151/0x3c0
>  [<c013b45b>] ? vprintk+0x2db/0x3c0
>  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c0687d3a>] error_code+0x72/0x78
>  [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
>  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
>  [<c0198721>] unmap_vmas+0x4b1/0x8b0
>  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>  [<c019d504>] exit_mmap+0x84/0x120
>  [<c0138538>] mmput+0x48/0xa0
>  [<c013c3d7>] exit_mm+0xe7/0x110
>  [<c013d7a4>] do_exit+0x184/0x890
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
>  [<c01060f1>] oops_end+0xb1/0xc0
>  [<c01067c0>] die+0x50/0x70
>  [<c0122b4f>] do_page_fault+0x1ef/0xa20
>  [<c010b335>] ? native_sched_clock+0xb5/0x110
>  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>  [<c0122960>] ? do_page_fault+0x0/0xa20
>  [<c0687d3a>] error_code+0x72/0x78
>  [<c038ad65>] ? __list_add+0x15/0x90
>  [<c0687133>] ? _spin_lock+0x63/0x70
>  [<c018b954>] rmqueue_bulk+0x54/0x80
>  [<c018d317>] get_page_from_freelist+0x5a7/0x720
>  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>  [<c018dd50>] __alloc_pages_internal+0xa0/0x450
>  [<c01acd4b>] alloc_pages_current+0x7b/0xc0
>  [<c01b37fb>] new_slab+0x1bb/0x2d0
>  [<c0687877>] ? _spin_unlock+0x27/0x50
>  [<c01b40ca>] __slab_alloc+0x32a/0x4e0
>  [<c010b335>] ? native_sched_clock+0xb5/0x110
>  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>  [<c018969e>] mempool_alloc_slab+0xe/0x10
>  [<c01897a1>] mempool_alloc+0x31/0xf0
>  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
>  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
>  [<c0368c7e>] get_request+0xae/0x2c0
>  [<c036935c>] get_request_wait+0x1c/0xd0
>  [<c0687462>] ? _spin_lock_irq+0x72/0x80
>  [<c0369442>] blk_get_request+0x32/0x70
>  [<c0471c1c>] generic_ide_resume+0x5c/0xf0
>  [<c03f8bde>] device_resume+0x32e/0x380
>  [<c0168791>] hibernation_snapshot+0xa1/0x220
>  [<c013b55b>] ? printk+0x1b/0x20
>  [<c01689f0>] hibernate+0xe0/0x180
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c016755f>] state_store+0xbf/0xd0
>  [<c01674a0>] ? state_store+0x0/0xd0
>  [<c0375ef4>] kobj_attr_store+0x24/0x30
>  [<c01fa432>] sysfs_write_file+0xa2/0x100
>
> I can look up addresses in the vmlinux for accurate line numbers if needed.
>
>
> Vegard
>
> --
> "The animistic metaphor of the bug that maliciously sneaked in while
> the programmer was not looking is intellectually dishonest as it
> disguises that the error is the programmer's own creation."
>        -- E. W. Dijkstra, EWD1036
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 21:21               ` Pekka Enberg
@ 2008-08-21 22:10                 ` Rafael J. Wysocki
  2008-08-22  7:02                   ` Pekka Enberg
  2008-08-22  9:38                   ` Pavel Machek
  0 siblings, 2 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2008-08-21 22:10 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Vegard Nossum, Linux Kernel Mailing List, Andrew Morton, Jens Axboe

On Thursday, 21 of August 2008, Pekka Enberg wrote:
> Hi Vegard,
> 
> On Thu, Aug 21, 2008 at 11:08 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > On Thu, Aug 21, 2008 at 9:37 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> Can you switch to SLAB and retest?  I don't really think SLUB is the issue
> >> here, but SLAB may give us additional information.
> >
> > Got this with SLAB:
> >
> > BUG: unable to handle kernel NULL pointer dereference at 00000000
> > IP: [<c03897ac>] list_del+0xc/0x90
> > *pdpt = 0000000031480001 *pde = 0000000000000000
> > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > Pid: 7, comm: events/0 Not tainted (2.6.27-rc4-00003-gef9b1bc #33)
> > EIP: 0060:[<c03897ac>] EFLAGS: 00010082 CPU: 0
> > EIP is at list_del+0xc/0x90
> > EAX: 00000000 EBX: f556afa0 ECX: f54a5380 EDX: f54e3128
> > ESI: f54f1ef0 EDI: f556afa0 EBP: f68b7ee0 ESP: f68b7ec8
> >  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > Process events/0 (pid: 7, ti=f68b6000 task=f689c338 task.ti=f68b6000)
> > Stack: c015b62e f68b7ed4 c015b66d f68b7efc 00000046 f54f1f20 f68b7f1c c01b4054
> >       00000000 f689c338 f54e315c 00000001 f54f1f20 f54a5380 00000000 f54e3128
> >       f55b3348 00000000 f54f1f20 f54f1ef0 00000001 f68b7f3c c01b41ba 00000000
> > Call Trace:
> >  [<c015b62e>] ? get_lock_stats+0x1e/0x50
> >  [<c015b66d>] ? put_lock_stats+0xd/0x30
> 
> Looks like slabp->list is corrupted when we do list_del() in
> free_block(). Why is the stack trace so unreliable, though?
> 
> >  [<c01b4054>] ? free_block+0xa4/0x1b0
> >  [<c01b41ba>] ? drain_array+0x5a/0xb0
> >  [<c01b4f82>] ? cache_reap+0x72/0x230
> >  [<c014ada7>] ? run_workqueue+0x107/0x200
> >  [<c014ae0a>] ? run_workqueue+0x16a/0x200
> >  [<c014ada7>] ? run_workqueue+0x107/0x200
> >  [<c01b4f10>] ? cache_reap+0x0/0x230
> >  [<c014b94d>] ? worker_thread+0x7d/0xe0
> >  [<c014e4f0>] ? autoremove_wake_function+0x0/0x50
> >  [<c014b8d0>] ? worker_thread+0x0/0xe0
> >  [<c014e1e2>] ? kthread+0x42/0x70
> >  [<c014e1a0>] ? kthread+0x0/0x70
> >  [<c0105cf3>] ? kernel_thread_helper+0x7/0x14
> >  =======================
> > Code: e8 01 89 44 24 04 e8 d4 1d db ff 8b 55 04 b8 48 2f 7a c0 e8 47 d8 dd ff e8
> >  c2 dc d7 ff eb a9 55 89 e5 53 89 c3 83 ec 14 8b 40 04 <8b> 00 39 d8 75 24 8b 13
> >  8b 42 04 39 d8 75 41 8b 43 04 89 42 04
> > EIP: [<c03897ac>] list_del+0xc/0x90 SS:ESP 0068:f68b7ec8
> > ---[ end trace 958cea1a710a109a ]---
> > note: events/0[7] exited with preempt_count 1
> > uhci_hcd 0000:00:1d.2: setting latency timer to 64
> > uhci_hcd 0000:00:1d.3: setting latency timer to 64
> > ehci_hcd 0000:00:1d.7: setting latency timer to 64
> > pci 0000:00:1e.0: setting latency timer to 64
> > ------------[ cut here ]------------
> > WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/lib/list_debug.c:26
> > __list_add+0x61/0x90()
> > list_add corruption. next->prev should be prev (f6c00490), but was
> > f73ecea8. (next=f73ecea8).
> > Pid: 201, comm: rcu_torture_rea Tainted: G      D
> > 2.6.27-rc4-00003-gef9b1bc #33
> >  [<c013a58e>] warn_slowpath+0x5e/0x80
> >  [<c015b60b>] ? trace_hardirqs_off+0xb/0x10
> >  [<c010b325>] ? native_sched_clock+0xb5/0x110
> >  [<c0125c36>] ? kernel_map_pages+0xa6/0x130
> 
> This looks most interesting. The free_list in zone->free_area is corrupted.
> 
> >  [<c0389891>] __list_add+0x61/0x90
> >  [<c018c978>] __free_pages_ok+0x368/0x410
> >  [<c018ccf2>] __free_pages+0x22/0x40
> >  [<c018cd58>] free_pages+0x48/0x50
> >  [<c010be69>] free_thread_info+0x19/0x20
> >  [<c0138929>] free_task+0x19/0x30
> >  [<c0139e31>] __put_task_struct+0x51/0xa0
> >  [<c013c0b7>] delayed_put_task_struct+0x27/0x30
> >  [<c018206c>] rcu_process_callbacks+0x6c/0xb0
> >  [<c013f5b3>] __do_softirq+0x83/0x100
> >  [<c013f6d5>] do_softirq+0xa5/0xb0
> >  [<c013f975>] irq_exit+0x95/0xa0
> >  [<c010813d>] do_IRQ+0x4d/0xa0
> >  [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
> >  [<c0105a3c>] common_interrupt+0x28/0x30
> >  [<c0683b44>] ? schedule+0x734/0x8f0
> >  [<c068007b>] ? start_secondary+0x19b/0x1c0
> >  [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
> >  [<c068695b>] ? _spin_unlock_irq+0x2b/0x60
> >  [<c0683b44>] schedule+0x734/0x8f0
> >  [<c010509e>] ? restore_nocheck_notrace+0x0/
> >  [<c015e93b>] ? trace_hardirqs_on+0xb/0x10
> >  [<c0180c88>] rcu_torture_reader+0x148/0x230
> >  [<c0181460>] ? rcu_torture_timer+0x0/0x100
> >  [<c0180d19>] ? rcu_torture_reader+0x1d9/0x230
> >  [<c0180b40>] ? rcu_torture_reader+0x0/0x230
> >  [<c014e1e2>] kthread+0x42/0x70
> >  [<c014e1a0>] ? kthread+0x0/0x70
> >  [<c0105cf3>] kernel_thread_helper+0x7/0x14
> >  =======================
> > ---[ end trace 958cea1a710a109a ]---
> > eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
> > eth1: link down
> > hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
> > hda: UDMA/100 mode selected
> > ------------[ cut here ]------------
> > kernel BUG at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/mm/slab.c:590!
> > invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
> > Pid: 3257, comm: S99local Tainted: G      D W (2.6.27-rc4-00003-gef9b1bc #33)
> > EIP: 0060:[<c01b24c0>] EFLAGS: 00210002 CPU: 0
> > EIP is at cache_free_debugcheck+0x2e0/0x310
> > EAX: 00000000 EBX: f4c1f688 ECX: c018976e EDX: f785f744
> > ESI: f6864180 EDI: f4c1f688 EBP: f25b1de8 ESP: f25b1d94
> >  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > Process S99local (pid: 3257, ti=f25b0000 task=f253f338 task.ti=f25b0000)
> > Stack: 00200200 f4c1f688 00000000 f25b1e20 f25b1dac c0683402 f25b1e58 c018976e
> >       00000001 c036a7b0 00000002 00000000 00000000 c088c190 f253f338 f4c1f688
> >       f253f338 f25b1df8 f6864180 f6960ef0 f4c1f688 f25b1e0c c01b3c9c f253f7f4
> > Call Trace:
> >  [<c0683402>] ? wait_for_completion+0x12/0x20
> 
> Here the block layer is passing a pointer to a non-slab page to
> kmem_cache_free(). Assuming we can rely on the stack trace, of
> course...
> 
> >  [<c018976e>] ? mempool_free_slab+0xe/0x10
> >  [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
> >  [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
> >  [<c018976e>] ? mempool_free_slab+0xe/0x10
> >  [<c018979c>] ? mempool_free+0x2c/0x90
> >  [<c0367252>] ? __blk_put_request+0x62/0x90
> >  [<c036764c>] ? blk_put_request+0x2c/0x50
> >  [<c0470732>] ? generic_ide_resume+0xa2/0xf0
> >  [<c03f767e>] ? device_resume+0x32e/0x380
> >  [<c01687d1>] ? hibernation_snapshot+0xa1/0x220
> >  [<c013b57b>] ? printk+0x1b/0x20
> >  [<c0168a30>] ? hibernate+0xe0/0x180
> >  [<c01674e0>] ? state_store+0x0/0xd0
> >  [<c016759f>] ? state_store+0xbf/0xd0
> >  [<c01674e0>] ? state_store+0x0/0xd0
> >  [<c03749d4>] ? kobj_attr_store+0x24/0x30
> >  [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
> >  [<c01ba946>] ? vfs_write+0x96/0x130
> >  [<c01f8e50>] ? sysfs_write_file+0x0/0x100
> >  [<c01bae8d>] ? sys_write+0x3d/0x70
> >  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
> >  =======================
> > Code: 00 00 75 cd c7 03 21 43 65 87 8b 5e 28 e9 78 ff ff ff 0f 0b eb fe 90 8d 74
> >  26 00 8b 52 0c e9 ca fd ff ff 0f 0b eb fe 8d 74 26 00 <0f> 0b eb fe 0f 0b eb fe
> >  0f 0b eb fe 8d 74 26 00 8b 52 0c 8b 02
> > EIP: [<c01b24c0>] cache_free_debugcheck+0x2e0/0x310 SS:ESP 0068:f25b1d94
> > ---[ end trace 958cea1a710a109a ]---
> > note: S99local[3257] exited with preempt_count 1
> 
> > BUG: unable to handle kernel paging request at 00100104
> > IP: [<c0389845>] __list_add+0x15/0x90
> > *pdpt = 0000000031480001 *pde = 0000000000000000
> > Oops: 0000 [#3] PREEMPT SMP DEBUG_PAGEALLOC
> > Pid: 3257, comm: S99local Tainted: G      D W (2.6.27-rc4-00003-gef9b1bc #33)
> > EIP: 0060:[<c0389845>] EFLAGS: 00210086 CPU: 0
> > EIP is at __list_add+0x15/0x90
> > EAX: f73ece6c EBX: 00100100 ECX: 00100100 EDX: f73ece6c
> > ESI: f73ece6c EDI: f73ece6c EBP: f25b16dc ESP: f25b16b8
> >  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > Process S99local (pid: 3257, ti=f25b0000 task=f253f338 task.ti=f25b0000)
> > Stack: f6814e44 f6c00400 c06860b3 00000000 00000002 00000000 f73ece3c f73ece6c
> >       f73ece6c f25b1704 c018b9e4 0000001f 00000000 f6c00400 f6c00444 00000001
> >       f6814e44 f6814e38 00200002 f25b1778 c018d3a7 f6814e44 00000000 00000000
> > Call Trace:
> >  [<c06860b3>] ? _spin_lock+0x63/0x70
> 
> And now we have the per-cpu page lists corrupted as well. Note that
> even though we have tty showing up in the stack traces, the list has
> already been corrupted by someone else. So I don't think tty is at
> fault here.
> 
> >  [<c018b9e4>] ? rmqueue_bulk+0x54/0x80
> >  [<c018d3a7>] ? get_page_from_freelist+0x5a7/0x720
> >  [<c018dde0>] ? __alloc_pages_internal+0xa0/0x450
> >  [<c01b25a2>] ? kmem_getpages+0x62/0x110
> >  [<c01b2f4c>] ? cache_grow+0x39c/0x3b0
> >  [<c03890b6>] ? _raw_spin_unlock+0x46/0x80
> >  [<c01b3503>] ? cache_alloc_refill+0x1f3/0x230
> >  [<c01b36e5>] ? __kmalloc+0x1a5/0x1e0
> >  [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
> >  [<c03cd9b2>] ? tty_buffer_request_room+0xe2/0x130
> >  [<c03cfe0d>] ? tty_insert_flip_string_flags+0x2d/0xa0
> >  [<c03ee7b1>] ? receive_chars+0x161/0x290
> >  [<c03ef7e4>] ? serial8250_interrupt+0x134/0x150
> >  [<c017dc08>] ? handle_IRQ_event+0x28/0x70
> >  [<c017f1cf>] ? handle_edge_irq+0xaf/0x140
> >  [<c0108138>] ? do_IRQ+0x48/0xa0
> >  [<c037a264>] ? trace_hardirqs_off_thunk+0xc/0x18
> >  [<c0105a3c>] ? common_interrupt+0x28/0x30
> >  [<c013b2f1>] ? vprintk+0x151/0x3c0
> >  [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
> >  [<c0143e5f>] ? run_timer_softirq+0x19f/0x1d0
> >  [<c013b57b>] ? printk+0x1b/0x20
> >  [<c016700a>] ? __print_symbol+0x2a/0x40
> >  [<c037a254>] ? trace_hardirqs_on_thunk+0xc/0x10
> >  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
> >  [<c013b2f1>] ? vprintk+0x151/0x3c0
> >  [<c013b47b>] ? vprintk+0x2db/0x3c0
> >  [<c013f91a>] ? irq_exit+0x3a/0xa0
> >  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
> >  [<c013b2f1>] ? vprintk+0x151/0x3c0
> >  [<c0106871>] ? do_trap+0x91/0xc0
> >  [<c013b57b>] ? printk+0x1b/0x20
> >  [<c0106871>] ? do_trap+0x91/0xc0
> >  [<c01061f0>] ? print_trace_address+0x40/0x50
> >  [<c0106871>] ? do_trap+0x91/0xc0
> >  [<c01062aa>] ? dump_trace+0xaa/0x120
> >  [<c0106346>] ? show_trace_log_lvl+0x26/0x40
> >  [<c0106b8a>] ? show_trace+0x1a/0x20
> >  [<c01074d2>] ? dump_stack+0x72/0x80
> >  [<c01312c1>] ? __might_sleep+0xf1/0x140
> >  [<c0684fe9>] ? down_read+0x19/0x80
> >  [<c03e007b>] ? tty_audit_add_data+0x1bb/0x2e0
> >  [<c013c33b>] ? exit_mm+0x2b/0x110
> >  [<c013d7c4>] ? do_exit+0x184/0x890
> >  [<c013b57b>] ? printk+0x1b/0x20
> >  [<c013a52a>] ? print_oops_end_marker+0x2a/0x30
> >  [<c01060f1>] ? oops_end+0xb1/0xc0
> >  [<c01067c0>] ? die+0x50/0x70
> >  [<c0106871>] ? do_trap+0x91/0xc0
> >  [<c0106940>] ? do_invalid_op+0x0/0xa0
> >  [<c01069c8>] ? do_invalid_op+0x88/0xa0
> >  [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
> >  [<c015c60a>] ? print_lock_contention_bug+0x1a/0xe0
> >  [<c0182527>] ? rcu_irq_exit+0x17/0x90
> >  [<c0686cba>] ? error_code+0x72/0x78
> >  [<c018976e>] ? mempool_free_slab+0xe/0x10
> >  [<c068007b>] ? start_secondary+0x19b/0x1c0
> >  [<c06800d8>] ? check_tsc_warp+0x38/0x1e0
> >  [<c01b24c0>] ? cache_free_debugcheck+0x2e0/0x310
> >  [<c0683402>] ? wait_for_completion+0x12/0x20
> >  [<c018976e>] ? mempool_free_slab+0xe/0x10
> >  [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
> >  [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
> >  [<c018976e>] ? mempool_free_slab+0xe/0x10
> >  [<c018979c>] ? mempool_free+0x2c/0x90
> >  [<c0367252>] ? __blk_put_request+0x62/0x90
> >  [<c036764c>] ? blk_put_request+0x2c/0x50
> >  [<c0470732>] ? generic_ide_resume+0xa2/0xf0
> >  [<c03f767e>] ? device_resume+0x32e/0x380
> >  [<c01687d1>] ? hibernation_snapshot+0xa1/0x220
> >  [<c013b57b>] ? printk+0x1b/0x20
> >  [<c0168a30>] ? hibernate+0xe0/0x180
> >  [<c01674e0>] ? state_store+0x0/0xd0
> >  [<c016759f>] ? state_store+0xbf/0xd0
> >  [<c01674e0>] ? state_store+0x0/0xd0
> >  [<c03749d4>] ? kobj_attr_store+0x24/0x30
> >  [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
> >  [<c01ba946>] ? vfs_write+0x96/0x130
> >  [<c01f8e50>] ? sysfs_write_file+0x0/0x100
> >  [<c01bae8d>] ? sys_write+0x3d/0x70
> >  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
> >  =======================
> > Code: c0 e8 10 0d db ff 8b 13 eb 97 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5
> >  83 ec 24 89 5d f4 89 cb 89 75 f8 89 d6 89 7d fc 89 c7 <8b> 41 04 39 d0 75 1d 8b
> >  06 39 d8 75 41 89 7b 04 89 1f 8b 5d f4
> > EIP: [<c0389845>] __list_add+0x15/0x90 SS:ESP 0068:f25b16b8
> 
> I don't know the suspend code at all but the bogus pointer coming from
> hibernation_snapshot() seems suspicious.

Which one?

> Another possibility is that the block layer is doing something strange here.
> Dunno.

I bet on the IDE driver.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 21:27     ` Pekka Enberg
@ 2008-08-21 22:16       ` Rafael J. Wysocki
  2008-08-22  4:45         ` Vegard Nossum
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2008-08-21 22:16 UTC (permalink / raw)
  To: Pekka Enberg, Vegard Nossum
  Cc: Linux Kernel Mailing List, Andrew Morton, Jens Axboe,
	Bartlomiej Zolnierkiewicz

On Thursday, 21 of August 2008, Pekka Enberg wrote:
> On Thu, Aug 21, 2008 at 9:21 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > I got a "proper" oops this time (brace yourself, it's long ;-)), and
> > it does not have serial on the stack:
> >
> > BUG: unable to handle kernel paging request at 00100104
> > IP: [<c038ad65>] __list_add+0x15/0x90
> > *pdpt = 00000000325bb001 *pde = 0000000000000000
> > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > Pid: 3597, comm: bash Not tainted (2.6.27-rc4-00003-ga798564-dirty #30)
> > EIP: 0060:[<c038ad65>] EFLAGS: 00210082 CPU: 0
> > EIP is at __list_add+0x15/0x90
> > EAX: f7455764 EBX: 00100100 ECX: 00100100 EDX: f7455764
> > ESI: f7455764 EDI: f7455764 EBP: f25e7c18 ESP: f25e7bf4
> >  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > Process bash (pid: 3597, ti=f25e6000 task=f25aa700 task.ti=f25e6000)
> > Stack: f68120bc f6c00400 c0687133 00000000 00000002 00000000 f7455734 f7455764
> >       f7455764 f25e7c40 c018b954 0000001f 00000000 f6c00400 f6c00444 00000001
> >       f68120bc f68120b0 00200002 f25e7cb4 c018d317 f68120bc 00000000 f25aac0c
> > Call Trace:
> >  [<c0687133>] ? _spin_lock+0x63/0x70
> 
> Corruption in the page allocator freelists.
> 
> >  [<c018b954>] ? rmqueue_bulk+0x54/0x80
> >  [<c018d317>] ? get_page_from_freelist+0x5a7/0x720
> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >  [<c018dd50>] ? __alloc_pages_internal+0xa0/0x450
> >  [<c01acd4b>] ? alloc_pages_current+0x7b/0xc0
> >  [<c01b37fb>] ? new_slab+0x1bb/0x2d0
> >  [<c0687877>] ? _spin_unlock+0x27/0x50
> >  [<c01b40ca>] ? __slab_alloc+0x32a/0x4e0
> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >  [<c01b4424>] ? kmem_cache_alloc+0xb4/0xe0
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c01897a1>] ? mempool_alloc+0x31/0xf0
> >  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> >  [<c0368c7e>] ? get_request+0xae/0x2c0
> >  [<c036935c>] ? get_request_wait+0x1c/0xd0
> >  [<c0687462>] ? _spin_lock_irq+0x72/0x80
> >  [<c0369442>] ? blk_get_request+0x32/0x70
> >  [<c0471c1c>] ? generic_ide_resume+0x5c/0xf0
> >  [<c03f8bde>] ? device_resume+0x32e/0x380
> >  [<c0168791>] ? hibernation_snapshot+0xa1/0x220
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c01689f0>] ? hibernate+0xe0/0x180
> >  [<c01674a0>] ? state_store+0x0/0xd0
> >  [<c016755f>] ? state_store+0xbf/0xd0
> >  [<c01674a0>] ? state_store+0x0/0xd0
> >  [<c0375ef4>] ? kobj_attr_store+0x24/0x30
> >  [<c01fa432>] ? sysfs_write_file+0xa2/0x100
> >  [<c01bbf06>] ? vfs_write+0x96/0x130
> >  [<c01fa390>] ? sysfs_write_file+0x0/0x100
> >  [<c01bc44d>] ? sys_write+0x3d/0x70
> >  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
> >  =======================
> > Code: c0 e8 d0 f7 da ff 8b 13 eb 97 8d b6 00 00 00 00 8d bf 00 00 00
> > 00 55 89 e5 83 ec 24 89 5d f4 89 cb 89 75 f8 89 d6 89 7d fc 89 c7 <8b>
> > 41 04 39 d0 75 1d 8b 06 39 d8 75 41 89 7b 04 89 1f 8b 5d f4
> > EIP: [<c038ad65>] __list_add+0x15/0x90 SS:ESP 0068:f25e7bf4
> > ---[ end trace e3ed674f2f20c5d3 ]---
> > note: bash[3597] exited with preempt_count 2
> > Eeek! page_mapcount(page) went negative! (-1)
> >  page pfn = 3e74f
> >  page->flags = 210007c
> >  page->count = 1
> >  page->mapping = f6439028
> >  vma->vm_ops = generic_file_vm_ops+0x0/0x20
> >  vma->vm_ops->fault = filemap_fault+0x0/0x480
> >  vma->vm_file->f_op->mmap = generic_file_mmap+0x0/0x50
> > ------------[ cut here ]------------
> > kernel BUG at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/mm/rmap.c:662!
> > invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
> > Pid: 3597, comm: bash Tainted: G      D   (2.6.27-rc4-00003-ga798564-dirty #30)
> > EIP: 0060:[<c01a0f39>] EFLAGS: 00210286 CPU: 0
> > EIP is at page_remove_rmap+0x109/0x120
> > EAX: 0000003b EBX: f7aa5684 ECX: f25e6000 EDX: 00000005
> > ESI: f25b94c8 EDI: 3e74f025 EBP: f25e7990 ESP: f25e7980
> >  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > Process bash (pid: 3597, ti=f25e6000 task=f25aa700 task.ti=f25e6000)
> > Stack: c077ed7c f6439028 00000000 00000025 f25e7a38 c0198721 3e74f025 00000000
> >       f25e79c0 c015c5ea 00000025 00000000 325a3067 00000000 00000000 f25b94c8
> >       f25e7a50 00000000 00000000 3e74f025 00000000 00007000 0097a000 00000000
> > Call Trace:
> 
> No idea what happened here but it looks bad :-)
> 
> >  [<c0198721>] ? unmap_vmas+0x4b1/0x8b0
> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >  [<c019d504>] ? exit_mmap+0x84/0x120
> >  [<c0138538>] ? mmput+0x48/0xa0
> >  [<c013c3d7>] ? exit_mm+0xe7/0x110
> >  [<c013d7a4>] ? do_exit+0x184/0x890
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> >  [<c01060f1>] ? oops_end+0xb1/0xc0
> >  [<c01067c0>] ? die+0x50/0x70
> >  [<c0122b4f>] ? do_page_fault+0x1ef/0xa20
> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >  [<c0122960>] ? do_page_fault+0x0/0xa20
> >  [<c0687d3a>] ? error_code+0x72/0x78
> >  [<c038ad65>] ? __list_add+0x15/0x90
> >  [<c0687133>] ? _spin_lock+0x63/0x70
> >  [<c018b954>] ? rmqueue_bulk+0x54/0x80
> >  [<c018d317>] ? get_page_from_freelist+0x5a7/0x720
> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >  [<c018dd50>] ? __alloc_pages_internal+0xa0/0x450
> >  [<c01acd4b>] ? alloc_pages_current+0x7b/0xc0
> >  [<c01b37fb>] ? new_slab+0x1bb/0x2d0
> >  [<c0687877>] ? _spin_unlock+0x27/0x50
> >  [<c01b40ca>] ? __slab_alloc+0x32a/0x4e0
> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >  [<c01b4424>] ? kmem_cache_alloc+0xb4/0xe0
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c01897a1>] ? mempool_alloc+0x31/0xf0
> >  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> >  [<c0368c7e>] ? get_request+0xae/0x2c0
> >  [<c036935c>] ? get_request_wait+0x1c/0xd0
> >  [<c0687462>] ? _spin_lock_irq+0x72/0x80
> >  [<c0369442>] ? blk_get_request+0x32/0x70
> >  [<c0471c1c>] ? generic_ide_resume+0x5c/0xf0
> >  [<c03f8bde>] ? device_resume+0x32e/0x380
> >  [<c0168791>] ? hibernation_snapshot+0xa1/0x220
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c01689f0>] ? hibernate+0xe0/0x180
> >  [<c01674a0>] ? state_store+0x0/0xd0
> >  [<c016755f>] ? state_store+0xbf/0xd0
> >  [<c01674a0>] ? state_store+0x0/0xd0
> >  [<c0375ef4>] ? kobj_attr_store+0x24/0x30
> >  [<c01fa432>] ? sysfs_write_file+0xa2/0x100
> >  [<c01bbf06>] ? vfs_write+0x96/0x130
> >  [<c01fa390>] ? sysfs_write_file+0x0/0x100
> >  [<c01bc44d>] ? sys_write+0x3d/0x70
> >  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f
> >  =======================
> > Code: c0 74 0d 8b 50 08 b8 ac ed 77 c0 e8 82 60 fc ff 8b 46 4c 85 c0
> > 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 d8 d1 77 c0 e8 67 60 fc ff <0f>
> > 0b eb fe 8b 53 0c eb 95 8d b4 26 00 00 00 00 8d bc 27 00 00
> > EIP: [<c01a0f39>] page_remove_rmap+0x109/0x120 SS:ESP 0068:f25e7980
> > ---[ end trace e3ed674f2f20c5d3 ]---
> > Fixing recursive fault but reboot is needed!
> > =============================================================================
> > BUG blkdev_ioc: Invalid object pointer 0xf5cdaca8
> > -----------------------------------------------------------------------------
> 
> Ok, here we have the block layer passing a bad pointer to SLUB this
> time. And it's also from the suspend code (although it's the resume
> path this time). As we never see an oops from the block layer first,
> it's possible that someone else corrupted everything and it just shows
> up in the block layer. Maybe something worth investigating, though.
> 
> > INFO: Slab 0xf789e318 objects=14 used=14 fp=0x00000000 flags=0x2082083
> > Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
> >  [<c01b2576>] slab_err+0x46/0x50
> >  [<c01b2766>] ? check_slab+0xd6/0xf0
> >  [<c0181aef>] ? call_rcu+0x6f/0x80
> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> >  [<c01b3c78>] __slab_free+0x238/0x360
> >  [<c01b4749>] kmem_cache_free+0xa9/0x120
> >  [<c036b773>] ? put_io_context+0x53/0x70
> >  [<c036b773>] ? put_io_context+0x53/0x70
> >  [<c036b773>] put_io_context+0x53/0x70
> >  [<c036b82e>] exit_io_context+0x6e/0x80
> >  [<c013de6e>] do_exit+0x84e/0x890
> >  [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> >  [<c01060f1>] oops_end+0xb1/0xc0
> >  [<c01067c0>] die+0x50/0x70
> >  [<c0106871>] do_trap+0x91/0xc0
> >  [<c0106940>] ? do_invalid_op+0x0/0xa0
> >  [<c01069c8>] do_invalid_op+0x88/0xa0
> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> >  [<c013b2d1>] ? vprintk+0x151/0x3c0
> >  [<c013b45b>] ? vprintk+0x2db/0x3c0
> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >  [<c0687d3a>] error_code+0x72/0x78
> >  [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> >  [<c0198721>] unmap_vmas+0x4b1/0x8b0
> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >  [<c019d504>] exit_mmap+0x84/0x120
> >  [<c0138538>] mmput+0x48/0xa0
> >  [<c013c3d7>] exit_mm+0xe7/0x110
> >  [<c013d7a4>] do_exit+0x184/0x890
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> >  [<c01060f1>] oops_end+0xb1/0xc0
> >  [<c01067c0>] die+0x50/0x70
> >  [<c0122b4f>] do_page_fault+0x1ef/0xa20
> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >  [<c0122960>] ? do_page_fault+0x0/0xa20
> >  [<c0687d3a>] error_code+0x72/0x78
> >  [<c038ad65>] ? __list_add+0x15/0x90
> >  [<c0687133>] ? _spin_lock+0x63/0x70
> >  [<c018b954>] rmqueue_bulk+0x54/0x80
> >  [<c018d317>] get_page_from_freelist+0x5a7/0x720
> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >  [<c018dd50>] __alloc_pages_internal+0xa0/0x450
> >  [<c01acd4b>] alloc_pages_current+0x7b/0xc0
> >  [<c01b37fb>] new_slab+0x1bb/0x2d0
> >  [<c0687877>] ? _spin_unlock+0x27/0x50
> >  [<c01b40ca>] __slab_alloc+0x32a/0x4e0
> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c018969e>] mempool_alloc_slab+0xe/0x10
> >  [<c01897a1>] mempool_alloc+0x31/0xf0
> >  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> >  [<c0368c7e>] get_request+0xae/0x2c0
> >  [<c036935c>] get_request_wait+0x1c/0xd0
> >  [<c0687462>] ? _spin_lock_irq+0x72/0x80
> >  [<c0369442>] blk_get_request+0x32/0x70
> >  [<c0471c1c>] generic_ide_resume+0x5c/0xf0

IDE again?

Vegard, this is piix, isn't it?

> >  [<c03f8bde>] device_resume+0x32e/0x380
> >  [<c0168791>] hibernation_snapshot+0xa1/0x220
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c01689f0>] hibernate+0xe0/0x180
> >  [<c01674a0>] ? state_store+0x0/0xd0
> >  [<c016755f>] state_store+0xbf/0xd0
> >  [<c01674a0>] ? state_store+0x0/0xd0
> >  [<c0375ef4>] kobj_attr_store+0x24/0x30
> >  [<c01fa432>] sysfs_write_file+0xa2/0x100
> >  [<c01bbf06>] vfs_write+0x96/0x130
> >  [<c01fa390>] ? sysfs_write_file+0x0/0x100
> >  [<c01bc44d>] sys_write+0x3d/0x70
> >  [<c0104f3b>] sysenter_do_call+0x12/0x3f
> >  =======================
> > FIX blkdev_ioc: Object at 0xf5cdaca8 not freed
> > BUG: scheduling while atomic: bash/3597/0x00000006
> > INFO: lockdep is turned off.
> > Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
> >  [<c0135467>] __schedule_bug+0x77/0x80
> >  [<c0684ce2>] schedule+0x852/0x8f0
> >  [<c010509e>] ? restore_nocheck_notrace+0x0/0xe
> >  [<c01b4779>] ? kmem_cache_free+0xd9/0x120
> >  [<c036b773>] ? put_io_context+0x53/0x70
> >  [<c036b773>] ? put_io_context+0x53/0x70
> >  [<c013de81>] do_exit+0x861/0x890
> >  [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> >  [<c01060f1>] oops_end+0xb1/0xc0
> >  [<c01067c0>] die+0x50/0x70
> >  [<c0106871>] do_trap+0x91/0xc0
> >  [<c0106940>] ? do_invalid_op+0x0/0xa0
> >  [<c01069c8>] do_invalid_op+0x88/0xa0
> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> >  [<c013b2d1>] ? vprintk+0x151/0x3c0
> >  [<c013b45b>] ? vprintk+0x2db/0x3c0
> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >  [<c0687d3a>] error_code+0x72/0x78
> >  [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> >  [<c0198721>] unmap_vmas+0x4b1/0x8b0
> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >  [<c019d504>] exit_mmap+0x84/0x120
> >  [<c0138538>] mmput+0x48/0xa0
> >  [<c013c3d7>] exit_mm+0xe7/0x110
> >  [<c013d7a4>] do_exit+0x184/0x890
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> >  [<c01060f1>] oops_end+0xb1/0xc0
> >  [<c01067c0>] die+0x50/0x70
> >  [<c0122b4f>] do_page_fault+0x1ef/0xa20
> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >  [<c0122960>] ? do_page_fault+0x0/0xa20
> >  [<c0687d3a>] error_code+0x72/0x78
> >  [<c038ad65>] ? __list_add+0x15/0x90
> >  [<c0687133>] ? _spin_lock+0x63/0x70
> >  [<c018b954>] rmqueue_bulk+0x54/0x80
> >  [<c018d317>] get_page_from_freelist+0x5a7/0x720
> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >  [<c018dd50>] __alloc_pages_internal+0xa0/0x450
> >  [<c01acd4b>] alloc_pages_current+0x7b/0xc0
> >  [<c01b37fb>] new_slab+0x1bb/0x2d0
> >  [<c0687877>] ? _spin_unlock+0x27/0x50
> >  [<c01b40ca>] __slab_alloc+0x32a/0x4e0
> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >  [<c018969e>] mempool_alloc_slab+0xe/0x10
> >  [<c01897a1>] mempool_alloc+0x31/0xf0
> >  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> >  [<c0368c7e>] get_request+0xae/0x2c0
> >  [<c036935c>] get_request_wait+0x1c/0xd0
> >  [<c0687462>] ? _spin_lock_irq+0x72/0x80
> >  [<c0369442>] blk_get_request+0x32/0x70
> >  [<c0471c1c>] generic_ide_resume+0x5c/0xf0
> >  [<c03f8bde>] device_resume+0x32e/0x380
> >  [<c0168791>] hibernation_snapshot+0xa1/0x220
> >  [<c013b55b>] ? printk+0x1b/0x20
> >  [<c01689f0>] hibernate+0xe0/0x180
> >  [<c01674a0>] ? state_store+0x0/0xd0
> >  [<c016755f>] state_store+0xbf/0xd0
> >  [<c01674a0>] ? state_store+0x0/0xd0
> >  [<c0375ef4>] kobj_attr_store+0x24/0x30
> >  [<c01fa432>] sysfs_write_file+0xa2/0x100
> >
> > I can look up addresses in the vmlinux for accurate line numbers if needed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 22:16       ` Rafael J. Wysocki
@ 2008-08-22  4:45         ` Vegard Nossum
  2008-08-22  9:28           ` Pavel Machek
  2008-08-22  9:34           ` Rafael J. Wysocki
  0 siblings, 2 replies; 21+ messages in thread
From: Vegard Nossum @ 2008-08-22  4:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pekka Enberg, Linux Kernel Mailing List, Andrew Morton,
	Jens Axboe, Bartlomiej Zolnierkiewicz

On Fri, Aug 22, 2008 at 12:16 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, 21 of August 2008, Pekka Enberg wrote:
>> > =============================================================================
>> > BUG blkdev_ioc: Invalid object pointer 0xf5cdaca8
>> > -----------------------------------------------------------------------------
>>
>> Ok, here we have the block layer passing a bad pointer to SLUB this
>> time. And it's also from the suspend code (although it's the resume
>> path this time). As we never see an oops from the block layer first,
>> it's possible that someone else corrupted everything and it just shows
>> up in the block layer. Maybe something worth investigating, though.
>>
>> > INFO: Slab 0xf789e318 objects=14 used=14 fp=0x00000000 flags=0x2082083
>> > Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
>> >  [<c01b2576>] slab_err+0x46/0x50
>> >  [<c01b2766>] ? check_slab+0xd6/0xf0
>> >  [<c0181aef>] ? call_rcu+0x6f/0x80
>> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
>> >  [<c01b3c78>] __slab_free+0x238/0x360
>> >  [<c01b4749>] kmem_cache_free+0xa9/0x120
>> >  [<c036b773>] ? put_io_context+0x53/0x70
>> >  [<c036b773>] ? put_io_context+0x53/0x70
>> >  [<c036b773>] put_io_context+0x53/0x70
>> >  [<c036b82e>] exit_io_context+0x6e/0x80
>> >  [<c013de6e>] do_exit+0x84e/0x890
>> >  [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
>> >  [<c013b55b>] ? printk+0x1b/0x20
>> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
>> >  [<c01060f1>] oops_end+0xb1/0xc0
>> >  [<c01067c0>] die+0x50/0x70
>> >  [<c0106871>] do_trap+0x91/0xc0
>> >  [<c0106940>] ? do_invalid_op+0x0/0xa0
>> >  [<c01069c8>] do_invalid_op+0x88/0xa0
>> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
>> >  [<c013b2d1>] ? vprintk+0x151/0x3c0
>> >  [<c013b45b>] ? vprintk+0x2db/0x3c0
>> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>> >  [<c0687d3a>] error_code+0x72/0x78
>> >  [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
>> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
>> >  [<c0198721>] unmap_vmas+0x4b1/0x8b0
>> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
>> >  [<c019d504>] exit_mmap+0x84/0x120
>> >  [<c0138538>] mmput+0x48/0xa0
>> >  [<c013c3d7>] exit_mm+0xe7/0x110
>> >  [<c013d7a4>] do_exit+0x184/0x890
>> >  [<c013b55b>] ? printk+0x1b/0x20
>> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
>> >  [<c01060f1>] oops_end+0xb1/0xc0
>> >  [<c01067c0>] die+0x50/0x70
>> >  [<c0122b4f>] do_page_fault+0x1ef/0xa20
>> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
>> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>> >  [<c0122960>] ? do_page_fault+0x0/0xa20
>> >  [<c0687d3a>] error_code+0x72/0x78
>> >  [<c038ad65>] ? __list_add+0x15/0x90
>> >  [<c0687133>] ? _spin_lock+0x63/0x70
>> >  [<c018b954>] rmqueue_bulk+0x54/0x80
>> >  [<c018d317>] get_page_from_freelist+0x5a7/0x720
>> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
>> >  [<c018dd50>] __alloc_pages_internal+0xa0/0x450
>> >  [<c01acd4b>] alloc_pages_current+0x7b/0xc0
>> >  [<c01b37fb>] new_slab+0x1bb/0x2d0
>> >  [<c0687877>] ? _spin_unlock+0x27/0x50
>> >  [<c01b40ca>] __slab_alloc+0x32a/0x4e0
>> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
>> >  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
>> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
>> >  [<c018969e>] mempool_alloc_slab+0xe/0x10
>> >  [<c01897a1>] mempool_alloc+0x31/0xf0
>> >  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
>> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
>> >  [<c0368c7e>] get_request+0xae/0x2c0
>> >  [<c036935c>] get_request_wait+0x1c/0xd0
>> >  [<c0687462>] ? _spin_lock_irq+0x72/0x80
>> >  [<c0369442>] blk_get_request+0x32/0x70
>> >  [<c0471c1c>] generic_ide_resume+0x5c/0xf0
>
> IDE again?
>
> Vegard, this is piix, isn't it?

If this makes it so, then yes:

calling  piix_ide_init+0x0/0xb0
initcall piix_ide_init+0x0/0xb0 returned 0 after 0 msecs
calling  ide_scan_pcibus+0x0/0xf0
piix 0000:00:1f.1: IDE controller (0x8086:0x27df rev 0x01)
piix 0000:00:1f.1: IDE port disabled
piix 0000:00:1f.1: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7
Probing IDE interface ide0...
hda: WDC WD1600BB-00DAA3, ATA DISK drive
hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
hda: UDMA/100 mode selected
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

It is also interesting that you mention that; this is from an earlier
run (before serial console was working properly):

> In my last run, I managed to get a lot of ascii art on the screen, but
> also one line which gave me the EIP of the oops:

> $ addr2line -e vmlinux -i c03724f3
> block/cfq-iosched.c:1190

> /*
>  * Must always be called with the rcu_read_lock() held
>  */
> static void
> __call_for_each_cic(struct io_context *ioc,
>                    void (*func)(struct io_context *, struct cfq_io_context *))
> {
>        struct cfq_io_context *cic;
>        struct hlist_node *n;
>
>        hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) <-- here
>                func(ioc, cic);
> }


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 22:10                 ` Rafael J. Wysocki
@ 2008-08-22  7:02                   ` Pekka Enberg
  2008-08-22  9:38                   ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Pekka Enberg @ 2008-08-22  7:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vegard Nossum, Linux Kernel Mailing List, Andrew Morton, Jens Axboe

Hi Rafael,

On Fri, Aug 22, 2008 at 1:10 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> I don't know the suspend code at all but the bogus pointer coming from
>> hibernation_snapshot() seems suspicious.
>
> Which one?

This trace here (which has IDE too):

> > Call Trace:
> >  [<c0683402>] ? wait_for_completion+0x12/0x20
>
> Here the block layer is passing a pointer to a non-slab page to
> kmem_cache_free(). Assuming we can rely on the stack trace, of
> course...
>
> >  [<c018976e>] ? mempool_free_slab+0xe/0x10
> >  [<c036a7b0>] ? blk_end_sync_rq+0x0/0x30
> >  [<c01b3c9c>] ? kmem_cache_free+0x5c/0x200
> >  [<c018976e>] ? mempool_free_slab+0xe/0x10
> >  [<c018979c>] ? mempool_free+0x2c/0x90
> >  [<c0367252>] ? __blk_put_request+0x62/0x90
> >  [<c036764c>] ? blk_put_request+0x2c/0x50
> >  [<c0470732>] ? generic_ide_resume+0xa2/0xf0     <-- IDE
> >  [<c03f767e>] ? device_resume+0x32e/0x380
> >  [<c01687d1>] ? hibernation_snapshot+0xa1/0x220 <--- Suspend
> >  [<c013b57b>] ? printk+0x1b/0x20
> >  [<c0168a30>] ? hibernate+0xe0/0x180
> >  [<c01674e0>] ? state_store+0x0/0xd0
> >  [<c016759f>] ? state_store+0xbf/0xd0
> >  [<c01674e0>] ? state_store+0x0/0xd0
> >  [<c03749d4>] ? kobj_attr_store+0x24/0x30
> >  [<c01f8ef2>] ? sysfs_write_file+0xa2/0x100
> >  [<c01ba946>] ? vfs_write+0x96/0x130
> >  [<c01f8e50>] ? sysfs_write_file+0x0/0x100
> >  [<c01bae8d>] ? sys_write+0x3d/0x70
> >  [<c0104f3b>] ? sysenter_do_call+0x12/0x3f

But like I said, the stack traces seem more unreliable than usual so I
could be totally wrong here.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-22  4:45         ` Vegard Nossum
@ 2008-08-22  9:28           ` Pavel Machek
  2008-08-22  9:34           ` Rafael J. Wysocki
  1 sibling, 0 replies; 21+ messages in thread
From: Pavel Machek @ 2008-08-22  9:28 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Rafael J. Wysocki, Pekka Enberg, Linux Kernel Mailing List,
	Andrew Morton, Jens Axboe, Bartlomiej Zolnierkiewicz

Hi!

> >> Ok, here we have the block layer passing a bad pointer to SLUB this
> >> time. And it's also from the suspend code (although it's the resume
> >> path this time). As we never see an oops from the block layer first,
> >> it's possible that someone else corrupted everything and it just shows
> >> up in the block layer. Maybe something worth investigating,
> >> though.

Do you use any unusual drivers? WHat kind of machine is that?

Can you try with minimal drivers?

Can you try with minimum config? Like smp disabled...? Does echo test > /sys/power/state 
work?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-22  4:45         ` Vegard Nossum
  2008-08-22  9:28           ` Pavel Machek
@ 2008-08-22  9:34           ` Rafael J. Wysocki
  2008-08-22 10:01             ` Bartlomiej Zolnierkiewicz
  1 sibling, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2008-08-22  9:34 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Pekka Enberg, Linux Kernel Mailing List, Andrew Morton,
	Jens Axboe, Bartlomiej Zolnierkiewicz

On Friday, 22 of August 2008, Vegard Nossum wrote:
> On Fri, Aug 22, 2008 at 12:16 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, 21 of August 2008, Pekka Enberg wrote:
> >> > =============================================================================
> >> > BUG blkdev_ioc: Invalid object pointer 0xf5cdaca8
> >> > -----------------------------------------------------------------------------
> >>
> >> Ok, here we have the block layer passing a bad pointer to SLUB this
> >> time. And it's also from the suspend code (although it's the resume
> >> path this time). As we never see an oops from the block layer first,
> >> it's possible that someone else corrupted everything and it just shows
> >> up in the block layer. Maybe something worth investigating, though.
> >>
> >> > INFO: Slab 0xf789e318 objects=14 used=14 fp=0x00000000 flags=0x2082083
> >> > Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
> >> >  [<c01b2576>] slab_err+0x46/0x50
> >> >  [<c01b2766>] ? check_slab+0xd6/0xf0
> >> >  [<c0181aef>] ? call_rcu+0x6f/0x80
> >> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> >> >  [<c01b3c78>] __slab_free+0x238/0x360
> >> >  [<c01b4749>] kmem_cache_free+0xa9/0x120
> >> >  [<c036b773>] ? put_io_context+0x53/0x70
> >> >  [<c036b773>] ? put_io_context+0x53/0x70
> >> >  [<c036b773>] put_io_context+0x53/0x70
> >> >  [<c036b82e>] exit_io_context+0x6e/0x80
> >> >  [<c013de6e>] do_exit+0x84e/0x890
> >> >  [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
> >> >  [<c013b55b>] ? printk+0x1b/0x20
> >> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> >> >  [<c01060f1>] oops_end+0xb1/0xc0
> >> >  [<c01067c0>] die+0x50/0x70
> >> >  [<c0106871>] do_trap+0x91/0xc0
> >> >  [<c0106940>] ? do_invalid_op+0x0/0xa0
> >> >  [<c01069c8>] do_invalid_op+0x88/0xa0
> >> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> >> >  [<c013b2d1>] ? vprintk+0x151/0x3c0
> >> >  [<c013b45b>] ? vprintk+0x2db/0x3c0
> >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >> >  [<c0687d3a>] error_code+0x72/0x78
> >> >  [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
> >> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> >> >  [<c0198721>] unmap_vmas+0x4b1/0x8b0
> >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> >> >  [<c019d504>] exit_mmap+0x84/0x120
> >> >  [<c0138538>] mmput+0x48/0xa0
> >> >  [<c013c3d7>] exit_mm+0xe7/0x110
> >> >  [<c013d7a4>] do_exit+0x184/0x890
> >> >  [<c013b55b>] ? printk+0x1b/0x20
> >> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> >> >  [<c01060f1>] oops_end+0xb1/0xc0
> >> >  [<c01067c0>] die+0x50/0x70
> >> >  [<c0122b4f>] do_page_fault+0x1ef/0xa20
> >> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >> >  [<c0122960>] ? do_page_fault+0x0/0xa20
> >> >  [<c0687d3a>] error_code+0x72/0x78
> >> >  [<c038ad65>] ? __list_add+0x15/0x90
> >> >  [<c0687133>] ? _spin_lock+0x63/0x70
> >> >  [<c018b954>] rmqueue_bulk+0x54/0x80
> >> >  [<c018d317>] get_page_from_freelist+0x5a7/0x720
> >> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> >> >  [<c018dd50>] __alloc_pages_internal+0xa0/0x450
> >> >  [<c01acd4b>] alloc_pages_current+0x7b/0xc0
> >> >  [<c01b37fb>] new_slab+0x1bb/0x2d0
> >> >  [<c0687877>] ? _spin_unlock+0x27/0x50
> >> >  [<c01b40ca>] __slab_alloc+0x32a/0x4e0
> >> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> >> >  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
> >> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> >> >  [<c018969e>] mempool_alloc_slab+0xe/0x10
> >> >  [<c01897a1>] mempool_alloc+0x31/0xf0
> >> >  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
> >> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> >> >  [<c0368c7e>] get_request+0xae/0x2c0
> >> >  [<c036935c>] get_request_wait+0x1c/0xd0
> >> >  [<c0687462>] ? _spin_lock_irq+0x72/0x80
> >> >  [<c0369442>] blk_get_request+0x32/0x70
> >> >  [<c0471c1c>] generic_ide_resume+0x5c/0xf0
> >
> > IDE again?
> >
> > Vegard, this is piix, isn't it?
> 
> If this makes it so, then yes:
> 
> calling  piix_ide_init+0x0/0xb0
> initcall piix_ide_init+0x0/0xb0 returned 0 after 0 msecs
> calling  ide_scan_pcibus+0x0/0xf0
> piix 0000:00:1f.1: IDE controller (0x8086:0x27df rev 0x01)
> piix 0000:00:1f.1: IDE port disabled
> piix 0000:00:1f.1: not 100% native mode: will probe irqs later
>     ide0: BM-DMA at 0xffa0-0xffa7
> Probing IDE interface ide0...
> hda: WDC WD1600BB-00DAA3, ATA DISK drive
> hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
> hda: UDMA/100 mode selected
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> 
> It is also interesting that you mention that; this is from an earlier
> run (before serial console was working properly):
> 
> > In my last run, I managed to get a lot of ascii art on the screen, but
> > also one line which gave me the EIP of the oops:
> 
> > $ addr2line -e vmlinux -i c03724f3
> > block/cfq-iosched.c:1190
> 
> > /*
> >  * Must always be called with the rcu_read_lock() held
> >  */
> > static void
> > __call_for_each_cic(struct io_context *ioc,
> >                    void (*func)(struct io_context *, struct cfq_io_context *))
> > {
> >        struct cfq_io_context *cic;
> >        struct hlist_node *n;
> >
> >        hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) <-- here
> >                func(ioc, cic);
> > }

Hmm.

Would that be possible to switch temporarily to PATA/libata and see if the
problem goes away?  Then, we'd get a strong indication that it really is
related to IDE.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-21 22:10                 ` Rafael J. Wysocki
  2008-08-22  7:02                   ` Pekka Enberg
@ 2008-08-22  9:38                   ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Pavel Machek @ 2008-08-22  9:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pekka Enberg, Vegard Nossum, Linux Kernel Mailing List,
	Andrew Morton, Jens Axboe

Hi!

> > Another possibility is that the block layer is doing something strange here.
> > Dunno.
> 
> I bet on the IDE driver.

Should be easy enough to switch to libata to verify this, right?
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-22  9:34           ` Rafael J. Wysocki
@ 2008-08-22 10:01             ` Bartlomiej Zolnierkiewicz
  2008-08-22 10:15               ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2008-08-22 10:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vegard Nossum, Pekka Enberg, Linux Kernel Mailing List,
	Andrew Morton, Jens Axboe


Hi,

On Friday 22 August 2008, Rafael J. Wysocki wrote:
> On Friday, 22 of August 2008, Vegard Nossum wrote:
> > On Fri, Aug 22, 2008 at 12:16 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > On Thursday, 21 of August 2008, Pekka Enberg wrote:
> > >> > =============================================================================
> > >> > BUG blkdev_ioc: Invalid object pointer 0xf5cdaca8
> > >> > -----------------------------------------------------------------------------
> > >>
> > >> Ok, here we have the block layer passing a bad pointer to SLUB this
> > >> time. And it's also from the suspend code (although it's the resume
> > >> path this time). As we never see an oops from the block layer first,
> > >> it's possible that someone else corrupted everything and it just shows
> > >> up in the block layer. Maybe something worth investigating, though.
> > >>
> > >> > INFO: Slab 0xf789e318 objects=14 used=14 fp=0x00000000 flags=0x2082083
> > >> > Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
> > >> >  [<c01b2576>] slab_err+0x46/0x50
> > >> >  [<c01b2766>] ? check_slab+0xd6/0xf0
> > >> >  [<c0181aef>] ? call_rcu+0x6f/0x80
> > >> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> > >> >  [<c01b3c78>] __slab_free+0x238/0x360
> > >> >  [<c01b4749>] kmem_cache_free+0xa9/0x120
> > >> >  [<c036b773>] ? put_io_context+0x53/0x70
> > >> >  [<c036b773>] ? put_io_context+0x53/0x70
> > >> >  [<c036b773>] put_io_context+0x53/0x70
> > >> >  [<c036b82e>] exit_io_context+0x6e/0x80
> > >> >  [<c013de6e>] do_exit+0x84e/0x890
> > >> >  [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
> > >> >  [<c013b55b>] ? printk+0x1b/0x20
> > >> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> > >> >  [<c01060f1>] oops_end+0xb1/0xc0
> > >> >  [<c01067c0>] die+0x50/0x70
> > >> >  [<c0106871>] do_trap+0x91/0xc0
> > >> >  [<c0106940>] ? do_invalid_op+0x0/0xa0
> > >> >  [<c01069c8>] do_invalid_op+0x88/0xa0
> > >> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> > >> >  [<c013b2d1>] ? vprintk+0x151/0x3c0
> > >> >  [<c013b45b>] ? vprintk+0x2db/0x3c0
> > >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> > >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> > >> >  [<c0687d3a>] error_code+0x72/0x78
> > >> >  [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
> > >> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> > >> >  [<c0198721>] unmap_vmas+0x4b1/0x8b0
> > >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> > >> >  [<c019d504>] exit_mmap+0x84/0x120
> > >> >  [<c0138538>] mmput+0x48/0xa0
> > >> >  [<c013c3d7>] exit_mm+0xe7/0x110
> > >> >  [<c013d7a4>] do_exit+0x184/0x890
> > >> >  [<c013b55b>] ? printk+0x1b/0x20
> > >> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> > >> >  [<c01060f1>] oops_end+0xb1/0xc0
> > >> >  [<c01067c0>] die+0x50/0x70
> > >> >  [<c0122b4f>] do_page_fault+0x1ef/0xa20
> > >> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> > >> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> > >> >  [<c0122960>] ? do_page_fault+0x0/0xa20
> > >> >  [<c0687d3a>] error_code+0x72/0x78
> > >> >  [<c038ad65>] ? __list_add+0x15/0x90
> > >> >  [<c0687133>] ? _spin_lock+0x63/0x70
> > >> >  [<c018b954>] rmqueue_bulk+0x54/0x80
> > >> >  [<c018d317>] get_page_from_freelist+0x5a7/0x720
> > >> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> > >> >  [<c018dd50>] __alloc_pages_internal+0xa0/0x450
> > >> >  [<c01acd4b>] alloc_pages_current+0x7b/0xc0
> > >> >  [<c01b37fb>] new_slab+0x1bb/0x2d0
> > >> >  [<c0687877>] ? _spin_unlock+0x27/0x50
> > >> >  [<c01b40ca>] __slab_alloc+0x32a/0x4e0
> > >> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> > >> >  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
> > >> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> > >> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> > >> >  [<c018969e>] mempool_alloc_slab+0xe/0x10
> > >> >  [<c01897a1>] mempool_alloc+0x31/0xf0
> > >> >  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
> > >> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> > >> >  [<c0368c7e>] get_request+0xae/0x2c0
> > >> >  [<c036935c>] get_request_wait+0x1c/0xd0
> > >> >  [<c0687462>] ? _spin_lock_irq+0x72/0x80
> > >> >  [<c0369442>] blk_get_request+0x32/0x70
> > >> >  [<c0471c1c>] generic_ide_resume+0x5c/0xf0
> > >
> > > IDE again?

"again"?

generic_ide_resume() just tries to get new request from the block
layer which seems to end up being problematic.

> > >
> > > Vegard, this is piix, isn't it?
> > 
> > If this makes it so, then yes:
> > 
> > calling  piix_ide_init+0x0/0xb0
> > initcall piix_ide_init+0x0/0xb0 returned 0 after 0 msecs
> > calling  ide_scan_pcibus+0x0/0xf0
> > piix 0000:00:1f.1: IDE controller (0x8086:0x27df rev 0x01)
> > piix 0000:00:1f.1: IDE port disabled
> > piix 0000:00:1f.1: not 100% native mode: will probe irqs later
> >     ide0: BM-DMA at 0xffa0-0xffa7
> > Probing IDE interface ide0...
> > hda: WDC WD1600BB-00DAA3, ATA DISK drive
> > hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
> > hda: UDMA/100 mode selected
> > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> > 
> > It is also interesting that you mention that; this is from an earlier
> > run (before serial console was working properly):
> > 
> > > In my last run, I managed to get a lot of ascii art on the screen, but
> > > also one line which gave me the EIP of the oops:
> > 
> > > $ addr2line -e vmlinux -i c03724f3
> > > block/cfq-iosched.c:1190
> > 
> > > /*
> > >  * Must always be called with the rcu_read_lock() held
> > >  */
> > > static void
> > > __call_for_each_cic(struct io_context *ioc,
> > >                    void (*func)(struct io_context *, struct cfq_io_context *))
> > > {
> > >        struct cfq_io_context *cic;
> > >        struct hlist_node *n;
> > >
> > >        hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) <-- here
> > >                func(ioc, cic);
> > > }
> 
> Hmm.
> 
> Would that be possible to switch temporarily to PATA/libata and see if the
> problem goes away?  Then, we'd get a strong indication that it really is
> related to IDE.

IIRC besides conversion to blk_{get,put}_request() conversion from
15th July (commit 5b114715ed63f3a4fdf790f5df61364fc4adadf1) there
weren't any PM related IDE changes recently.

Vegard, it would be worth to try if the kernels at the above commit
and at commit 9a2d43b7566caeeeb414aa628bc2759028897dbb (one commit
before the above one) are OK.

If it doesn't give the definitive answer then doing git-bisect run
would be of great help in identifying and fixing the source of the
problem quickly.

Thanks,
Bart

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?)
  2008-08-22 10:01             ` Bartlomiej Zolnierkiewicz
@ 2008-08-22 10:15               ` Rafael J. Wysocki
  0 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2008-08-22 10:15 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Vegard Nossum, Pekka Enberg, Linux Kernel Mailing List,
	Andrew Morton, Jens Axboe

On Friday, 22 of August 2008, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Friday 22 August 2008, Rafael J. Wysocki wrote:
> > On Friday, 22 of August 2008, Vegard Nossum wrote:
> > > On Fri, Aug 22, 2008 at 12:16 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > > On Thursday, 21 of August 2008, Pekka Enberg wrote:
> > > >> > =============================================================================
> > > >> > BUG blkdev_ioc: Invalid object pointer 0xf5cdaca8
> > > >> > -----------------------------------------------------------------------------
> > > >>
> > > >> Ok, here we have the block layer passing a bad pointer to SLUB this
> > > >> time. And it's also from the suspend code (although it's the resume
> > > >> path this time). As we never see an oops from the block layer first,
> > > >> it's possible that someone else corrupted everything and it just shows
> > > >> up in the block layer. Maybe something worth investigating, though.
> > > >>
> > > >> > INFO: Slab 0xf789e318 objects=14 used=14 fp=0x00000000 flags=0x2082083
> > > >> > Pid: 3597, comm: bash Tainted: G      D   2.6.27-rc4-00003-ga798564-dirty #30
> > > >> >  [<c01b2576>] slab_err+0x46/0x50
> > > >> >  [<c01b2766>] ? check_slab+0xd6/0xf0
> > > >> >  [<c0181aef>] ? call_rcu+0x6f/0x80
> > > >> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> > > >> >  [<c01b3c78>] __slab_free+0x238/0x360
> > > >> >  [<c01b4749>] kmem_cache_free+0xa9/0x120
> > > >> >  [<c036b773>] ? put_io_context+0x53/0x70
> > > >> >  [<c036b773>] ? put_io_context+0x53/0x70
> > > >> >  [<c036b773>] put_io_context+0x53/0x70
> > > >> >  [<c036b82e>] exit_io_context+0x6e/0x80
> > > >> >  [<c013de6e>] do_exit+0x84e/0x890
> > > >> >  [<c037b794>] ? trace_hardirqs_on_thunk+0xc/0x10
> > > >> >  [<c013b55b>] ? printk+0x1b/0x20
> > > >> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> > > >> >  [<c01060f1>] oops_end+0xb1/0xc0
> > > >> >  [<c01067c0>] die+0x50/0x70
> > > >> >  [<c0106871>] do_trap+0x91/0xc0
> > > >> >  [<c0106940>] ? do_invalid_op+0x0/0xa0
> > > >> >  [<c01069c8>] do_invalid_op+0x88/0xa0
> > > >> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> > > >> >  [<c013b2d1>] ? vprintk+0x151/0x3c0
> > > >> >  [<c013b45b>] ? vprintk+0x2db/0x3c0
> > > >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> > > >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> > > >> >  [<c0687d3a>] error_code+0x72/0x78
> > > >> >  [<c013007b>] ? sched_rt_period_timer+0x21b/0x270
> > > >> >  [<c01a0f39>] ? page_remove_rmap+0x109/0x120
> > > >> >  [<c0198721>] unmap_vmas+0x4b1/0x8b0
> > > >> >  [<c015c5ea>] ? print_lock_contention_bug+0x1a/0xe0
> > > >> >  [<c019d504>] exit_mmap+0x84/0x120
> > > >> >  [<c0138538>] mmput+0x48/0xa0
> > > >> >  [<c013c3d7>] exit_mm+0xe7/0x110
> > > >> >  [<c013d7a4>] do_exit+0x184/0x890
> > > >> >  [<c013b55b>] ? printk+0x1b/0x20
> > > >> >  [<c013a50a>] ? print_oops_end_marker+0x2a/0x30
> > > >> >  [<c01060f1>] oops_end+0xb1/0xc0
> > > >> >  [<c01067c0>] die+0x50/0x70
> > > >> >  [<c0122b4f>] do_page_fault+0x1ef/0xa20
> > > >> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> > > >> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> > > >> >  [<c0122960>] ? do_page_fault+0x0/0xa20
> > > >> >  [<c0687d3a>] error_code+0x72/0x78
> > > >> >  [<c038ad65>] ? __list_add+0x15/0x90
> > > >> >  [<c0687133>] ? _spin_lock+0x63/0x70
> > > >> >  [<c018b954>] rmqueue_bulk+0x54/0x80
> > > >> >  [<c018d317>] get_page_from_freelist+0x5a7/0x720
> > > >> >  [<c01600ea>] ? __lock_acquire+0x27a/0xa00
> > > >> >  [<c018dd50>] __alloc_pages_internal+0xa0/0x450
> > > >> >  [<c01acd4b>] alloc_pages_current+0x7b/0xc0
> > > >> >  [<c01b37fb>] new_slab+0x1bb/0x2d0
> > > >> >  [<c0687877>] ? _spin_unlock+0x27/0x50
> > > >> >  [<c01b40ca>] __slab_alloc+0x32a/0x4e0
> > > >> >  [<c010b335>] ? native_sched_clock+0xb5/0x110
> > > >> >  [<c01b4424>] kmem_cache_alloc+0xb4/0xe0
> > > >> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> > > >> >  [<c018969e>] ? mempool_alloc_slab+0xe/0x10
> > > >> >  [<c018969e>] mempool_alloc_slab+0xe/0x10
> > > >> >  [<c01897a1>] mempool_alloc+0x31/0xf0
> > > >> >  [<c015e884>] ? trace_hardirqs_on_caller+0xd4/0x160
> > > >> >  [<c015e91b>] ? trace_hardirqs_on+0xb/0x10
> > > >> >  [<c0368c7e>] get_request+0xae/0x2c0
> > > >> >  [<c036935c>] get_request_wait+0x1c/0xd0
> > > >> >  [<c0687462>] ? _spin_lock_irq+0x72/0x80
> > > >> >  [<c0369442>] blk_get_request+0x32/0x70
> > > >> >  [<c0471c1c>] generic_ide_resume+0x5c/0xf0
> > > >
> > > > IDE again?
> 
> "again"?

There was another trace in which IDE was also involved.

Sorry for the confusion.

Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-08-22 10:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-21 17:28 latest -git: suspend: unable to handle kernel paging request (was Re: no_console_suspend doesn't work?) Vegard Nossum
2008-08-21 18:10 ` Rafael J. Wysocki
2008-08-21 18:21   ` Vegard Nossum
2008-08-21 18:45     ` Vegard Nossum
2008-08-21 19:13       ` Rafael J. Wysocki
2008-08-21 19:16         ` Vegard Nossum
2008-08-21 19:37           ` Rafael J. Wysocki
2008-08-21 20:08             ` Vegard Nossum
2008-08-21 20:22               ` Rafael J. Wysocki
2008-08-21 20:26                 ` Andrew Morton
2008-08-21 21:21               ` Pekka Enberg
2008-08-21 22:10                 ` Rafael J. Wysocki
2008-08-22  7:02                   ` Pekka Enberg
2008-08-22  9:38                   ` Pavel Machek
2008-08-21 21:27     ` Pekka Enberg
2008-08-21 22:16       ` Rafael J. Wysocki
2008-08-22  4:45         ` Vegard Nossum
2008-08-22  9:28           ` Pavel Machek
2008-08-22  9:34           ` Rafael J. Wysocki
2008-08-22 10:01             ` Bartlomiej Zolnierkiewicz
2008-08-22 10:15               ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).