2.6.21-rc4-rt0 BUG: at kernel/fork.c:1033 copy

All of lore.kernel.org
 help / color / mirror / Atom feed

* 2.6.21-rc4-rt0 BUG: at kernel/fork.c:1033 copy_process()
@ 2007-03-21 20:38 Michal Piotrowski
  2007-03-22  9:31 ` [patch] setup_boot_APIC_clock() irq-enable fix Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-21 20:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

Hi Ingo,

It might be lockdep related

-#ifdef CONFIG_TRACE_IRQFLAGS
+#if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_LOCKDEP)

BUG: at kernel/fork.c:1033 copy_process()
 [<c0105474>] dump_trace+0x78/0x21a
 [<c010564b>] show_trace_log_lvl+0x35/0x54
 [<c0105ddc>] show_trace+0x2c/0x2e
 [<c0105ea3>] dump_stack+0x29/0x2b
 [<c01234e0>] copy_process+0x1d1/0x1440
 [<c01249fc>] do_fork+0xa8/0x18f
 [<c010239e>] kernel_thread+0x93/0x99
 [<c01393ec>] keventd_create_kthread+0x2f/0x7c
 [<c01394ae>] kthread_create+0x75/0xbb
 [<c011feba>] migration_call+0x5c/0x3cf
 [<c04ddb54>] migration_init+0x2b/0x62
  [<c04d14c5>] init+0x56/0x361
 [<c0104ffb>] kernel_thread_helper+0x7/0x10
 =======================
---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:
----------------------------------------

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/rt-config
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/rt-dmesg

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch] setup_boot_APIC_clock() irq-enable fix
  2007-03-21 20:38 2.6.21-rc4-rt0 BUG: at kernel/fork.c:1033 copy_process() Michal Piotrowski
@ 2007-03-22  9:31 ` Ingo Molnar
  2007-03-22 10:56   ` Thomas Gleixner
  2007-03-22 12:57   ` Michal Piotrowski
  0 siblings, 2 replies; 12+ messages in thread
From: Ingo Molnar @ 2007-03-22  9:31 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: LKML, Linus Torvalds, Thomas Gleixner, Andrew Morton


* Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:

> Hi Ingo,

> 2.6.21-rc4-rt0

> BUG: at kernel/fork.c:1033 copy_process()

thanks Michal - this is a real bug that affects upstream too. Find the 
fix below - i've test-booted it and it fixes the warning.

Linus, Andrew, this is a must-have for v2.6.21.

	Ingo

--------------->
Subject: [patch] setup_boot_APIC_clock() irq-enable fix
From: Ingo Molnar <mingo@elte.hu>

latest -git triggers an irqtrace/lockdep warning of a leaked
irqs-off condition:

  BUG: at kernel/fork.c:1033 copy_process()

after some debugging it turns out that commit ca1b940c accidentally left 
interrupts disabled - which trickled down all the way to the first time 
we fork a kernel thread and triggered the warning.

the fix is to re-enable interrupts in the 'else' branch of 
setup_boot_APIC_clock()'s pmtimers calibration path.

Reported-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/i386/kernel/apic.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/arch/i386/kernel/apic.c
===================================================================
--- linux.orig/arch/i386/kernel/apic.c
+++ linux/arch/i386/kernel/apic.c
@@ -507,7 +507,8 @@ void __init setup_boot_APIC_clock(void)
 			apic_printk(APIC_VERBOSE, "... jiffies result ok\n");
 		else
 			local_apic_timer_verify_ok = 0;
-	}
+	} else
+		local_irq_enable();
 
 	if (!local_apic_timer_verify_ok) {
 		printk(KERN_WARNING

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] setup_boot_APIC_clock() irq-enable fix
  2007-03-22  9:31 ` [patch] setup_boot_APIC_clock() irq-enable fix Ingo Molnar
@ 2007-03-22 10:56   ` Thomas Gleixner
  2007-03-22 12:57   ` Michal Piotrowski
  1 sibling, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2007-03-22 10:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Michal Piotrowski, LKML, Linus Torvalds, Andrew Morton

On Thu, 2007-03-22 at 10:31 +0100, Ingo Molnar wrote:
> Subject: [patch] setup_boot_APIC_clock() irq-enable fix
> From: Ingo Molnar <mingo@elte.hu>
> 
> latest -git triggers an irqtrace/lockdep warning of a leaked
> irqs-off condition:
> 
>   BUG: at kernel/fork.c:1033 copy_process()
> 
> after some debugging it turns out that commit ca1b940c accidentally left 
> interrupts disabled - which trickled down all the way to the first time 
> we fork a kernel thread and triggered the warning.
> 
> the fix is to re-enable interrupts in the 'else' branch of 
> setup_boot_APIC_clock()'s pmtimers calibration path.
> 
> Reported-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>

Acked-by: Thomas Gleixner <tglx@brown.paperbag.linutronix.de>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] setup_boot_APIC_clock() irq-enable fix
  2007-03-22  9:31 ` [patch] setup_boot_APIC_clock() irq-enable fix Ingo Molnar
  2007-03-22 10:56   ` Thomas Gleixner
@ 2007-03-22 12:57   ` Michal Piotrowski
  2007-03-22 13:27     ` 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix) Michal Piotrowski
  1 sibling, 1 reply; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-22 12:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, Linus Torvalds, Thomas Gleixner, Andrew Morton

On 22/03/07, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
>
> > Hi Ingo,
>
> > 2.6.21-rc4-rt0
>
> > BUG: at kernel/fork.c:1033 copy_process()
>
> thanks Michal - this is a real bug that affects upstream too. Find the
> fix below - i've test-booted it and it fixes the warning.

Problem is fixed, thanks.

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix)
  2007-03-22 12:57   ` Michal Piotrowski
@ 2007-03-22 13:27     ` Michal Piotrowski
  2007-03-23  5:25       ` Vivek Goyal
  2007-03-23  7:15       ` 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix) Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-22 13:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, LKML

Michal Piotrowski napisał(a):
> On 22/03/07, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
>>
>> > Hi Ingo,
>>
>> > 2.6.21-rc4-rt0
>>
>> > BUG: at kernel/fork.c:1033 copy_process()
>>
>> thanks Michal - this is a real bug that affects upstream too. Find the
>> fix below - i've test-booted it and it fixes the warning.
> 
> Problem is fixed, thanks.

BTW. It seems that nobody uses -rt as a crash dump kernel ;)

BUG: unable to handle kernel paging request at virtual address f7ebf8c4
 printing eip:
c1610192
*pde = 00000000
stopped custom tracer.
Oops: 0000 [#1]
PREEMPT 
Modules linked in:
CPU:    0
EIP:    0060:[<c1610192>]    Not tainted VLI
EFLAGS: 00010206   (2.6.21-rc4-rt0-kdump #3)
EIP is at copy_oldmem_page+0x4a/0xd0
eax: 000008c4   ebx: f7ebf000   ecx: 00000100   edx: 00000246
esi: f7ebf8c4   edi: c4c520fc   ebp: c4d54e30   esp: c4d54e18
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068  preempt:00000001
Process swapper (pid: 1, ti=c4d54000 task=c4d52c20 task.ti=c4d54000)
Stack: c17ab7e0 c183f982 c1969658 00000400 00000400 00037ebf c4d54e5c c16af187 
       00037ebf c4c520fc 00000400 000008c4 00000000 00000000 c4c696e0 00000400 
       c4c520fc c4d54f94 c19a9cfd c4c520fc 00000400 c4d54f78 00000000 c1840996 
Call Trace:
 [<c16af187>] read_from_oldmem+0x73/0x98
 [<c19a9cfd>] vmcore_init+0x26c/0xab7
 [<c199979b>] init+0xaa/0x287
 [<c16044eb>] kernel_thread_helper+0x7/0x10
 =======================

l *copy_oldmem_page+0x4a/0xd0
0xc1610148 is in copy_oldmem_page (arch/i386/kernel/crash_dump.c:35).
30       * copying the data to a pre-allocated kernel page and then copying to user
31       * space in non-atomic context.
32       */
33      ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
34                                     size_t csize, unsigned long offset, int userbuf)
35      {
36              void  *vaddr;
37
38              if (!csize)
39                      return 0;


---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c184045a>] .... __spin_lock_irqsave+0x23/0x65
.....[<c1604f8c>] ..   ( <= die+0x44/0x24d)

l *0xc184045a
0xc184045a is in __spin_lock_irqsave (kernel/spinlock.c:122).
117     {
118             unsigned long flags;
119
120             local_irq_save(flags);
121             preempt_disable();
122             spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
123             /*
124              * On lockdep we dont want the hand-coded irq-enable of
125              * _raw_spin_lock_flags() code, because lockdep assumes
126              * that interrupts are not re-enabled during lock-acquire:

l *0xc1604f8c
0xc1604f8c is in die (arch/i386/kernel/traps.c:477).
472
473             oops_enter();
474
475             if (die.lock_owner != raw_smp_processor_id()) {
476                     console_verbose();
477                     spin_lock_irqsave(&die.lock, flags);
478                     die.lock_owner = smp_processor_id();
479                     die.lock_owner_depth = 0;
480                     bust_spinlocks(1);
481             }


Code: 10 05 00 c1 e3 05 03 1d 60 8e d6 c1 89 1c 24 e8 fc 33 00 00 89 c3 83 7d 18 00 75 2a 8b 4d 10 c1 e9 02 8b 45 14 8d 34 03 8b 7d 0c <f3> a5 8b 4d 10 83 e1 03 74 02 f3 a4 e8 cb 10 05 00 89 1c 24 e8 
EIP: [<c1610192>] copy_oldmem_page+0x4a/0xd0 SS:ESP 0068:c4d54e18
Kernel panic - not syncing: Attempted to kill init!
 [<c160496d>] dump_trace+0x78/0x21a
 [<c1604b44>] show_trace_log_lvl+0x35/0x54
 [<c16052c4>] show_trace+0x2c/0x2e
 [<c160538b>] dump_stack+0x29/0x2b
 [<c1618b30>] panic+0x68/0x130
 [<c161b67c>] do_exit+0xa1/0x7e3
 [<c160516a>] die+0x222/0x24d
 [<c1612a3f>] do_page_fault+0x4a1/0x586
 [<c1841044>] error_code+0x74/0x7c
 [<c1610192>] copy_oldmem_page+0x4a/0xd0
 [<c16af187>] read_from_oldmem+0x73/0x98
 [<c19a9cfd>] vmcore_init+0x26c/0xab7
 [<c199979b>] init+0xaa/0x287
 [<c16044eb>] kernel_thread_helper+0x7/0x10
 =======================
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c1618ae6>] .... panic+0x1e/0x130
.....[<c161b67c>] ..   ( <= do_exit+0xa1/0x7e3)

l *0xc1618ae6
0xc1618ae6 is in panic (kernel/panic.c:85).
80               * have preempt disabled. Some functions called from here want
81               * preempt to be disabled. No point enabling it later though...
82               */
83              preempt_disable();
84
85              bust_spinlocks(1);
86              va_start(args, fmt);
87              vsnprintf(buf, sizeof(buf), fmt, args);
88              va_end(args);
89              printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);

l *0xc161b67c
0xc161b67c is in do_exit (include/linux/pid_namespace.h:42).
37              kref_put(&ns->kref, free_pid_ns);
38      }
39
40      static inline struct task_struct *child_reaper(struct task_struct *tsk)
41      {
42              return init_pid_ns.child_reaper;
43      }
44
45      #endif /* _LINUX_PID_NS_H */

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/kdump-console.log
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/kdump-config


NOHZ: local_softirq_pending 02 on CPU#1
NOHZ: local_softirq_pending 02 on CPU#0
NOHZ: local_softirq_pending 08 on CPU#0
NOHZ: local_softirq_pending 02 on CPU#1
NOHZ: local_softirq_pending 10 on CPU#0
NOHZ: local_softirq_pending 02 on CPU#0
NOHZ: local_softirq_pending 02 on CPU#1
NOHZ: local_softirq_pending 08 on CPU#0

           CPU0       CPU1
  0:        304          0   IO-APIC-edge      timer
  1:       2319          0   IO-APIC-edge      i8042
  7:          0          0   IO-APIC-edge      parport0
  8:          1          0   IO-APIC-edge      rtc
  9:          1          0   IO-APIC-fasteoi   acpi
 12:          3          0   IO-APIC-edge      i8042
 14:        738          0   IO-APIC-edge      ide0
 15:       3050          0   IO-APIC-edge      ide1
 16:      23802          0   IO-APIC-fasteoi   uhci_hcd:usb2, uhci_hcd:usb5
 17:      33123          0   IO-APIC-fasteoi   eth1
 19:      29280          0   IO-APIC-fasteoi   libata, uhci_hcd:usb4
 20:          2          0   IO-APIC-fasteoi   ehci_hcd:usb1
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 22:      38572          0   IO-APIC-fasteoi   Intel ICH5
NMI:          0          0
LOC:     351305     268386
ERR:          0
MIS:          0

Hibernation is still broken.

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/console.log
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/rt-config

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix)
  2007-03-22 13:27     ` 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix) Michal Piotrowski
@ 2007-03-23  5:25       ` Vivek Goyal
  2007-03-23  8:23         ` 2.6.21-rc4-rt0-kdump Michal Piotrowski
  2007-03-23 12:10         ` 2.6.21-rc4-rt0-kdump Michal Piotrowski
  2007-03-23  7:15       ` 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix) Ingo Molnar
  1 sibling, 2 replies; 12+ messages in thread
From: Vivek Goyal @ 2007-03-23  5:25 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: Ingo Molnar, Thomas Gleixner, LKML

On Thu, Mar 22, 2007 at 02:27:25PM +0100, Michal Piotrowski wrote:
> Michal Piotrowski napisał(a):
> > On 22/03/07, Ingo Molnar <mingo@elte.hu> wrote:
> >>
> >> * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
> >>
> >> > Hi Ingo,
> >>
> >> > 2.6.21-rc4-rt0
> >>
> >> > BUG: at kernel/fork.c:1033 copy_process()
> >>
> >> thanks Michal - this is a real bug that affects upstream too. Find the
> >> fix below - i've test-booted it and it fixes the warning.
> > 
> > Problem is fixed, thanks.
> 
> BTW. It seems that nobody uses -rt as a crash dump kernel ;)
> 
> BUG: unable to handle kernel paging request at virtual address f7ebf8c4
>  printing eip:
> c1610192
> *pde = 00000000
> stopped custom tracer.
> Oops: 0000 [#1]
> PREEMPT 
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c1610192>]    Not tainted VLI
> EFLAGS: 00010206   (2.6.21-rc4-rt0-kdump #3)
> EIP is at copy_oldmem_page+0x4a/0xd0
> eax: 000008c4   ebx: f7ebf000   ecx: 00000100   edx: 00000246
> esi: f7ebf8c4   edi: c4c520fc   ebp: c4d54e30   esp: c4d54e18
> ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068  preempt:00000001
> Process swapper (pid: 1, ti=c4d54000 task=c4d52c20 task.ti=c4d54000)
> Stack: c17ab7e0 c183f982 c1969658 00000400 00000400 00037ebf c4d54e5c c16af187 
>        00037ebf c4c520fc 00000400 000008c4 00000000 00000000 c4c696e0 00000400 
>        c4c520fc c4d54f94 c19a9cfd c4c520fc 00000400 c4d54f78 00000000 c1840996 
> Call Trace:
>  [<c16af187>] read_from_oldmem+0x73/0x98
>  [<c19a9cfd>] vmcore_init+0x26c/0xab7
>  [<c199979b>] init+0xaa/0x287
>  [<c16044eb>] kernel_thread_helper+0x7/0x10
>  =======================
> 
> l *copy_oldmem_page+0x4a/0xd0
> 0xc1610148 is in copy_oldmem_page (arch/i386/kernel/crash_dump.c:35).
> 30       * copying the data to a pre-allocated kernel page and then copying to user
> 31       * space in non-atomic context.
> 32       */
> 33      ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> 34                                     size_t csize, unsigned long offset, int userbuf)
> 35      {
> 36              void  *vaddr;
> 37
> 38              if (!csize)
> 39                      return 0;
> 

Can you please paste the disassembly of copy_oldmem_page() on your system.
Not sure from where this faulting address 0xf7ebf8c4 is coming. We are still
in vmcore_init(), so we should be copying the data to kernel buffers only.
This looks like a valid kernel address.

Can you also put some printk() here to find out from where 0xf7ebf8c4 has
come? It does not look like a fixed kernel virutual address returned by
kmap_atomic_pfn(). Then is it passed by kernel as a parameter to
copy_oldmem_page()?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix)
  2007-03-22 13:27     ` 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix) Michal Piotrowski
  2007-03-23  5:25       ` Vivek Goyal
@ 2007-03-23  7:15       ` Ingo Molnar
  2007-03-23  7:58         ` Michal Piotrowski
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-03-23  7:15 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: Thomas Gleixner, LKML


* Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:

> >> > BUG: at kernel/fork.c:1033 copy_process()
> >>
> >> thanks Michal - this is a real bug that affects upstream too. Find 
> >> the fix below - i've test-booted it and it fixes the warning.
> > 
> > Problem is fixed, thanks.
> 
> BTW. It seems that nobody uses -rt as a crash dump kernel ;)

it's been tested with v2.6.20-rt8, and it should work as long as you 
enable CONFIG_RELOCATABLE. But i'm not using it myself, and 
v2.6.21-rc4-rt0 isnt a particularly encouraging version string for 
people to try ;)

> Hibernation is still broken.
> 
> http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/console.log
> http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/rt-config

what's the failure mode besides the lockdep + other debug messages - it 
doesnt resume? Your log seems to have at least one sequence of resume 
related messages - those seem to have worked fine.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix)
  2007-03-23  7:15       ` 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix) Ingo Molnar
@ 2007-03-23  7:58         ` Michal Piotrowski
  2007-03-23  8:02           ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-23  7:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, LKML

On 23/03/07, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
>
> > >> > BUG: at kernel/fork.c:1033 copy_process()
> > >>
> > >> thanks Michal - this is a real bug that affects upstream too. Find
> > >> the fix below - i've test-booted it and it fixes the warning.
> > >
> > > Problem is fixed, thanks.
> >
> > BTW. It seems that nobody uses -rt as a crash dump kernel ;)
>
> it's been tested with v2.6.20-rt8, and it should work as long as you
> enable CONFIG_RELOCATABLE. But i'm not using it myself, and
> v2.6.21-rc4-rt0 isnt a particularly encouraging version string for
> people to try ;)
>
> > Hibernation is still broken.
> >
> > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/console.log
> > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/rt-config
>
> what's the failure mode besides the lockdep + other debug messages - it
> doesnt resume? Your log seems to have at least one sequence of resume
> related messages - those seem to have worked fine.

Kernel has crashed after

Read 497936 kbytes in 23.09 seconds (21.56 MB/s)
swsusp: Reading resume file was successful
PM: Preparing devices for restore.
Suspending console(s)

>
>         Ingo
>

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix)
  2007-03-23  7:58         ` Michal Piotrowski
@ 2007-03-23  8:02           ` Ingo Molnar
  2007-03-23  8:17             ` Michal Piotrowski
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-03-23  8:02 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: Thomas Gleixner, LKML


* Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:

> >> Hibernation is still broken.
> >>
> >> 
> >http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/console.log
> >> 
> >http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/rt-config
> >
> >what's the failure mode besides the lockdep + other debug messages - 
> >it doesnt resume? Your log seems to have at least one sequence of 
> >resume related messages - those seem to have worked fine.
> 
> Kernel has crashed after

crashed == 'hung hard' or 'spontaneous reboot' or 'other'?

> Read 497936 kbytes in 23.09 seconds (21.56 MB/s)
> swsusp: Reading resume file was successful
> PM: Preparing devices for restore.
> Suspending console(s)

i havent used sw-suspend for a while, so here's a stupid question: why 
does it try to suspend the consoles in the resume path? I assume the 
messages above mean that we are already in the resume path?

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix)
  2007-03-23  8:02           ` Ingo Molnar
@ 2007-03-23  8:17             ` Michal Piotrowski
  0 siblings, 0 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-23  8:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, LKML

On 23/03/07, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
>
> > >> Hibernation is still broken.
> > >>
> > >>
> > >http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/console.log
> > >>
> > >http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/rt-config
> > >
> > >what's the failure mode besides the lockdep + other debug messages -
> > >it doesnt resume? Your log seems to have at least one sequence of
> > >resume related messages - those seem to have worked fine.
> >
> > Kernel has crashed after
>
> crashed == 'hung hard' or 'spontaneous reboot' or 'other'?

It means 'nothing has happened for a five minutes'.

>
> > Read 497936 kbytes in 23.09 seconds (21.56 MB/s)
> > swsusp: Reading resume file was successful
> > PM: Preparing devices for restore.
> > Suspending console(s)
>
> i havent used sw-suspend for a while, so here's a stupid question: why
> does it try to suspend the consoles in the resume path? I assume the
> messages above mean that we are already in the resume path?

Yes. (I'm not sure, but "Suspending console(s)" might come from PM_DEBUG.)

>
>         Ingo
>

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc4-rt0-kdump
  2007-03-23  5:25       ` Vivek Goyal
@ 2007-03-23  8:23         ` Michal Piotrowski
  2007-03-23 12:10         ` 2.6.21-rc4-rt0-kdump Michal Piotrowski
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-23  8:23 UTC (permalink / raw)
  To: vgoyal; +Cc: Michal Piotrowski, Ingo Molnar, Thomas Gleixner, LKML

Vivek Goyal napisał(a):
> On Thu, Mar 22, 2007 at 02:27:25PM +0100, Michal Piotrowski wrote:
>> Michal Piotrowski napisał(a):
>>> On 22/03/07, Ingo Molnar <mingo@elte.hu> wrote:
>>>> * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
>>>>
>>>>> Hi Ingo,
>>>>> 2.6.21-rc4-rt0
>>>>> BUG: at kernel/fork.c:1033 copy_process()
>>>> thanks Michal - this is a real bug that affects upstream too. Find the
>>>> fix below - i've test-booted it and it fixes the warning.
>>> Problem is fixed, thanks.
>> BTW. It seems that nobody uses -rt as a crash dump kernel ;)
>>
>> BUG: unable to handle kernel paging request at virtual address f7ebf8c4
>>  printing eip:
>> c1610192
>> *pde = 00000000
>> stopped custom tracer.
>> Oops: 0000 [#1]
>> PREEMPT 
>> Modules linked in:
>> CPU:    0
>> EIP:    0060:[<c1610192>]    Not tainted VLI
>> EFLAGS: 00010206   (2.6.21-rc4-rt0-kdump #3)
>> EIP is at copy_oldmem_page+0x4a/0xd0
>> eax: 000008c4   ebx: f7ebf000   ecx: 00000100   edx: 00000246
>> esi: f7ebf8c4   edi: c4c520fc   ebp: c4d54e30   esp: c4d54e18
>> ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068  preempt:00000001
>> Process swapper (pid: 1, ti=c4d54000 task=c4d52c20 task.ti=c4d54000)
>> Stack: c17ab7e0 c183f982 c1969658 00000400 00000400 00037ebf c4d54e5c c16af187 
>>        00037ebf c4c520fc 00000400 000008c4 00000000 00000000 c4c696e0 00000400 
>>        c4c520fc c4d54f94 c19a9cfd c4c520fc 00000400 c4d54f78 00000000 c1840996 
>> Call Trace:
>>  [<c16af187>] read_from_oldmem+0x73/0x98
>>  [<c19a9cfd>] vmcore_init+0x26c/0xab7
>>  [<c199979b>] init+0xaa/0x287
>>  [<c16044eb>] kernel_thread_helper+0x7/0x10
>>  =======================
>>
>> l *copy_oldmem_page+0x4a/0xd0
>> 0xc1610148 is in copy_oldmem_page (arch/i386/kernel/crash_dump.c:35).
>> 30       * copying the data to a pre-allocated kernel page and then copying to user
>> 31       * space in non-atomic context.
>> 32       */
>> 33      ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>> 34                                     size_t csize, unsigned long offset, int userbuf)
>> 35      {
>> 36              void  *vaddr;
>> 37
>> 38              if (!csize)
>> 39                      return 0;
>>
> 
> Can you please paste the disassembly of copy_oldmem_page() on your system.

disassemble *copy_oldmem_page
Dump of assembler code for function copy_oldmem_page:
0xc1610148 <copy_oldmem_page+0>:        push   %ebp
0xc1610149 <copy_oldmem_page+1>:        mov    %esp,%ebp
0xc161014b <copy_oldmem_page+3>:        push   %edi
0xc161014c <copy_oldmem_page+4>:        push   %esi
0xc161014d <copy_oldmem_page+5>:        push   %ebx
0xc161014e <copy_oldmem_page+6>:        sub    $0xc,%esp
0xc1610151 <copy_oldmem_page+9>:        call   0xc160f5c4 <mcount>
0xc1610156 <copy_oldmem_page+14>:       mov    0x8(%ebp),%ebx
0xc1610159 <copy_oldmem_page+17>:       xor    %edx,%edx
0xc161015b <copy_oldmem_page+19>:       cmpl   $0x0,0x10(%ebp)
0xc161015f <copy_oldmem_page+23>:       je     0xc161020d <copy_oldmem_page+197>
0xc1610165 <copy_oldmem_page+29>:       call   0xc1661258 <pagefault_disable>
0xc161016a <copy_oldmem_page+34>:       shl    $0x5,%ebx
0xc161016d <copy_oldmem_page+37>:       add    0xc1d68e60,%ebx
0xc1610173 <copy_oldmem_page+43>:       mov    %ebx,(%esp)
0xc1610176 <copy_oldmem_page+46>:       call   0xc1613577 <kmap>
0xc161017b <copy_oldmem_page+51>:       mov    %eax,%ebx
0xc161017d <copy_oldmem_page+53>:       cmpl   $0x0,0x18(%ebp)
0xc1610181 <copy_oldmem_page+57>:       jne    0xc16101ad <copy_oldmem_page+101>
0xc1610183 <copy_oldmem_page+59>:       mov    0x10(%ebp),%ecx
0xc1610186 <copy_oldmem_page+62>:       shr    $0x2,%ecx
0xc1610189 <copy_oldmem_page+65>:       mov    0x14(%ebp),%eax
0xc161018c <copy_oldmem_page+68>:       lea    (%ebx,%eax,1),%esi
0xc161018f <copy_oldmem_page+71>:       mov    0xc(%ebp),%edi
0xc1610192 <copy_oldmem_page+74>:       rep movsl %ds:(%esi),%es:(%edi)
0xc1610194 <copy_oldmem_page+76>:       mov    0x10(%ebp),%ecx
0xc1610197 <copy_oldmem_page+79>:       and    $0x3,%ecx
0xc161019a <copy_oldmem_page+82>:       je     0xc161019e <copy_oldmem_page+86>
0xc161019c <copy_oldmem_page+84>:       rep movsb %ds:(%esi),%es:(%edi)
0xc161019e <copy_oldmem_page+86>:       call   0xc166126e <pagefault_enable>
0xc16101a3 <copy_oldmem_page+91>:       mov    %ebx,(%esp)
0xc16101a6 <copy_oldmem_page+94>:       call   0xc1613533 <kunmap_virt>
0xc16101ab <copy_oldmem_page+99>:       jmp    0xc161020a <copy_oldmem_page+194>
0xc16101ad <copy_oldmem_page+101>:      mov    0xc19d4004,%edi
0xc16101b3 <copy_oldmem_page+107>:      test   %edi,%edi
0xc16101b5 <copy_oldmem_page+109>:      jne    0xc16101ca <copy_oldmem_page+130>
0xc16101b7 <copy_oldmem_page+111>:      movl   $0xc18bbc93,(%esp)
0xc16101be <copy_oldmem_page+118>:      call   0xc1619671 <printk>
0xc16101c3 <copy_oldmem_page+123>:      mov    $0xfffffff2,%edx
0xc16101c8 <copy_oldmem_page+128>:      jmp    0xc161020d <copy_oldmem_page+197>
0xc16101ca <copy_oldmem_page+130>:      mov    $0x400,%ecx
0xc16101cf <copy_oldmem_page+135>:      mov    %eax,%esi
0xc16101d1 <copy_oldmem_page+137>:      rep movsl %ds:(%esi),%es:(%edi)
0xc16101d3 <copy_oldmem_page+139>:      call   0xc166126e <pagefault_enable>
0xc16101d8 <copy_oldmem_page+144>:      mov    %ebx,(%esp)
0xc16101db <copy_oldmem_page+147>:      call   0xc1613533 <kunmap_virt>
0xc16101e0 <copy_oldmem_page+152>:      mov    0x10(%ebp),%eax
0xc16101e3 <copy_oldmem_page+155>:      mov    %eax,0x8(%esp)
0xc16101e7 <copy_oldmem_page+159>:      mov    0xc19d4004,%eax
0xc16101ec <copy_oldmem_page+164>:      add    %eax,0x14(%ebp)
0xc16101ef <copy_oldmem_page+167>:      mov    0x14(%ebp),%eax
0xc16101f2 <copy_oldmem_page+170>:      mov    %eax,0x4(%esp)
0xc16101f6 <copy_oldmem_page+174>:      mov    0xc(%ebp),%eax
0xc16101f9 <copy_oldmem_page+177>:      mov    %eax,(%esp)
0xc16101fc <copy_oldmem_page+180>:      call   0xc1700e98 <copy_to_user>
0xc1610201 <copy_oldmem_page+185>:      mov    $0xfffffff2,%edx
0xc1610206 <copy_oldmem_page+190>:      test   %eax,%eax
0xc1610208 <copy_oldmem_page+192>:      jne    0xc161020d <copy_oldmem_page+197>
0xc161020a <copy_oldmem_page+194>:      mov    0x10(%ebp),%edx
0xc161020d <copy_oldmem_page+197>:      mov    %edx,%eax
0xc161020f <copy_oldmem_page+199>:      add    $0xc,%esp
0xc1610212 <copy_oldmem_page+202>:      pop    %ebx
0xc1610213 <copy_oldmem_page+203>:      pop    %esi
0xc1610214 <copy_oldmem_page+204>:      pop    %edi
0xc1610215 <copy_oldmem_page+205>:      pop    %ebp
0xc1610216 <copy_oldmem_page+206>:      ret
End of assembler dump.


> Not sure from where this faulting address 0xf7ebf8c4 is coming. We are still
> in vmcore_init(), so we should be copying the data to kernel buffers only.
> This looks like a valid kernel address.
> 
> Can you also put some printk() here to find out from where 0xf7ebf8c4 has
> come? It does not look like a fixed kernel virutual address returned by
> kmap_atomic_pfn(). Then is it passed by kernel as a parameter to
> copy_oldmem_page()?
> 
> Thanks
> Vivek
> 

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.21-rc4-rt0-kdump
  2007-03-23  5:25       ` Vivek Goyal
  2007-03-23  8:23         ` 2.6.21-rc4-rt0-kdump Michal Piotrowski
@ 2007-03-23 12:10         ` Michal Piotrowski
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-03-23 12:10 UTC (permalink / raw)
  To: vgoyal; +Cc: Michal Piotrowski, Ingo Molnar, Thomas Gleixner, LKML

Vivek Goyal napisał(a):
> On Thu, Mar 22, 2007 at 02:27:25PM +0100, Michal Piotrowski wrote:
>> Michal Piotrowski napisał(a):
>>> On 22/03/07, Ingo Molnar <mingo@elte.hu> wrote:
>>>> * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
>>>>
>>>>> Hi Ingo,
>>>>> 2.6.21-rc4-rt0
>>>>> BUG: at kernel/fork.c:1033 copy_process()
>>>> thanks Michal - this is a real bug that affects upstream too. Find the
>>>> fix below - i've test-booted it and it fixes the warning.
>>> Problem is fixed, thanks.
>> BTW. It seems that nobody uses -rt as a crash dump kernel ;)
>>
>> BUG: unable to handle kernel paging request at virtual address f7ebf8c4
>>  printing eip:
>> c1610192
>> *pde = 00000000
>> stopped custom tracer.
>> Oops: 0000 [#1]
>> PREEMPT 
>> Modules linked in:
>> CPU:    0
>> EIP:    0060:[<c1610192>]    Not tainted VLI
>> EFLAGS: 00010206   (2.6.21-rc4-rt0-kdump #3)
>> EIP is at copy_oldmem_page+0x4a/0xd0
>> eax: 000008c4   ebx: f7ebf000   ecx: 00000100   edx: 00000246
>> esi: f7ebf8c4   edi: c4c520fc   ebp: c4d54e30   esp: c4d54e18
>> ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068  preempt:00000001
>> Process swapper (pid: 1, ti=c4d54000 task=c4d52c20 task.ti=c4d54000)
>> Stack: c17ab7e0 c183f982 c1969658 00000400 00000400 00037ebf c4d54e5c c16af187 
>>        00037ebf c4c520fc 00000400 000008c4 00000000 00000000 c4c696e0 00000400 
>>        c4c520fc c4d54f94 c19a9cfd c4c520fc 00000400 c4d54f78 00000000 c1840996 
>> Call Trace:
>>  [<c16af187>] read_from_oldmem+0x73/0x98
>>  [<c19a9cfd>] vmcore_init+0x26c/0xab7
>>  [<c199979b>] init+0xaa/0x287
>>  [<c16044eb>] kernel_thread_helper+0x7/0x10
>>  =======================
>>
>> l *copy_oldmem_page+0x4a/0xd0
>> 0xc1610148 is in copy_oldmem_page (arch/i386/kernel/crash_dump.c:35).
>> 30       * copying the data to a pre-allocated kernel page and then copying to user
>> 31       * space in non-atomic context.
>> 32       */
>> 33      ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>> 34                                     size_t csize, unsigned long offset, int userbuf)
>> 35      {
>> 36              void  *vaddr;
>> 37
>> 38              if (!csize)
>> 39                      return 0;
>>
> 
> Can you please paste the disassembly of copy_oldmem_page() on your system.
> Not sure from where this faulting address 0xf7ebf8c4 is coming. We are still
> in vmcore_init(), so we should be copying the data to kernel buffers only.
> This looks like a valid kernel address.
> 
> Can you also put some printk() here to find out from where 0xf7ebf8c4 has
> come? It does not look like a fixed kernel virutual address returned by
> kmap_atomic_pfn(). Then is it passed by kernel as a parameter to
> copy_oldmem_page()?

I added
printk(KERN_WARNING "copy_oldmem_page() pfn=%lu , buf=%s , nr_bytes=%d , offset=%lu , userbuf=%d\n", pfn, buf, nr_bytes, offset, userbuf);

before
tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);

result is here
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc4-rt0/kdump-console2.log

'buf' might be broken.

> 
> Thanks
> Vivek
> 

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-03-23 12:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-21 20:38 2.6.21-rc4-rt0 BUG: at kernel/fork.c:1033 copy_process() Michal Piotrowski
2007-03-22  9:31 ` [patch] setup_boot_APIC_clock() irq-enable fix Ingo Molnar
2007-03-22 10:56   ` Thomas Gleixner
2007-03-22 12:57   ` Michal Piotrowski
2007-03-22 13:27     ` 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix) Michal Piotrowski
2007-03-23  5:25       ` Vivek Goyal
2007-03-23  8:23         ` 2.6.21-rc4-rt0-kdump Michal Piotrowski
2007-03-23 12:10         ` 2.6.21-rc4-rt0-kdump Michal Piotrowski
2007-03-23  7:15       ` 2.6.21-rc4-rt0-kdump (was: Re: [patch] setup_boot_APIC_clock() irq-enable fix) Ingo Molnar
2007-03-23  7:58         ` Michal Piotrowski
2007-03-23  8:02           ` Ingo Molnar
2007-03-23  8:17             ` Michal Piotrowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.