linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: garbled oopsen
  2003-05-08  1:40 ` Andrew Morton
@ 2003-05-08  0:06   ` Martin J. Bligh
  2003-05-08  4:34   ` Randy.Dunlap
  1 sibling, 0 replies; 10+ messages in thread
From: Martin J. Bligh @ 2003-05-08  0:06 UTC (permalink / raw)
  To: Andrew Morton, Randy.Dunlap; +Cc: linux-kernel

>> Can these be cleaned up in any reasonable way?
> 
> It needs some additional spinlock in there.  People have moaned for over a
> year, patches have been floating about but nobody has taken the time to
> finish one off and submit it.

I tried it a while back, the obvious lock approach didn't seem to work, but
I can't seem to find the patch right now. IIRC, printk should be atomic,
so converting it to printk into a line buffer, and then printk'ing the buffer 
(prefaced by cpu number) *should* work. Maybe. I think.

M.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* garbled oopsen
@ 2003-05-08  1:05 Randy.Dunlap
  2003-05-08  1:40 ` Andrew Morton
  2003-05-19  7:20 ` Keith Owens
  0 siblings, 2 replies; 10+ messages in thread
From: Randy.Dunlap @ 2003-05-08  1:05 UTC (permalink / raw)
  To: lkml


I have several oopses that are garbled.  Part of the problem is that
page fault code (x86: arch/i386/mm/fault.c) does not attempt to
serialize the "Unable to handle kernel ... at virtual address ..."
messages, since it's considered better to get _some_ messages out
than no messages.  (and serialize it with what?)

However, after untwisting these, I can tell you that unraveling
them is not fun.

Can these be cleaned up in any reasonable way?
Any suggestions?

This is on 2.5.68 and 2.5.69.


(sample 1)
i
de-sUcnsaibl:e  hdtod :h asuncd l1e 80ke22rn0e1l96 p3aging request at virtual address 6b6b6b8b

(sample 2)
i<de1->Usncsaibl:e h dtod: h saunc dl18e 0k2e20r1ne96l3 p

which decodes into:
i de -  s cs i  :  h d  d:   s u c   18  0 2 20 1  96 3
 <  1 >U n  a bl e    to   h  a n  dl  e  k e  r ne  l  p

(sample 3, much longer)
scsi_eh_<pr4>t_hfdadi: lA_TstAaPIts r: e2se:t0: 0co:m0 plcmetdse 
failiedde:- sc0s,i :c anRecaeclh: ed1 
idTeosctsali_ pofc_ 1in ctro mminantdersr oupnt 1  hdanedvliceres
 rePaqcuikrete  ceho mmwoanrdk
 cosmcplsiet_ehe_d,2 : 0 abboyrttesin tgr canmdsf:e0rxrf7eddb
f1didc
e-scidsei-: shcsddi::  achbeorckt  icognndoiretdi
on sfocsr i_1e41h_
2: iabdoe-rstcisngi : chmddd :f aqilueedue: 0xcmf7dd b= f1[d c3 
0 0s cs0i _4e0 h_02 :]
 Seinddie-ngs cBsiD:R  Rsdeaevc:h ed0xf i7ddde5sccs0i0_
pci_dien-trs cisin:te rdrevuiptc e harensdelte ri
gnoidreed-s
csiid: eR-escaschi:ed h iddd:e scqsuei_ 1p4c1_i, nctmrd  i=n te[r 0 ru0p 0t  h0 a0nd 0le r]

/end/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: garbled oopsen
  2003-05-08  1:05 garbled oopsen Randy.Dunlap
@ 2003-05-08  1:40 ` Andrew Morton
  2003-05-08  0:06   ` Martin J. Bligh
  2003-05-08  4:34   ` Randy.Dunlap
  2003-05-19  7:20 ` Keith Owens
  1 sibling, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2003-05-08  1:40 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: linux-kernel

"Randy.Dunlap" <rddunlap@osdl.org> wrote:
>
> I have several oopses that are garbled.

Use kgdb.

> Can these be cleaned up in any reasonable way?

It needs some additional spinlock in there.  People have moaned for over a
year, patches have been floating about but nobody has taken the time to
finish one off and submit it.

It's never bothered me, because availability of a serial console equates to
availability of kgdb.

> Any suggestions?

A Greek-to-English dictionary?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: garbled oopsen
  2003-05-08  1:40 ` Andrew Morton
  2003-05-08  0:06   ` Martin J. Bligh
@ 2003-05-08  4:34   ` Randy.Dunlap
  1 sibling, 0 replies; 10+ messages in thread
From: Randy.Dunlap @ 2003-05-08  4:34 UTC (permalink / raw)
  To: akpm; +Cc: rddunlap, linux-kernel

> "Randy.Dunlap" <rddunlap@osdl.org> wrote:
>>
>> I have several oopses that are garbled.
>
> Use kgdb.
>
>> Can these be cleaned up in any reasonable way?
>
> It needs some additional spinlock in there.  People have moaned for over a
> year, patches have been floating about but nobody has taken the time to
> finish one off and submit it.
>
> It's never bothered me, because availability of a serial console equates to
> availability of kgdb.

I'm more interested in having it clean for people who use 2.6.x.
Yes, I can get by without it or by using kgdb, but that's not the point IMO.

~Randy




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: garbled oopsen
  2003-05-08  1:05 garbled oopsen Randy.Dunlap
  2003-05-08  1:40 ` Andrew Morton
@ 2003-05-19  7:20 ` Keith Owens
  1 sibling, 0 replies; 10+ messages in thread
From: Keith Owens @ 2003-05-19  7:20 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: lkml

On Wed, 7 May 2003 18:05:30 -0700, 
"Randy.Dunlap" <rddunlap@osdl.org> wrote:
>I have several oopses that are garbled.  Part of the problem is that
>page fault code (x86: arch/i386/mm/fault.c) does not attempt to
>serialize the "Unable to handle kernel ... at virtual address ..."
>messages, since it's considered better to get _some_ messages out
>than no messages.  (and serialize it with what?)
>
>However, after untwisting these, I can tell you that unraveling
>them is not fun.
>
>Can these be cleaned up in any reasonable way?
>Any suggestions?

kdb_printf() has this:

        /* Serialize kdb_printf if multiple cpus try to write at once.
         * But if any cpu goes recursive in kdb, just print the output,
         * even if it is interleaved with any other text.
         */
        if (!KDB_STATE(PRINTF_LOCK)) {
                KDB_STATE_SET(PRINTF_LOCK);
                spin_lock(&kdb_printf_lock);
        }
	....
        if (KDB_STATE(PRINTF_LOCK)) {
                spin_unlock(&kdb_printf_lock);
                KDB_STATE_CLEAR(PRINTF_LOCK);
        }

KDB_STATE() is a per-cpu set of flags, PRINTF_LOCK indicates if this
cpu has got or is trying to get the kdb_printf_lock.  I get no
interleave problems, except when somebody prints a line in multiple
calls to kdb_printf(), the fragments are printed as one chunk but the
individual fragments can be interleaved.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: garbled oopsen
  2003-05-08  5:53       ` Andrew Morton
@ 2003-05-08  7:29         ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2003-05-08  7:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, ak, linux-kernel, rddunlap

On Thu, May 08, 2003 at 07:53:10AM +0200, Andrew Morton wrote:
> A recursive oops is easy enough to detect anyway.
> 
> 	preempt_disable();
> 	if (oops_cpu == -1 || oops_cpu != smp_processor_id()) {
> 		_raw_spin_lock(&oops_lock);
> 		oops_cpu = smp_processor_id();
> 	}
> 	<current stuff>
> 	oops_cpu = -1;
> 	spin_lock_init(&oops_lock);
> 	preempt_enable();
> 
> or something like that.

yes I did it this way in my old 2.4 x86-64 patch.  But i never
felt comfortable enough about it to commit it.

(the in_interrupt thing was to avoid an interrupt stack problem on 
x86-64, not needed anymore or on i386)

But I would prefer the spinlock timeout I think. It's an safer and more
obviously correct algorithm.

-Andi

Index: arch/x86_64/mm/fault.c
===================================================================
RCS file: /home/cvs/Repository/linux/arch/x86_64/mm/fault.c,v
retrieving revision 1.33
diff -u -u -r1.33 fault.c
--- arch/x86_64/mm/fault.c	2002/10/02 15:41:14	1.33
+++ arch/x86_64/mm/fault.c	2003/01/13 08:42:35
@@ -30,6 +30,9 @@
 #include <asm/proto.h>
 #include <asm/kdebug.h>
 
+spinlock_t pcrash_lock; 
+int crashing_cpu;
+
 extern spinlock_t console_lock, timerlist_lock;
 
 void bust_spinlocks(int yes)
@@ -251,6 +254,14 @@
 	console_verbose();
 	bust_spinlocks(1); 
 
+	if (!in_interrupt()) { 
+		if (!spin_trylock(&pcrash_lock)) { 
+			if (crashing_cpu != smp_processor_id()) 
+				spin_lock(&pcrash_lock);  		    
+		} 
+		crashing_cpu = smp_processor_id();
+	} 
+
 	if (address < PAGE_SIZE)
 		printk(KERN_ALERT "Unable to handle kernel NULL pointer dereference");
 	else
@@ -259,7 +270,14 @@
 	printk(" printing rip:\n");
 	printk("%016lx\n", regs->rip);
 	dump_pagetable(address);
+
 	die("Oops", regs, error_code);
+
+	if (!in_interrupt()) { 
+		crashing_cpu = -1;  /* small harmless window */ 
+		spin_unlock(&pcrash_lock);
+	}
+
 	bust_spinlocks(0); 
 	do_exit(SIGKILL);
 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: garbled oopsen
  2003-05-08  3:32     ` Martin J. Bligh
  2003-05-08  5:53       ` Andrew Morton
@ 2003-05-08  6:44       ` Andi Kleen
  1 sibling, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2003-05-08  6:44 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andi Kleen, Andrew Morton, linux-kernel, Randy.Dunlap

On Thu, May 08, 2003 at 05:32:04AM +0200, Martin J. Bligh wrote:
> The trouble is that the subsystems you want may be broken (eg timers).

rdtsc/get_cycles() should still work. If that's broken too you have a really 
serious problem. It's only on the local CPU, so you don't need any complications
for bro^wunsynced SMP systems.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: garbled oopsen
  2003-05-08  3:32     ` Martin J. Bligh
@ 2003-05-08  5:53       ` Andrew Morton
  2003-05-08  7:29         ` Andi Kleen
  2003-05-08  6:44       ` Andi Kleen
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-05-08  5:53 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: ak, linux-kernel, rddunlap

"Martin J. Bligh" <mbligh@aracnet.com> wrote:
>
> >>> Can these be cleaned up in any reasonable way?
> >> 
> >> It needs some additional spinlock in there.  People have moaned for over a
> >> year, patches have been floating about but nobody has taken the time to
> >> finish one off and submit it.
> > 
> > I considered it for x86-64 and even implemented it, but never submitted
> > in fear of deadlocks e.g. when an oops recurses. For this a 
> > spinlock_timeout() would be useful. Print anyways when you cannot get the
> > lock in a second or two.
> 
> The trouble is that the subsystems you want may be broken (eg timers).
> IMHO it's better to just spew whatever you can (the current crap) ... 
> wait a couple of seconds, then have another go at doing it properly.

A recursive oops is easy enough to detect anyway.

	preempt_disable();
	if (oops_cpu == -1 || oops_cpu != smp_processor_id()) {
		_raw_spin_lock(&oops_lock);
		oops_cpu = smp_processor_id();
	}
	<current stuff>
	oops_cpu = -1;
	spin_lock_init(&oops_lock);
	preempt_enable();

or something like that.

> That way people can't complain it's worse than it is now in any way ;-)

Too many complaints, too few unified diffs on this one.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: garbled oopsen
  2003-05-08  2:49   ` Andi Kleen
@ 2003-05-08  3:32     ` Martin J. Bligh
  2003-05-08  5:53       ` Andrew Morton
  2003-05-08  6:44       ` Andi Kleen
  0 siblings, 2 replies; 10+ messages in thread
From: Martin J. Bligh @ 2003-05-08  3:32 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton; +Cc: linux-kernel, Randy.Dunlap

>>> Can these be cleaned up in any reasonable way?
>> 
>> It needs some additional spinlock in there.  People have moaned for over a
>> year, patches have been floating about but nobody has taken the time to
>> finish one off and submit it.
> 
> I considered it for x86-64 and even implemented it, but never submitted
> in fear of deadlocks e.g. when an oops recurses. For this a 
> spinlock_timeout() would be useful. Print anyways when you cannot get the
> lock in a second or two.

The trouble is that the subsystems you want may be broken (eg timers).
IMHO it's better to just spew whatever you can (the current crap) ... 
wait a couple of seconds, then have another go at doing it properly.

That way people can't complain it's worse than it is now in any way ;-)

M.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: garbled oopsen
       [not found] ` <20030508015008$481c@gated-at.bofh.it>
@ 2003-05-08  2:49   ` Andi Kleen
  2003-05-08  3:32     ` Martin J. Bligh
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2003-05-08  2:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Randy.Dunlap

Andrew Morton <akpm@digeo.com> writes:

>> Can these be cleaned up in any reasonable way?
>
> It needs some additional spinlock in there.  People have moaned for over a
> year, patches have been floating about but nobody has taken the time to
> finish one off and submit it.

I considered it for x86-64 and even implemented it, but never submitted
in fear of deadlocks e.g. when an oops recurses. For this a 
spinlock_timeout() would be useful. Print anyways when you cannot get the
lock in a second or two.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-05-19  7:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-08  1:05 garbled oopsen Randy.Dunlap
2003-05-08  1:40 ` Andrew Morton
2003-05-08  0:06   ` Martin J. Bligh
2003-05-08  4:34   ` Randy.Dunlap
2003-05-19  7:20 ` Keith Owens
     [not found] <20030508011013$3d80@gated-at.bofh.it>
     [not found] ` <20030508015008$481c@gated-at.bofh.it>
2003-05-08  2:49   ` Andi Kleen
2003-05-08  3:32     ` Martin J. Bligh
2003-05-08  5:53       ` Andrew Morton
2003-05-08  7:29         ` Andi Kleen
2003-05-08  6:44       ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).