linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux v2.5.62 --- spontaneous reboots
       [not found] ` <fa.m7uie32.15048ou@ifi.uio.no>
@ 2003-02-18 13:07   ` Ed Tomlinson
  0 siblings, 0 replies; 18+ messages in thread
From: Ed Tomlinson @ 2003-02-18 13:07 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

Linus Torvalds wrote:

> A lot of people seem to be using gcc-3.2 these days, since it's what RH-8
> comes with as standard. I don't think there are any known problems with
> that compiler, at least on x86.

No so,

See the lkml thread

Re: [BUG] link error in usbserial with gcc3.2

Ed Tomlinson





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  2:02       ` Linus Torvalds
  2003-02-18  2:16         ` Chris Wedgwood
  2003-02-18  3:21         ` Martin J. Bligh
@ 2003-02-19 11:02         ` David Ford
  2 siblings, 0 replies; 18+ messages in thread
From: David Ford @ 2003-02-19 11:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List, Martin J. Bligh

I have a 2.5.58 box that's a simple firewall/router w/ iptables running 
on it.  It crashes and reboots automatically roughly every other day.  
It's been doing that for a  long time and I never had the time to debug 
it.  I'll put .62 on it with a serial console and see what it comes up 
with.  It runs two PPPoE channels over ethX.  PPPoE is known to blow up 
(OOPS) on pppd hangup/restarts.

David



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
       [not found] ` <fa.d672u14.1gk8ea4@ifi.uio.no>
@ 2003-02-18 23:48   ` walt
  0 siblings, 0 replies; 18+ messages in thread
From: walt @ 2003-02-18 23:48 UTC (permalink / raw)
  To: linux-kernel

Chris Wedgwood wrote:

> ...I'd suspect it was an Athlon or chipset problem if it weren't for the
> fact 2.4.x is stable for 8+ hours doing doing the same exact thing[1].

Unfortunately this is not proof  :-(    I can tell you from personal
experience that the BSD kernels are much more sensitive to overheating
hardware than linux is, for example -- so one linux kernel could just
as easily be more sensitive to overheating than another linux kernel.

I've never found out why this is, but I know it's true.  When I try
to run a BSD kernel on a dust-covered motherboard I'll get random
crashes all over the place even though a linux kernel will run just
fine on the same machine.  All I do is blow the dust off the motherboard
and both kernels run again without problem.  Absolutely for sure.

I'd love to know what makes the difference.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18 21:59       ` Chris Wedgwood
  2003-02-18 22:13         ` Linus Torvalds
@ 2003-02-18 23:01         ` Chris Wedgwood
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Wedgwood @ 2003-02-18 23:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Martin J. Bligh

On Tue, Feb 18, 2003 at 02:13:00PM -0800, Linus Torvalds wrote:

> > I'm back to 2.5.51 and I'll beat it hard and see what happens.  I
> > guess until I (or someone else who sees this) can get some
> > concrete data points you'll have to ignore this.
>
> Ok. Especially if it seems that -mjb4 also potentially does it (just
> harder to trigger), I don't see many other alternatives than just
> going back in time to see when it started.

It seems 2.5.51 *does* also show this... but it took nearly an hour
this time.

> But if it was getting hard to trigger with 2.5.52 too, things might
> be getting hairier and hairier... If it becomes hard enough to
> trigger as to be practically nondeterministic, a better approach
> might be to just go back to -mjb4, and even if it is still there in
> -mjb4 try to see which part of the patch seems to be making it more
> stable.

I may have to do that...  it seems older kernel do have this problem,
it's just harder to hit for some reason.

I'd suspect it was an Athlon or chipset problem if it weren't for the
fact 2.4.x is stable for 8+ hours doing doing the same exact thing[1].

> That might give us more clues, and it's a much smaller problem set
> than going arbitrarily far back in the 2.5.x series.

Sure thing.


  --cw

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18 22:13         ` Linus Torvalds
@ 2003-02-18 22:34           ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2003-02-18 22:34 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Kernel Mailing List, Martin J. Bligh


On Tue, 18 Feb 2003, Linus Torvalds wrote:
> 
> But if it was getting hard to trigger with 2.5.52 too, things might be
> getting hairier and hairier.. If it becomes hard enough to trigger as to
> be practically nondeterministic, a better approach might be to just go
> back to -mjb4, and even if it is still there in -mjb4 try to see which
> part of the patch seems to be making it more stable.

Btw, this is particularly true if it takes you potentially hours to test 
something like 2.5.51 for stability, but you can reboot 2.5.59 at will in 
ten minutes. 

In that case, you can test several vrsions of "2.5.59 + partial -mjb
patches" much more quickly than you can walk backwards in 2.5.x, and try 
to pinpoint the "this part of -mjb makes it much less likely to reboot".

Also, with the -mjb patch there are some new configuration options. For 
example, CONFIG_100HZ on -mjb has very different behaviour than a plain 
2.5.59 kernel that defaults to 1kHz timer clock, and maybe the reason -mjb 
seems more stable is that you may have selected a configuration option 
that made -mjb act differently.

Regardless, it would be very interesting to hear what the -mjb split-down
results would be. Even if the answer might be "at 1kHz timer it is
unstable, at 100Hz it is stable" (and if that were to be it, then you'd
have to walk backwards to 2.5.24 to find the old 2.5.x kernel that had a
slow tick rate).

		Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18 21:59       ` Chris Wedgwood
@ 2003-02-18 22:13         ` Linus Torvalds
  2003-02-18 22:34           ` Linus Torvalds
  2003-02-18 23:01         ` Chris Wedgwood
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2003-02-18 22:13 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Kernel Mailing List, Martin J. Bligh


On Tue, 18 Feb 2003, Chris Wedgwood wrote:
> 
> Of course, Murphy being the optimist he is; about two minutes after I
> make a claim that 2.5.52 does NOT spontaneously reboot --- it *DOES*.
> 
> I'm back to 2.5.51 and I'll beat it hard and see what happens.  I
> guess until I (or someone else who sees this) can get some concrete
> data points you'll have to ignore this.

Ok. Especially if it seems that -mjb4 also potentially does it (just
harder to trigger), I don't see many other alternatives than just going
back in time to see when it started.

But if it was getting hard to trigger with 2.5.52 too, things might be
getting hairier and hairier.. If it becomes hard enough to trigger as to
be practically nondeterministic, a better approach might be to just go
back to -mjb4, and even if it is still there in -mjb4 try to see which
part of the patch seems to be making it more stable. That might give us
more clues, and it's a much smaller problem set than going arbitrarily far
back in the 2.5.x series.

		Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18 21:44     ` Chris Wedgwood
@ 2003-02-18 21:59       ` Chris Wedgwood
  2003-02-18 22:13         ` Linus Torvalds
  2003-02-18 23:01         ` Chris Wedgwood
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Wedgwood @ 2003-02-18 21:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Martin J. Bligh

On Tue, Feb 18, 2003 at 01:44:31PM -0800, Chris Wedgwood wrote:

> I say thus far, because the problem usually appears after about 15
> minutes of compiling, but it sometimes takes a little longer.  I'm
> running 2.5.52 now and after 45 minutes it's still going.

Of course, Murphy being the optimist he is; about two minutes after I
make a claim that 2.5.52 does NOT spontaneously reboot --- it *DOES*.

I'm back to 2.5.51 and I'll beat it hard and see what happens.  I
guess until I (or someone else who sees this) can get some concrete
data points you'll have to ignore this.


  --cw


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  1:42   ` Linus Torvalds
  2003-02-18  1:53     ` Chris Wedgwood
@ 2003-02-18 21:44     ` Chris Wedgwood
  2003-02-18 21:59       ` Chris Wedgwood
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Wedgwood @ 2003-02-18 21:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Martin J. Bligh

On Mon, Feb 17, 2003 at 05:42:38PM -0800, Linus Torvalds wrote:

> It would be interesting to hear exactly when the trouble
> started. And if plain 2.5.59 does it (which is unclear from your
> description), but 59-mjb4 doesn't, then that's an interesting data
> point.

After much testing, which is still in progress it would seem that
*maybe* mjb4 does have the problem too, although it's much harder to
hit.  Please note that this is a single data point where for other
kernels I have two or more occurrences of spontaneous reboots.

I've been checking older kernels...  it would seem the problem first
occurs in 2.5.53 (that is 2.5.53 through 2.5.62-bk all reboot for me).
2.5.51 doesn't appear to and thus far neither does 2.5.52.

I say thus far, because the problem usually appears after about 15
minutes of compiling, but it sometimes takes a little longer.  I'm
running 2.5.52 now and after 45 minutes it's still going.


As to what difference it might be between '52 and '53 I have no idea.
I had a quick look and the changes there are considerable.

I've tried different compiles, with and without preempt, and and
without IO-APIC and trimming down the kernel...



  --cw

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  0:03 ` Linux v2.5.62 --- spontaneous reboots Chris Wedgwood
  2003-02-18  0:44   ` Jeff Garzik
  2003-02-18  1:42   ` Linus Torvalds
@ 2003-02-18 12:13   ` Pavel Machek
  2 siblings, 0 replies; 18+ messages in thread
From: Pavel Machek @ 2003-02-18 12:13 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Kernel Mailing List

Hi!

> > Oh, and as a sign that 2.6.x really _is_ approaching, people have
> > started sending me spelling fixes.
> 
> FWIW, I can't get 2.5.59+ (maybe earlier) to run reliably for me
> without spontaneous rebooting under load (kernel compile in a loop).
> 
> I wondered if it was specific to my system here except a few other
> people have reported this on *very* different hardware (I'm have UP
> Athlon with IDE, they have 8-way P4 with SCSI).
> 
> Is anyone else seeing this?  Might there be some bogon causing triple
> faults or similar lurking that I'm just unlucky enough to hit often?

I'm seeing loop-related problems around 2.5.60+...
								Pavel

-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  2:02       ` Linus Torvalds
  2003-02-18  2:16         ` Chris Wedgwood
@ 2003-02-18  3:21         ` Martin J. Bligh
  2003-02-19 11:02         ` David Ford
  2 siblings, 0 replies; 18+ messages in thread
From: Martin J. Bligh @ 2003-02-18  3:21 UTC (permalink / raw)
  To: Linus Torvalds, Chris Wedgwood; +Cc: Kernel Mailing List

>>   plain 2.5.59 does
>> 
>>   59-mjb4 does NOT
> 
> Can you check mjb 1-3 too? The better it gets pinpointed, the easier it's 
> going to be to find.

I should note that our performance team also has triple-faults on some 
database app on a 8x machine ... that goes away with mjb4, not sure why 
as yet. There's nothing in there that I can think of that would fix
a triple fault, so it may well be something annoyingly subtle.

Try -mjb1 first, if that still fixes it, then I'll start hacking off 
chunks for you to test. Try 62 as well ... that has dcache_rcu merged,
which is another major chunk of the patch. kgdb is also big, and may 
well change timings ...
 
> Also, if you can figure out _which_ part of the patch makes a difference,
> that would obviously be even better.  Part of the stuff in mjb is already
> merged in later kernels (ie things like using sequence locks for xtime is
> already there in 2.5.60, so clearly that doesn't seem to be the thing that
> helps your situation).

Yup, a lot of it is designed to give our performance team a stable base
to work from - so minimal changes to a 59 base.

I use gcc-2.95.4 (Debian) as Chris does and have found that extremely 
stable, not sure what the perf team were using, I'll find out.

> Now, interestingly enough, the mjb patch _does_ contain a change to 
> mm/memory.c that really makes no sense _except_ in the case of a compiler 
> bug. So you could check whether that (small) mm/memory.c patch is the 
> thing that makes a difference for you..

That's the config_page_offset patch, which Dave ported forward from 
Andrea's tree ... I've split that out below:

diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/arch/i386/Kconfig 22-config_page_offset/arch/i386/Kconfig
--- 21-config_hz/arch/i386/Kconfig	Wed Feb  5 22:22:59 2003
+++ 22-config_page_offset/arch/i386/Kconfig	Wed Feb  5 22:23:00 2003
@@ -660,6 +660,44 @@ config HIGHMEM64G
 
 endchoice
 
+choice
+	help
+	  On i386, a process can only virtually address 4GB of memory.  This
+	  lets you select how much of that virtual space you would like to 
+	  devoted to userspace, and how much to the kernel.
+
+	  Some userspace programs would like to address as much as possible and 
+	  have few demands of the kernel other than it get out of the way.  These
+	  users may opt to use the 3.5GB option to give their userspace program 
+	  as much room as possible.  Due to alignment issues imposed by PAE, 
+	  the "3.5GB" option is unavailable if "64GB" high memory support is 
+	  enabled.
+
+	  Other users (especially those who use PAE) may be running out of
+	  ZONE_NORMAL memory.  Those users may benefit from increasing the
+	  kernel's virtual address space size by taking it away from userspace, 
+	  which may not need all of its space.  An indicator that this is 
+	  happening is when /proc/Meminfo's "LowFree:" is a small percentage of
+	  "LowTotal:" while "HighFree:" is very large.
+
+	  If unsure, say "3GB"
+	prompt "User address space size"
+        default 1GB
+	
+config	05GB
+	bool "3.5 GB"
+	depends on !HIGHMEM64G
+	
+config	1GB
+	bool "3 GB"
+	
+config	2GB
+	bool "2 GB"
+	
+config	3GB
+	bool "1 GB"
+endchoice
+
 config HIGHMEM
 	bool
 	depends on HIGHMEM64G || HIGHMEM4G
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/arch/i386/Makefile 22-config_page_offset/arch/i386/Makefile
--- 21-config_hz/arch/i386/Makefile	Fri Jan 17 09:18:19 2003
+++ 22-config_page_offset/arch/i386/Makefile	Wed Feb  5 22:23:00 2003
@@ -89,6 +89,7 @@ drivers-$(CONFIG_OPROFILE)		+= arch/i386
 
 CFLAGS += $(mflags-y)
 AFLAGS += $(mflags-y)
+AFLAGS_vmlinux.lds.o += -imacros $(TOPDIR)/include/asm-i386/page.h
 
 boot := arch/i386/boot
 
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/arch/i386/vmlinux.lds.S 22-config_page_offset/arch/i386/vmlinux.lds.S
--- 21-config_hz/arch/i386/vmlinux.lds.S	Fri Jan 17 09:18:20 2003
+++ 22-config_page_offset/arch/i386/vmlinux.lds.S	Wed Feb  5 22:23:00 2003
@@ -10,7 +10,7 @@ ENTRY(_start)
 jiffies = jiffies_64;
 SECTIONS
 {
-  . = 0xC0000000 + 0x100000;
+  . = __PAGE_OFFSET + 0x100000;
   /* read-only */
   _text = .;			/* Text and read-only data */
   .text : {
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/include/asm-i386/page.h 22-config_page_offset/include/asm-i386/page.h
--- 21-config_hz/include/asm-i386/page.h	Tue Jan 14 10:06:18 2003
+++ 22-config_page_offset/include/asm-i386/page.h	Wed Feb  5 22:23:00 2003
@@ -89,7 +89,16 @@ typedef struct { unsigned long pgprot; }
  * and CONFIG_HIGHMEM64G options in the kernel configuration.
  */
 
-#define __PAGE_OFFSET		(0xC0000000)
+#include <linux/config.h>
+#ifdef CONFIG_05GB
+#define __PAGE_OFFSET          (0xE0000000)
+#elif defined(CONFIG_1GB)
+#define __PAGE_OFFSET          (0xC0000000)
+#elif defined(CONFIG_2GB)
+#define __PAGE_OFFSET          (0x80000000)
+#elif defined(CONFIG_3GB)
+#define __PAGE_OFFSET          (0x40000000)
+#endif
 
 /*
  * This much address space is reserved for vmalloc() and iomap()
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/include/asm-i386/processor.h 22-config_page_offset/include/asm-i386/processor.h
--- 21-config_hz/include/asm-i386/processor.h	Thu Jan  2 22:05:15 2003
+++ 22-config_page_offset/include/asm-i386/processor.h	Wed Feb  5 22:23:00 2003
@@ -279,7 +279,11 @@ extern unsigned int mca_pentium_flag;
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
  */
+#ifdef CONFIG_05GB
+#define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 16))
+#else
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
+#endif
 
 /*
  * Size of io_bitmap in longwords: 32 is ports 0-0x3ff.
diff -urpN -X /home/fletch/.diff.exclude 21-config_hz/mm/memory.c 22-config_page_offset/mm/memory.c
--- 21-config_hz/mm/memory.c	Mon Jan 13 21:09:28 2003
+++ 22-config_page_offset/mm/memory.c	Wed Feb  5 22:23:00 2003
@@ -101,8 +101,7 @@ static inline void free_one_pmd(struct m
 
 static inline void free_one_pgd(struct mmu_gather *tlb, pgd_t * dir)
 {
-	int j;
-	pmd_t * pmd;
+	pmd_t * pmd, * md, * emd;
 
 	if (pgd_none(*dir))
 		return;
@@ -113,8 +112,21 @@ static inline void free_one_pgd(struct m
 	}
 	pmd = pmd_offset(dir, 0);
 	pgd_clear(dir);
-	for (j = 0; j < PTRS_PER_PMD ; j++)
-		free_one_pmd(tlb, pmd+j);
+	/*
+	 * Beware if changing the loop below.  It once used int j,
+	 * 	for (j = 0; j < PTRS_PER_PMD; j++)
+	 * 		free_one_pmd(pmd+j);
+	 * but some older i386 compilers (e.g. egcs-2.91.66, gcc-2.95.3)
+	 * terminated the loop with a _signed_ address comparison
+	 * using "jle", when configured for HIGHMEM64GB (X86_PAE).
+	 * If also configured for 3GB of kernel virtual address space,
+	 * if page at physical 0x3ffff000 virtual 0x7ffff000 is used as
+	 * a pmd, when that mm exits the loop goes on to free "entries"
+	 * found at 0x80000000 onwards.  The loop below compiles instead
+	 * to be terminated by unsigned address comparison using "jb".
+	 */
+	for (md = pmd, emd = pmd + PTRS_PER_PMD; md < emd; md++)
+		free_one_pmd(tlb,md);
 	pmd_free_tlb(tlb, pmd);
 }
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  2:16         ` Chris Wedgwood
@ 2003-02-18  2:33           ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2003-02-18  2:33 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Kernel Mailing List, Martin J. Bligh


On Mon, 17 Feb 2003, Chris Wedgwood wrote:
> 
> The only thing I can think of is a triple-fault...  I'm wondering
> about using gcc-3.2 instead of 2.95.4 (Debian blah blort blem) on the
> off chance it's a weird compiler problem.

A lot of people seem to be using gcc-3.2 these days, since it's what RH-8 
comes with as standard. I don't think there are any _known_ problems with 
that compiler, at least on x86.

Now, interestingly enough, the mjb patch _does_ contain a change to 
mm/memory.c that really makes no sense _except_ in the case of a compiler 
bug. So you could check whether that (small) mm/memory.c patch is the 
thing that makes a difference for you..

It would also be interesting to see if you can check just the scheduler 
part of the mjb patch. On the whole the mjb patch looks like it should be 
fairly easy to cut into specific parts, and Martin may actually have it 
somewhere as separate patches.

		Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  2:02       ` Linus Torvalds
@ 2003-02-18  2:16         ` Chris Wedgwood
  2003-02-18  2:33           ` Linus Torvalds
  2003-02-18  3:21         ` Martin J. Bligh
  2003-02-19 11:02         ` David Ford
  2 siblings, 1 reply; 18+ messages in thread
From: Chris Wedgwood @ 2003-02-18  2:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Martin J. Bligh

On Mon, Feb 17, 2003 at 06:02:03PM -0800, Linus Torvalds wrote:

> Can you check mjb 1-3 too? The better it gets pinpointed, the easier
> it's going to be to find.

Sure... I'll test them later on.

> Also, if you can figure out _which_ part of the patch makes a
> difference, that would obviously be even better.

I'll try to narrow this down.

> Part of the stuff in mjb is already merged in later kernels (ie
> things like using sequence locks for xtime is already there in
> 2.5.60, so clearly that doesn't seem to be the thing that helps your
> situation).

I don't think it's anything really obvious.  If the problem I'm seeing
is the same as the one showing up on *some* IBM NUMA-Q (or whatever
they are) boxen then it's probably not a driver or fs thing --- as we
have nothing in common.

Now... it could be two different problems, except the same kernel
which the IBM people found works for them also works for me.

Oddly, wli has not seen this problem and he's using similar hardware
(I think) to the other IBM people and the same compiler as me.

> Do you use the starfire driver?

Nope.

A stripped down kernel, compile for a 486 with no IO-APIC support (in
an attempt to slow things down and hopefully avoid possible hardware
problems such as overheating) still reboots on me.

The only thing I can think of is a triple-fault...  I'm wondering
about using gcc-3.2 instead of 2.95.4 (Debian blah blort blem) on the
off chance it's a weird compiler problem.



  --cw

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  1:53     ` Chris Wedgwood
@ 2003-02-18  2:02       ` Linus Torvalds
  2003-02-18  2:16         ` Chris Wedgwood
                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Linus Torvalds @ 2003-02-18  2:02 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Kernel Mailing List, Martin J. Bligh


On Mon, 17 Feb 2003, Chris Wedgwood wrote:
> 
>   plain 2.5.59 does
> 
>   59-mjb4 does NOT

Can you check mjb 1-3 too? The better it gets pinpointed, the easier it's 
going to be to find.

Also, if you can figure out _which_ part of the patch makes a difference,
that would obviously be even better.  Part of the stuff in mjb is already
merged in later kernels (ie things like using sequence locks for xtime is
already there in 2.5.60, so clearly that doesn't seem to be the thing that
helps your situation).

Martin cc'd, in case he has suggestions on how/what to split up the patch.

Do you use the starfire driver? That's a big part of the patch, for
example.. And part of the patch just makes the timer interrupt happen much
less often, if you havn't configured for 1000Hz - and it may well be that
small perturbations like that are the things that matter to you.

		Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  1:42   ` Linus Torvalds
@ 2003-02-18  1:53     ` Chris Wedgwood
  2003-02-18  2:02       ` Linus Torvalds
  2003-02-18 21:44     ` Chris Wedgwood
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Wedgwood @ 2003-02-18  1:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Mon, Feb 17, 2003 at 05:42:38PM -0800, Linus Torvalds wrote:

> It would be interesting to hear exactly when the trouble
> started. And if plain 2.5.59 does it (which is unclear from your
> description), but 59-mjb4 doesn't, then that's an interesting data
> point.

  plain 2.5.59 does

  59-mjb4 does NOT

I tested 59-mjb4 at the suggest of mbligh after hearing that other
people had discovered the same bug and were now using 59-mjb4


  --cw


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  0:03 ` Linux v2.5.62 --- spontaneous reboots Chris Wedgwood
  2003-02-18  0:44   ` Jeff Garzik
@ 2003-02-18  1:42   ` Linus Torvalds
  2003-02-18  1:53     ` Chris Wedgwood
  2003-02-18 21:44     ` Chris Wedgwood
  2003-02-18 12:13   ` Pavel Machek
  2 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2003-02-18  1:42 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Kernel Mailing List


On Mon, 17 Feb 2003, Chris Wedgwood wrote:
> 
> FWIW, I can't get 2.5.59+ (maybe earlier) to run reliably for me
> without spontaneous rebooting under load (kernel compile in a loop).
> 
> I note the 2.5.59-mjb4 seems pretty reliable and doesn't have this
> problem...

It would be interesting to hear exactly when the trouble started. And if
plain 2.5.59 does it (which is unclear from your description), but 59-mjb4
doesn't, then that's an interesting data point.

		Linus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  0:44   ` Jeff Garzik
@ 2003-02-18  0:46     ` Chris Wedgwood
  0 siblings, 0 replies; 18+ messages in thread
From: Chris Wedgwood @ 2003-02-18  0:46 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, Kernel Mailing List

On Mon, Feb 17, 2003 at 07:44:08PM -0500, Jeff Garzik wrote:

> ACPI, or no?

nope

> highmem, or no?

no for me --- yes for them I assume (8-way P4)

> Are you running your UP Athlon with CONFIG_X86_UP_APIC?

I was... I wondered if that might do it, so I tried without.  Still
reboots.  Built kernel as 486 kernel with no IO-APIC too, still
reboots.

Nothing is logged (serial console).

Tried gcc-2.95 and gcc-3.2.



  --cw

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.5.62 --- spontaneous reboots
  2003-02-18  0:03 ` Linux v2.5.62 --- spontaneous reboots Chris Wedgwood
@ 2003-02-18  0:44   ` Jeff Garzik
  2003-02-18  0:46     ` Chris Wedgwood
  2003-02-18  1:42   ` Linus Torvalds
  2003-02-18 12:13   ` Pavel Machek
  2 siblings, 1 reply; 18+ messages in thread
From: Jeff Garzik @ 2003-02-18  0:44 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Kernel Mailing List

Chris Wedgwood wrote:
> On Mon, Feb 17, 2003 at 03:18:43PM -0800, Linus Torvalds wrote:
> 
> 
>>Oh, and as a sign that 2.6.x really _is_ approaching, people have
>>started sending me spelling fixes.
> 
> 
> FWIW, I can't get 2.5.59+ (maybe earlier) to run reliably for me
> without spontaneous rebooting under load (kernel compile in a loop).


ACPI, or no?

highmem, or no?

Are you running your UP Athlon with CONFIG_X86_UP_APIC?

	Jeff




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Linux v2.5.62 --- spontaneous reboots
  2003-02-17 23:18 Linux v2.5.62 Linus Torvalds
@ 2003-02-18  0:03 ` Chris Wedgwood
  2003-02-18  0:44   ` Jeff Garzik
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Chris Wedgwood @ 2003-02-18  0:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Mon, Feb 17, 2003 at 03:18:43PM -0800, Linus Torvalds wrote:

> Oh, and as a sign that 2.6.x really _is_ approaching, people have
> started sending me spelling fixes.

FWIW, I can't get 2.5.59+ (maybe earlier) to run reliably for me
without spontaneous rebooting under load (kernel compile in a loop).

I wondered if it was specific to my system here except a few other
people have reported this on *very* different hardware (I'm have UP
Athlon with IDE, they have 8-way P4 with SCSI).

Is anyone else seeing this?  Might there be some bogon causing triple
faults or similar lurking that I'm just unlucky enough to hit often?

I note the 2.5.59-mjb4 seems pretty reliable and doesn't have this
problem...




  --cw

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-02-19 10:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fa.du861p4.qi0a2o@ifi.uio.no>
     [not found] ` <fa.m7uie32.15048ou@ifi.uio.no>
2003-02-18 13:07   ` Linux v2.5.62 --- spontaneous reboots Ed Tomlinson
     [not found] <fa.oa9dc7e.jk65re@ifi.uio.no>
     [not found] ` <fa.d672u14.1gk8ea4@ifi.uio.no>
2003-02-18 23:48   ` walt
2003-02-17 23:18 Linux v2.5.62 Linus Torvalds
2003-02-18  0:03 ` Linux v2.5.62 --- spontaneous reboots Chris Wedgwood
2003-02-18  0:44   ` Jeff Garzik
2003-02-18  0:46     ` Chris Wedgwood
2003-02-18  1:42   ` Linus Torvalds
2003-02-18  1:53     ` Chris Wedgwood
2003-02-18  2:02       ` Linus Torvalds
2003-02-18  2:16         ` Chris Wedgwood
2003-02-18  2:33           ` Linus Torvalds
2003-02-18  3:21         ` Martin J. Bligh
2003-02-19 11:02         ` David Ford
2003-02-18 21:44     ` Chris Wedgwood
2003-02-18 21:59       ` Chris Wedgwood
2003-02-18 22:13         ` Linus Torvalds
2003-02-18 22:34           ` Linus Torvalds
2003-02-18 23:01         ` Chris Wedgwood
2003-02-18 12:13   ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).