linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Signal 11 - the continuing saga
       [not found] <Pine.LNX.4.10.10012130805190.19301-100000@penguin.transmeta.com>
@ 2000-12-13 17:23 ` Linus Torvalds
  2000-12-13 18:02   ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2000-12-13 17:23 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Kernel Mailing List



On Wed, 13 Dec 2000, Linus Torvalds wrote:
> 
> Lookin gat "swapoff()", it could easily be something like
> 
>  - swapoff walks theough the processes, marking the pages dirty
>    (correctly)
>  - swapoff goes on to the next swap entry, and because it needs memory for
>    this, the VM layer will swap out old entries by marking them dirty in
>    the "struct page".
>  - final stages of swapoff() removes the swap cache entry, never minding
>    the fact that it is marked dirty again in "struct page", and clean in
>    various VM page tables.
> 
> Ho humm.. I don't think that is it exactly, but something along those
> lines.

Actually, having thought about it for five more minutes, I actually think
that that _is_ it.

If so, the fix looks like it could be really simple. The whole problem
arises from the fact that we remove the page from the swap cache only
_after_ we've walked the page-tables to look at it. It looks like the
fairly trivial fix is simply to remove it from the swap cache before,
getting rid of all such races in swapoff().

Mind trying out this patch?

NOTE! It's untested. It might not work. It might trigger some sanity-test
somewhere else. But it looks like it should do the right thing (the page
might be moved to _another_ swap device early, if there are multiple swap
areas, but even that should be fine - the unuse_process() stuff doesn't
care about what swapcache this actually is any more.

Does this patch make a difference (I moved the delete seven lines upwards,
and removed the test - the test looks extraneous).

		Linus

----
--- v2.4.0-test12/linux/mm/swapfile.c	Tue Oct 31 12:42:27 2000
+++ linux/mm/swapfile.c	Wed Dec 13 09:17:51 2000
@@ -370,6 +370,7 @@
 			swap_free(entry);
   			return -ENOMEM;
 		}
+		delete_from_swap_cache(page);
 		read_lock(&tasklist_lock);
 		for_each_task(p)
 			unuse_process(p->mm, entry, page);
@@ -377,8 +378,6 @@
 		shm_unuse(entry, page);
 		/* Now get rid of the extra reference to the temporary
                    page we've been using. */
-		if (PageSwapCache(page))
-			delete_from_swap_cache(page);
 		page_cache_release(page);
 		/*
 		 * Check for and clear any overflowed swap map counts.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13 17:23 ` Signal 11 - the continuing saga Linus Torvalds
@ 2000-12-13 18:02   ` Mike Galbraith
  2000-12-13 19:27     ` Linus Torvalds
  2000-12-13 19:35     ` Linus Torvalds
  0 siblings, 2 replies; 20+ messages in thread
From: Mike Galbraith @ 2000-12-13 18:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 13 Dec 2000, Linus Torvalds wrote:

> On Wed, 13 Dec 2000, Linus Torvalds wrote:
> > 
> > Lookin gat "swapoff()", it could easily be something like
> > 
> >  - swapoff walks theough the processes, marking the pages dirty
> >    (correctly)
> >  - swapoff goes on to the next swap entry, and because it needs memory for
> >    this, the VM layer will swap out old entries by marking them dirty in
> >    the "struct page".
> >  - final stages of swapoff() removes the swap cache entry, never minding
> >    the fact that it is marked dirty again in "struct page", and clean in
> >    various VM page tables.
> > 
> > Ho humm.. I don't think that is it exactly, but something along those
> > lines.
> 
> Actually, having thought about it for five more minutes, I actually think
> that that _is_ it.
> 
> If so, the fix looks like it could be really simple. The whole problem
> arises from the fact that we remove the page from the swap cache only
> _after_ we've walked the page-tables to look at it. It looks like the
> fairly trivial fix is simply to remove it from the swap cache before,
> getting rid of all such races in swapoff().
> 
> Mind trying out this patch?
> 
> NOTE! It's untested. It might not work. It might trigger some sanity-test
> somewhere else. But it looks like it should do the right thing (the page
> might be moved to _another_ swap device early, if there are multiple swap
> areas, but even that should be fine - the unuse_process() stuff doesn't
> care about what swapcache this actually is any more.
> 
> Does this patch make a difference (I moved the delete seven lines upwards,
> and removed the test - the test looks extraneous).

Not in my test tree.  Same fault, and same trace leading up to it.
I'll run virgin source hard tomorrow to be sure. (No message means
no change)

	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13 18:02   ` Mike Galbraith
@ 2000-12-13 19:27     ` Linus Torvalds
  2000-12-14  3:57       ` Mike Galbraith
  2000-12-13 19:35     ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2000-12-13 19:27 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Kernel Mailing List



On Wed, 13 Dec 2000, Mike Galbraith wrote:
> 
> Not in my test tree.  Same fault, and same trace leading up to it. no

Ok.

It definitely looks like a swapoff() problem.

Have you ever seen the behaviour without running swapoff?

Also, can you re-create it without running swapon() (if it's something
like a lost dirty bit, it should be possible to trigger even without the
swapon, and I'd like to hear if that can happen - if it only happens with
swapon() and you can't trigger it with just a swapoff() it might be a
question of re-using some swap file stuff and delaying the writeout or
whatever).

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13 18:02   ` Mike Galbraith
  2000-12-13 19:27     ` Linus Torvalds
@ 2000-12-13 19:35     ` Linus Torvalds
  2000-12-13 20:00       ` Gérard Roudier
                         ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Linus Torvalds @ 2000-12-13 19:35 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Kernel Mailing List



Ehh, I think I found it.

Hint: "ptep_mkdirty()".

Oops.

I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that
this explains it.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13 19:35     ` Linus Torvalds
@ 2000-12-13 20:00       ` Gérard Roudier
  2000-12-13 20:19       ` Linus Torvalds
  2000-12-13 21:36       ` Jeff V. Merkey
  2 siblings, 0 replies; 20+ messages in thread
From: Gérard Roudier @ 2000-12-13 20:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Galbraith, Kernel Mailing List



On Wed, 13 Dec 2000, Linus Torvalds wrote:

> 
> 
> Ehh, I think I found it.
> 
> Hint: "ptep_mkdirty()".
> 
> Oops.
> 
> I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that

Poor European Gérard as slim as 1.84 meter - 78 Kg these days.
What about old days poor European Linus versus these days American Linus
on these points ? ;-)

> this explains it.

Really ? :o)

> 		Linus

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13 19:35     ` Linus Torvalds
  2000-12-13 20:00       ` Gérard Roudier
@ 2000-12-13 20:19       ` Linus Torvalds
  2000-12-13 22:48         ` Rainer Mager
  2000-12-14  7:22         ` Signal 11 - the continuing saga Mike Galbraith
  2000-12-13 21:36       ` Jeff V. Merkey
  2 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2000-12-13 20:19 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Kernel Mailing List



On Wed, 13 Dec 2000, Linus Torvalds wrote:
> 
> Hint: "ptep_mkdirty()".

In case you wonder why the bug was so insidious, what this caused was two
separate problems, both of them able to cause SIGSGV's. 

One: we didn't mark the page table entry dirty like we were supposed to.

Two: by making it writable, we also made the page shared, even if it
wasn't supposed to be shared (so when the next process wrote to the page,
if the swap page was shared with somebody else, the changes would show up
even in the process that _didn't_ write to it).

And "ptep_mkdirty()" is only used by swapoff, so nothing else would show
this. Which was why it hadn't been immediately obvious that anything was
broken.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13 19:35     ` Linus Torvalds
  2000-12-13 20:00       ` Gérard Roudier
  2000-12-13 20:19       ` Linus Torvalds
@ 2000-12-13 21:36       ` Jeff V. Merkey
  2 siblings, 0 replies; 20+ messages in thread
From: Jeff V. Merkey @ 2000-12-13 21:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Galbraith, Kernel Mailing List

On Wed, Dec 13, 2000 at 11:35:57AM -0800, Linus Torvalds wrote:
> 
> 
> Ehh, I think I found it.
> 
> Hint: "ptep_mkdirty()".
> 
> Oops.
> 
> I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that
> this explains it.
> 
> 		Linus

Good.  Sounds like you guys have a handle on it now.

:-)

Jeff

> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13 20:19       ` Linus Torvalds
@ 2000-12-13 22:48         ` Rainer Mager
  2000-12-17 23:27           ` Signal 11 - revisited Rainer Mager
  2000-12-14  7:22         ` Signal 11 - the continuing saga Mike Galbraith
  1 sibling, 1 reply; 20+ messages in thread
From: Rainer Mager @ 2000-12-13 22:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

Err, for those of us who aren't up to our elbows in the kernel code, is
there a patch for this? Presumeably this will be rolled into 2.4.0test13 but
I'd like to try it out? Also, can someone summarize the fix in English along
with the expected, improved behavior (e.g. Linux will never have a signal 11
again and will never, ever crash ;-)

Finally, as soon as there is a patch, can other people who have seen this
problem test it. My problem is so random that I'd need at least a few days
to gain some confidence this is fixed.


Thanks all.

--Rainer

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Linus Torvalds
> Sent: Thursday, December 14, 2000 5:19 AM
> To: Mike Galbraith
> Cc: Kernel Mailing List
> Subject: Re: Signal 11 - the continuing saga
>
>
> On Wed, 13 Dec 2000, Linus Torvalds wrote:
> >
> > Hint: "ptep_mkdirty()".
>
> In case you wonder why the bug was so insidious, what this caused was two
> separate problems, both of them able to cause SIGSGV's.
>
> One: we didn't mark the page table entry dirty like we were supposed to.
>
> Two: by making it writable, we also made the page shared, even if it
> wasn't supposed to be shared (so when the next process wrote to the page,
> if the swap page was shared with somebody else, the changes would show up
> even in the process that _didn't_ write to it).
>
> And "ptep_mkdirty()" is only used by swapoff, so nothing else would show
> this. Which was why it hadn't been immediately obvious that anything was
> broken.
>
> 		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13 19:27     ` Linus Torvalds
@ 2000-12-14  3:57       ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2000-12-14  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 13 Dec 2000, Linus Torvalds wrote:

> On Wed, 13 Dec 2000, Mike Galbraith wrote:
> > 
> > Not in my test tree.  Same fault, and same trace leading up to it. no
> 
> Ok.
> 
> It definitely looks like a swapoff() problem.
> 
> Have you ever seen the behaviour without running swapoff?

No.

> Also, can you re-create it without running swapon() (if it's something
> like a lost dirty bit, it should be possible to trigger even without the
> swapon, and I'd like to hear if that can happen - if it only happens with
> swapon() and you can't trigger it with just a swapoff() it might be a
> question of re-using some swap file stuff and delaying the writeout or
> whatever).

I'll try loading up swap, swapoff and then doing jobs that fit in ram.

(hmm.. what about inactive_clean list when you do swapoff.. might there
be pages sitting there that are [were] swap cache? reclaim_page=kaboom?)

	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13 20:19       ` Linus Torvalds
  2000-12-13 22:48         ` Rainer Mager
@ 2000-12-14  7:22         ` Mike Galbraith
  1 sibling, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2000-12-14  7:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 13 Dec 2000, Linus Torvalds wrote:

> On Wed, 13 Dec 2000, Linus Torvalds wrote:
> > 
> > Hint: "ptep_mkdirty()".

<g> rather obvious oopsie.. once spotted.

> In case you wonder why the bug was so insidious, what this caused was two
> separate problems, both of them able to cause SIGSGV's. 
> 
> One: we didn't mark the page table entry dirty like we were supposed to.
> 
> Two: by making it writable, we also made the page shared, even if it
> wasn't supposed to be shared (so when the next process wrote to the page,
> if the swap page was shared with somebody else, the changes would show up
> even in the process that _didn't_ write to it).
> 
> And "ptep_mkdirty()" is only used by swapoff, so nothing else would show
> this. Which was why it hadn't been immediately obvious that anything was
> broken.

The terminal OOM problem is now gone and I haven't seen a SIGSEGV yet
running virgin source.

	IOU 5 bogo$$

	-Mike

(I still see something with IKD that _could_ be timing related troubles.
There are a couple of grubby fingerprints I need to wipe off, and some
churn/burn hours to be sure)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Signal 11 - revisited
  2000-12-13 22:48         ` Rainer Mager
@ 2000-12-17 23:27           ` Rainer Mager
  0 siblings, 0 replies; 20+ messages in thread
From: Rainer Mager @ 2000-12-17 23:27 UTC (permalink / raw)
  To: Kernel Mailing List

I was wondering if anyone had any new info/suggestions for the Signal 11
problem.

I think I last reported that I had tried 2.4.0test12 w AGPGart and DRM
turned off. This seemed a bit more stable but I did have X crash with
Signall 11 after about 1.5 days.

I'd really appreciate any advice on how to diagnose this.


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13  3:17     ` Linus Torvalds
  2000-12-13  9:34       ` Rainer Mager
@ 2000-12-13 17:43       ` Jeff V. Merkey
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff V. Merkey @ 2000-12-13 17:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Tue, Dec 12, 2000 at 07:17:41PM -0800, Linus Torvalds wrote:
> In article <20001212191719.A12420@vger.timpanogas.org>,
> Jeff V. Merkey <jmerkey@vger.timpanogas.org> wrote:
> >On Wed, Dec 13, 2000 at 09:22:55AM +0900, Rainer Mager wrote:
> >> 	I have a tiny bash script that launches a Java swing app. If I run my
> >> script from an xterm (or gnome-terminal or whatever) then it starts up fine.
> >> If, however, I try to launch it from my gnome taskbar's menu then it dies
> >> with signal 11 (the Java log is available upon request). This seems to be
> >> 100% consistent, since I noticed it yesterday, even across reboots.
> >> Interestingly, the same behavior occurs if I try to run the program from
> >> withis JBuilder 4.
> >> 	So, is this related to the larger signal 11 problems?
> >
> >There's a corruption bug in the page cache somewhere, and it's 100%
> >reproducable.  Finding it will be tough....
> 
> Unlikely. If the actual program data was corrupted, it would SIGSEGV
> regardless of how it's executed.
> 
> I'd guess that the program has a bug, and depending on the arguments and
> environment (especially the latter will be different), it shows up or
> not. Things like not having a LOCALE set in either case or similar.
> 
> 		Linus

Linus,

I agree that there may be some problem in the code above -- the question is
what has changed to make this behavior emerge?  I see it with a host of 
programs(ssh, make, netscape) -- true all are userspace.  Time permitting, 
I may attempt to track this down in ssh and make in jobserver mode.  It
may be related to some interaction that changed underneath.

Jeff


> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  9:34         ` Rainer Mager
@ 2000-12-13 15:40           ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2000-12-13 15:40 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel, Alan Cox

On Wed, 13 Dec 2000, Rainer Mager wrote:

> Mike et al,
> 
> 	I have no idea what IKD is and I don't know what to do with any results I
> might find BUT I'd be happy to do this if it will help. Please pass on the
> info with the instructions. Who should I report the results to?

IKD is a debugging toolkit.  The trap I have set up freezes the kernel
trace buffer at SIGSEGV time.  From there you have to read it backward
looking for problems. (which isn't particularly easy).  I was thinking
you wanted to roll your shirt sleeves up and maybe this would help ;-)  

If you want it, and do a trace, I'b be very interested in the last
couple of schedules to compare to my traces.  It's not something you
can just run and report though.

	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  3:17     ` Linus Torvalds
@ 2000-12-13  9:34       ` Rainer Mager
  2000-12-13 17:43       ` Jeff V. Merkey
  1 sibling, 0 replies; 20+ messages in thread
From: Rainer Mager @ 2000-12-13  9:34 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

Give that man a cigar....it was an env var (not LOCALE but LANG). I'd
actually checked this but I didn't think that made a difference in my case.

Thanks Linus, now can you fix the larger signal 11 problem?

--Rainer


> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Linus Torvalds
> I'd guess that the program has a bug, and depending on the arguments and
> environment (especially the latter will be different), it shows up or
> not. Things like not having a LOCALE set in either case or similar.
>
> 		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  4:29       ` Mike Galbraith
@ 2000-12-13  9:34         ` Rainer Mager
  2000-12-13 15:40           ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Rainer Mager @ 2000-12-13  9:34 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, Alan Cox

Mike et al,

	I have no idea what IKD is and I don't know what to do with any results I
might find BUT I'd be happy to do this if it will help. Please pass on the
info with the instructions. Who should I report the results to?



--Rainer

> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Mike Galbraith
> If you want, I can extract IKD.. which happens to have a trap in place
> for this (because I have a 100% reproducable swap related SIGSEGV that
> I'm trying to figure out).
>
> If you're interested, let me know and I'll extract it (quite large) and
> send it along instructions on how to do the trap.
>
> 	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  1:45     ` Rainer Mager
@ 2000-12-13  4:29       ` Mike Galbraith
  2000-12-13  9:34         ` Rainer Mager
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2000-12-13  4:29 UTC (permalink / raw)
  To: Rainer Mager; +Cc: Jeff V. Merkey, linux-kernel, Alan Cox

On Wed, 13 Dec 2000, Rainer Mager wrote:

> Thanks for the info...
> 
> > [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Jeff V. Merkey
> > > 	So, is this related to the larger signal 11 problems?
> >
> > There's a corruption bug in the page cache somewhere, and it's 100%
> > reproducable.  Finding it will be tough....
> 
> Ok, granted this will be tough but is anyone even actively working on it?
> What can I do to help?

If you want, I can extract IKD.. which happens to have a trap in place
for this (because I have a 100% reproducable swap related SIGSEGV that
I'm trying to figure out). 

If you're interested, let me know and I'll extract it (quite large) and
send it along instructions on how to do the trap.

	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13  2:17   ` Jeff V. Merkey
  2000-12-13  1:45     ` Rainer Mager
@ 2000-12-13  3:17     ` Linus Torvalds
  2000-12-13  9:34       ` Rainer Mager
  2000-12-13 17:43       ` Jeff V. Merkey
  1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2000-12-13  3:17 UTC (permalink / raw)
  To: linux-kernel

In article <20001212191719.A12420@vger.timpanogas.org>,
Jeff V. Merkey <jmerkey@vger.timpanogas.org> wrote:
>On Wed, Dec 13, 2000 at 09:22:55AM +0900, Rainer Mager wrote:
>> 	I have a tiny bash script that launches a Java swing app. If I run my
>> script from an xterm (or gnome-terminal or whatever) then it starts up fine.
>> If, however, I try to launch it from my gnome taskbar's menu then it dies
>> with signal 11 (the Java log is available upon request). This seems to be
>> 100% consistent, since I noticed it yesterday, even across reboots.
>> Interestingly, the same behavior occurs if I try to run the program from
>> withis JBuilder 4.
>> 	So, is this related to the larger signal 11 problems?
>
>There's a corruption bug in the page cache somewhere, and it's 100%
>reproducable.  Finding it will be tough....

Unlikely. If the actual program data was corrupted, it would SIGSEGV
regardless of how it's executed.

I'd guess that the program has a bug, and depending on the arguments and
environment (especially the latter will be different), it shows up or
not. Things like not having a LOCALE set in either case or similar.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13  0:22 ` Signal 11 - the continuing saga Rainer Mager
@ 2000-12-13  2:17   ` Jeff V. Merkey
  2000-12-13  1:45     ` Rainer Mager
  2000-12-13  3:17     ` Linus Torvalds
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff V. Merkey @ 2000-12-13  2:17 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel, Alan Cox

On Wed, Dec 13, 2000 at 09:22:55AM +0900, Rainer Mager wrote:
> Hi again,
> 
> 	Ok, I just upgraded to 2.4.0test12 (although I don't think there was any
> work in 12 that directly addresses this signal 11 problem). When compiling
> the new kernel I chose to disable AGPGart and RDM as suggested by
> davej@suse.de. I will report later if this makes any difference.
> 
> 	On another, possibly related note, I'm getting some really weird behavior
> with a Java program. The only reason I mention it here is because it dies
> with our old friend Signal 11. Anyway, please bear with the description
> below.
> 	I have a tiny bash script that launches a Java swing app. If I run my
> script from an xterm (or gnome-terminal or whatever) then it starts up fine.
> If, however, I try to launch it from my gnome taskbar's menu then it dies
> with signal 11 (the Java log is available upon request). This seems to be
> 100% consistent, since I noticed it yesterday, even across reboots.
> Interestingly, the same behavior occurs if I try to run the program from
> withis JBuilder 4.
> 	So, is this related to the larger signal 11 problems?

There's a corruption bug in the page cache somewhere, and it's 100%
reproducable.  Finding it will be tough....

> 
> 
> 	What else can I do regarding these issues to help fix it? Would a core dump
> help anyone? I'd really like to contribute somehow but I need some
> direction.
> 
> 
> --Rainer
> 
> > From: CMA [mailto:cma@mclink.it]
> > Did you already try to selectively disable L1 and L2 caches (if
> > your box has both) and see what happens?
> 
> Anyone know how to do this?

Usually this is performed in the BIOS setup.  You can also disable L1 
with a sequence of instructions that write to the CR0 register on intel
and flip a bit, but in doing this you have to execute a WBINV (write
back invalidate) instruction to flush out the cache.  BIOS setup is
probably simpler.  Disabling Level I will make the machine slower 
than mollasses, BTW, and if this bug is race related (they always 
are) it won't help much in running it down.

Jeff

> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  2:17   ` Jeff V. Merkey
@ 2000-12-13  1:45     ` Rainer Mager
  2000-12-13  4:29       ` Mike Galbraith
  2000-12-13  3:17     ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Rainer Mager @ 2000-12-13  1:45 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: linux-kernel, Alan Cox

Thanks for the info...

> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Jeff V. Merkey
> > 	So, is this related to the larger signal 11 problems?
>
> There's a corruption bug in the page cache somewhere, and it's 100%
> reproducable.  Finding it will be tough....

Ok, granted this will be tough but is anyone even actively working on it?
What can I do to help?



> > Anyone know how to do [disable L1 and L2 caches]?
>
> Usually this is performed in the BIOS setup.  You can also disable L1
> with a sequence of instructions that write to the CR0 register on intel
> and flip a bit, but in doing this you have to execute a WBINV (write
> back invalidate) instruction to flush out the cache.  BIOS setup is
> probably simpler.  Disabling Level I will make the machine slower
> than mollasses, BTW, and if this bug is race related (they always
> are) it won't help much in running it down.

Aha, just as I suspected. My BIOS doesn't appear to support this. You seem
to be saying that doing so won't really contribute anything anyway so I will
hold off for now.



--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-11 23:24 Signal 11 Rainer Mager
@ 2000-12-13  0:22 ` Rainer Mager
  2000-12-13  2:17   ` Jeff V. Merkey
  0 siblings, 1 reply; 20+ messages in thread
From: Rainer Mager @ 2000-12-13  0:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox

Hi again,

	Ok, I just upgraded to 2.4.0test12 (although I don't think there was any
work in 12 that directly addresses this signal 11 problem). When compiling
the new kernel I chose to disable AGPGart and RDM as suggested by
davej@suse.de. I will report later if this makes any difference.

	On another, possibly related note, I'm getting some really weird behavior
with a Java program. The only reason I mention it here is because it dies
with our old friend Signal 11. Anyway, please bear with the description
below.
	I have a tiny bash script that launches a Java swing app. If I run my
script from an xterm (or gnome-terminal or whatever) then it starts up fine.
If, however, I try to launch it from my gnome taskbar's menu then it dies
with signal 11 (the Java log is available upon request). This seems to be
100% consistent, since I noticed it yesterday, even across reboots.
Interestingly, the same behavior occurs if I try to run the program from
withis JBuilder 4.
	So, is this related to the larger signal 11 problems?


	What else can I do regarding these issues to help fix it? Would a core dump
help anyone? I'd really like to contribute somehow but I need some
direction.


--Rainer

> From: CMA [mailto:cma@mclink.it]
> Did you already try to selectively disable L1 and L2 caches (if
> your box has both) and see what happens?

Anyone know how to do this?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2000-12-17 23:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.10.10012130805190.19301-100000@penguin.transmeta.com>
2000-12-13 17:23 ` Signal 11 - the continuing saga Linus Torvalds
2000-12-13 18:02   ` Mike Galbraith
2000-12-13 19:27     ` Linus Torvalds
2000-12-14  3:57       ` Mike Galbraith
2000-12-13 19:35     ` Linus Torvalds
2000-12-13 20:00       ` Gérard Roudier
2000-12-13 20:19       ` Linus Torvalds
2000-12-13 22:48         ` Rainer Mager
2000-12-17 23:27           ` Signal 11 - revisited Rainer Mager
2000-12-14  7:22         ` Signal 11 - the continuing saga Mike Galbraith
2000-12-13 21:36       ` Jeff V. Merkey
2000-12-11 23:24 Signal 11 Rainer Mager
2000-12-13  0:22 ` Signal 11 - the continuing saga Rainer Mager
2000-12-13  2:17   ` Jeff V. Merkey
2000-12-13  1:45     ` Rainer Mager
2000-12-13  4:29       ` Mike Galbraith
2000-12-13  9:34         ` Rainer Mager
2000-12-13 15:40           ` Mike Galbraith
2000-12-13  3:17     ` Linus Torvalds
2000-12-13  9:34       ` Rainer Mager
2000-12-13 17:43       ` Jeff V. Merkey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).