linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible freezing bug located after ac13
@ 2001-06-24  2:29 tcm
  2001-06-24  2:54 ` Rik van Riel
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: tcm @ 2001-06-24  2:29 UTC (permalink / raw)
  To: linux-kernel

I've recently been going slightly nuts with the fact ac15, 16, and 17
all like deadlocking/slowing to a crawl for seconds/minutes on my K6-III
with 64MB of ram and a swap space of 128MB...

Recently I noticed something VERY odd, I'd been keeping an eye on
gkrellm while I was doing stupid things to produce the problem (a du
as root in X of / generally would always make it pop up) ... And swap
was doing I/O at the time *JUST* before when I'd either deadlock or slow
down to a crawl, and if it recovered, swap would do more I/O...

So. I tried unmounting all swap, and suddenly everything worked fine,
although I couldn't exactly do everythign I wanted of course.

I regression tested this, ac 16,15 and even 14 do this. ac 13 does *not*
- IMHO I think the dead swap patches introduced into 14 may be related
to the problem.

Just my two cents.

Tim

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible freezing bug located after ac13
  2001-06-24  2:29 Possible freezing bug located after ac13 tcm
@ 2001-06-24  2:54 ` Rik van Riel
  2001-06-26 22:38 ` Swap error message I've seen in 2.4.5-ac17 tcm
  2001-06-28  0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm
  2 siblings, 0 replies; 7+ messages in thread
From: Rik van Riel @ 2001-06-24  2:54 UTC (permalink / raw)
  To: tcm; +Cc: linux-kernel

On Sat, 23 Jun 2001 tcm@nac.net wrote:

> I've recently been going slightly nuts with the fact ac15, 16, and 17
> all like deadlocking/slowing to a crawl for seconds/minutes on my K6-III
> with 64MB of ram and a swap space of 128MB...
>
> Recently I noticed something VERY odd, I'd been keeping an eye on
> gkrellm while I was doing stupid things to produce the problem (a du
> as root in X of / generally would always make it pop up) ... And swap
> was doing I/O at the time *JUST* before when I'd either deadlock or slow
> down to a crawl, and if it recovered, swap would do more I/O...
>
> So. I tried unmounting all swap, and suddenly everything worked fine,
> although I couldn't exactly do everythign I wanted of course.
>
> I regression tested this, ac 16,15 and even 14 do this. ac 13 does *not*
> - IMHO I think the dead swap patches introduced into 14 may be related
> to the problem.

1) the dead swap cache patch should alleviate the problem,
   if anything

2) does this happen with 2.4.6-pre5 too ?

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Swap error message I've seen in 2.4.5-ac17
  2001-06-24  2:29 Possible freezing bug located after ac13 tcm
  2001-06-24  2:54 ` Rik van Riel
@ 2001-06-26 22:38 ` tcm
  2001-06-28  0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm
  2 siblings, 0 replies; 7+ messages in thread
From: tcm @ 2001-06-26 22:38 UTC (permalink / raw)
  To: linux-kernel

Yep, me again. I've been playing around with ac17 on my old 486 machine
for a few days (it seems strange that the 486 works fine while the K6
doesn't, but I digress) and I noticed today something that made my hair
stand on end:

Jun 26 16:17:27 debian kernel: VM: Bad swap entry 0033da00
Jun 26 16:17:27 debian kernel: Unused swap offset entry in swap_count
0033da00
Jun 26 16:17:27 debian kernel: Unused swap offset entry in swap_count
0033da00
Jun 26 16:38:16 debian -- MARK --
Jun 26 16:53:13 debian kernel: PPP BSD Compression module registered
Jun 26 16:53:14 debian kernel: PPP Deflate Compression module registered
Jun 26 16:53:24 debian kernel: VM: Bad swap entry 0033da00

Now I have been told by Rik Van Riel that this is a kernel bug - I
initially figured it was a bad disk, thanks to him I can breathe now...

Anyway, at the time the kernel did these messages I was just stopping
playing quake on my K6-III (486 handles packets to/from the modem) and
was reloading the compression modules, changing the mtu of my modem's 
interface to 1500 from 576, and starting fetchmail. And about one
minute later I decided to simply disconnect.

I can't seem to find a way to reproduce this problem all the time like I
can with the freezing bug, but I will reply to this thread if I see it
again and/or can repeatedly reproduce it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2
  2001-06-28  0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm
@ 2001-06-27 23:11   ` Marcelo Tosatti
  2001-06-27 23:25     ` Marcelo Tosatti
  2001-07-01  5:08   ` tcm
  1 sibling, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2001-06-27 23:11 UTC (permalink / raw)
  To: tcm; +Cc: linux-kernel



On Wed, 27 Jun 2001 tcm@nac.net wrote:

> I decided, for the hell of it, to test the pre series as I've been
> nudged by many people to try it in favor of the ac kernel series that
> I've been having problems with. Well, it turns out I have ran into
> exactly the same problem I had with the ac kernel series, which quite
> frankly is surprising the hell out of me.
> 
> To make the kernel freeze/slow down to a crawl with affected kernels on
> my machine I do this test:
> 
> Load X (This fills up my ram and causes me to swap a bit)
> run a rxvt and su to root (proboably unnecessary)
> du /
> 
> Now, somewhere in this test I start swapping a little bit, nothing
> big... then BAM. hard disk, mouse, keyboard, all completely and utterly
> stop. Video continues to work, but my cpu's load goes absolutely INSANE.
> (If it recovers, gkrellm generally says I've gotten a loadavg somewhere
> between 3-20, depending on how long it was stuck) This can last for
> seconds (usually) minutes (once) or it can simply get worse and hang the
> machine (many, many many times)
> 
> When it recovers from this, I generally see a MASSIVE write to swap,
> (I'm using gkrellm to monitor it) and the system continues on as if
> nothing happened - until, of course, this happens again. A kernel
> compile can cause it. a rm -R of a large directory can cause it. Loading
> a large application can cause it.
> 
> On some kernels this is more noticable than others - ac15 does it the
> worst, although pre3 rivals it, and the symptoms are different on
> ac17/18 - it'll simply freeze randomly and with no recovery instead of
> sometimes freezing or sometimes slowing down to a crawl and recovering
> or freezing. (Which is worse? You decide.)
> 
> Now, as before, I tested this with swap and without swap. With swap, I
> get the hangs/freezes in all the affected kernels. Without swap, I
> don't. Nada.
> 
> Now, the big question of the day folks: What changed between 2.4.6-pre2
> and 2.4.6-pre3 that ALSO changed between 2.4.5-ac13 and 2.4.5-ac14 - and
> now, what part of those patches were the VM? Anyone? I don't see in
> 2.4.6-pre3 what changed that was part of the VM... So I am trying to
> narrow it down a bit :)
> 
> This bug is driving me slightly nuts, so I want it dead. Anyone got a
> exterminator handy? =)

Rik's page_launder() changes. 


> 
> Refer to my previous post with this subject for my original description
> of this problem. It's still there in ac18, though I've not tested 19
> (Some have said it's not likely to have been fixed, and I've been
> regress testing 2.4.6pre's today.)
> 
> Subject: Possible freezing bug located after ac13
> 
> Let me know if I can provide any additional information that will help
> nail this bug to the wall. (I want to torture it. =)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2
  2001-06-27 23:11   ` Marcelo Tosatti
@ 2001-06-27 23:25     ` Marcelo Tosatti
  0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2001-06-27 23:25 UTC (permalink / raw)
  To: tcm; +Cc: linux-kernel



On Wed, 27 Jun 2001, Marcelo Tosatti wrote:

> 
> 
> On Wed, 27 Jun 2001 tcm@nac.net wrote:
> 
> > I decided, for the hell of it, to test the pre series as I've been
> > nudged by many people to try it in favor of the ac kernel series that
> > I've been having problems with. Well, it turns out I have ran into
> > exactly the same problem I had with the ac kernel series, which quite
> > frankly is surprising the hell out of me.
> > 
> > To make the kernel freeze/slow down to a crawl with affected kernels on
> > my machine I do this test:
> > 
> > Load X (This fills up my ram and causes me to swap a bit)
> > run a rxvt and su to root (proboably unnecessary)
> > du /
> > 
> > Now, somewhere in this test I start swapping a little bit, nothing
> > big... then BAM. hard disk, mouse, keyboard, all completely and utterly
> > stop. Video continues to work, but my cpu's load goes absolutely INSANE.
> > (If it recovers, gkrellm generally says I've gotten a loadavg somewhere
> > between 3-20, depending on how long it was stuck) This can last for
> > seconds (usually) minutes (once) or it can simply get worse and hang the
> > machine (many, many many times)
> > 
> > When it recovers from this, I generally see a MASSIVE write to swap,
> > (I'm using gkrellm to monitor it) and the system continues on as if
> > nothing happened - until, of course, this happens again. A kernel
> > compile can cause it. a rm -R of a large directory can cause it. Loading
> > a large application can cause it.
> > 
> > On some kernels this is more noticable than others - ac15 does it the
> > worst, although pre3 rivals it, and the symptoms are different on
> > ac17/18 - it'll simply freeze randomly and with no recovery instead of
> > sometimes freezing or sometimes slowing down to a crawl and recovering
> > or freezing. (Which is worse? You decide.)
> > 
> > Now, as before, I tested this with swap and without swap. With swap, I
> > get the hangs/freezes in all the affected kernels. Without swap, I
> > don't. Nada.
> > 
> > Now, the big question of the day folks: What changed between 2.4.6-pre2
> > and 2.4.6-pre3 that ALSO changed between 2.4.5-ac13 and 2.4.5-ac14 - and
> > now, what part of those patches were the VM? Anyone? I don't see in
> > 2.4.6-pre3 what changed that was part of the VM... So I am trying to
> > narrow it down a bit :)
> > 
> > This bug is driving me slightly nuts, so I want it dead. Anyone got a
> > exterminator handy? =)
> 
> Rik's page_launder() changes. 

Eek. I mean Rik's page_launder() changes are _causing_ the problem. (its
the only VM change between 2.4.6-pre2->pre3/2.4.5-ac13->ac14)

Question:

Whats the size of the inactive dirty and clean lists when you're about to
crash.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2
  2001-06-24  2:29 Possible freezing bug located after ac13 tcm
  2001-06-24  2:54 ` Rik van Riel
  2001-06-26 22:38 ` Swap error message I've seen in 2.4.5-ac17 tcm
@ 2001-06-28  0:33 ` tcm
  2001-06-27 23:11   ` Marcelo Tosatti
  2001-07-01  5:08   ` tcm
  2 siblings, 2 replies; 7+ messages in thread
From: tcm @ 2001-06-28  0:33 UTC (permalink / raw)
  To: linux-kernel

I decided, for the hell of it, to test the pre series as I've been
nudged by many people to try it in favor of the ac kernel series that
I've been having problems with. Well, it turns out I have ran into
exactly the same problem I had with the ac kernel series, which quite
frankly is surprising the hell out of me.

To make the kernel freeze/slow down to a crawl with affected kernels on
my machine I do this test:

Load X (This fills up my ram and causes me to swap a bit)
run a rxvt and su to root (proboably unnecessary)
du /

Now, somewhere in this test I start swapping a little bit, nothing
big... then BAM. hard disk, mouse, keyboard, all completely and utterly
stop. Video continues to work, but my cpu's load goes absolutely INSANE.
(If it recovers, gkrellm generally says I've gotten a loadavg somewhere
between 3-20, depending on how long it was stuck) This can last for
seconds (usually) minutes (once) or it can simply get worse and hang the
machine (many, many many times)

When it recovers from this, I generally see a MASSIVE write to swap,
(I'm using gkrellm to monitor it) and the system continues on as if
nothing happened - until, of course, this happens again. A kernel
compile can cause it. a rm -R of a large directory can cause it. Loading
a large application can cause it.

On some kernels this is more noticable than others - ac15 does it the
worst, although pre3 rivals it, and the symptoms are different on
ac17/18 - it'll simply freeze randomly and with no recovery instead of
sometimes freezing or sometimes slowing down to a crawl and recovering
or freezing. (Which is worse? You decide.)

Now, as before, I tested this with swap and without swap. With swap, I
get the hangs/freezes in all the affected kernels. Without swap, I
don't. Nada.

Now, the big question of the day folks: What changed between 2.4.6-pre2
and 2.4.6-pre3 that ALSO changed between 2.4.5-ac13 and 2.4.5-ac14 - and
now, what part of those patches were the VM? Anyone? I don't see in
2.4.6-pre3 what changed that was part of the VM... So I am trying to
narrow it down a bit :)

This bug is driving me slightly nuts, so I want it dead. Anyone got a
exterminator handy? =)

Refer to my previous post with this subject for my original description
of this problem. It's still there in ac18, though I've not tested 19
(Some have said it's not likely to have been fixed, and I've been
regress testing 2.4.6pre's today.)

Subject: Possible freezing bug located after ac13

Let me know if I can provide any additional information that will help
nail this bug to the wall. (I want to torture it. =)

Tim

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2
  2001-06-28  0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm
  2001-06-27 23:11   ` Marcelo Tosatti
@ 2001-07-01  5:08   ` tcm
  1 sibling, 0 replies; 7+ messages in thread
From: tcm @ 2001-07-01  5:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Rik van Riel, Marcelo Tosatti, Linus Torvalds

	I'm currently running 2.4.6-pre8 and happy as a clam, the
problem has been found and reverted, looks from my discussions with
Linus like the page_launder change introduced into pre3 and also
included in ac14 was causing the hangs/near freezes.

	I'm not really much of a coder, so I can't say what was wrong
with it, only what the symptoms were and how to get it to screw up
whenever I wanted to test for it. (See previous messages for how to do
this) If Rik van Riel/Marcelo Tosatti/anyone wants to have me gather
information on what is going on just before/after the kernel dies I'll
do it - just tell me how to, and I'll push it along :)

Thanks a bunch Linus,
Tim

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-07-01  5:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-24  2:29 Possible freezing bug located after ac13 tcm
2001-06-24  2:54 ` Rik van Riel
2001-06-26 22:38 ` Swap error message I've seen in 2.4.5-ac17 tcm
2001-06-28  0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm
2001-06-27 23:11   ` Marcelo Tosatti
2001-06-27 23:25     ` Marcelo Tosatti
2001-07-01  5:08   ` tcm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).