linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Srihari Vijayaraghavan <harisri@bigpond.com>
Cc: linux-kernel@vger.kernel.org, hpj@urpla.net,
	mcelrath+kernel@draal.physics.wisc.edu, pellegrini@mpcnet.com.br,
	lists@sapience.com, mroos@linux.ee, willi@7val.com
Subject: Re: 2.4.20-pre10aa1 oops report (was Re: Linux-2.4.20-pre8-aa2 oops report. [solved]) [solved2? ac97]
Date: Wed, 16 Oct 2002 07:06:42 +0200	[thread overview]
Message-ID: <20021016050642.GB6276@dualathlon.random> (raw)
In-Reply-To: <200210160013.02220.harisri@bigpond.com>

On Wed, Oct 16, 2002 at 12:13:02AM +1000, Srihari Vijayaraghavan wrote:
> Hello,
> 
> > That precisely is the reason. The bad news is that system crashes when
> > agpgart and radeon are compiled as modules, and the good news is that I am
> > unable to crash it when they are not.
> 
> My goodness, I have spoken too early I guess. The -aa kernel crashes whether 
> agpgart and radeon are modules or not.

I'm running this kernel for 5 days now very often under heavy load (also with
thousand of tasks with volanomark in background and aio and flood of writes
from /dev/zero), and there's no sign of instability (besides a rare tcp race
that is been reported for 2.4.19 on l-k too, not fatal, it only deadlocks the
tcp connection and you've to kill the task because readmsg will never return
until it gets a signal, I tried to debug it but with no luck yet, but
that is also most certainly a mainline issue too and it triggers only
during heavy load).

You probably did something incidentally (not part of your regression
test loop) that corrupted memory. The regression test is a workload that
will show you if the corruption has happened in the past or not, but the
regression test loop is not the thing that is generating the corruption.
The regression test loop is what gets _harmed_ by the corruption, it's
not the culprit.

My crystall ball is telling me that you could reproduce it easily on my
tree because when you feel finally stable and that you can restart doing
your usual work without worrying about oopses, you enjoy yourself
playing some music to relax. And you instead don't play music while you
try to reproduce the problem because you're busy looking at stressing
the kernel and in turn you can't reproduce the bug. Is she right? ;)

Please try with CONFIG_SOUND=n and make sure to run:

	rm -r /lib/modules/2.4.20-pre10aa1

before "make modules_install" to avoid running stale modules (also enable
modversions just in case).

I see a pile of oopses all showing ac97 loaded into the kernel, some
also for 2.4.19, but they may be unrelated problems of course. A number
of reports showing definitive random mm corruption like yours on top of
2.4.20-pre vanilla (not -aa) are most certainly been affected too by the
ac97 bug (I'm CC'ing the other affected testers, they can try as well
the same as you). I never tried ac97 (I've a couple of boxes that could
handle it, but I never attempted to play sound on those yet and the
chipset may be different so it may not trigger for me after all even if
I could load that module).

Hint: in the past I found easier to reproduce various module bugs with a
loop like this:

	while :; do insmod ac97_codec.o; rmmod ac97_codec.o; done

you can try the above and see if it trigger in seconds.

>From the l-k db grepping it seems the bug is been introduced in 2.4.19.
So I would suggest you to try to reproduce after a:

	rm -r 2.4.20pre10aa1/drivers/sound
	cp -a 2.4.18/drivers/sound 2.4.20pre10aa1/drivers
	cd 2.4.20pre10aa1; make oldconfig ...

(of course you can replace 2.4.20pre10aa1 with 2.4.20pre11 vanilla or
2.4.20pre10ac2)

and see if the instability goes away?

Marcelo also included some further ac97 patch in pre11, maybe
2.4.20pre11aa1 will fix it, you may want to give it a try too when I
release it (OTOH, I'm fixing what seems to be a design bug in the o1
scheduler that is apparently generatating an huge cpu waste, so I don't
guarantee that the very first release with these changes will be as
solid as 2.4.20pre10aa1 ;)

Thanks for all the reports,

Andrea

      reply	other threads:[~2002-10-16  8:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-15  1:08 2.4.20-pre10aa1 oops report (was Re: Linux-2.4.20-pre8-aa2 oops report. [solved]) harisri
2002-10-15 13:05 ` Srihari Vijayaraghavan
2002-10-15 14:13   ` Srihari Vijayaraghavan
2002-10-16  5:06     ` Andrea Arcangeli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021016050642.GB6276@dualathlon.random \
    --to=andrea@suse.de \
    --cc=harisri@bigpond.com \
    --cc=hpj@urpla.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lists@sapience.com \
    --cc=mcelrath+kernel@draal.physics.wisc.edu \
    --cc=mroos@linux.ee \
    --cc=pellegrini@mpcnet.com.br \
    --cc=willi@7val.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).