From: Josh Brooks <user@mail.econolodgetulsa.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: aacraid (dell PERC) cannot handle a degraded mirror
Date: Tue, 11 Mar 2003 02:18:02 -0800 (PST) [thread overview]
Message-ID: <20030311021253.I5773-100000@mail.econolodgetulsa.com> (raw)
In-Reply-To: <1047342169.16969.47.camel@irongate.swansea.linux.org.uk>
Thank you for looking at this - I have seen this happen many times across
many machines, and it is always the disk that is bad. Replace the disk,
and this stops happening - even though the firmware was not ever replaced.
Relevant details- (and these are interesting):
1. Controller BIOS: 2.7-0 (Build #3153)
2. using fujitsu drives
3. previously, these fujitsu drives, with older firmwares on the PERC
would confuse it so bad it would drop them off and do this same behavior -
now it only happens when a drive actually goes bad.
thanks!
On 11 Mar 2003, Alan Cox wrote:
> On Mon, 2003-03-10 at 07:44, Josh Brooks wrote:
> > 1. I start getting things like this in /var/log/messages
> >
> > Mar 9 07:12:36 system kernel: aacraid:ID(0:02:0); Error Event
> > [command:0x28]
> > Mar 9 07:12:36 system kernel: aacraid:ID(0:02:0); Medium Error, Block
> > Range 435200 : 435327
> > Mar 9 07:12:36 system kernel: aacraid:ID(0:02:0); Error Too Long To
> > Correct
> > Mar 9 07:12:36 system kernel: aacraid:ID(0:02:0) Medium Error, LBN Range
> > 435200:435327
> > Mar 9 07:12:36 system kernel: aacraid:ID(0:02:0) Starting BBR sequence
> >
>
> These come from the firmware
>
> > Mar 9 07:13:00 system kernel: scsi : aborting command due to timeout :
> > pid
> > 162469964, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 06 a3 ff 00 00 08
> > 00
>
> We start to timeout because the firmware isnt responding
>
> > Mar 9 07:13:07 system kernel: aacraid:ID(0:02:0); Error Event
> > [command:0x28]
> > Mar 9 07:13:07 system kernel: aacraid:ID(0:02:0); Medium Error, Block
> > Range 435234 : 435234
> > Mar 9 07:13:07 system kernel: aacraid:ID(0:02:0); Error Too Long To
> > Correct
>
> Firmware finally gives up
>
> >
> > 3. disk 2 on channel 0 fails. No problem, it's a mirror, right ?
>
> > Mar 9 07:13:36 system kernel: aacraid: BBR timed out at Block 0x6a42d
> > Mar 9 07:13:36 system kernel: aacraid:Drive 0:2:0 returning error
> > Mar 9 07:13:36 system kernel: aacraid:ID(0:02:0) - IO failed, Cmd[0x28]
>
> Drive firmware fails the I/O
>
> > So, why does the system run fine on the broken mirror, but panics and
> > crashes when the mirror actually breaks ?
> >
> > This is very frustrating - one of the reasons we spent money to mirror
> > things was to reduce possible downtimes (since a disk failure will not
> > crash the machine) but ... a disk failure does crash the machine.
> > Explanations welcome.
>
> Looking at the trace the driver was thrown by something. I think I know
> what may have occurred in your case but not in the test/qualification
> sets. Somehow the firmware spent so long we aborted/gave up and killed
> of a command - then it completed and we tried to sell the scsi layer.
>
> It'll be a while before I can validate that, you might also want to
> report it to aacraid@adapter.com (I think - see MAINTAINERS file for
> the kernel)
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
prev parent reply other threads:[~2003-03-11 10:07 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-01-01 2:41 Nvidia and its choice to read the GPL "differently" Hell.Surfers
2003-01-01 9:36 ` Mike Galbraith
2003-01-02 18:38 ` Richard Stallman
2003-01-02 18:49 ` Larry McVoy
2003-01-02 19:02 ` Richard B. Johnson
2003-01-02 19:31 ` Mark Mielke
2003-01-03 7:50 ` Richard Stallman
2003-01-03 7:56 ` Mark Hahn
2003-01-03 20:30 ` Richard Stallman
2003-01-03 11:17 ` venom
2003-01-03 11:49 ` Andrew Walrond
2003-01-03 13:11 ` venom
2003-01-03 14:58 ` Bill Davidsen
2003-01-03 15:25 ` Andrew Walrond
2003-01-03 15:48 ` Hugo Mills
2003-01-03 20:30 ` Richard Stallman
2003-01-03 1:01 ` Mike Galbraith
2003-01-03 7:50 ` Richard Stallman
2003-01-04 22:14 ` Matthias Andree
2003-03-10 7:44 ` aacraid (dell PERC) cannot handle a degraded mirror Josh Brooks
2003-03-11 0:22 ` Alan Cox
2003-03-11 10:18 ` Josh Brooks [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030311021253.I5773-100000@mail.econolodgetulsa.com \
--to=user@mail.econolodgetulsa.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).