linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Brooks <user@mail.econolodgetulsa.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: aacraid (dell PERC) cannot handle a degraded mirror
Date: Tue, 11 Mar 2003 02:18:02 -0800 (PST)	[thread overview]
Message-ID: <20030311021253.I5773-100000@mail.econolodgetulsa.com> (raw)
In-Reply-To: <1047342169.16969.47.camel@irongate.swansea.linux.org.uk>


Thank you for looking at this - I have seen this happen many times across
many machines, and it is always the disk that is bad.  Replace the disk,
and this stops happening - even though the firmware was not ever replaced.

Relevant details- (and these are interesting):

1. Controller BIOS: 2.7-0 (Build #3153)
2. using fujitsu drives
3. previously, these fujitsu drives, with older firmwares on the PERC
would confuse it so bad it would drop them off and do this same behavior -
now it only happens when a drive actually goes bad.

thanks!

On 11 Mar 2003, Alan Cox wrote:

> On Mon, 2003-03-10 at 07:44, Josh Brooks wrote:
> > 1. I start getting things like this in /var/log/messages
> >
> > Mar  9 07:12:36 system kernel: aacraid:ID(0:02:0); Error Event
> > [command:0x28]
> > Mar  9 07:12:36 system kernel: aacraid:ID(0:02:0); Medium Error, Block
> > Range 435200 : 435327
> > Mar  9 07:12:36 system kernel: aacraid:ID(0:02:0); Error Too Long To
> > Correct
> > Mar  9 07:12:36 system kernel: aacraid:ID(0:02:0) Medium Error, LBN Range
> > 435200:435327
> > Mar  9 07:12:36 system kernel: aacraid:ID(0:02:0) Starting BBR sequence
> >
>
> These come from the firmware
>
> > Mar  9 07:13:00 system kernel: scsi : aborting command due to timeout :
> > pid
> > 162469964, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 06 a3 ff 00 00 08
> > 00
>
> We start to timeout because the firmware isnt responding
>
> > Mar  9 07:13:07 system kernel: aacraid:ID(0:02:0); Error Event
> > [command:0x28]
> > Mar  9 07:13:07 system kernel: aacraid:ID(0:02:0); Medium Error, Block
> > Range 435234 : 435234
> > Mar  9 07:13:07 system kernel: aacraid:ID(0:02:0); Error Too Long To
> > Correct
>
> Firmware finally gives up
>
> >
> > 3. disk 2 on channel 0 fails.  No problem, it's a mirror, right ?
>
> > Mar  9 07:13:36 system kernel: aacraid:  BBR timed out at Block 0x6a42d
> > Mar  9 07:13:36 system kernel: aacraid:Drive 0:2:0 returning error
> > Mar  9 07:13:36 system kernel: aacraid:ID(0:02:0) - IO failed, Cmd[0x28]
>
> Drive firmware fails the I/O
>
> > So, why does the system run fine on the broken mirror, but panics and
> > crashes when the mirror actually breaks ?
> >
> > This is very frustrating - one of the reasons we spent money to mirror
> > things was to reduce possible downtimes (since a disk failure will not
> > crash the machine) but ... a disk failure does crash the machine.
> > Explanations welcome.
>
> Looking at the trace the driver was thrown by something. I think I know
> what may have occurred in your case but not in the test/qualification
> sets. Somehow the firmware spent so long we aborted/gave up and killed
> of a command - then it completed and we tried to sell the scsi layer.
>
> It'll be a while before I can validate that, you might also want to
> report it to aacraid@adapter.com (I think - see MAINTAINERS file for
> the kernel)
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


      reply	other threads:[~2003-03-11 10:07 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-01-01  2:41 Nvidia and its choice to read the GPL "differently" Hell.Surfers
2003-01-01  9:36 ` Mike Galbraith
2003-01-02 18:38   ` Richard Stallman
2003-01-02 18:49     ` Larry McVoy
2003-01-02 19:02     ` Richard B. Johnson
2003-01-02 19:31     ` Mark Mielke
2003-01-03  7:50       ` Richard Stallman
2003-01-03  7:56         ` Mark Hahn
2003-01-03 20:30           ` Richard Stallman
2003-01-03 11:17         ` venom
2003-01-03 11:49           ` Andrew Walrond
2003-01-03 13:11             ` venom
2003-01-03 14:58             ` Bill Davidsen
2003-01-03 15:25               ` Andrew Walrond
2003-01-03 15:48                 ` Hugo Mills
2003-01-03 20:30           ` Richard Stallman
2003-01-03  1:01     ` Mike Galbraith
2003-01-03  7:50       ` Richard Stallman
2003-01-04 22:14     ` Matthias Andree
2003-03-10  7:44   ` aacraid (dell PERC) cannot handle a degraded mirror Josh Brooks
2003-03-11  0:22     ` Alan Cox
2003-03-11 10:18       ` Josh Brooks [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030311021253.I5773-100000@mail.econolodgetulsa.com \
    --to=user@mail.econolodgetulsa.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).