From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Stromberg Subject: Re: timeouts on 3ware 5800.. driver issue? Date: Fri, 24 Jun 2005 10:44:38 -0700 Message-ID: <1119635078.22853.376.camel@seki.nac.uci.edu> References: <42BC2A32.4090509@pobox.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <42BC2A32.4090509@pobox.com> Sender: linux-raid-owner@vger.kernel.org To: mjstumpf@pobox.com Cc: linux-raid@vger.kernel.org, strombrg@dcs.nac.uci.edu List-Id: linux-raid.ids I'd probably: 1) Fish around for a way of cranking back linux's expectations of the "scsi device", EG cranking back the bandwidth, turning off Tagged Command Queuing, and so on, and see if the errors persist. 2) Test each disk invidually, on a non-RAID controller, using UBCD or similar. You might also look them over with "smart". 3) If the disks all check out OK individually, then it may be time to consider a different choice of RAID card - we've had some problems with 3Ware RAID cards here (or maybe a systemic problem in the SATA Maxtor disks we were using with them). EG, these folks appear to sell RAID cards that don't require any lockin, binary-only drivers to use them under linux: http://www.areca.com.tw/index/html/ HTH. On Fri, 2005-06-24 at 10:43 -0500, Michael Stumpf wrote: > I've got an old server I'm trying to maintain with 2 - 3ware 5800 8 port > cards inside, one filled with 80 gig drives, the other with 120 gig. I > have 4 independent md arrays that are all in one large LVM virtual drive. > > Some drives have started to go bad. So as I replace them with new > Seagate 120 gig PATA drives, I get errors in syslog similar to this: > > Jun 15 22:45:14 blimp kernel: 3w-xxxx: scsi1: Command failed: status = > 0xc7, flags = 0x1b, unit #2. > Jun 15 22:45:14 blimp kernel: 3w-xxxx: scsi1: AEN: WARNING: ATA port > timeout: Port #2. > Jun 15 22:45:14 blimp kernel: 3w-xxxx: scsi1: Reset succeeded. > Jun 16 04:40:58 blimp kernel: 3w-xxxx: scsi1: Command failed: status = > 0xc7, flags = 0x1b, unit #2. > Jun 16 04:40:58 blimp kernel: 3w-xxxx: scsi1: AEN: WARNING: ATA port > timeout: Port #2. > Jun 16 04:40:58 blimp kernel: 3w-xxxx: scsi1: Reset succeeded. > Jun 16 11:24:56 blimp kernel: 3w-xxxx: scsi1: Command failed: status = > 0xc7, flags = 0x1b, unit #2. > > This manifests in different ways. Usually it starts up fine, but when > the array is idle and I attempt to access it, I see these entries.and a > brief > delay, then the array works fine for a while. > > I replaced it with a 200 gig older drive (yes, I know it is limited to > 137 gig), and this problem shifted to unit #3 (same thing, it is also a > recently replaced new seagate 120gig). > > I replace unit #3 with several different 200 gig drives (new hitachi, > new seagate, old WD) and always now I get on startup: > > Jun 23 20:54:27 blimp kernel: 3w-xxxx: scsi1: Command failed: status = > 0xc1, flags = 0x11, unit #3. > Jun 23 20:54:27 blimp kernel: 3w-xxxx: scsi1: AEN: ERROR: Drive error: > Port #0. > Jun 23 20:54:27 blimp kernel: 3w-xxxx: scsi1: Reset succeeded. > Jun 23 20:54:27 blimp kernel: 3w-xxxx: scsi1: Command failed: status = > 0xc1, flags = 0x11, unit #3. > Jun 23 20:54:27 blimp kernel: SCSI disk error : host 1 channel 0 id 3 > lun 0 return code = 2 > Jun 23 20:54:27 blimp kernel: I/O error: dev 08:b1, sector 390716672 > Jun 23 20:54:27 blimp kernel: md: disabled device sdl1, could not read > superblock. > Jun 23 20:54:27 blimp kernel: md: could not read sdl1's sb, not importing! > Jun 23 20:54:27 blimp kernel: md: could not import sdl1! > Jun 23 20:54:27 blimp kernel: 3w-xxxx: scsi1: AEN: ERROR: Drive error: > Port #0. > Jun 23 20:54:27 blimp kernel: md3: former device sdl1 is unavailable, > removing from array! > > Any suggestions? I'm not really sure what to do now. > > Regards, > Michael Stumpf > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >