All of lore.kernel.org
 help / color / mirror / Atom feed
From: bryan.coleman@dart.biz
To: Eric Sandeen <sandeen@redhat.com>
Cc: linux-ext4@vger.kernel.org, linux-ext4-owner@vger.kernel.org,
	"Ted Ts'o" <tytso@mit.edu>
Subject: Re: ext4 problems with external RAID array via SAS connection
Date: Tue, 8 Feb 2011 13:50:32 -0500	[thread overview]
Message-ID: <OF0792030A.13D80D43-ON85257831.0066ABF1-85257831.006780FD@dart.biz> (raw)
In-Reply-To: <4D515F0D.1030902@redhat.com>

I found that the promise array had been restarted via watchdog timer.  I 
am investigating that avenue via promise (albeit slow).  Note: the 
watchdog reset the controller days after the initial ext4 messages.  I'm 
not saying they are unrelated.  I just what to get all of the facts out 
there.

I suspect the connection between the server and the promise got hosed when 
the controller was reset.  When I restart the server, I could fsck the 
drive.

The fsck is currently running (and has been for some time now). 

It is doing a ton of "Inode ######## ref count is 2, should be 1.  Fix? 
yes"  "Unattached inode #########"  "Connect to /lost+found? yes"

I am running fsck in a script session; however, there are currently a ton 
of the messages above (current log size: 106M).

Do you think it is still hardware?  If so, is there a command that would 
stress it enough to break quickly?  What is the best way to isolate 
hardware problems?

Bryan



From:   Eric Sandeen <sandeen@redhat.com>
To:     bryan.coleman@dart.biz
Cc:     linux-ext4@vger.kernel.org, "Ted Ts'o" <tytso@mit.edu>
Date:   02/08/2011 10:21 AM
Subject:        Re: ext4 problems with external RAID array via SAS 
connection
Sent by:        linux-ext4-owner@vger.kernel.org



On 2/8/11 8:50 AM, bryan.coleman@dart.biz wrote:
> Well, I attempted to run fsck on the problem drive using the script 
> command to capture the transcript; however, it failed to read a block 
from 
> the file system.  The exception was "fsck.ext4: Attempt to read block 
from 
> filesystem resulted in short read while trying to open 
> /dev/mapper/vg_storage-lv_storage". 
> 
> Other messages that are now in /var/log/messages:
> 
> Buffer I/O error on device dm-2, logical block 0
> lost page write due to I/O error on dm-2
> EXT4-fs (dm-2): previous I/O error to superblock detected
> Buffer I/O error on device dm-2, logical block 0
> lost page write due to I/O error on dm-2
> Buffer I/O error on device dm-2, logical block 0
> Buffer I/O error on device dm-2, logical block 1
> Buffer I/O error on device dm-2, logical block 2
> Buffer I/O error on device dm-2, logical block 3
> Buffer I/O error on device dm-2, logical block 0
> EXT4-fs (dm-2): unable to read superblock
> 
> 
> Since it looks like I need to start the process all over again, is there 
a 
> good way to quickly determine if the problem is hardware related?  Is 
> there a preferred method that will stress test the drive and shed more 
> light on what might be going wrong?

You have a hardware problem... "Buffer I/O error on device dm-2, logical 
block 0"
means that you failed to read the first block on that device; not 
something
e2fsck can fix, I'm afraid; you'll need to sort out what's wrong with the 
storage,
first.

-Eric

> Thank you,
> 
> Bryan
> 
> 
> 
> From:   bryan.coleman@dart.biz
> To:     linux-ext4@vger.kernel.org, linux-ext4-owner@vger.kernel.org
> Date:   02/08/2011 08:19 AM
> Subject:        Re: ext4 problems with external RAID array via SAS 
> connection
> Sent by:        linux-ext4-owner@vger.kernel.org
> 
> 
> 
> When I ran fsck after the first bout of failure, it did report a lot of 
> errors.  I do not have a copy of that fsck transcript; however, I have 
not 
> 
> yet run fsck since my second attempt.  Is there a method of capturing 
the 
> transcript that is preferred?
> 
> Bryan
> 
> 
> 
> From:   Ted Ts'o <tytso@mit.edu>
> To:     bryan.coleman@dart.biz
> Cc:     linux-ext4@vger.kernel.org
> Date:   02/07/2011 05:55 PM
> Subject:        Re: ext4 problems with external RAID array via SAS 
> connection
> Sent by:        linux-ext4-owner@vger.kernel.org
> 
> 
> 
> On Mon, Feb 07, 2011 at 01:53:18PM -0500, bryan.coleman@dart.biz wrote:
>> I am experiencing problems with an ext4 file system.
>>
>> At first, the drive seemed to work fine.  I was primarily copying 
things 
> 
> 
>> to the drive migrating data from another server.  After many GBs of 
> data, 
>> that seemingly successfully were done being transferred, I started 
> seeing 
>> ext4 errors in /var/log/messages.  I then unmounted the drive and ran 
> fsck 
>> on it (which took multiple hours to run).  I then ls'ed around and one 
> of 
>> the areas caused the system to again throw ext4 errors.
> 
> Did fsck report any errors?  Do you have a copy of your fsck
> transcript?
> 
> The errors you've reported do make me suspicious that there's
> something unstable with your hardware...
> 
>   - Ted
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



  reply	other threads:[~2011-02-08 18:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-07 18:53 ext4 problems with external RAID array via SAS connection bryan.coleman
2011-02-07 22:54 ` Ted Ts'o
2011-02-08 13:18   ` bryan.coleman
2011-02-08 14:50     ` bryan.coleman
2011-02-08 15:19       ` Eric Sandeen
2011-02-08 18:50         ` bryan.coleman [this message]
2011-02-08 20:49           ` Eric Sandeen
2011-02-09 13:43             ` bryan.coleman
2011-02-09 18:28               ` Ted Ts'o
2011-02-09 19:46                 ` Ric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OF0792030A.13D80D43-ON85257831.0066ABF1-85257831.006780FD@dart.biz \
    --to=bryan.coleman@dart.biz \
    --cc=linux-ext4-owner@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.