All of lore.kernel.org
 help / color / mirror / Atom feed
From: Barrett Lewis <barrett.lewis.mitsi@gmail.com>
To: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Mdadm server eating drives
Date: Mon, 1 Jul 2013 19:17:00 -0500	[thread overview]
Message-ID: <CAPSPcXhZosdmKiG-rhQXu+NcNJ2yLjfT1hAor1cCHWi1kM08aA@mail.gmail.com> (raw)
In-Reply-To: <51CC72A4.4040508@jungers.net>

I am very sorry to keep bugging this list, but I am really lost.

After learning about erc and timeouts the severity of the problem was
reduced to the point that I could atleast get my system back to a
raid6.  I ran a repair and fixed 5477 mismatches, and then a check
showed it clean.  Yet drives continue to give me DRDY statuses.  I
replaced the two that were doing it with WD reds (which my intent is
to only buy from now on).  Then I tried to run a repair again, and my
system crashed, as if the timers were mismatched, but I had set the
driver timeouts on all drives to 180, even the ones with erc to be
safe.  This repair crashed several (3-4) times under these conditions
(usually within a few minutes of starting).  Finally instead of a
repair I ran a check which somehow completed fine and showed zero
mismatches.

I started rsync to verify my data against a backup.  And now 3 drives
are giving me DRDY statuses.  Two of them have REALLY failed out of
the array, giving DRDY DF ERR messages, and don't even show superblock
present from mdadm --examine, so now I'm back to the bare minimum of
my raid6.  One of the two drives that is so bad it lost it's
superblock is one of the WD reds I just bought and installed 5 days
ago.

Any thoughts on what is going on?  I have to ask again if it's
possibly my motherboard is frying the hardware in these drives?



cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]

md0 : active raid6 sdd[6](F) sdc[7] sda[9] sdf[8](F) sdb[0] sde[4]
      7813531648 blocks super 1.2 level 6, 512k chunk, algorithm 2
[6/4] [U__UUU]

unused devices: <none>

sudo mdadm -D /dev/md0 | nopaste
http://pastie.org/8101687

sudo mdadm --examine /dev/sd[a-f] 2>&1 | nopaste
http://pastie.org/8101681


sudo smartctl -x /dev/sda | nopaste
http://pastie.org/8101691

sudo smartctl -x /dev/sdb | nopaste
http://pastie.org/8101693

sudo smartctl -x /dev/sdc | nopaste
http://pastie.org/8101694

sudo smartctl -x /dev/sdd | nopaste
http://pastie.org/8101695

sudo smartctl -x /dev/sde | nopaste
http://pastie.org/8101696

sudo smartctl -x /dev/sdf | nopaste
http://pastie.org/8101697

for x in /sys/block/sd[a-f]/device/timeout ; do echo $x $(< $x); done
/sys/block/sda/device/timeout 180
/sys/block/sdb/device/timeout 180
/sys/block/sdc/device/timeout 180
/sys/block/sdd/device/timeout 180
/sys/block/sde/device/timeout 180
/sys/block/sdf/device/timeout 180






On Thu, Jun 27, 2013 at 12:13 PM, Nicolas Jungers <nicolas@jungers.net> wrote:
> On 06/27/2013 02:23 AM, Barrett Lewis wrote:
>>
>> Everything is going well, I am just trying to replace the parts that
>> are on the way out.
>> I ran a 'repair' and it came out with 5477 under
>> /sys/block/md0/md/mismatch_cnt.  Then a 'check' came out with 0.
>>
>> Then I went out and bought a couple WD Reds (I'm done with greens now
>> that I know they lack ERC).  I replaced one of the two drives Phil
>> said was not ok, which had many reallocations (I can personally see
>> those) in the smart status.  I then ran another repair to be safe.  It
>> came up with 0 mismatches, but in the process /dev/sda started giving
>> me tons (and tons and tons, rolled over dmesg) of these "failed
>> command: READ FPDMA QUEUED status: { DRDY ERR } error: { UNC }"
>> errors. sda hadn't been giving me problems before but I'll come back
>> to it.
>>
>> The second disk Phil said was "not ok" was this one which showed
>> "several pending errors".
>> (original smart status) http://pastie.org/8040852
>> I was going to replace it with my second spare Red, but the errors
>> seem to have gone away.
>> (current smart status) http://pastie.org/8084278
>> Or maybe I am looking in the wrong place to find the pending errors
>> (looking at "197 Current_Pending_Sector").  Is the drive currently in
>> need of replacement?  I'm not sure what I'm looking for.
>>
>> What about this one (sda), after it gave all of those errors during a
>> repair?  http://pastie.org/8084292
>> I get the "5 Reallocated_Sector_Ct", but where do you find pending errors?
>>
>> What does it mean to get all these "failed command: READ FPDMA QUEUED
>> status: { DRDY ERR } error: { UNC }" errors and the smart status seems
>> to be fine even after a repair?
>
>
> Have you considered that your SATA may be faulty? I had consistent bad
> experiences with "cheap" SATA cables. I also use exclusively now cables with
> latches. I said "cheap" because the price is not an absolute criteria,
> quality of sourcing is more important in my experience.
>
> Regards,
> N.
>
>
>>
>> Thanks everyone, I'm learning a lot.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

  reply	other threads:[~2013-07-02  0:17 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12 13:47 Mdadm server eating drives Barrett Lewis
2013-06-12 13:57 ` David Brown
2013-06-12 14:44 ` Phil Turmel
2013-06-12 15:41 ` Adam Goryachev
     [not found]   ` <CAPSPcXihHrAi2TB9Fuxb1qOGMc_WzwGoXAA7nHdwe2knkO0LkQ@mail.gmail.com>
     [not found]     ` <CAPSPcXib4YZ9Ah-jLvL_kPwpKHLxaGT0rNaDL4XQcFm=RtjcAQ@mail.gmail.com>
2013-06-14  0:19       ` Barrett Lewis
2013-06-14  2:08         ` Phil Turmel
     [not found]           ` <CAPSPcXgMxOF-C2Szu_nf4ZLDC8p+yJFOtvLPu7xy1DTW9VAHjg@mail.gmail.com>
2013-06-14 21:18             ` Barrett Lewis
2013-06-14 21:20               ` Barrett Lewis
2013-06-14 21:25                 ` Phil Turmel
2013-06-14 21:30                   ` Phil Turmel
2013-06-17 21:37                     ` Barrett Lewis
2013-06-18  4:13                       ` Mikael Abrahamsson
2013-06-27  0:23                         ` Barrett Lewis
2013-06-27 17:13                           ` Nicolas Jungers
2013-07-02  0:17                             ` Barrett Lewis [this message]
2013-07-02  1:57                               ` Stan Hoeppner
2013-07-02 15:48                                 ` Barrett Lewis
2013-07-02 19:44                                   ` Stan Hoeppner
2013-07-02 19:54                                     ` Stan Hoeppner
2013-07-02 20:07                                     ` Jon Nelson
2013-07-02 20:23                                       ` Stan Hoeppner
2013-07-02 20:58                                     ` Barrett Lewis
2013-07-03  1:50                                       ` Stan Hoeppner
2013-07-03  5:26                                         ` Barrett Lewis
2013-07-03 14:03                                           ` Jon Nelson
2013-07-03 14:36                                             ` Phil Turmel
2013-07-03 17:32                                             ` Stan Hoeppner
2013-07-03 19:47                                               ` Barrett Lewis
2013-07-03 20:38                                                 ` Jon Nelson
2013-07-04  2:21                                                 ` Stan Hoeppner
2013-07-03 17:05                                           ` Stan Hoeppner
2013-07-02 21:49                               ` Phil Turmel
2013-06-14 21:24               ` Phil Turmel
2013-07-29 22:25           ` Roy Sigurd Karlsbakk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPSPcXhZosdmKiG-rhQXu+NcNJ2yLjfT1hAor1cCHWi1kM08aA@mail.gmail.com \
    --to=barrett.lewis.mitsi@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.