All of lore.kernel.org
 help / color / mirror / Atom feed
* (unknown), 
@ 2009-04-02  4:16 Lelsie Rhorer
  2009-04-02  4:22 ` David Lethe
                   ` (4 more replies)
  0 siblings, 5 replies; 97+ messages in thread
From: Lelsie Rhorer @ 2009-04-02  4:16 UTC (permalink / raw)
  To: linux-raid

I'm having a severe problem whose root cause I cannot determine.  I have a
RAID 6 array managed by mdadm running on Debian "Lenny" with a 3.2GHz AMD
Athlon 64 x 2 processor and 8G of RAM.  There are ten 1 Terabyte SATA
drives, unpartitioned, fully allocated to the /dev/md0 device. The drive
are served by 3 Silicon Image SATA port multipliers and a Silicon Image 4
port eSATA controller.  The /dev/md0 device is also unpartitioned, and all
8T of active space is formatted as a single Reiserfs file system.  The
entire volume is mounted to /RAID.  Various directories on the volume are
shared using both NFS and SAMBA.

Performance of the RAID system is very good.  The array can read and write
at over 450 Mbps, and I don't know if the limit is the array itself or the
network, but since the performance is more than adequate I really am not
concerned which is the case.

The issue is the entire array will occasionally pause completely for about
40 seconds when a file is created.  This does not always happen, but the
situation is easily reproducible.  The frequency at which the symptom
occurs seems to be related to the transfer load on the array.  If no other
transfers are in process, then the failure seems somewhat more rare,
perhaps accompanying less than 1 file creation in 10..  During heavy file
transfer activity, sometimes the system halts with every other file
creation.  Although I have observed many dozens of these events, I have
never once observed it to happen except when a file creation occurs. 
Reading and writing existing files never triggers the event, although any
read or write occurring during the event is halted for the duration. 
(There is one cron jog which runs every half-hour that creates a tiny file;
this is the most common failure vector.)  There are other drives formatted
with other file systems on the machine, but the issue has never been seen
on any of the other drives.  When the array runs its regularly scheduled
health check, the problem is much worse.  Not only does it lock up with
almost every single file creation, but the lock-up time is much longer -
sometimes in excess of 2 minutes.

Transfers via Linux based utilities (ftp, NFS, cp, mv, rsync, etc) all
recover after the event, but SAMBA based transfers frequently fail, both
reads and writes.

How can I troubleshoot and more importantly resolve this issue?


^ permalink raw reply	[flat|nested] 97+ messages in thread
* RE: RAID halting
@ 2009-04-04 17:05 Lelsie Rhorer
  0 siblings, 0 replies; 97+ messages in thread
From: Lelsie Rhorer @ 2009-04-04 17:05 UTC (permalink / raw)
  To: 'Linux RAID'

> One thing that can cause this sort of behaviour is if the filesystem is in
> the middle of a sync and has to complete it before the create can
> complete, and the sync is writing out many megabytes of data.
> 
> You can see if this is happening by running
> 
>      watch 'grep Dirty /proc/meminfo'

OK, I did this.

> if that is large when the hang starts, and drops down to zero, and the
> hang lets go when it hits (close to) zero, then this is the problem.

No, not really.  The value of course rises and falls erratically during
normal operation (anything from a few dozen K to 200 Megs), but it is not
necessarily very high at the event onset.  When the halt occurs it drops
from whatever value it may have (perhaps 256K or so) to 16K, and then slowly
rises to several hundred K until the event terminates.

> If that doesn't turn out to be the problem, then knowing how the
> "Dirty" count is behaving might still be useful, and I would probably
> look at what processes are in 'D' state, (ps axgu)

Well, nothing surprising, there.  The process(es) involved with the
transfer(s) are dirty (D+), as well as the trigger process (for testing, I
simply copy /etc/hosts over to a directory on the RAID array), and pdflush
had a D state (no plus), but that's all.

> and look at their stack (/proc/$PID/stack)..

Um, I thought I knew what you meant by this, but apparently not.  I tried to
`cat /proc/<PID of the process with a D status>/stack`, but the system
returns "cat: /proc/8005/stack: No such file or directory".  What did I do
wrong?


^ permalink raw reply	[flat|nested] 97+ messages in thread
[parent not found: <49D7C19C.2050308@gmail.com>]
* FW: RAID halting
@ 2009-04-05 14:22 David Lethe
  2009-04-05 14:53 ` David Lethe
  2009-04-05 20:33 ` Leslie Rhorer
  0 siblings, 2 replies; 97+ messages in thread
From: David Lethe @ 2009-04-05 14:22 UTC (permalink / raw)
  To: lrhorer, linux-raid

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Lelsie Rhorer
Sent: Sunday, April 05, 2009 3:14 AM
To: linux-raid@vger.kernel.org
Subject: RE: RAID halting

> All of what you report is still consistent with delays caused by
having
> to remap bad blocks

I disagree.  If it happened with some frequency during ordinary reads,
then
I would agree.  If it happened without respect to the volume of reads
and
writes on the system, then I would be less inclined to disagree.

> The O/S will not report recovered errors, as this gets done internally
> by the disk drive, and the O/S never learns about it. (Queue depth

SMART is supposed to report this, and rarely the kernel log does report
a
block of sectors being marked bad by the controller.  I cannot speak to
the
notion SMART's reporting of relocated sectors and failed relocations may
not
be accurate, as I have no means to verify.

Actually, I should amend the first sentence, because while the ten
drives in
the array are almost never reporting any errors, there is another drive
in
the chassis which is chunking out error reports like a farm boy spitting
out
watermelon seeds.  I had a 320G drive in another system which was
behaving
erratically, so I moved it to the array chassis on this machine to
eliminate
it being a cable or the drive controller.  It's reporting blocks being
marked bad all over the place.

> Really, if this was my system I would run non-destructive read tests
on
> all blocks;

How does one do this?  Or rather, isn't this what the monthly mdadm
resync
does?

> along with the embedded self-test on the disk.  It is often

How does one do this?

> a lot easier and more productive to eliminate what ISN'T the problem
> rather than chase all of the potential reasons for the problem.

I agree, which is why I am asking for troubleshooting methods and
utilities.

The monthly RAID array resync started a few minutes ago, and it is
providing
some interesting results.  The number of blocks read / second is
consistently 13,000 - 24,000 on all ten drives.  There were no other
drive
accesses of any sort at the time, so the number of blocks written was
flat
zero on all drives in the array.  I copied the /etc/hosts file to the
RAID
array, and instantly the file system locked, but the array resync *DID
NOT*.
The number of blocks read and written per second continued to range from
13,000 to 24,000 blocks/second, with no apparent halt or slow-down at
all,
not even for one second.  So if it's a drive error, why are file system
reads halted almost completely, and writes halted altogether, yet drive
reads at the RAID array level continue unabated at an aggregate of more
than
130,000 blocks - 240,000 blocks (500 - 940 megabits) per second?  I
tried a
second copy and again the file system accesses to the drives halted
altogether.  The block reads (which had been alternating with writes
after
the new transfer proceses were implemented) again jumped to between
13,000
and 24,000.  This time I used a stopwatch, and the halt was 18 minutes
21
seconds - I believe the longest ever.  There is absolutely no way it
would
take a drive almost 20 minutes to mark a block bad.  The dirty blocks
grew
to more than 78 Megabytes.  I just did a 3rd cp of the /etc/hosts file
to
the array, and once again it locked the machine for what is likely to be
another 15 - 20 minutes.  I tried forcing a sync, but it also hung.

<Sigh>  The next three days are going to be Hell, again.  It's going to
be
all but impossible to edit a file until the RAID resync completes.  It's
often really bad under ordinary loads, but when the resync is underway,
it's
beyond absurd.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
======
Leslie: 
Respectfully, your statement, "SMART is supposed to report this" shows
you have no understanding of exactly what S.M.A.R.T. is and is not
supposed to report, nor do you know enough about hardware to make an
educated decision about what can and can not be contributing factors.
As such, you are not qualified to dismiss the necessity to run hardware
diagnostics.

A few other things - many SATA controller cards use poorly architected
bridge chips that spoof some of the ATA commands, so even if you *think*
you are kicking off one of the SMART subcommands, like the
SMART_IMMEDIATE_OFFLINE (op code d4h with the extended self test,
subcommand 2h), then it is possible, perhaps probable, they are never
getting run. -- yes, I am giving you the raw opcodes so you can look
them up and learn what they do.

You want to know how it is possible that frequency or size of reads can
be a factor? 
Do the math:
 * Look at the # of ECC bits you have on the disks (read the specs), and
compare that with the trillions of bytes you have.  How frequently can
you expect to have an unrecoverable ECC error based on that.
 * What percentage of your farm are you actually testing with the tests
you have run so far? Is it even close to being statistically
significant?
 * Do you know what physical blocks on each disk are being read/written
with the tests you mention? If you do not know, then how do you know
that the short tests are doing I/O on blocks that need to be repaired,
and subsequent tests run OK because those blocks were just repaired?
 * Did you look into firmware? Are the drives and/or firmware revisions
qualified by your controller vendor?  

I've been in the storage business for over 10 years, writing everything
from RAID firmware, configurators, disk diagnostics, test bench suites.
I even have my own company that writes storage diagnostics.  I think I
know a little more about diagnostics and what can and can not happen.
You said before that you do not agree with my statements earlier.  I
doubt that you will find any experienced storage professional that
wouldn't tell you to break it all down and run a full block-level DVT
before going further.  It could have all been done over the week-end if
you had the right setup, and then you would know a lot more than what
you know now.  

AT this point all you have done is tell people who suggest hardware is
the cause that they are wrong and then tell us why you think we are
wrong.  Frankly, be lazy and don't run diagnostics, you had just better
not be a government employee, or in charge of a database that contains
financial, medical, or other such information, and you have better be
running hot backups.

If you still refuse to run full block-level hardware test, then ask
yourself how much longer will you allow this to go on before you run
such a test, or are you just going to continue down this path waiting
for somebody to give you a magic command to type in that will fix
everything.

I am not the one who is putting my job on the line at best, and at
worst, is looking at a criminal violation for not taking appropriate
actions to protect certain data. I make no apology for beating you up on
this.  You need to hear it.




^ permalink raw reply	[flat|nested] 97+ messages in thread
[parent not found: <49D89515.3020800@computer.org>]
[parent not found: <49F21B75.7060705@sauce.co.nz>]
[parent not found: <49F2A193.8080807@sauce.co.nz>]

end of thread, other threads:[~2009-05-03  2:23 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-02  4:16 (unknown), Lelsie Rhorer
2009-04-02  4:22 ` David Lethe
2009-04-05  0:12   ` RE: Lelsie Rhorer
2009-04-05  0:38     ` Greg Freemyer
2009-04-05  5:05       ` Lelsie Rhorer
2009-04-05 11:42         ` Greg Freemyer
2009-04-05  0:45     ` Re: Roger Heflin
2009-04-05  5:21       ` Lelsie Rhorer
2009-04-05  5:33         ` RE: David Lethe
2009-04-05  8:14           ` RAID halting Lelsie Rhorer
2009-04-02  4:38 ` Strange filesystem slowness with 8TB RAID6 NeilBrown
2009-04-04  7:12   ` RAID halting Lelsie Rhorer
2009-04-04 12:38     ` Roger Heflin
2009-04-02  6:56 ` your mail Luca Berra
2009-04-04  6:44   ` RAID halting Lelsie Rhorer
2009-04-02  7:33 ` Peter Grandi
2009-04-02 23:01   ` RAID halting Lelsie Rhorer
2009-04-02 13:35 ` Andrew Burgess
2009-04-04  5:57   ` RAID halting Lelsie Rhorer
2009-04-04 13:01     ` Andrew Burgess
2009-04-04 14:39       ` Lelsie Rhorer
2009-04-04 15:04         ` Andrew Burgess
2009-04-04 15:15           ` Lelsie Rhorer
2009-04-04 16:39             ` Andrew Burgess
2009-04-04 17:05 Lelsie Rhorer
     [not found] <49D7C19C.2050308@gmail.com>
2009-04-05  0:07 ` Lelsie Rhorer
2009-04-05  0:49   ` Greg Freemyer
2009-04-05  5:34     ` Lelsie Rhorer
2009-04-05  7:16       ` Richard Scobie
2009-04-05  8:22         ` Lelsie Rhorer
2009-04-05 14:05           ` Drew
2009-04-05 18:54             ` Leslie Rhorer
2009-04-05 19:17               ` John Robinson
2009-04-05 20:00                 ` Greg Freemyer
2009-04-05 20:39                   ` Peter Grandi
2009-04-05 23:27                     ` Leslie Rhorer
2009-04-05 22:03                   ` Leslie Rhorer
2009-04-06 22:16                     ` Greg Freemyer
2009-04-07 18:22                       ` Leslie Rhorer
2009-04-24  4:52                   ` Leslie Rhorer
2009-04-24  6:50                     ` Richard Scobie
2009-04-24 10:03                       ` Leslie Rhorer
2009-04-28 19:36                         ` lrhorer
2009-04-24 15:24                     ` Andrew Burgess
2009-04-25  4:26                       ` Leslie Rhorer
2009-04-24 17:03                     ` Doug Ledford
2009-04-24 20:25                       ` Richard Scobie
2009-04-24 20:28                         ` CoolCold
2009-04-24 21:04                           ` Richard Scobie
2009-04-25  7:40                       ` Leslie Rhorer
2009-04-25  8:53                         ` Michał Przyłuski
2009-04-28 19:33                         ` Leslie Rhorer
2009-04-29 11:25                           ` John Robinson
2009-04-30  0:55                             ` Leslie Rhorer
2009-04-30 12:34                               ` John Robinson
2009-05-03  2:16                                 ` Leslie Rhorer
2009-05-03  2:23                           ` Leslie Rhorer
2009-04-24 20:25                     ` Greg Freemyer
2009-04-25  7:24                     ` Leslie Rhorer
2009-04-05 21:02                 ` Leslie Rhorer
2009-04-05 19:26               ` Richard Scobie
2009-04-05 20:40                 ` Leslie Rhorer
2009-04-05 20:57               ` Peter Grandi
2009-04-05 23:55                 ` Leslie Rhorer
2009-04-06 20:35                   ` jim owens
2009-04-07 17:47                     ` Leslie Rhorer
2009-04-07 18:18                       ` David Lethe
2009-04-08 14:17                         ` Leslie Rhorer
2009-04-08 14:30                           ` David Lethe
2009-04-09  4:52                             ` Leslie Rhorer
2009-04-09  6:45                               ` David Lethe
2009-04-08 14:37                           ` Greg Freemyer
2009-04-08 16:29                             ` Andrew Burgess
2009-04-09  3:24                               ` Leslie Rhorer
2009-04-10  3:02                               ` Leslie Rhorer
2009-04-10  4:51                                 ` Leslie Rhorer
2009-04-10 12:50                                   ` jim owens
2009-04-10 15:31                                   ` Bill Davidsen
2009-04-11  1:37                                     ` Leslie Rhorer
2009-04-11 13:02                                       ` Bill Davidsen
2009-04-10  8:53                                 ` David Greaves
2009-04-08 18:04                           ` Corey Hickey
2009-04-07 18:20                       ` Greg Freemyer
2009-04-08  8:45                       ` John Robinson
2009-04-09  3:34                         ` Leslie Rhorer
2009-04-05  7:33       ` Richard Scobie
2009-04-05  0:57   ` Roger Heflin
2009-04-05  6:30     ` Lelsie Rhorer
2009-04-05 14:22 FW: " David Lethe
2009-04-05 14:53 ` David Lethe
2009-04-05 20:33 ` Leslie Rhorer
2009-04-05 22:20   ` Peter Grandi
2009-04-06  0:31   ` Doug Ledford
2009-04-06  1:53     ` Leslie Rhorer
2009-04-06 12:37       ` Doug Ledford
     [not found] <49D89515.3020800@computer.org>
2009-04-05 18:40 ` Leslie Rhorer
     [not found] <49F21B75.7060705@sauce.co.nz>
2009-04-25  4:32 ` Leslie Rhorer
     [not found] <49F2A193.8080807@sauce.co.nz>
2009-04-25  7:03 ` Leslie Rhorer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.