All of lore.kernel.org
 help / color / mirror / Atom feed
* Nvidia Raid5 Failure
@ 2014-04-10  5:00 peter davidson
  2014-04-10  8:46 ` David Brown
  2014-04-10 14:36 ` Scott D'Vileskis
  0 siblings, 2 replies; 12+ messages in thread
From: peter davidson @ 2014-04-10  5:00 UTC (permalink / raw)
  To: linux-raid

Hi Folks,

My computer suddenly shut down due to a failed memory module - damaging the 1.8TB RAID5 array of three disks.

The computer was able to boot with a degraded array (Windows 7 OS was on the array) but I was unable to get the array to rebuild using the Nvidia toolset - either at BIOS level or in Windows 7. Now the computer will not boot from the array.

I had something very similar to this happen a few weeks ago when the mother board failed - I was able to limp things along to get a backup of all important data.

I am interested to know if LINUX will be able to recover the array for me this time. Having got part way through this process before on the previous failure (which led me to this forum), I am keen to follow this through as an exercise knowing I have a backup of the really important stuff. 

I intend to build LINUX onto a new disk and work through this in the coming days - what would be my best choice of distro for this exercise? I am hoping to find something that has all the relevant tools and is relatively simple to get up and running with a friendly GUI to help me navigate round.

I used to work on various databases running on UNIX servers so I hope I can still can find my way round a terminal window.

Thanks in advance for any support anyone can offer me!

Regards,

Peter.
 		 	   		  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
  2014-04-10  5:00 Nvidia Raid5 Failure peter davidson
@ 2014-04-10  8:46 ` David Brown
  2014-04-10 14:36 ` Scott D'Vileskis
  1 sibling, 0 replies; 12+ messages in thread
From: David Brown @ 2014-04-10  8:46 UTC (permalink / raw)
  To: peter davidson, linux-raid

On 10/04/14 07:00, peter davidson wrote:
> Hi Folks,
> 
> My computer suddenly shut down due to a failed memory module -
> damaging the 1.8TB RAID5 array of three disks.
> 
> The computer was able to boot with a degraded array (Windows 7 OS was
> on the array) but I was unable to get the array to rebuild using the
> Nvidia toolset - either at BIOS level or in Windows 7. Now the
> computer will not boot from the array.
> 
> I had something very similar to this happen a few weeks ago when the
> mother board failed - I was able to limp things along to get a backup
> of all important data.
> 
> I am interested to know if LINUX will be able to recover the array
> for me this time. Having got part way through this process before on
> the previous failure (which led me to this forum), I am keen to
> follow this through as an exercise knowing I have a backup of the
> really important stuff.
> 
> I intend to build LINUX onto a new disk and work through this in the
> coming days - what would be my best choice of distro for this
> exercise? I am hoping to find something that has all the relevant
> tools and is relatively simple to get up and running with a friendly
> GUI to help me navigate round.
> 
> I used to work on various databases running on UNIX servers so I hope
> I can still can find my way round a terminal window.
> 
> Thanks in advance for any support anyone can offer me!
> 
> Regards,
> 

As a general point, don't do /anything/ to write to the disks or attempt
to recover or rebuild them until you have copied off /all/ important
data to safe backups.  If you have booted Windows from the array, then
step one is to shut it down and do not even consider booting it until
backups are all in place and double-checked.

You want to use a live CD for recovery here - it will let you play
around with the disks without risking causing more damage.  My usual
choice of Linux live CD for recovery purposes is System Rescue CD - I
can't tell you if it is the best choice here, and I haven't needed to
recover raid arrays using it.  But I find it very useful for testing and
configuration, and have used it to recover data or fix up broken Windows
systems.

Another option you should consider is a Windows live CD.  You can't
legally download and burn one, AFAIK, but there are plenty available if
you are happy to look about.  There are also several Windows live CD
generators that will make a bootable Windows CD from another windows
machine, and can include utility programs.  They are particularly
popular for malware recovery, but I expect you can put your Nvidia raid
software on them.

As for how well you can access the data and/or recover and/or rebuild
your array from Linux, it all depends on the support for your Nvidia
raid.  Someone here might have experience and can give you information,
but your best starting point would be Nvidia's website.  There are Linux
drivers and utilities for most proper hardware raid systems, but if this
is a Nvidia-specific fake raid, it might not be supported.  Fake raid is
not very popular in the Linux world - it combines the disadvantages of
software raid with the disadvantages of hardware raid, and the benefits
of neither.  It's only real advantage is if you want to use Windows with
raid and don't want to pay for proper hardware raid.  Intel's "matrix"
is far and away the most popular fake raid, and has good support in
Linux, but I cannot say about Nvidia's raid.


If you want to set up a new system with Linux raid, then you will be
able to get pointers and help in this list - but it's not really
suitable for "how to get started with Linux" information.  And if you
want to mix Windows and Linux on the same system, be aware that Windows
can't work with Linux software raid, and can't understand Linux
filesystems (at least, not easily).  It is often much easier to keep
them on separate machines and sharing files over the network.
Alternatively, consider using VirtualBox to let you run one system as a
virtual machine inside the other.

mvh.,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
  2014-04-10  5:00 Nvidia Raid5 Failure peter davidson
  2014-04-10  8:46 ` David Brown
@ 2014-04-10 14:36 ` Scott D'Vileskis
  2014-04-11  4:15   ` Scott D'Vileskis
  2014-04-13 16:42   ` Drew
  1 sibling, 2 replies; 12+ messages in thread
From: Scott D'Vileskis @ 2014-04-10 14:36 UTC (permalink / raw)
  To: peter davidson; +Cc: linux-raid

I would advise keeping operating systems off RAID arrays in general,
Windows or Linux, because most bootloaders are loaded to a single
disk. If *that disk fails, you may not be able to boot, even if your
RAID is in degraded mode. Having your data on the RAID and a separate
OS disk allows troubleshooting with OS tools (NVIDIA's toolkit in
Windows, MDADM in Linux, or microsoft's disk manager in Windows).  I
would also advise against what is known as 'fake raid' controllers
like your NVIDIA hardware likely is, (or Promise, highpoint, Intel,
etc)  because it can be difficult to recover data if you have a
controller/mobo failure without exact hardware.

For Setting up Linux, I would advise picking up a 64 or 120GB SSD,
(even a 16/32GB would be enough). For your first steps in Linux, I
would go with a flavor of Ubuntu Linux. (XUbuntu is really nice, and
doen't have the bastardized Unity desktop environment). From most
modern Linux distros, you can setup RAID arrays at install time, or
wait until your desktop is up and running and do it from GUI tools

Another idea is to grab a diskless NAS appliance like a Lenovo/Iomega
IX4 300D or a Synology for $200-400 and move your disks over. (You'll
likely have to back up all your data and wipe your disks though). I
like the Lenovo/Iomega product because is uses a custom build of
Debian Linux and linux software RAID, which I could always recover in
my linux Desktop if I had a NAS hardware failure.

Good luck!

On Thu, Apr 10, 2014 at 1:00 AM, peter davidson
<merrymeetpete@hotmail.com> wrote:
> Hi Folks,
>
> My computer suddenly shut down due to a failed memory module - damaging the 1.8TB RAID5 array of three disks.
>
> The computer was able to boot with a degraded array (Windows 7 OS was on the array) but I was unable to get the array to rebuild using the Nvidia toolset - either at BIOS level or in Windows 7. Now the computer will not boot from the array.
>
> I had something very similar to this happen a few weeks ago when the mother board failed - I was able to limp things along to get a backup of all important data.
>
> I am interested to know if LINUX will be able to recover the array for me this time. Having got part way through this process before on the previous failure (which led me to this forum), I am keen to follow this through as an exercise knowing I have a backup of the really important stuff.
>
> I intend to build LINUX onto a new disk and work through this in the coming days - what would be my best choice of distro for this exercise? I am hoping to find something that has all the relevant tools and is relatively simple to get up and running with a friendly GUI to help me navigate round.
>
> I used to work on various databases running on UNIX servers so I hope I can still can find my way round a terminal window.
>
> Thanks in advance for any support anyone can offer me!
>
> Regards,
>
> Peter.
>                                           --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
  2014-04-10 14:36 ` Scott D'Vileskis
@ 2014-04-11  4:15   ` Scott D'Vileskis
  2014-04-11  7:45     ` David Brown
  2014-04-13 16:42   ` Drew
  1 sibling, 1 reply; 12+ messages in thread
From: Scott D'Vileskis @ 2014-04-11  4:15 UTC (permalink / raw)
  To: linux-raid; +Cc: peter davidson

OP Peter and I exchanged a few Emails and I recommended he start with
a flavor of Ubuntu on a spare hard drive, and loop devices to learn
mdadm. He found it helpful and thought it might help someone else, and
despite this mailing list
 """ not really really suitable for "how to get started with Linux"
information. """"  the following is our EMail:

I would advise setting up Xubuntu on your spare drive, and leaving
your RAID disks completely disconnected while you learn mdadm.

On that drive, create a few blank 1GB files, and loop devices:
fallocate -l 1G file1.img
losetup /dev/loop0 file1.img
fallocate -l 1G file2.img
losetup /dev/loop1 file2.img
fallocate -l 1G file3.img
losetup /dev/loop2 file3.img
fallocate -l 1G file4.img
losetup /dev/loop3 file4.img

Then you can create a raid array with these fake hard drives
(/dev/loop0, /dev/loop1, etc...)
mdadm --create -n4 -l5 /dev/md0 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3

Check rebuild status with:
cat /proc/mdstat

Create a file system:
mkfs.ext4 /dev/md0

Mount the filesystem
mkdir /mnt/testraid
mount /dev/md0 /mnt/testraid

Copy some files to it, maybe a movie, episode of SVU, etc, then:
mplayer somemovie.mkv

Then, while watching the movie, fail a disk
mdadm --fail /dev/md0 /dev/loop3
mdadm --remove /dev/md0  /dev/loop3

Check status, delete the loop device, delete the file:
cat /proc/mdstat
losetup -d /dev/loop3
rm file4.img

And I'll leave it to you to figure out how to create a new loop disk,
readd it to the raid, and resync while before your movie completes..

Once you are familiar and want to tackle your real drives.. From the
command line you can usu mdadm commands to attempt to --assemble the
array in degraded mode. When using the mdadm commands I believe there
are some special options for running in readonly mode, and/or not starting
it unless all devices are available. You may even need to use the
--force command if your drives are out of sync but you trust the data
on them.
When you start deleting superblocks and using the --create flag is
when you have to be careful.

-----------------------------------------------------------------------------------------

Hi Scot - excellent email - thanks a million...

Various hiccups getting the hardware ready but those were not
insurmountable. Install went OK and I remember how excited I used to
get about using the UNIX OS - various things are coming back to me.

I considered writing up the various tweaks to your instructions on the
mail server - do you think that would be a valid exercise that someone
else might gain from?

In case your feeling sceptical...

unused devices: <none>
peter@peter-MS-7374:~$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop3[4] loop2[2] loop1[1] loop0[0]
      3142656 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
      [================>....]  recovery = 82.4% (864464/1047552)
finish=0.1min speed=27014K/sec

unused devices: <none>
peter@peter-MS-7374:~$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop3[4] loop2[2] loop1[1] loop0[0]
      3142656 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

The film was an episode of Homelands - it didn't even hiccup!

If you have any other exercise suggestions to help me get up to speed
then I am all ears - off to bed now - somehow it got late!

Thanks for you assistance.

Peter.


---------------------------------------------------------

Peter,
For further exercises (to familiarize yourself with creating,
breaking, and rebuilding a raid), I recommend the following additional
scenarios:

a) With a working raid up and running, unmount the filesystem, stop
the array, then stop one of your loop devices. Try to assemble the
array with the missing disk, start and stop the array a few times, and
also familiarize yourself with the --run and --no-degraded options, as
well as the --examine features for understanding superblocks. Remember
just mounting filesystems may change metadata on the raid disks, so
this and this will impact the data integrity on the raid, even if you
don't manipulate any files.
b) After you have messed around a bit, maybe even changed some data in
degraded mode, stop the array, restart your 'missing' loop device and
attempt to restart the raid array with all the devices. After the
array starts degraded, you'll likely have to --add the disk again for
the rebuild to start.
c) Try to --create an array with your existing loop devices and check
out all the warnings you'll get about existing memberships in raid
arrays. You'll find that, with the exception of the --zero-superblock
command, it is usually pretty difficult to break things. If you
somehow convince mdadm to start or recreate an array with questionable
disks (like with the --assume-clean) option, familiarize yourself with
the various filesystem check tools.

--Scott

That leads me to the following general questions about mdadm and linux raid...
I have certainly RTFM and learned many things in the past dozen years
or so from internet examples, broken arrays, kernel panics on suspend,
bad drive cabling, mistypes using dd, blowing away the first gig of a
partition, growing, shrinking, migrating, etc. Are there formal test
cases and scenarios for mdadm and linux-raid?

Also many of the emails I have seen pass through this mailing list
involve some interesting combinations of raid device superblock
mismatches that beg the question.. How could you have possibly gotten
your raid components into *that* state...

In addition to the typical use cases covered in the manual (creating
an array, growing, shrinking, replacing disks, etc) it might be
interesting to have a list of misuse cases for folks to try and work
out..  (Ooops, I accidentally blew away my superblock, what can I do
without a full rebuild)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
  2014-04-11  4:15   ` Scott D'Vileskis
@ 2014-04-11  7:45     ` David Brown
  0 siblings, 0 replies; 12+ messages in thread
From: David Brown @ 2014-04-11  7:45 UTC (permalink / raw)
  To: Scott D'Vileskis, linux-raid; +Cc: peter davidson


Hi Scott,

I did not mean to imply that people here could not, should not or would
not help someone getting started with Linux - merely that discussions
like that are off-topic on this mailing list, and can quickly get out of
hand ("You recommended Ubuntu?  No, he should be using....").  It's
great that you had the time to help him here.

It's good that you posted your recipe here for loopback device raids for
testing.  I made a similar post a good while back, and have seen a few
others over the years.  But it is good to get it repeated, especially
for newer followers of the list.  Loopback md raid is a fantastic tool
for learning, and for practising risky operations such as resizes,
recovery, etc., and is something all md raid users should try on occasion.

mvh.,

David


On 11/04/14 06:15, Scott D'Vileskis wrote:
> OP Peter and I exchanged a few Emails and I recommended he start with
> a flavor of Ubuntu on a spare hard drive, and loop devices to learn
> mdadm. He found it helpful and thought it might help someone else, and
> despite this mailing list
>  """ not really really suitable for "how to get started with Linux"
> information. """"  the following is our EMail:
> 
> I would advise setting up Xubuntu on your spare drive, and leaving
> your RAID disks completely disconnected while you learn mdadm.
> 
> On that drive, create a few blank 1GB files, and loop devices:
> fallocate -l 1G file1.img
> losetup /dev/loop0 file1.img
> fallocate -l 1G file2.img
> losetup /dev/loop1 file2.img
> fallocate -l 1G file3.img
> losetup /dev/loop2 file3.img
> fallocate -l 1G file4.img
> losetup /dev/loop3 file4.img
> 
> Then you can create a raid array with these fake hard drives
> (/dev/loop0, /dev/loop1, etc...)
> mdadm --create -n4 -l5 /dev/md0 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
> 
> Check rebuild status with:
> cat /proc/mdstat
> 
> Create a file system:
> mkfs.ext4 /dev/md0
> 
> Mount the filesystem
> mkdir /mnt/testraid
> mount /dev/md0 /mnt/testraid
> 
> Copy some files to it, maybe a movie, episode of SVU, etc, then:
> mplayer somemovie.mkv
> 
> Then, while watching the movie, fail a disk
> mdadm --fail /dev/md0 /dev/loop3
> mdadm --remove /dev/md0  /dev/loop3
> 
> Check status, delete the loop device, delete the file:
> cat /proc/mdstat
> losetup -d /dev/loop3
> rm file4.img
> 
> And I'll leave it to you to figure out how to create a new loop disk,
> readd it to the raid, and resync while before your movie completes..
> 
> Once you are familiar and want to tackle your real drives.. From the
> command line you can usu mdadm commands to attempt to --assemble the
> array in degraded mode. When using the mdadm commands I believe there
> are some special options for running in readonly mode, and/or not starting
> it unless all devices are available. You may even need to use the
> --force command if your drives are out of sync but you trust the data
> on them.
> When you start deleting superblocks and using the --create flag is
> when you have to be careful.
> 
> -----------------------------------------------------------------------------------------
> 
> Hi Scot - excellent email - thanks a million...
> 
> Various hiccups getting the hardware ready but those were not
> insurmountable. Install went OK and I remember how excited I used to
> get about using the UNIX OS - various things are coming back to me.
> 
> I considered writing up the various tweaks to your instructions on the
> mail server - do you think that would be a valid exercise that someone
> else might gain from?
> 
> In case your feeling sceptical...
> 
> unused devices: <none>
> peter@peter-MS-7374:~$ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 loop3[4] loop2[2] loop1[1] loop0[0]
>       3142656 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
>       [================>....]  recovery = 82.4% (864464/1047552)
> finish=0.1min speed=27014K/sec
> 
> unused devices: <none>
> peter@peter-MS-7374:~$ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 loop3[4] loop2[2] loop1[1] loop0[0]
>       3142656 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
> 
> The film was an episode of Homelands - it didn't even hiccup!
> 
> If you have any other exercise suggestions to help me get up to speed
> then I am all ears - off to bed now - somehow it got late!
> 
> Thanks for you assistance.
> 
> Peter.
> 
> 
> ---------------------------------------------------------
> 
> Peter,
> For further exercises (to familiarize yourself with creating,
> breaking, and rebuilding a raid), I recommend the following additional
> scenarios:
> 
> a) With a working raid up and running, unmount the filesystem, stop
> the array, then stop one of your loop devices. Try to assemble the
> array with the missing disk, start and stop the array a few times, and
> also familiarize yourself with the --run and --no-degraded options, as
> well as the --examine features for understanding superblocks. Remember
> just mounting filesystems may change metadata on the raid disks, so
> this and this will impact the data integrity on the raid, even if you
> don't manipulate any files.
> b) After you have messed around a bit, maybe even changed some data in
> degraded mode, stop the array, restart your 'missing' loop device and
> attempt to restart the raid array with all the devices. After the
> array starts degraded, you'll likely have to --add the disk again for
> the rebuild to start.
> c) Try to --create an array with your existing loop devices and check
> out all the warnings you'll get about existing memberships in raid
> arrays. You'll find that, with the exception of the --zero-superblock
> command, it is usually pretty difficult to break things. If you
> somehow convince mdadm to start or recreate an array with questionable
> disks (like with the --assume-clean) option, familiarize yourself with
> the various filesystem check tools.
> 
> --Scott
> 
> That leads me to the following general questions about mdadm and linux raid...
> I have certainly RTFM and learned many things in the past dozen years
> or so from internet examples, broken arrays, kernel panics on suspend,
> bad drive cabling, mistypes using dd, blowing away the first gig of a
> partition, growing, shrinking, migrating, etc. Are there formal test
> cases and scenarios for mdadm and linux-raid?
> 
> Also many of the emails I have seen pass through this mailing list
> involve some interesting combinations of raid device superblock
> mismatches that beg the question.. How could you have possibly gotten
> your raid components into *that* state...
> 
> In addition to the typical use cases covered in the manual (creating
> an array, growing, shrinking, replacing disks, etc) it might be
> interesting to have a list of misuse cases for folks to try and work
> out..  (Ooops, I accidentally blew away my superblock, what can I do
> without a full rebuild)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
  2014-04-10 14:36 ` Scott D'Vileskis
  2014-04-11  4:15   ` Scott D'Vileskis
@ 2014-04-13 16:42   ` Drew
  2014-04-14  6:14     ` Stan Hoeppner
  1 sibling, 1 reply; 12+ messages in thread
From: Drew @ 2014-04-13 16:42 UTC (permalink / raw)
  To: linux-raid

On Thu, Apr 10, 2014 at 7:36 AM, Scott D'Vileskis <sdvileskis@gmail.com> wrote:
> <snip>  I
> would also advise against what is known as 'fake raid' controllers
> like your NVIDIA hardware likely is, (or Promise, highpoint, Intel,
> etc)  because it can be difficult to recover data if you have a
> controller/mobo failure without exact hardware.

Agree on the staying away from fake-RAID. One thing I will point out
for reference tho, is that not *all* Intel RAID is fakeraid. The
onboard RAID built into Intel's ICH family certainly is. However Intel
does make a line of RAID controller daughter cards which are rebadged
LSI RAID controllers and are in fact true H/W RAID. Easiest way to
know is to see if the card supports SAS. If it does, chances are it's
a H/W RAID card.


-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
  2014-04-13 16:42   ` Drew
@ 2014-04-14  6:14     ` Stan Hoeppner
  2014-04-14  9:50       ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Stan Hoeppner @ 2014-04-14  6:14 UTC (permalink / raw)
  To: Drew, linux-raid

On 4/13/2014 11:42 AM, Drew wrote:
> On Thu, Apr 10, 2014 at 7:36 AM, Scott D'Vileskis <sdvileskis@gmail.com> wrote:
>> <snip>  I
>> would also advise against what is known as 'fake raid' controllers
>> like your NVIDIA hardware likely is, (or Promise, highpoint, Intel,
>> etc)  because it can be difficult to recover data if you have a
>> controller/mobo failure without exact hardware.
> 
> Agree on the staying away from fake-RAID. One thing I will point out
> for reference tho, is that not *all* Intel RAID is fakeraid. The
> onboard RAID built into Intel's ICH family certainly is. However Intel
> does make a line of RAID controller daughter cards which are rebadged
> LSI RAID controllers and are in fact true H/W RAID. Easiest way to
> know is to see if the card supports SAS. If it does, chances are it's
> a H/W RAID card.

The term "hardware RAID" is no longer appropriate as a means of
classifying or describing the capability or performance of an HBA, and
ceased to be quite a few years ago.

All of the Intel mezzanine cards and PCIe HBAs use LSI SAS ASICs and LSI
RAID firmware. In that sense they are "hardware RAID" controllers as the
RAID software executes on the ASIC, not the host.  However more than
half of them lack DRAM.  Those without DRAM do not and cannot support
[F|B|]BWC.  Without BBWC you lose two features that are really the
defining characteristics of what we used to call a "hardware RAID"
controller.

1.  Early ACK.  Without BBWC the ASIC firmware cannot buffer small
random IOs and it cannot ACK command completion for sync, fsync,
O_DIRECT, etc writes.  Additionally one cannot disable barriers in
filesystems.  BBWC enhances the performance of such workloads
dramatically by reducing latency.

2.  Writeback.  Some of Intel's DRAM-less RAID solutions, just like
their LSI counterparts, support RAID5.  Without on board DRAM these
controllers cannot perform efficient writeback of RMW operations because
there is no read cache.  This would be roughly equivalent to hacking
md/RAID5 to use a stripe_cache_size of 0.  Using RAID5 with one of these
controllers most often yields lower IOPS/throughput than a single disk.

Better classification for the current era:

1.  RAID controller - ASIC firmware, BBWC
2.  HBA w/RAID      - ASIC firmware, cache less
3.  Fake-RAID       - host software


Cheers,

Stan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
  2014-04-14  6:14     ` Stan Hoeppner
@ 2014-04-14  9:50       ` NeilBrown
  2014-04-14 10:55         ` Stan Hoeppner
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2014-04-14  9:50 UTC (permalink / raw)
  To: stan; +Cc: Drew, linux-raid

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

On Mon, 14 Apr 2014 01:14:05 -0500 Stan Hoeppner <stan@hardwarefreak.com>
wrote:

> Better classification for the current era:
> 
> 1.  RAID controller - ASIC firmware, BBWC
> 2.  HBA w/RAID      - ASIC firmware, cache less
> 3.  Fake-RAID       - host software

Can we come up with a better adjective than "fake"?
It makes sense if you say "fake RAID controller", but people don't.  They
safe "fake RAID", which sounds like the RAID is fake, which it isn't.

How about "BIOS RAID" ??  or "host RAID" ??

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
  2014-04-14  9:50       ` NeilBrown
@ 2014-04-14 10:55         ` Stan Hoeppner
       [not found]           ` <CAK_KU4aRbK-sD6h7xqieW_D9FhBYBAy799wZHXq222DAMLjRng@mail.gmail.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Stan Hoeppner @ 2014-04-14 10:55 UTC (permalink / raw)
  To: NeilBrown; +Cc: Drew, linux-raid

On 4/14/2014 4:50 AM, NeilBrown wrote:
> On Mon, 14 Apr 2014 01:14:05 -0500 Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
> 
>> Better classification for the current era:
>>
>> 1.  RAID controller - ASIC firmware, BBWC
>> 2.  HBA w/RAID      - ASIC firmware, cache less
>> 3.  Fake-RAID       - host software

To be clear, above I am differentiating between the various flavors of
"hardware RAID" devices, and part of the classification is based on
where the RAID binary is executed.  I do not address software only RAID
above.

> Can we come up with a better adjective than "fake"?

Many already have, but the terms were not adopted en masse.

> It makes sense if you say "fake RAID controller", but people don't.  They
> safe "fake RAID", which sounds like the RAID is fake, which it isn't.

I'm not attempting to reinvent the wheel above.  "FakeRAID", and various
spellings of it, is the term in common use for a decade+, is widely
recognized and understood.  It is even used in official Linux distro
documentation:

https://help.ubuntu.com/community/FakeRaidHowto
https://wiki.archlinux.org/index.php/Installing_with_Fake_RAID

> How about "BIOS RAID" ??  or "host RAID" ??

I agree that a better descriptive short label would be preferable.
However I don't see either of these working.  "BIOS RAID" will be
confusing to some as many folks

A.  don't understand the difference between BIOS and firmware
B.  have a BIOS config setup utility on their RAID controller or HBA
w/RAID card, and both devices "boot from the card's BIOS"

"Host RAID" has been used extensively over the years in various circles
to describe host software only RAID solutions.  Additionally this
wouldn't be an accurate description because there have been many add-in
IDE/SATA "RAID" cards that split RAID duty between card BIOS/firmware
and host OS driver in this manner.  HighPoint has such current product:

http://www.highpoint-tech.com/USA_new/series_rr272x.htm

They describes this as "Hardware-Assisted RAID", which is a pretty good
description IMO.

Any effort or campaign to supplant "fakeRAID" with another term I think
will be extremely difficult and prone to fail as "fakeRAID" is already
so entrenched in the lexicon, and has been used in official distro
documentation.

Just my 2¢

Cheers,

Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
       [not found]           ` <CAK_KU4aRbK-sD6h7xqieW_D9FhBYBAy799wZHXq222DAMLjRng@mail.gmail.com>
@ 2014-04-15  3:18             ` Stan Hoeppner
  0 siblings, 0 replies; 12+ messages in thread
From: Stan Hoeppner @ 2014-04-15  3:18 UTC (permalink / raw)
  To: Scott D'Vileskis; +Cc: NeilBrown, Linux RAID

On 4/14/2014 8:51 AM, Scott D'Vileskis wrote:
> I'd be curious to see the benchmarks of some of these, specifically
> how properly-tuned software raid in Linux (with ample memory and CPU
> bandwidth) compares against the "hardware" solutions.
>
> I'm already sold on software raid for ease of use and recovery,
> maturity, knowledge base, etc.. But the numbers would be fun. I know I
> can simply google it, but surely one of you two has a great bookmark!

With the emergence of bcache and cousins, their equivalent in RAID card
firmware, and the low cost of SSDs, RAID implementation, software or
hardware, is no longer a real issue WRT performance.  Streaming non RMW
throughput is usually identical between the two, limited by the drives,
not the RAID.  Random IO will be very similar as well, with a slight
edge to RAID cards with DRAM cache in front of the SSD cache.

Thus it is the other qualities and deficiencies of each and the intended
use case that drives the decision.  The needs of the personal or SOHO
server, a UNI department on a tight budget, etc, are often quite
different than those of the enterprise customer with deeper pockets.
For the former cost is usually a significant factor, while for the
latter the cost of the RAID card is inconsequential given the high cost
of the enterprise drives attached to it, where only one or two drives
equal the price of the RAID card.

In an enterprise environment, light path management is a must--an LED is
required for drive failure identification and easy replacement.
Linux/md does not yet provide this functionality (though efforts are
being made) and typically forces users to carefully document which
drives are in which chassis/backplane slots, and maintain those records
every time a drive is swapped.  Most RAID cards have provided failure
LED support for ~20 years.

md has the same management interface on any Linux host.  If one buys
RAID cards from multiple vendors they must learn multiple interfaces.

md is more flexible WRT to mixing different drive types within an array.
 Hardware RAID controllers are typically more finicky here, requiring
drives with a low and uniform ERC/TLER value, and of matching firmware
revs across the drives in an array.

And of course md can do the one thing hardware RAID cards cannot:
stitch arrays on multiple RAID cards together into a single disk device
using a nested stripe or a linear array depending on the workload.  This
gives you a bit of the best features of both technologies, and allows
you to scale to a level not easily achievable using only one of either
technology.

Cheers,

Stan





> Thanks!
> --Scott
> 
> On Mon, Apr 14, 2014 at 6:55 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 4/14/2014 4:50 AM, NeilBrown wrote:
>>> On Mon, 14 Apr 2014 01:14:05 -0500 Stan Hoeppner <stan@hardwarefreak.com>
>>> wrote:
>>>
>>>> Better classification for the current era:
>>>>
>>>> 1.  RAID controller - ASIC firmware, BBWC
>>>> 2.  HBA w/RAID      - ASIC firmware, cache less
>>>> 3.  Fake-RAID       - host software
>>
>> To be clear, above I am differentiating between the various flavors of
>> "hardware RAID" devices, and part of the classification is based on
>> where the RAID binary is executed.  I do not address software only RAID
>> above.
>>
>>> Can we come up with a better adjective than "fake"?
>>
>> Many already have, but the terms were not adopted en masse.
>>
>>> It makes sense if you say "fake RAID controller", but people don't.  They
>>> safe "fake RAID", which sounds like the RAID is fake, which it isn't.
>>
>> I'm not attempting to reinvent the wheel above.  "FakeRAID", and various
>> spellings of it, is the term in common use for a decade+, is widely
>> recognized and understood.  It is even used in official Linux distro
>> documentation:
>>
>> https://help.ubuntu.com/community/FakeRaidHowto
>> https://wiki.archlinux.org/index.php/Installing_with_Fake_RAID
>>
>>> How about "BIOS RAID" ??  or "host RAID" ??
>>
>> I agree that a better descriptive short label would be preferable.
>> However I don't see either of these working.  "BIOS RAID" will be
>> confusing to some as many folks
>>
>> A.  don't understand the difference between BIOS and firmware
>> B.  have a BIOS config setup utility on their RAID controller or HBA
>> w/RAID card, and both devices "boot from the card's BIOS"
>>
>> "Host RAID" has been used extensively over the years in various circles
>> to describe host software only RAID solutions.  Additionally this
>> wouldn't be an accurate description because there have been many add-in
>> IDE/SATA "RAID" cards that split RAID duty between card BIOS/firmware
>> and host OS driver in this manner.  HighPoint has such current product:
>>
>> http://www.highpoint-tech.com/USA_new/series_rr272x.htm
>>
>> They describes this as "Hardware-Assisted RAID", which is a pretty good
>> description IMO.
>>
>> Any effort or campaign to supplant "fakeRAID" with another term I think
>> will be extremely difficult and prone to fail as "fakeRAID" is already
>> so entrenched in the lexicon, and has been used in official distro
>> documentation.
>>
>> Just my 2¢
>>
>> Cheers,
>>
>> Stan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Nvidia Raid5 Failure
@ 2014-04-13 10:59 peter davidson
  0 siblings, 0 replies; 12+ messages in thread
From: peter davidson @ 2014-04-13 10:59 UTC (permalink / raw)
  To: linux-raid

Hi David,

For some reason this never made it through to my email inbox - I hop I haven't messed up the format as I have pulled the text in from a browser. Thanks (belatedly) for your response. I put some comments below.
 
>> Hi Folks,
>>
>> My computer suddenly shut down due to a failed memory module -
>> damaging the 1.8TB RAID5 array of three disks.
>>
>> The computer was able to boot with a degraded array (Windows 7 OS was
>> on the array) but I was unable to get the array to rebuild using the
>> Nvidia toolset - either at BIOS level or in Windows 7. Now the
>> computer will not boot from the array.
>>
>> I had something very similar to this happen a few weeks ago when the
>> mother board failed - I was able to limp things along to get a backup
>> of all important data.
>>
>> I am interested to know if LINUX will be able to recover the array
>> for me this time. Having got part way through this process before on
>> the previous failure (which led me to this forum), I am keen to
>> follow this through as an exercise knowing I have a backup of the
>> really important stuff.
>>
>> I intend to build LINUX onto a new disk and work through this in the
>> coming days - what would be my best choice of distro for this
>> exercise? I am hoping to find something that has all the relevant
>> tools and is relatively simple to get up and running with a friendly
>> GUI to help me navigate round.
>>
>> I used to work on various databases running on UNIX servers so I hope
>> I can still can find my way round a terminal window.
>>
>> Thanks in advance for any support anyone can offer me!
>>
>> Regards,
>>

>As a general point, don't do /anything/ to write to the disks or attempt
>to recover or rebuild them until you have copied off /all/ important
>data to safe backups.  If you have booted Windows from the array, then
>step one is to shut it down and do not even consider booting it until
>backups are all in place and double-checked.

On this note I was able to get the all useful data out to a couple of old disks. I have decided to hold off trying to reconstruct the array now until I get a second copy of the useful stuff on a new disk that I have ordered. Once everything is back together this new disk will be for my backup images to land on - strangely this whole thing started on the weekend I decided to get a proper backup of my data - what a coincidence! 

>
>You want to use a live CD for recovery here - it will let you play
>around with the disks without risking causing more damage.  My usual
>choice of Linux live CD for recovery purposes is System Rescue CD - I
>can't tell you if it is the best choice here, and I haven't needed to
>recover raid arrays using it.  But I find it very useful for testing and
>configuration, and have used it to recover data or fix up broken Windows
>systems.

I tried the Fedora live CD at first - this has dsadm but there was something missing (dm-raid45 module) that meant I could not use the command with my particular flavour of NVIDIA fake raid, the CD also didn't have mdraid. I then went on to download the Ubuntu live disk which had a crack at starting the array but didn't manage it - I really didn't like that it tried to load up the array without my being able to do anything about it - it also marked the array degraded and finally failed on a second start up that caused a kernel panic. I thought the idea of a Live CD was that it should be passive in terms of fiddling round with your existing disks and data at start up.

In my original question I was searching for a suitable LINUX install or live CD that would have everything on it I might need to get to work on the array. I went with the XUbuntu full installation on a seperate disk. It also didn't have mdadm but the OS was polite enough to tell me the exact command to issue to get it installed. This route therefore depends on an internet connection which proved tricky as the wireless adapter does not seam usable in any 64 bit OS. I got there in the end - good old fashioned network cables are very reliable!

>
>Another option you should consider is a Windows live CD.  You can't
>legally download and burn one, AFAIK, but there are plenty available if
>you are happy to look about.  There are also several Windows live CD
>generators that will make a bootable Windows CD from another windows
>machine, and can include utility programs.  They are particularly
>popular for malware recovery, but I expect you can put your Nvidia raid
>software on them.

None of the windows OSs can do anything to put the array together without the Nvidia support packages. Unfortunately now the Nvidia software won't rebuild the degraded array. The Nvidia tools provide no feedback on what is going on - you either succeed or fail in what you are trying to do - no tweaking allowed. So thats why LINUX looks a good option for this exercise.

>As for how well you can access the data and/or recover and/or rebuild
>your array from Linux, it all depends on the support for your Nvidia
>raid.  Someone here might have experience and can give you information,
>but your best starting point would be Nvidia's website.

Nvidia document their Windows GUI tools and give a few words on their BIOS utility - neither of these tools are able to put things straight from where things lie at the moment.

> There are Linux
>drivers and utilities for most proper hardware raid systems, but if this
>is a Nvidia-specific fake raid, it might not be supported.  Fake raid is
>not very popular in the Linux world - it combines the disadvantages of
>software raid with the disadvantages of hardware raid, and the benefits
>of neither.  It's only real advantage is if you want to use Windows with
>raid and don't want to pay for proper hardware raid.  Intel's "matrix"
>is far and away the most popular fake raid, and has good support in
>Linux, but I cannot say about Nvidia's raid.

I have read elsewhere that mdadm>=3 will cope with NVRAID - the fact that the Ubuntu Live CD had a good go at starting the array with madadm makes me think it is worth pursuing LINUX.

>
>
>If you want to set up a new system with Linux raid, then you will be
>able to get pointers and help in this list - but it's not really
>suitable for "how to get started with Linux" information.

I saw the further mails with Scot on this - I thought it would be on topic to ask for a Linux distro that would have everything I needed in place to get working with the array - as you can see from above I have been fumbling round with a few versions of LINUX and am dismayed at the sheer number available. Apologies if I led the forum away from its intended purpose.

> And if you
>want to mix Windows and Linux on the same system, be aware that Windows
>can't work with Linux software raid, and can't understand Linux
>filesystems (at least, not easily).  It is often much easier to keep
>them on separate machines and sharing files over the network.
>Alternatively, consider using VirtualBox to let you run one system as a
>virtual machine inside the other.

I like the idea of a NAS (or even an old machine on the network) running LINUX with the RAID on it. I have seen enough of nvraid fake raiding to say I want to get off there asap - or at the very least have a full backup and recovery procedure with redundancy in place to continue with it any further. 

>
>mvh.,
>
>David
>--
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@xxxxxxxxxxxxxxx
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
 		 	   		  --
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Nvidia Raid5 Failure
@ 2014-04-10 11:13 peter davidson
  0 siblings, 0 replies; 12+ messages in thread
From: peter davidson @ 2014-04-10 11:13 UTC (permalink / raw)
  To: linux-raid

Hi Folks,

My computer suddenly shut down due to a failed memory module - damaging the 1.8TB RAID5 array of three disks.

The computer was able to boot with a degraded array (Windows 7 OS was on the array) but I was unable to get the array to rebuild using the Nvidia toolset - either at BIOS level or in Windows 7. Now the computer will not boot from the array.

I had something very similar to this happen a few weeks ago when the mother board failed - I was able to limp things along to get a backup of all important data.

I am interested to know if LINUX will be able to recover the array for me this time. Having got part way through this process before on the previous failure (which led me to this forum), I am keen to follow this through as an exercise knowing I have a backup of the really important stuff. 

I intend to build LINUX onto a new disk and work through this in the coming days - what would be my best choice of distro for this exercise? I am hoping to find something that has all the relevant tools and is relatively simple to get up and running hopefully with a friendly GUI to help me navigate round as well.

I used to work on various databases running on UNIX servers - so I hope I can still can find my way round a terminal window.

Thanks in advance for any support anyone can offer me!

Regards,

Peter. 		 	   		  

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-04-15  3:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10  5:00 Nvidia Raid5 Failure peter davidson
2014-04-10  8:46 ` David Brown
2014-04-10 14:36 ` Scott D'Vileskis
2014-04-11  4:15   ` Scott D'Vileskis
2014-04-11  7:45     ` David Brown
2014-04-13 16:42   ` Drew
2014-04-14  6:14     ` Stan Hoeppner
2014-04-14  9:50       ` NeilBrown
2014-04-14 10:55         ` Stan Hoeppner
     [not found]           ` <CAK_KU4aRbK-sD6h7xqieW_D9FhBYBAy799wZHXq222DAMLjRng@mail.gmail.com>
2014-04-15  3:18             ` Stan Hoeppner
2014-04-10 11:13 peter davidson
2014-04-13 10:59 peter davidson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.