* Re: Nvidia Raid5 Failure @ 2014-04-13 10:59 peter davidson 0 siblings, 0 replies; 12+ messages in thread From: peter davidson @ 2014-04-13 10:59 UTC (permalink / raw) To: linux-raid Hi David, For some reason this never made it through to my email inbox - I hop I haven't messed up the format as I have pulled the text in from a browser. Thanks (belatedly) for your response. I put some comments below. >> Hi Folks, >> >> My computer suddenly shut down due to a failed memory module - >> damaging the 1.8TB RAID5 array of three disks. >> >> The computer was able to boot with a degraded array (Windows 7 OS was >> on the array) but I was unable to get the array to rebuild using the >> Nvidia toolset - either at BIOS level or in Windows 7. Now the >> computer will not boot from the array. >> >> I had something very similar to this happen a few weeks ago when the >> mother board failed - I was able to limp things along to get a backup >> of all important data. >> >> I am interested to know if LINUX will be able to recover the array >> for me this time. Having got part way through this process before on >> the previous failure (which led me to this forum), I am keen to >> follow this through as an exercise knowing I have a backup of the >> really important stuff. >> >> I intend to build LINUX onto a new disk and work through this in the >> coming days - what would be my best choice of distro for this >> exercise? I am hoping to find something that has all the relevant >> tools and is relatively simple to get up and running with a friendly >> GUI to help me navigate round. >> >> I used to work on various databases running on UNIX servers so I hope >> I can still can find my way round a terminal window. >> >> Thanks in advance for any support anyone can offer me! >> >> Regards, >> >As a general point, don't do /anything/ to write to the disks or attempt >to recover or rebuild them until you have copied off /all/ important >data to safe backups. If you have booted Windows from the array, then >step one is to shut it down and do not even consider booting it until >backups are all in place and double-checked. On this note I was able to get the all useful data out to a couple of old disks. I have decided to hold off trying to reconstruct the array now until I get a second copy of the useful stuff on a new disk that I have ordered. Once everything is back together this new disk will be for my backup images to land on - strangely this whole thing started on the weekend I decided to get a proper backup of my data - what a coincidence! > >You want to use a live CD for recovery here - it will let you play >around with the disks without risking causing more damage. My usual >choice of Linux live CD for recovery purposes is System Rescue CD - I >can't tell you if it is the best choice here, and I haven't needed to >recover raid arrays using it. But I find it very useful for testing and >configuration, and have used it to recover data or fix up broken Windows >systems. I tried the Fedora live CD at first - this has dsadm but there was something missing (dm-raid45 module) that meant I could not use the command with my particular flavour of NVIDIA fake raid, the CD also didn't have mdraid. I then went on to download the Ubuntu live disk which had a crack at starting the array but didn't manage it - I really didn't like that it tried to load up the array without my being able to do anything about it - it also marked the array degraded and finally failed on a second start up that caused a kernel panic. I thought the idea of a Live CD was that it should be passive in terms of fiddling round with your existing disks and data at start up. In my original question I was searching for a suitable LINUX install or live CD that would have everything on it I might need to get to work on the array. I went with the XUbuntu full installation on a seperate disk. It also didn't have mdadm but the OS was polite enough to tell me the exact command to issue to get it installed. This route therefore depends on an internet connection which proved tricky as the wireless adapter does not seam usable in any 64 bit OS. I got there in the end - good old fashioned network cables are very reliable! > >Another option you should consider is a Windows live CD. You can't >legally download and burn one, AFAIK, but there are plenty available if >you are happy to look about. There are also several Windows live CD >generators that will make a bootable Windows CD from another windows >machine, and can include utility programs. They are particularly >popular for malware recovery, but I expect you can put your Nvidia raid >software on them. None of the windows OSs can do anything to put the array together without the Nvidia support packages. Unfortunately now the Nvidia software won't rebuild the degraded array. The Nvidia tools provide no feedback on what is going on - you either succeed or fail in what you are trying to do - no tweaking allowed. So thats why LINUX looks a good option for this exercise. >As for how well you can access the data and/or recover and/or rebuild >your array from Linux, it all depends on the support for your Nvidia >raid. Someone here might have experience and can give you information, >but your best starting point would be Nvidia's website. Nvidia document their Windows GUI tools and give a few words on their BIOS utility - neither of these tools are able to put things straight from where things lie at the moment. > There are Linux >drivers and utilities for most proper hardware raid systems, but if this >is a Nvidia-specific fake raid, it might not be supported. Fake raid is >not very popular in the Linux world - it combines the disadvantages of >software raid with the disadvantages of hardware raid, and the benefits >of neither. It's only real advantage is if you want to use Windows with >raid and don't want to pay for proper hardware raid. Intel's "matrix" >is far and away the most popular fake raid, and has good support in >Linux, but I cannot say about Nvidia's raid. I have read elsewhere that mdadm>=3 will cope with NVRAID - the fact that the Ubuntu Live CD had a good go at starting the array with madadm makes me think it is worth pursuing LINUX. > > >If you want to set up a new system with Linux raid, then you will be >able to get pointers and help in this list - but it's not really >suitable for "how to get started with Linux" information. I saw the further mails with Scot on this - I thought it would be on topic to ask for a Linux distro that would have everything I needed in place to get working with the array - as you can see from above I have been fumbling round with a few versions of LINUX and am dismayed at the sheer number available. Apologies if I led the forum away from its intended purpose. > And if you >want to mix Windows and Linux on the same system, be aware that Windows >can't work with Linux software raid, and can't understand Linux >filesystems (at least, not easily). It is often much easier to keep >them on separate machines and sharing files over the network. >Alternatively, consider using VirtualBox to let you run one system as a >virtual machine inside the other. I like the idea of a NAS (or even an old machine on the network) running LINUX with the RAID on it. I have seen enough of nvraid fake raiding to say I want to get off there asap - or at the very least have a full backup and recovery procedure with redundancy in place to continue with it any further. > >mvh., > >David >-- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@xxxxxxxxxxxxxxx >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Nvidia Raid5 Failure @ 2014-04-10 11:13 peter davidson 0 siblings, 0 replies; 12+ messages in thread From: peter davidson @ 2014-04-10 11:13 UTC (permalink / raw) To: linux-raid Hi Folks, My computer suddenly shut down due to a failed memory module - damaging the 1.8TB RAID5 array of three disks. The computer was able to boot with a degraded array (Windows 7 OS was on the array) but I was unable to get the array to rebuild using the Nvidia toolset - either at BIOS level or in Windows 7. Now the computer will not boot from the array. I had something very similar to this happen a few weeks ago when the mother board failed - I was able to limp things along to get a backup of all important data. I am interested to know if LINUX will be able to recover the array for me this time. Having got part way through this process before on the previous failure (which led me to this forum), I am keen to follow this through as an exercise knowing I have a backup of the really important stuff. I intend to build LINUX onto a new disk and work through this in the coming days - what would be my best choice of distro for this exercise? I am hoping to find something that has all the relevant tools and is relatively simple to get up and running hopefully with a friendly GUI to help me navigate round as well. I used to work on various databases running on UNIX servers - so I hope I can still can find my way round a terminal window. Thanks in advance for any support anyone can offer me! Regards, Peter. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Nvidia Raid5 Failure @ 2014-04-10 5:00 peter davidson 2014-04-10 8:46 ` David Brown 2014-04-10 14:36 ` Scott D'Vileskis 0 siblings, 2 replies; 12+ messages in thread From: peter davidson @ 2014-04-10 5:00 UTC (permalink / raw) To: linux-raid Hi Folks, My computer suddenly shut down due to a failed memory module - damaging the 1.8TB RAID5 array of three disks. The computer was able to boot with a degraded array (Windows 7 OS was on the array) but I was unable to get the array to rebuild using the Nvidia toolset - either at BIOS level or in Windows 7. Now the computer will not boot from the array. I had something very similar to this happen a few weeks ago when the mother board failed - I was able to limp things along to get a backup of all important data. I am interested to know if LINUX will be able to recover the array for me this time. Having got part way through this process before on the previous failure (which led me to this forum), I am keen to follow this through as an exercise knowing I have a backup of the really important stuff. I intend to build LINUX onto a new disk and work through this in the coming days - what would be my best choice of distro for this exercise? I am hoping to find something that has all the relevant tools and is relatively simple to get up and running with a friendly GUI to help me navigate round. I used to work on various databases running on UNIX servers so I hope I can still can find my way round a terminal window. Thanks in advance for any support anyone can offer me! Regards, Peter. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Nvidia Raid5 Failure 2014-04-10 5:00 peter davidson @ 2014-04-10 8:46 ` David Brown 2014-04-10 14:36 ` Scott D'Vileskis 1 sibling, 0 replies; 12+ messages in thread From: David Brown @ 2014-04-10 8:46 UTC (permalink / raw) To: peter davidson, linux-raid On 10/04/14 07:00, peter davidson wrote: > Hi Folks, > > My computer suddenly shut down due to a failed memory module - > damaging the 1.8TB RAID5 array of three disks. > > The computer was able to boot with a degraded array (Windows 7 OS was > on the array) but I was unable to get the array to rebuild using the > Nvidia toolset - either at BIOS level or in Windows 7. Now the > computer will not boot from the array. > > I had something very similar to this happen a few weeks ago when the > mother board failed - I was able to limp things along to get a backup > of all important data. > > I am interested to know if LINUX will be able to recover the array > for me this time. Having got part way through this process before on > the previous failure (which led me to this forum), I am keen to > follow this through as an exercise knowing I have a backup of the > really important stuff. > > I intend to build LINUX onto a new disk and work through this in the > coming days - what would be my best choice of distro for this > exercise? I am hoping to find something that has all the relevant > tools and is relatively simple to get up and running with a friendly > GUI to help me navigate round. > > I used to work on various databases running on UNIX servers so I hope > I can still can find my way round a terminal window. > > Thanks in advance for any support anyone can offer me! > > Regards, > As a general point, don't do /anything/ to write to the disks or attempt to recover or rebuild them until you have copied off /all/ important data to safe backups. If you have booted Windows from the array, then step one is to shut it down and do not even consider booting it until backups are all in place and double-checked. You want to use a live CD for recovery here - it will let you play around with the disks without risking causing more damage. My usual choice of Linux live CD for recovery purposes is System Rescue CD - I can't tell you if it is the best choice here, and I haven't needed to recover raid arrays using it. But I find it very useful for testing and configuration, and have used it to recover data or fix up broken Windows systems. Another option you should consider is a Windows live CD. You can't legally download and burn one, AFAIK, but there are plenty available if you are happy to look about. There are also several Windows live CD generators that will make a bootable Windows CD from another windows machine, and can include utility programs. They are particularly popular for malware recovery, but I expect you can put your Nvidia raid software on them. As for how well you can access the data and/or recover and/or rebuild your array from Linux, it all depends on the support for your Nvidia raid. Someone here might have experience and can give you information, but your best starting point would be Nvidia's website. There are Linux drivers and utilities for most proper hardware raid systems, but if this is a Nvidia-specific fake raid, it might not be supported. Fake raid is not very popular in the Linux world - it combines the disadvantages of software raid with the disadvantages of hardware raid, and the benefits of neither. It's only real advantage is if you want to use Windows with raid and don't want to pay for proper hardware raid. Intel's "matrix" is far and away the most popular fake raid, and has good support in Linux, but I cannot say about Nvidia's raid. If you want to set up a new system with Linux raid, then you will be able to get pointers and help in this list - but it's not really suitable for "how to get started with Linux" information. And if you want to mix Windows and Linux on the same system, be aware that Windows can't work with Linux software raid, and can't understand Linux filesystems (at least, not easily). It is often much easier to keep them on separate machines and sharing files over the network. Alternatively, consider using VirtualBox to let you run one system as a virtual machine inside the other. mvh., David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Nvidia Raid5 Failure 2014-04-10 5:00 peter davidson 2014-04-10 8:46 ` David Brown @ 2014-04-10 14:36 ` Scott D'Vileskis 2014-04-11 4:15 ` Scott D'Vileskis 2014-04-13 16:42 ` Drew 1 sibling, 2 replies; 12+ messages in thread From: Scott D'Vileskis @ 2014-04-10 14:36 UTC (permalink / raw) To: peter davidson; +Cc: linux-raid I would advise keeping operating systems off RAID arrays in general, Windows or Linux, because most bootloaders are loaded to a single disk. If *that disk fails, you may not be able to boot, even if your RAID is in degraded mode. Having your data on the RAID and a separate OS disk allows troubleshooting with OS tools (NVIDIA's toolkit in Windows, MDADM in Linux, or microsoft's disk manager in Windows). I would also advise against what is known as 'fake raid' controllers like your NVIDIA hardware likely is, (or Promise, highpoint, Intel, etc) because it can be difficult to recover data if you have a controller/mobo failure without exact hardware. For Setting up Linux, I would advise picking up a 64 or 120GB SSD, (even a 16/32GB would be enough). For your first steps in Linux, I would go with a flavor of Ubuntu Linux. (XUbuntu is really nice, and doen't have the bastardized Unity desktop environment). From most modern Linux distros, you can setup RAID arrays at install time, or wait until your desktop is up and running and do it from GUI tools Another idea is to grab a diskless NAS appliance like a Lenovo/Iomega IX4 300D or a Synology for $200-400 and move your disks over. (You'll likely have to back up all your data and wipe your disks though). I like the Lenovo/Iomega product because is uses a custom build of Debian Linux and linux software RAID, which I could always recover in my linux Desktop if I had a NAS hardware failure. Good luck! On Thu, Apr 10, 2014 at 1:00 AM, peter davidson <merrymeetpete@hotmail.com> wrote: > Hi Folks, > > My computer suddenly shut down due to a failed memory module - damaging the 1.8TB RAID5 array of three disks. > > The computer was able to boot with a degraded array (Windows 7 OS was on the array) but I was unable to get the array to rebuild using the Nvidia toolset - either at BIOS level or in Windows 7. Now the computer will not boot from the array. > > I had something very similar to this happen a few weeks ago when the mother board failed - I was able to limp things along to get a backup of all important data. > > I am interested to know if LINUX will be able to recover the array for me this time. Having got part way through this process before on the previous failure (which led me to this forum), I am keen to follow this through as an exercise knowing I have a backup of the really important stuff. > > I intend to build LINUX onto a new disk and work through this in the coming days - what would be my best choice of distro for this exercise? I am hoping to find something that has all the relevant tools and is relatively simple to get up and running with a friendly GUI to help me navigate round. > > I used to work on various databases running on UNIX servers so I hope I can still can find my way round a terminal window. > > Thanks in advance for any support anyone can offer me! > > Regards, > > Peter. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Nvidia Raid5 Failure 2014-04-10 14:36 ` Scott D'Vileskis @ 2014-04-11 4:15 ` Scott D'Vileskis 2014-04-11 7:45 ` David Brown 2014-04-13 16:42 ` Drew 1 sibling, 1 reply; 12+ messages in thread From: Scott D'Vileskis @ 2014-04-11 4:15 UTC (permalink / raw) To: linux-raid; +Cc: peter davidson OP Peter and I exchanged a few Emails and I recommended he start with a flavor of Ubuntu on a spare hard drive, and loop devices to learn mdadm. He found it helpful and thought it might help someone else, and despite this mailing list """ not really really suitable for "how to get started with Linux" information. """" the following is our EMail: I would advise setting up Xubuntu on your spare drive, and leaving your RAID disks completely disconnected while you learn mdadm. On that drive, create a few blank 1GB files, and loop devices: fallocate -l 1G file1.img losetup /dev/loop0 file1.img fallocate -l 1G file2.img losetup /dev/loop1 file2.img fallocate -l 1G file3.img losetup /dev/loop2 file3.img fallocate -l 1G file4.img losetup /dev/loop3 file4.img Then you can create a raid array with these fake hard drives (/dev/loop0, /dev/loop1, etc...) mdadm --create -n4 -l5 /dev/md0 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 Check rebuild status with: cat /proc/mdstat Create a file system: mkfs.ext4 /dev/md0 Mount the filesystem mkdir /mnt/testraid mount /dev/md0 /mnt/testraid Copy some files to it, maybe a movie, episode of SVU, etc, then: mplayer somemovie.mkv Then, while watching the movie, fail a disk mdadm --fail /dev/md0 /dev/loop3 mdadm --remove /dev/md0 /dev/loop3 Check status, delete the loop device, delete the file: cat /proc/mdstat losetup -d /dev/loop3 rm file4.img And I'll leave it to you to figure out how to create a new loop disk, readd it to the raid, and resync while before your movie completes.. Once you are familiar and want to tackle your real drives.. From the command line you can usu mdadm commands to attempt to --assemble the array in degraded mode. When using the mdadm commands I believe there are some special options for running in readonly mode, and/or not starting it unless all devices are available. You may even need to use the --force command if your drives are out of sync but you trust the data on them. When you start deleting superblocks and using the --create flag is when you have to be careful. ----------------------------------------------------------------------------------------- Hi Scot - excellent email - thanks a million... Various hiccups getting the hardware ready but those were not insurmountable. Install went OK and I remember how excited I used to get about using the UNIX OS - various things are coming back to me. I considered writing up the various tweaks to your instructions on the mail server - do you think that would be a valid exercise that someone else might gain from? In case your feeling sceptical... unused devices: <none> peter@peter-MS-7374:~$ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 loop3[4] loop2[2] loop1[1] loop0[0] 3142656 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_] [================>....] recovery = 82.4% (864464/1047552) finish=0.1min speed=27014K/sec unused devices: <none> peter@peter-MS-7374:~$ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 loop3[4] loop2[2] loop1[1] loop0[0] 3142656 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] The film was an episode of Homelands - it didn't even hiccup! If you have any other exercise suggestions to help me get up to speed then I am all ears - off to bed now - somehow it got late! Thanks for you assistance. Peter. --------------------------------------------------------- Peter, For further exercises (to familiarize yourself with creating, breaking, and rebuilding a raid), I recommend the following additional scenarios: a) With a working raid up and running, unmount the filesystem, stop the array, then stop one of your loop devices. Try to assemble the array with the missing disk, start and stop the array a few times, and also familiarize yourself with the --run and --no-degraded options, as well as the --examine features for understanding superblocks. Remember just mounting filesystems may change metadata on the raid disks, so this and this will impact the data integrity on the raid, even if you don't manipulate any files. b) After you have messed around a bit, maybe even changed some data in degraded mode, stop the array, restart your 'missing' loop device and attempt to restart the raid array with all the devices. After the array starts degraded, you'll likely have to --add the disk again for the rebuild to start. c) Try to --create an array with your existing loop devices and check out all the warnings you'll get about existing memberships in raid arrays. You'll find that, with the exception of the --zero-superblock command, it is usually pretty difficult to break things. If you somehow convince mdadm to start or recreate an array with questionable disks (like with the --assume-clean) option, familiarize yourself with the various filesystem check tools. --Scott That leads me to the following general questions about mdadm and linux raid... I have certainly RTFM and learned many things in the past dozen years or so from internet examples, broken arrays, kernel panics on suspend, bad drive cabling, mistypes using dd, blowing away the first gig of a partition, growing, shrinking, migrating, etc. Are there formal test cases and scenarios for mdadm and linux-raid? Also many of the emails I have seen pass through this mailing list involve some interesting combinations of raid device superblock mismatches that beg the question.. How could you have possibly gotten your raid components into *that* state... In addition to the typical use cases covered in the manual (creating an array, growing, shrinking, replacing disks, etc) it might be interesting to have a list of misuse cases for folks to try and work out.. (Ooops, I accidentally blew away my superblock, what can I do without a full rebuild) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Nvidia Raid5 Failure 2014-04-11 4:15 ` Scott D'Vileskis @ 2014-04-11 7:45 ` David Brown 0 siblings, 0 replies; 12+ messages in thread From: David Brown @ 2014-04-11 7:45 UTC (permalink / raw) To: Scott D'Vileskis, linux-raid; +Cc: peter davidson Hi Scott, I did not mean to imply that people here could not, should not or would not help someone getting started with Linux - merely that discussions like that are off-topic on this mailing list, and can quickly get out of hand ("You recommended Ubuntu? No, he should be using...."). It's great that you had the time to help him here. It's good that you posted your recipe here for loopback device raids for testing. I made a similar post a good while back, and have seen a few others over the years. But it is good to get it repeated, especially for newer followers of the list. Loopback md raid is a fantastic tool for learning, and for practising risky operations such as resizes, recovery, etc., and is something all md raid users should try on occasion. mvh., David On 11/04/14 06:15, Scott D'Vileskis wrote: > OP Peter and I exchanged a few Emails and I recommended he start with > a flavor of Ubuntu on a spare hard drive, and loop devices to learn > mdadm. He found it helpful and thought it might help someone else, and > despite this mailing list > """ not really really suitable for "how to get started with Linux" > information. """" the following is our EMail: > > I would advise setting up Xubuntu on your spare drive, and leaving > your RAID disks completely disconnected while you learn mdadm. > > On that drive, create a few blank 1GB files, and loop devices: > fallocate -l 1G file1.img > losetup /dev/loop0 file1.img > fallocate -l 1G file2.img > losetup /dev/loop1 file2.img > fallocate -l 1G file3.img > losetup /dev/loop2 file3.img > fallocate -l 1G file4.img > losetup /dev/loop3 file4.img > > Then you can create a raid array with these fake hard drives > (/dev/loop0, /dev/loop1, etc...) > mdadm --create -n4 -l5 /dev/md0 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 > > Check rebuild status with: > cat /proc/mdstat > > Create a file system: > mkfs.ext4 /dev/md0 > > Mount the filesystem > mkdir /mnt/testraid > mount /dev/md0 /mnt/testraid > > Copy some files to it, maybe a movie, episode of SVU, etc, then: > mplayer somemovie.mkv > > Then, while watching the movie, fail a disk > mdadm --fail /dev/md0 /dev/loop3 > mdadm --remove /dev/md0 /dev/loop3 > > Check status, delete the loop device, delete the file: > cat /proc/mdstat > losetup -d /dev/loop3 > rm file4.img > > And I'll leave it to you to figure out how to create a new loop disk, > readd it to the raid, and resync while before your movie completes.. > > Once you are familiar and want to tackle your real drives.. From the > command line you can usu mdadm commands to attempt to --assemble the > array in degraded mode. When using the mdadm commands I believe there > are some special options for running in readonly mode, and/or not starting > it unless all devices are available. You may even need to use the > --force command if your drives are out of sync but you trust the data > on them. > When you start deleting superblocks and using the --create flag is > when you have to be careful. > > ----------------------------------------------------------------------------------------- > > Hi Scot - excellent email - thanks a million... > > Various hiccups getting the hardware ready but those were not > insurmountable. Install went OK and I remember how excited I used to > get about using the UNIX OS - various things are coming back to me. > > I considered writing up the various tweaks to your instructions on the > mail server - do you think that would be a valid exercise that someone > else might gain from? > > In case your feeling sceptical... > > unused devices: <none> > peter@peter-MS-7374:~$ cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 loop3[4] loop2[2] loop1[1] loop0[0] > 3142656 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_] > [================>....] recovery = 82.4% (864464/1047552) > finish=0.1min speed=27014K/sec > > unused devices: <none> > peter@peter-MS-7374:~$ cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 loop3[4] loop2[2] loop1[1] loop0[0] > 3142656 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] > > The film was an episode of Homelands - it didn't even hiccup! > > If you have any other exercise suggestions to help me get up to speed > then I am all ears - off to bed now - somehow it got late! > > Thanks for you assistance. > > Peter. > > > --------------------------------------------------------- > > Peter, > For further exercises (to familiarize yourself with creating, > breaking, and rebuilding a raid), I recommend the following additional > scenarios: > > a) With a working raid up and running, unmount the filesystem, stop > the array, then stop one of your loop devices. Try to assemble the > array with the missing disk, start and stop the array a few times, and > also familiarize yourself with the --run and --no-degraded options, as > well as the --examine features for understanding superblocks. Remember > just mounting filesystems may change metadata on the raid disks, so > this and this will impact the data integrity on the raid, even if you > don't manipulate any files. > b) After you have messed around a bit, maybe even changed some data in > degraded mode, stop the array, restart your 'missing' loop device and > attempt to restart the raid array with all the devices. After the > array starts degraded, you'll likely have to --add the disk again for > the rebuild to start. > c) Try to --create an array with your existing loop devices and check > out all the warnings you'll get about existing memberships in raid > arrays. You'll find that, with the exception of the --zero-superblock > command, it is usually pretty difficult to break things. If you > somehow convince mdadm to start or recreate an array with questionable > disks (like with the --assume-clean) option, familiarize yourself with > the various filesystem check tools. > > --Scott > > That leads me to the following general questions about mdadm and linux raid... > I have certainly RTFM and learned many things in the past dozen years > or so from internet examples, broken arrays, kernel panics on suspend, > bad drive cabling, mistypes using dd, blowing away the first gig of a > partition, growing, shrinking, migrating, etc. Are there formal test > cases and scenarios for mdadm and linux-raid? > > Also many of the emails I have seen pass through this mailing list > involve some interesting combinations of raid device superblock > mismatches that beg the question.. How could you have possibly gotten > your raid components into *that* state... > > In addition to the typical use cases covered in the manual (creating > an array, growing, shrinking, replacing disks, etc) it might be > interesting to have a list of misuse cases for folks to try and work > out.. (Ooops, I accidentally blew away my superblock, what can I do > without a full rebuild) > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Nvidia Raid5 Failure 2014-04-10 14:36 ` Scott D'Vileskis 2014-04-11 4:15 ` Scott D'Vileskis @ 2014-04-13 16:42 ` Drew 2014-04-14 6:14 ` Stan Hoeppner 1 sibling, 1 reply; 12+ messages in thread From: Drew @ 2014-04-13 16:42 UTC (permalink / raw) To: linux-raid On Thu, Apr 10, 2014 at 7:36 AM, Scott D'Vileskis <sdvileskis@gmail.com> wrote: > <snip> I > would also advise against what is known as 'fake raid' controllers > like your NVIDIA hardware likely is, (or Promise, highpoint, Intel, > etc) because it can be difficult to recover data if you have a > controller/mobo failure without exact hardware. Agree on the staying away from fake-RAID. One thing I will point out for reference tho, is that not *all* Intel RAID is fakeraid. The onboard RAID built into Intel's ICH family certainly is. However Intel does make a line of RAID controller daughter cards which are rebadged LSI RAID controllers and are in fact true H/W RAID. Easiest way to know is to see if the card supports SAS. If it does, chances are it's a H/W RAID card. -- Drew "Nothing in life is to be feared. It is only to be understood." --Marie Curie "This started out as a hobby and spun horribly out of control." -Unknown ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Nvidia Raid5 Failure 2014-04-13 16:42 ` Drew @ 2014-04-14 6:14 ` Stan Hoeppner 2014-04-14 9:50 ` NeilBrown 0 siblings, 1 reply; 12+ messages in thread From: Stan Hoeppner @ 2014-04-14 6:14 UTC (permalink / raw) To: Drew, linux-raid On 4/13/2014 11:42 AM, Drew wrote: > On Thu, Apr 10, 2014 at 7:36 AM, Scott D'Vileskis <sdvileskis@gmail.com> wrote: >> <snip> I >> would also advise against what is known as 'fake raid' controllers >> like your NVIDIA hardware likely is, (or Promise, highpoint, Intel, >> etc) because it can be difficult to recover data if you have a >> controller/mobo failure without exact hardware. > > Agree on the staying away from fake-RAID. One thing I will point out > for reference tho, is that not *all* Intel RAID is fakeraid. The > onboard RAID built into Intel's ICH family certainly is. However Intel > does make a line of RAID controller daughter cards which are rebadged > LSI RAID controllers and are in fact true H/W RAID. Easiest way to > know is to see if the card supports SAS. If it does, chances are it's > a H/W RAID card. The term "hardware RAID" is no longer appropriate as a means of classifying or describing the capability or performance of an HBA, and ceased to be quite a few years ago. All of the Intel mezzanine cards and PCIe HBAs use LSI SAS ASICs and LSI RAID firmware. In that sense they are "hardware RAID" controllers as the RAID software executes on the ASIC, not the host. However more than half of them lack DRAM. Those without DRAM do not and cannot support [F|B|]BWC. Without BBWC you lose two features that are really the defining characteristics of what we used to call a "hardware RAID" controller. 1. Early ACK. Without BBWC the ASIC firmware cannot buffer small random IOs and it cannot ACK command completion for sync, fsync, O_DIRECT, etc writes. Additionally one cannot disable barriers in filesystems. BBWC enhances the performance of such workloads dramatically by reducing latency. 2. Writeback. Some of Intel's DRAM-less RAID solutions, just like their LSI counterparts, support RAID5. Without on board DRAM these controllers cannot perform efficient writeback of RMW operations because there is no read cache. This would be roughly equivalent to hacking md/RAID5 to use a stripe_cache_size of 0. Using RAID5 with one of these controllers most often yields lower IOPS/throughput than a single disk. Better classification for the current era: 1. RAID controller - ASIC firmware, BBWC 2. HBA w/RAID - ASIC firmware, cache less 3. Fake-RAID - host software Cheers, Stan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Nvidia Raid5 Failure 2014-04-14 6:14 ` Stan Hoeppner @ 2014-04-14 9:50 ` NeilBrown 2014-04-14 10:55 ` Stan Hoeppner 0 siblings, 1 reply; 12+ messages in thread From: NeilBrown @ 2014-04-14 9:50 UTC (permalink / raw) To: stan; +Cc: Drew, linux-raid [-- Attachment #1: Type: text/plain, Size: 531 bytes --] On Mon, 14 Apr 2014 01:14:05 -0500 Stan Hoeppner <stan@hardwarefreak.com> wrote: > Better classification for the current era: > > 1. RAID controller - ASIC firmware, BBWC > 2. HBA w/RAID - ASIC firmware, cache less > 3. Fake-RAID - host software Can we come up with a better adjective than "fake"? It makes sense if you say "fake RAID controller", but people don't. They safe "fake RAID", which sounds like the RAID is fake, which it isn't. How about "BIOS RAID" ?? or "host RAID" ?? NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Nvidia Raid5 Failure 2014-04-14 9:50 ` NeilBrown @ 2014-04-14 10:55 ` Stan Hoeppner [not found] ` <CAK_KU4aRbK-sD6h7xqieW_D9FhBYBAy799wZHXq222DAMLjRng@mail.gmail.com> 0 siblings, 1 reply; 12+ messages in thread From: Stan Hoeppner @ 2014-04-14 10:55 UTC (permalink / raw) To: NeilBrown; +Cc: Drew, linux-raid On 4/14/2014 4:50 AM, NeilBrown wrote: > On Mon, 14 Apr 2014 01:14:05 -0500 Stan Hoeppner <stan@hardwarefreak.com> > wrote: > >> Better classification for the current era: >> >> 1. RAID controller - ASIC firmware, BBWC >> 2. HBA w/RAID - ASIC firmware, cache less >> 3. Fake-RAID - host software To be clear, above I am differentiating between the various flavors of "hardware RAID" devices, and part of the classification is based on where the RAID binary is executed. I do not address software only RAID above. > Can we come up with a better adjective than "fake"? Many already have, but the terms were not adopted en masse. > It makes sense if you say "fake RAID controller", but people don't. They > safe "fake RAID", which sounds like the RAID is fake, which it isn't. I'm not attempting to reinvent the wheel above. "FakeRAID", and various spellings of it, is the term in common use for a decade+, is widely recognized and understood. It is even used in official Linux distro documentation: https://help.ubuntu.com/community/FakeRaidHowto https://wiki.archlinux.org/index.php/Installing_with_Fake_RAID > How about "BIOS RAID" ?? or "host RAID" ?? I agree that a better descriptive short label would be preferable. However I don't see either of these working. "BIOS RAID" will be confusing to some as many folks A. don't understand the difference between BIOS and firmware B. have a BIOS config setup utility on their RAID controller or HBA w/RAID card, and both devices "boot from the card's BIOS" "Host RAID" has been used extensively over the years in various circles to describe host software only RAID solutions. Additionally this wouldn't be an accurate description because there have been many add-in IDE/SATA "RAID" cards that split RAID duty between card BIOS/firmware and host OS driver in this manner. HighPoint has such current product: http://www.highpoint-tech.com/USA_new/series_rr272x.htm They describes this as "Hardware-Assisted RAID", which is a pretty good description IMO. Any effort or campaign to supplant "fakeRAID" with another term I think will be extremely difficult and prone to fail as "fakeRAID" is already so entrenched in the lexicon, and has been used in official distro documentation. Just my 2¢ Cheers, Stan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <CAK_KU4aRbK-sD6h7xqieW_D9FhBYBAy799wZHXq222DAMLjRng@mail.gmail.com>]
* Re: Nvidia Raid5 Failure [not found] ` <CAK_KU4aRbK-sD6h7xqieW_D9FhBYBAy799wZHXq222DAMLjRng@mail.gmail.com> @ 2014-04-15 3:18 ` Stan Hoeppner 0 siblings, 0 replies; 12+ messages in thread From: Stan Hoeppner @ 2014-04-15 3:18 UTC (permalink / raw) To: Scott D'Vileskis; +Cc: NeilBrown, Linux RAID On 4/14/2014 8:51 AM, Scott D'Vileskis wrote: > I'd be curious to see the benchmarks of some of these, specifically > how properly-tuned software raid in Linux (with ample memory and CPU > bandwidth) compares against the "hardware" solutions. > > I'm already sold on software raid for ease of use and recovery, > maturity, knowledge base, etc.. But the numbers would be fun. I know I > can simply google it, but surely one of you two has a great bookmark! With the emergence of bcache and cousins, their equivalent in RAID card firmware, and the low cost of SSDs, RAID implementation, software or hardware, is no longer a real issue WRT performance. Streaming non RMW throughput is usually identical between the two, limited by the drives, not the RAID. Random IO will be very similar as well, with a slight edge to RAID cards with DRAM cache in front of the SSD cache. Thus it is the other qualities and deficiencies of each and the intended use case that drives the decision. The needs of the personal or SOHO server, a UNI department on a tight budget, etc, are often quite different than those of the enterprise customer with deeper pockets. For the former cost is usually a significant factor, while for the latter the cost of the RAID card is inconsequential given the high cost of the enterprise drives attached to it, where only one or two drives equal the price of the RAID card. In an enterprise environment, light path management is a must--an LED is required for drive failure identification and easy replacement. Linux/md does not yet provide this functionality (though efforts are being made) and typically forces users to carefully document which drives are in which chassis/backplane slots, and maintain those records every time a drive is swapped. Most RAID cards have provided failure LED support for ~20 years. md has the same management interface on any Linux host. If one buys RAID cards from multiple vendors they must learn multiple interfaces. md is more flexible WRT to mixing different drive types within an array. Hardware RAID controllers are typically more finicky here, requiring drives with a low and uniform ERC/TLER value, and of matching firmware revs across the drives in an array. And of course md can do the one thing hardware RAID cards cannot: stitch arrays on multiple RAID cards together into a single disk device using a nested stripe or a linear array depending on the workload. This gives you a bit of the best features of both technologies, and allows you to scale to a level not easily achievable using only one of either technology. Cheers, Stan > Thanks! > --Scott > > On Mon, Apr 14, 2014 at 6:55 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote: >> On 4/14/2014 4:50 AM, NeilBrown wrote: >>> On Mon, 14 Apr 2014 01:14:05 -0500 Stan Hoeppner <stan@hardwarefreak.com> >>> wrote: >>> >>>> Better classification for the current era: >>>> >>>> 1. RAID controller - ASIC firmware, BBWC >>>> 2. HBA w/RAID - ASIC firmware, cache less >>>> 3. Fake-RAID - host software >> >> To be clear, above I am differentiating between the various flavors of >> "hardware RAID" devices, and part of the classification is based on >> where the RAID binary is executed. I do not address software only RAID >> above. >> >>> Can we come up with a better adjective than "fake"? >> >> Many already have, but the terms were not adopted en masse. >> >>> It makes sense if you say "fake RAID controller", but people don't. They >>> safe "fake RAID", which sounds like the RAID is fake, which it isn't. >> >> I'm not attempting to reinvent the wheel above. "FakeRAID", and various >> spellings of it, is the term in common use for a decade+, is widely >> recognized and understood. It is even used in official Linux distro >> documentation: >> >> https://help.ubuntu.com/community/FakeRaidHowto >> https://wiki.archlinux.org/index.php/Installing_with_Fake_RAID >> >>> How about "BIOS RAID" ?? or "host RAID" ?? >> >> I agree that a better descriptive short label would be preferable. >> However I don't see either of these working. "BIOS RAID" will be >> confusing to some as many folks >> >> A. don't understand the difference between BIOS and firmware >> B. have a BIOS config setup utility on their RAID controller or HBA >> w/RAID card, and both devices "boot from the card's BIOS" >> >> "Host RAID" has been used extensively over the years in various circles >> to describe host software only RAID solutions. Additionally this >> wouldn't be an accurate description because there have been many add-in >> IDE/SATA "RAID" cards that split RAID duty between card BIOS/firmware >> and host OS driver in this manner. HighPoint has such current product: >> >> http://www.highpoint-tech.com/USA_new/series_rr272x.htm >> >> They describes this as "Hardware-Assisted RAID", which is a pretty good >> description IMO. >> >> Any effort or campaign to supplant "fakeRAID" with another term I think >> will be extremely difficult and prone to fail as "fakeRAID" is already >> so entrenched in the lexicon, and has been used in official distro >> documentation. >> >> Just my 2¢ >> >> Cheers, >> >> Stan >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-15 3:18 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-04-13 10:59 Nvidia Raid5 Failure peter davidson -- strict thread matches above, loose matches on Subject: below -- 2014-04-10 11:13 peter davidson 2014-04-10 5:00 peter davidson 2014-04-10 8:46 ` David Brown 2014-04-10 14:36 ` Scott D'Vileskis 2014-04-11 4:15 ` Scott D'Vileskis 2014-04-11 7:45 ` David Brown 2014-04-13 16:42 ` Drew 2014-04-14 6:14 ` Stan Hoeppner 2014-04-14 9:50 ` NeilBrown 2014-04-14 10:55 ` Stan Hoeppner [not found] ` <CAK_KU4aRbK-sD6h7xqieW_D9FhBYBAy799wZHXq222DAMLjRng@mail.gmail.com> 2014-04-15 3:18 ` Stan Hoeppner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.