All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid failing, which command to remove the bad drive?
@ 2011-08-26 20:13 Timothy D. Lenz
  2011-08-26 21:25 ` Mathias Burén
  2011-08-26 22:45 ` NeilBrown
  0 siblings, 2 replies; 18+ messages in thread
From: Timothy D. Lenz @ 2011-08-26 20:13 UTC (permalink / raw)
  To: linux-raid

I have 4 drives set up as 2 pairs.  The first part has 3 partitions on 
it and it seems 1 of those drives is failing (going to have to figure 
out which drive it is too so I don't pull the wrong one out of the case)

It's been awhile since I had to replace a drive in the array and my 
notes are a bit confusing. I'm not sure which I need to use to remove 
the drive:


	sudo mdadm --manage /dev/md0 --fail /dev/sdb
	sudo mdadm --manage /dev/md0 --remove /dev/sdb
	sudo mdadm --manage /dev/md1 --fail /dev/sdb
	sudo mdadm --manage /dev/md1 --remove /dev/sdb
	sudo mdadm --manage /dev/md2 --fail /dev/sdb
	sudo mdadm --manage /dev/md2 --remove /dev/sdb

or

sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3

I'm not sure if I fail the drive partition or whole drive for each.

-------------------------------------
The mails I got are:
-------------------------------------
A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sdb1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdb2[2](F) sda2[0]
       4891712 blocks [2/1] [U_]

md2 : active raid1 sdb3[1] sda3[0]
       459073344 blocks [2/2] [UU]

md3 : active raid1 sdd1[1] sdc1[0]
       488383936 blocks [2/2] [UU]

md0 : active raid1 sdb1[2](F) sda1[0]
       24418688 blocks [2/1] [U_]

unused devices: <none>
-------------------------------------
A Fail event had been detected on md device /dev/md1.

It could be related to component device /dev/sdb2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdb2[2](F) sda2[0]
       4891712 blocks [2/1] [U_]

md2 : active raid1 sdb3[1] sda3[0]
       459073344 blocks [2/2] [UU]

md3 : active raid1 sdd1[1] sdc1[0]
       488383936 blocks [2/2] [UU]

md0 : active raid1 sdb1[2](F) sda1[0]
       24418688 blocks [2/1] [U_]

unused devices: <none>
-------------------------------------
A Fail event had been detected on md device /dev/md2.

It could be related to component device /dev/sdb3.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdb2[2](F) sda2[0]
       4891712 blocks [2/1] [U_]

md2 : active raid1 sdb3[2](F) sda3[0]
       459073344 blocks [2/1] [U_]

md3 : active raid1 sdd1[1] sdc1[0]
       488383936 blocks [2/2] [UU]

md0 : active raid1 sdb1[2](F) sda1[0]
       24418688 blocks [2/1] [U_]

unused devices: <none>
-------------------------------------

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-08-26 20:13 Raid failing, which command to remove the bad drive? Timothy D. Lenz
@ 2011-08-26 21:25 ` Mathias Burén
  2011-08-26 22:26   ` Timothy D. Lenz
  2011-08-26 22:45 ` NeilBrown
  1 sibling, 1 reply; 18+ messages in thread
From: Mathias Burén @ 2011-08-26 21:25 UTC (permalink / raw)
  To: Timothy D. Lenz; +Cc: linux-raid

On 26 August 2011 21:13, Timothy D. Lenz <tlenz@vorgon.com> wrote:
> I have 4 drives set up as 2 pairs.  The first part has 3 partitions on it
> and it seems 1 of those drives is failing (going to have to figure out which
> drive it is too so I don't pull the wrong one out of the case)
>
> It's been awhile since I had to replace a drive in the array and my notes
> are a bit confusing. I'm not sure which I need to use to remove the drive:
>
>
>        sudo mdadm --manage /dev/md0 --fail /dev/sdb
>        sudo mdadm --manage /dev/md0 --remove /dev/sdb
>        sudo mdadm --manage /dev/md1 --fail /dev/sdb
>        sudo mdadm --manage /dev/md1 --remove /dev/sdb
>        sudo mdadm --manage /dev/md2 --fail /dev/sdb
>        sudo mdadm --manage /dev/md2 --remove /dev/sdb
>
> or
>
> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
>
> I'm not sure if I fail the drive partition or whole drive for each.
>
> -------------------------------------
> The mails I got are:
> -------------------------------------
> A Fail event had been detected on md device /dev/md0.
>
> It could be related to component device /dev/sdb1.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>      4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[1] sda3[0]
>      459073344 blocks [2/2] [UU]
>
> md3 : active raid1 sdd1[1] sdc1[0]
>      488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
>      24418688 blocks [2/1] [U_]
>
> unused devices: <none>
> -------------------------------------
> A Fail event had been detected on md device /dev/md1.
>
> It could be related to component device /dev/sdb2.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>      4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[1] sda3[0]
>      459073344 blocks [2/2] [UU]
>
> md3 : active raid1 sdd1[1] sdc1[0]
>      488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
>      24418688 blocks [2/1] [U_]
>
> unused devices: <none>
> -------------------------------------
> A Fail event had been detected on md device /dev/md2.
>
> It could be related to component device /dev/sdb3.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>      4891712 blocks [2/1] [U_]
>
> md2 : active raid1 sdb3[2](F) sda3[0]
>      459073344 blocks [2/1] [U_]
>
> md3 : active raid1 sdd1[1] sdc1[0]
>      488383936 blocks [2/2] [UU]
>
> md0 : active raid1 sdb1[2](F) sda1[0]
>      24418688 blocks [2/1] [U_]
>
> unused devices: <none>
> -------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Looks like your sda is failing, that's the smartctl -a /dev/sda output?

/Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-08-26 21:25 ` Mathias Burén
@ 2011-08-26 22:26   ` Timothy D. Lenz
  2011-08-26 22:45     ` Mathias Burén
  0 siblings, 1 reply; 18+ messages in thread
From: Timothy D. Lenz @ 2011-08-26 22:26 UTC (permalink / raw)
  To: Mathias Burén; +Cc: linux-raid

um, no, that was the email that mdadm sends I thought. And it says 
problem is sdb in each case. Though I was wondering why each one said 
[U_] instead of [_U]. Here is the smartctl for sda and below that will 
be for sdb

======================================================================
vorg@x64VDR:~$ sudo smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (local 
build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST3500320AS
Serial Number:    9QM7M86S
LU WWN Device Id: 5 000c50 01059c636
Firmware Version: SD1A
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Aug 26 15:23:41 2011 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                         was completed without error.
                                         Auto Offline Data Collection: 
Enabled.
Self-test execution status:      (   0) The previous self-test routine 
completed
                                         without error or no self-test 
has ever
                                         been run.
Total time to complete Offline
data collection:                (  650) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                         Auto Offline data collection 
on/off support.
                                         Suspend Offline collection upon new
                                         command.
                                         Offline surface scan supported.
                                         Self-test supported.
                                         Conveyance Self-test supported.
                                         Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                         power-saving mode.
                                         Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                         General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 119) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103b) SCT Status supported.
                                         SCT Error Recovery Control 
supported.
                                         SCT Feature Control supported.
                                         SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   114   100   006    Pre-fail 
Always       -       83309768
   3 Spin_Up_Time            0x0003   094   094   000    Pre-fail 
Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020    Old_age 
Always       -       13
   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x000f   071   060   030    Pre-fail 
Always       -       13556066
   9 Power_On_Hours          0x0032   094   094   000    Old_age 
Always       -       5406
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   020    Old_age 
Always       -       13
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always 
       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always 
       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always 
       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always 
       -       0
190 Airflow_Temperature_Cel 0x0022   067   065   045    Old_age   Always 
       -       33 (Min/Max 30/35)
194 Temperature_Celsius     0x0022   033   040   000    Old_age   Always 
       -       33 (0 21 0 0)
195 Hardware_ECC_Recovered  0x001a   058   033   000    Old_age   Always 
       -       83309768
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always 
       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always 
       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay
On 8/26/2011 2:25 PM, Mathias Burén wrote:
> smartctl -a /dev/sda
======================================================================

vorg@x64VDR:~$ sudo smartctl -a /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (local 
build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               /1:0:0:0
Product:
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
scsiModePageOffset: response length too short, resp_len=47 offset=50 
bd_len=46
 >> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more 
'-T permissive' options.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-08-26 20:13 Raid failing, which command to remove the bad drive? Timothy D. Lenz
  2011-08-26 21:25 ` Mathias Burén
@ 2011-08-26 22:45 ` NeilBrown
  2011-09-01 17:51   ` Timothy D. Lenz
  1 sibling, 1 reply; 18+ messages in thread
From: NeilBrown @ 2011-08-26 22:45 UTC (permalink / raw)
  To: Timothy D. Lenz; +Cc: linux-raid

On Fri, 26 Aug 2011 13:13:01 -0700 "Timothy D. Lenz" <tlenz@vorgon.com> wrote:

> I have 4 drives set up as 2 pairs.  The first part has 3 partitions on 
> it and it seems 1 of those drives is failing (going to have to figure 
> out which drive it is too so I don't pull the wrong one out of the case)
> 
> It's been awhile since I had to replace a drive in the array and my 
> notes are a bit confusing. I'm not sure which I need to use to remove 
> the drive:
> 
> 
> 	sudo mdadm --manage /dev/md0 --fail /dev/sdb
> 	sudo mdadm --manage /dev/md0 --remove /dev/sdb
> 	sudo mdadm --manage /dev/md1 --fail /dev/sdb
> 	sudo mdadm --manage /dev/md1 --remove /dev/sdb
> 	sudo mdadm --manage /dev/md2 --fail /dev/sdb
> 	sudo mdadm --manage /dev/md2 --remove /dev/sdb

sdb is not a member of any of these arrays so all of these commands will fail.

The partitions are members of the arrays.
> 
> or
> 
> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2

sd1 and sdb2 have already been marked as failed so there is little point in
marking them as failed again.  Removing them makes sense though.


> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3

sdb3 hasn't been marked as failed yet - maybe it will soon if sdb is a bit
marginal.
So if you want to remove sdb from the machine this the correct thing to do.
Mark sdb3 as failed, then remove it from the array.

> 
> I'm not sure if I fail the drive partition or whole drive for each.

You only fail things that aren't failed already, and you fail the thing that
mdstat or mdadm -D tells you is a member of the array.

NeilBrown



> 
> -------------------------------------
> The mails I got are:
> -------------------------------------
> A Fail event had been detected on md device /dev/md0.
> 
> It could be related to component device /dev/sdb1.
> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>        4891712 blocks [2/1] [U_]
> 
> md2 : active raid1 sdb3[1] sda3[0]
>        459073344 blocks [2/2] [UU]
> 
> md3 : active raid1 sdd1[1] sdc1[0]
>        488383936 blocks [2/2] [UU]
> 
> md0 : active raid1 sdb1[2](F) sda1[0]
>        24418688 blocks [2/1] [U_]
> 
> unused devices: <none>
> -------------------------------------
> A Fail event had been detected on md device /dev/md1.
> 
> It could be related to component device /dev/sdb2.
> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>        4891712 blocks [2/1] [U_]
> 
> md2 : active raid1 sdb3[1] sda3[0]
>        459073344 blocks [2/2] [UU]
> 
> md3 : active raid1 sdd1[1] sdc1[0]
>        488383936 blocks [2/2] [UU]
> 
> md0 : active raid1 sdb1[2](F) sda1[0]
>        24418688 blocks [2/1] [U_]
> 
> unused devices: <none>
> -------------------------------------
> A Fail event had been detected on md device /dev/md2.
> 
> It could be related to component device /dev/sdb3.
> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
> md1 : active raid1 sdb2[2](F) sda2[0]
>        4891712 blocks [2/1] [U_]
> 
> md2 : active raid1 sdb3[2](F) sda3[0]
>        459073344 blocks [2/1] [U_]
> 
> md3 : active raid1 sdd1[1] sdc1[0]
>        488383936 blocks [2/2] [UU]
> 
> md0 : active raid1 sdb1[2](F) sda1[0]
>        24418688 blocks [2/1] [U_]
> 
> unused devices: <none>
> -------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-08-26 22:26   ` Timothy D. Lenz
@ 2011-08-26 22:45     ` Mathias Burén
  2011-08-26 23:14       ` Timothy D. Lenz
  0 siblings, 1 reply; 18+ messages in thread
From: Mathias Burén @ 2011-08-26 22:45 UTC (permalink / raw)
  To: Timothy D. Lenz; +Cc: linux-raid

On 26 August 2011 23:26, Timothy D. Lenz <tlenz@vorgon.com> wrote:
> um, no, that was the email that mdadm sends I thought. And it says problem
> is sdb in each case. Though I was wondering why each one said [U_] instead
> of [_U]. Here is the smartctl for sda and below that will be for sdb
>
> ======================================================================
> vorg@x64VDR:~$ sudo smartctl -a /dev/sda
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (local
> build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Barracuda 7200.11
> Device Model:     ST3500320AS
> Serial Number:    9QM7M86S
> LU WWN Device Id: 5 000c50 01059c636
> Firmware Version: SD1A
> User Capacity:    500,107,862,016 bytes [500 GB]
> Sector Size:      512 bytes logical/physical
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4
> Local Time is:    Fri Aug 26 15:23:41 2011 MST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x82) Offline data collection activity
>                                        was completed without error.
>                                        Auto Offline Data Collection:
> Enabled.
> Self-test execution status:      (   0) The previous self-test routine
> completed
>                                        without error or no self-test has
> ever
>                                        been run.
> Total time to complete Offline
> data collection:                (  650) seconds.
> Offline data collection
> capabilities:                    (0x7b) SMART execute Offline immediate.
>                                        Auto Offline data collection on/off
> support.
>                                        Suspend Offline collection upon new
>                                        command.
>                                        Offline surface scan supported.
>                                        Self-test supported.
>                                        Conveyance Self-test supported.
>                                        Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                        power-saving mode.
>                                        Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                        General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   1) minutes.
> Extended self-test routine
> recommended polling time:        ( 119) minutes.
> Conveyance self-test routine
> recommended polling time:        (   2) minutes.
> SCT capabilities:              (0x103b) SCT Status supported.
>                                        SCT Error Recovery Control supported.
>                                        SCT Feature Control supported.
>                                        SCT Data Table supported.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED
>  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000f   114   100   006    Pre-fail Always
> -       83309768
>  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail Always
> -       0
>  4 Start_Stop_Count        0x0032   100   100   020    Old_age Always
> -       13
>  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail Always
> -       0
>  7 Seek_Error_Rate         0x000f   071   060   030    Pre-fail Always
> -       13556066
>  9 Power_On_Hours          0x0032   094   094   000    Old_age Always
> -       5406
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail Always
>   -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age Always
> -       13
> 184 End-to-End_Error        0x0032   100   100   099    Old_age   Always
>   -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always
>   -       0
> 188 Command_Timeout         0x0032   100   100   000    Old_age   Always
>   -       0
> 189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always
>   -       0
> 190 Airflow_Temperature_Cel 0x0022   067   065   045    Old_age   Always
>   -       33 (Min/Max 30/35)
> 194 Temperature_Celsius     0x0022   033   040   000    Old_age   Always
>   -       33 (0 21 0 0)
> 195 Hardware_ECC_Recovered  0x001a   058   033   000    Old_age   Always
>   -       83309768
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
>   -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age Offline
>  -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
>   -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> No self-tests have been logged.  [To run self-tests, use: smartctl -t]
>
>
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>    1        0        0  Not_testing
>    2        0        0  Not_testing
>    3        0        0  Not_testing
>    4        0        0  Not_testing
>    5        0        0  Not_testing
> Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay
> On 8/26/2011 2:25 PM, Mathias Burén wrote:
>>
>> smartctl -a /dev/sda
>
> ======================================================================
>
> vorg@x64VDR:~$ sudo smartctl -a /dev/sdb
> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (local
> build)
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Vendor:               /1:0:0:0
> Product:
> User Capacity:        600,332,565,813,390,450 bytes [600 PB]
> Logical block size:   774843950 bytes
> scsiModePageOffset: response length too short, resp_len=47 offset=50
> bd_len=46
>>> Terminate command early due to bad response to IEC mode page
> A mandatory SMART command failed: exiting. To continue, add one or more '-T
> permissive' options.
>


Indeed, sorry. 600 PB... where did you get that drive? ;)

/M

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-08-26 22:45     ` Mathias Burén
@ 2011-08-26 23:14       ` Timothy D. Lenz
  0 siblings, 0 replies; 18+ messages in thread
From: Timothy D. Lenz @ 2011-08-26 23:14 UTC (permalink / raw)
  To: Mathias Burén; +Cc: linux-raid



On 8/26/2011 3:45 PM, Mathias Burén wrote:
> On 26 August 2011 23:26, Timothy D. Lenz<tlenz@vorgon.com>  wrote:
>> um, no, that was the email that mdadm sends I thought. And it says problem
>> is sdb in each case. Though I was wondering why each one said [U_] instead
>> of [_U]. Here is the smartctl for sda and below that will be for sdb
>>
>> ======================================================================
>> vorg@x64VDR:~$ sudo smartctl -a /dev/sda
>> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (local
>> build)
>> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Seagate Barracuda 7200.11
>> Device Model:     ST3500320AS
>> Serial Number:    9QM7M86S
>> LU WWN Device Id: 5 000c50 01059c636
>> Firmware Version: SD1A
>> User Capacity:    500,107,862,016 bytes [500 GB]
>> Sector Size:      512 bytes logical/physical
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   8
>> ATA Standard is:  ATA-8-ACS revision 4
>> Local Time is:    Fri Aug 26 15:23:41 2011 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status:  (0x82) Offline data collection activity
>>                                         was completed without error.
>>                                         Auto Offline Data Collection:
>> Enabled.
>> Self-test execution status:      (   0) The previous self-test routine
>> completed
>>                                         without error or no self-test has
>> ever
>>                                         been run.
>> Total time to complete Offline
>> data collection:                (  650) seconds.
>> Offline data collection
>> capabilities:                    (0x7b) SMART execute Offline immediate.
>>                                         Auto Offline data collection on/off
>> support.
>>                                         Suspend Offline collection upon new
>>                                         command.
>>                                         Offline surface scan supported.
>>                                         Self-test supported.
>>                                         Conveyance Self-test supported.
>>                                         Selective Self-test supported.
>> SMART capabilities:            (0x0003) Saves SMART data before entering
>>                                         power-saving mode.
>>                                         Supports SMART auto save timer.
>> Error logging capability:        (0x01) Error logging supported.
>>                                         General Purpose Logging supported.
>> Short self-test routine
>> recommended polling time:        (   1) minutes.
>> Extended self-test routine
>> recommended polling time:        ( 119) minutes.
>> Conveyance self-test routine
>> recommended polling time:        (   2) minutes.
>> SCT capabilities:              (0x103b) SCT Status supported.
>>                                         SCT Error Recovery Control supported.
>>                                         SCT Feature Control supported.
>>                                         SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 10
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED
>>   WHEN_FAILED RAW_VALUE
>>   1 Raw_Read_Error_Rate     0x000f   114   100   006    Pre-fail Always
>> -       83309768
>>   3 Spin_Up_Time            0x0003   094   094   000    Pre-fail Always
>> -       0
>>   4 Start_Stop_Count        0x0032   100   100   020    Old_age Always
>> -       13
>>   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail Always
>> -       0
>>   7 Seek_Error_Rate         0x000f   071   060   030    Pre-fail Always
>> -       13556066
>>   9 Power_On_Hours          0x0032   094   094   000    Old_age Always
>> -       5406
>>   10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail Always
>>    -       0
>>   12 Power_Cycle_Count       0x0032   100   100   020    Old_age Always
>> -       13
>> 184 End-to-End_Error        0x0032   100   100   099    Old_age   Always
>>    -       0
>> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always
>>    -       0
>> 188 Command_Timeout         0x0032   100   100   000    Old_age   Always
>>    -       0
>> 189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always
>>    -       0
>> 190 Airflow_Temperature_Cel 0x0022   067   065   045    Old_age   Always
>>    -       33 (Min/Max 30/35)
>> 194 Temperature_Celsius     0x0022   033   040   000    Old_age   Always
>>    -       33 (0 21 0 0)
>> 195 Hardware_ECC_Recovered  0x001a   058   033   000    Old_age   Always
>>    -       83309768
>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
>>    -       0
>> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age Offline
>>   -       0
>> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
>>    -       0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 1
>> No self-tests have been logged.  [To run self-tests, use: smartctl -t]
>>
>>
>> SMART Selective self-test log data structure revision number 1
>>   SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>     1        0        0  Not_testing
>>     2        0        0  Not_testing
>>     3        0        0  Not_testing
>>     4        0        0  Not_testing
>>     5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>   After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute delay
>> On 8/26/2011 2:25 PM, Mathias Burén wrote:
>>>
>>> smartctl -a /dev/sda
>>
>> ======================================================================
>>
>> vorg@x64VDR:~$ sudo smartctl -a /dev/sdb
>> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.34.20100610.1] (local
>> build)
>> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
>>
>> Vendor:               /1:0:0:0
>> Product:
>> User Capacity:        600,332,565,813,390,450 bytes [600 PB]
>> Logical block size:   774843950 bytes
>> scsiModePageOffset: response length too short, resp_len=47 offset=50
>> bd_len=46
>>>> Terminate command early due to bad response to IEC mode page
>> A mandatory SMART command failed: exiting. To continue, add one or more '-T
>> permissive' options.
>>
>
>
> Indeed, sorry. 600 PB... where did you get that drive? ;)
>
> /M

What about those pre-fail messages on the other drive? are they 
something to worry about now?

Also, I ran the same thing on the 2 drives for md3 and got the same 
pre-fail messages for both of those, plus one had this nice little note:

==> WARNING: There are known problems with these drives,
AND THIS FIRMWARE VERSION IS AFFECTED,
see the following Seagate web pages:
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951

4 seagate drives in this computer, this will make 3 failures since I put 
them in. I think the drives are still in warrenty. last time I replaced 
one it was good till something like 2012 or 2013.  But any new drives 
will be WD.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-08-26 22:45 ` NeilBrown
@ 2011-09-01 17:51   ` Timothy D. Lenz
  2011-09-02  5:24     ` Simon Matthews
  2011-09-09 21:54     ` Bill Davidsen
  0 siblings, 2 replies; 18+ messages in thread
From: Timothy D. Lenz @ 2011-09-01 17:51 UTC (permalink / raw)
  To: linux-raid



On 8/26/2011 3:45 PM, NeilBrown wrote:
> On Fri, 26 Aug 2011 13:13:01 -0700 "Timothy D. Lenz"<tlenz@vorgon.com>  wrote:
>
>> I have 4 drives set up as 2 pairs.  The first part has 3 partitions on
>> it and it seems 1 of those drives is failing (going to have to figure
>> out which drive it is too so I don't pull the wrong one out of the case)
>>
>> It's been awhile since I had to replace a drive in the array and my
>> notes are a bit confusing. I'm not sure which I need to use to remove
>> the drive:
>>
>>
>> 	sudo mdadm --manage /dev/md0 --fail /dev/sdb
>> 	sudo mdadm --manage /dev/md0 --remove /dev/sdb
>> 	sudo mdadm --manage /dev/md1 --fail /dev/sdb
>> 	sudo mdadm --manage /dev/md1 --remove /dev/sdb
>> 	sudo mdadm --manage /dev/md2 --fail /dev/sdb
>> 	sudo mdadm --manage /dev/md2 --remove /dev/sdb
>
> sdb is not a member of any of these arrays so all of these commands will fail.
>
> The partitions are members of the arrays.
>>
>> or
>>
>> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
>> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
>
> sd1 and sdb2 have already been marked as failed so there is little point in
> marking them as failed again.  Removing them makes sense though.
>
>
>> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
>
> sdb3 hasn't been marked as failed yet - maybe it will soon if sdb is a bit
> marginal.
> So if you want to remove sdb from the machine this the correct thing to do.
> Mark sdb3 as failed, then remove it from the array.
>
>>
>> I'm not sure if I fail the drive partition or whole drive for each.
>
> You only fail things that aren't failed already, and you fail the thing that
> mdstat or mdadm -D tells you is a member of the array.
>
> NeilBrown
>
>
>
>>
>> -------------------------------------
>> The mails I got are:
>> -------------------------------------
>> A Fail event had been detected on md device /dev/md0.
>>
>> It could be related to component device /dev/sdb1.
>>
>> Faithfully yours, etc.
>>
>> P.S. The /proc/mdstat file currently contains the following:
>>
>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>> md1 : active raid1 sdb2[2](F) sda2[0]
>>         4891712 blocks [2/1] [U_]
>>
>> md2 : active raid1 sdb3[1] sda3[0]
>>         459073344 blocks [2/2] [UU]
>>
>> md3 : active raid1 sdd1[1] sdc1[0]
>>         488383936 blocks [2/2] [UU]
>>
>> md0 : active raid1 sdb1[2](F) sda1[0]
>>         24418688 blocks [2/1] [U_]
>>
>> unused devices:<none>
>> -------------------------------------
>> A Fail event had been detected on md device /dev/md1.
>>
>> It could be related to component device /dev/sdb2.
>>
>> Faithfully yours, etc.
>>
>> P.S. The /proc/mdstat file currently contains the following:
>>
>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>> md1 : active raid1 sdb2[2](F) sda2[0]
>>         4891712 blocks [2/1] [U_]
>>
>> md2 : active raid1 sdb3[1] sda3[0]
>>         459073344 blocks [2/2] [UU]
>>
>> md3 : active raid1 sdd1[1] sdc1[0]
>>         488383936 blocks [2/2] [UU]
>>
>> md0 : active raid1 sdb1[2](F) sda1[0]
>>         24418688 blocks [2/1] [U_]
>>
>> unused devices:<none>
>> -------------------------------------
>> A Fail event had been detected on md device /dev/md2.
>>
>> It could be related to component device /dev/sdb3.
>>
>> Faithfully yours, etc.
>>
>> P.S. The /proc/mdstat file currently contains the following:
>>
>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>> md1 : active raid1 sdb2[2](F) sda2[0]
>>         4891712 blocks [2/1] [U_]
>>
>> md2 : active raid1 sdb3[2](F) sda3[0]
>>         459073344 blocks [2/1] [U_]
>>
>> md3 : active raid1 sdd1[1] sdc1[0]
>>         488383936 blocks [2/2] [UU]
>>
>> md0 : active raid1 sdb1[2](F) sda1[0]
>>         24418688 blocks [2/1] [U_]
>>
>> unused devices:<none>
>> -------------------------------------


Got another problem. Removed the drive and tried to start it back up and 
now get Grub Error 2. I'm not sure if when I did the mirrors if 
something when wrong with installing grub on the second drive< or if is 
has to do with [U_] which points to sda in that report instead of [_U].

I know I pulled the correct drive. I had it labled sdb, it's the second 
drive in the bios bootup drive check and it's the second connector on 
the board. And when I put just it in instead of the other, I got the 
noise again.  I think last time a drive failed it was one of these two 
drives because I remember recopying grub.

I do have another computer setup the same way, that I could put this 
remaining drive on to get grub fixed, but it's a bit of a pain to get 
the other computer hooked back up and I will have to dig through my 
notes about getting grub setup without messing up the array and stuff. I 
do know that both computers have been updated to grub 2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-01 17:51   ` Timothy D. Lenz
@ 2011-09-02  5:24     ` Simon Matthews
  2011-09-02 15:42       ` Timothy D. Lenz
  2011-09-09 21:54     ` Bill Davidsen
  1 sibling, 1 reply; 18+ messages in thread
From: Simon Matthews @ 2011-09-02  5:24 UTC (permalink / raw)
  To: Timothy D. Lenz; +Cc: linux-raid

On Thu, Sep 1, 2011 at 10:51 AM, Timothy D. Lenz <tlenz@vorgon.com> wrote:
>
>
> On 8/26/2011 3:45 PM, NeilBrown wrote:
>>
>> On Fri, 26 Aug 2011 13:13:01 -0700 "Timothy D. Lenz"<tlenz@vorgon.com>
>>  wrote:
>>
>>> I have 4 drives set up as 2 pairs.  The first part has 3 partitions on
>>> it and it seems 1 of those drives is failing (going to have to figure
>>> out which drive it is too so I don't pull the wrong one out of the case)
>>>
>>> It's been awhile since I had to replace a drive in the array and my
>>> notes are a bit confusing. I'm not sure which I need to use to remove
>>> the drive:
>>>
>>>
>>>        sudo mdadm --manage /dev/md0 --fail /dev/sdb
>>>        sudo mdadm --manage /dev/md0 --remove /dev/sdb
>>>        sudo mdadm --manage /dev/md1 --fail /dev/sdb
>>>        sudo mdadm --manage /dev/md1 --remove /dev/sdb
>>>        sudo mdadm --manage /dev/md2 --fail /dev/sdb
>>>        sudo mdadm --manage /dev/md2 --remove /dev/sdb
>>
>> sdb is not a member of any of these arrays so all of these commands will
>> fail.
>>
>> The partitions are members of the arrays.
>>>
>>> or
>>>
>>> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
>>> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
>>
>> sd1 and sdb2 have already been marked as failed so there is little point
>> in
>> marking them as failed again.  Removing them makes sense though.
>>
>>
>>> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
>>
>> sdb3 hasn't been marked as failed yet - maybe it will soon if sdb is a bit
>> marginal.
>> So if you want to remove sdb from the machine this the correct thing to
>> do.
>> Mark sdb3 as failed, then remove it from the array.
>>
>>>
>>> I'm not sure if I fail the drive partition or whole drive for each.
>>
>> You only fail things that aren't failed already, and you fail the thing
>> that
>> mdstat or mdadm -D tells you is a member of the array.
>>
>> NeilBrown
>>
>>
>>
>>>
>>> -------------------------------------
>>> The mails I got are:
>>> -------------------------------------
>>> A Fail event had been detected on md device /dev/md0.
>>>
>>> It could be related to component device /dev/sdb1.
>>>
>>> Faithfully yours, etc.
>>>
>>> P.S. The /proc/mdstat file currently contains the following:
>>>
>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>        4891712 blocks [2/1] [U_]
>>>
>>> md2 : active raid1 sdb3[1] sda3[0]
>>>        459073344 blocks [2/2] [UU]
>>>
>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>        488383936 blocks [2/2] [UU]
>>>
>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>        24418688 blocks [2/1] [U_]
>>>
>>> unused devices:<none>
>>> -------------------------------------
>>> A Fail event had been detected on md device /dev/md1.
>>>
>>> It could be related to component device /dev/sdb2.
>>>
>>> Faithfully yours, etc.
>>>
>>> P.S. The /proc/mdstat file currently contains the following:
>>>
>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>        4891712 blocks [2/1] [U_]
>>>
>>> md2 : active raid1 sdb3[1] sda3[0]
>>>        459073344 blocks [2/2] [UU]
>>>
>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>        488383936 blocks [2/2] [UU]
>>>
>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>        24418688 blocks [2/1] [U_]
>>>
>>> unused devices:<none>
>>> -------------------------------------
>>> A Fail event had been detected on md device /dev/md2.
>>>
>>> It could be related to component device /dev/sdb3.
>>>
>>> Faithfully yours, etc.
>>>
>>> P.S. The /proc/mdstat file currently contains the following:
>>>
>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>        4891712 blocks [2/1] [U_]
>>>
>>> md2 : active raid1 sdb3[2](F) sda3[0]
>>>        459073344 blocks [2/1] [U_]
>>>
>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>        488383936 blocks [2/2] [UU]
>>>
>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>        24418688 blocks [2/1] [U_]
>>>
>>> unused devices:<none>
>>> -------------------------------------
>
>
> Got another problem. Removed the drive and tried to start it back up and now
> get Grub Error 2. I'm not sure if when I did the mirrors if something when
> wrong with installing grub on the second drive< or if is has to do with [U_]
> which points to sda in that report instead of [_U].
>
> I know I pulled the correct drive. I had it labled sdb, it's the second
> drive in the bios bootup drive check and it's the second connector on the
> board. And when I put just it in instead of the other, I got the noise
> again.  I think last time a drive failed it was one of these two drives
> because I remember recopying grub.
>
> I do have another computer setup the same way, that I could put this
> remaining drive on to get grub fixed, but it's a bit of a pain to get the
> other computer hooked back up and I will have to dig through my notes about
> getting grub setup without messing up the array and stuff. I do know that
> both computers have been updated to grub 2


How did you install Grub on the second drive? I have seen some
instructions on the web that would not allow the system to boot if the
first drive failed or was removed.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-02  5:24     ` Simon Matthews
@ 2011-09-02 15:42       ` Timothy D. Lenz
  2011-09-03 11:35         ` Simon Matthews
  0 siblings, 1 reply; 18+ messages in thread
From: Timothy D. Lenz @ 2011-09-02 15:42 UTC (permalink / raw)
  Cc: Simon Matthews, linux-raid



On 9/1/2011 10:24 PM, Simon Matthews wrote:
> On Thu, Sep 1, 2011 at 10:51 AM, Timothy D. Lenz<tlenz@vorgon.com>  wrote:
>>
>>
>> On 8/26/2011 3:45 PM, NeilBrown wrote:
>>>
>>> On Fri, 26 Aug 2011 13:13:01 -0700 "Timothy D. Lenz"<tlenz@vorgon.com>
>>>   wrote:
>>>
>>>> I have 4 drives set up as 2 pairs.  The first part has 3 partitions on
>>>> it and it seems 1 of those drives is failing (going to have to figure
>>>> out which drive it is too so I don't pull the wrong one out of the case)
>>>>
>>>> It's been awhile since I had to replace a drive in the array and my
>>>> notes are a bit confusing. I'm not sure which I need to use to remove
>>>> the drive:
>>>>
>>>>
>>>>         sudo mdadm --manage /dev/md0 --fail /dev/sdb
>>>>         sudo mdadm --manage /dev/md0 --remove /dev/sdb
>>>>         sudo mdadm --manage /dev/md1 --fail /dev/sdb
>>>>         sudo mdadm --manage /dev/md1 --remove /dev/sdb
>>>>         sudo mdadm --manage /dev/md2 --fail /dev/sdb
>>>>         sudo mdadm --manage /dev/md2 --remove /dev/sdb
>>>
>>> sdb is not a member of any of these arrays so all of these commands will
>>> fail.
>>>
>>> The partitions are members of the arrays.
>>>>
>>>> or
>>>>
>>>> sudo mdadm /dev/md0 --fail /dev/sdb1 --remove /dev/sdb1
>>>> sudo mdadm /dev/md1 --fail /dev/sdb2 --remove /dev/sdb2
>>>
>>> sd1 and sdb2 have already been marked as failed so there is little point
>>> in
>>> marking them as failed again.  Removing them makes sense though.
>>>
>>>
>>>> sudo mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
>>>
>>> sdb3 hasn't been marked as failed yet - maybe it will soon if sdb is a bit
>>> marginal.
>>> So if you want to remove sdb from the machine this the correct thing to
>>> do.
>>> Mark sdb3 as failed, then remove it from the array.
>>>
>>>>
>>>> I'm not sure if I fail the drive partition or whole drive for each.
>>>
>>> You only fail things that aren't failed already, and you fail the thing
>>> that
>>> mdstat or mdadm -D tells you is a member of the array.
>>>
>>> NeilBrown
>>>
>>>
>>>
>>>>
>>>> -------------------------------------
>>>> The mails I got are:
>>>> -------------------------------------
>>>> A Fail event had been detected on md device /dev/md0.
>>>>
>>>> It could be related to component device /dev/sdb1.
>>>>
>>>> Faithfully yours, etc.
>>>>
>>>> P.S. The /proc/mdstat file currently contains the following:
>>>>
>>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>>         4891712 blocks [2/1] [U_]
>>>>
>>>> md2 : active raid1 sdb3[1] sda3[0]
>>>>         459073344 blocks [2/2] [UU]
>>>>
>>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>>         488383936 blocks [2/2] [UU]
>>>>
>>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>>         24418688 blocks [2/1] [U_]
>>>>
>>>> unused devices:<none>
>>>> -------------------------------------
>>>> A Fail event had been detected on md device /dev/md1.
>>>>
>>>> It could be related to component device /dev/sdb2.
>>>>
>>>> Faithfully yours, etc.
>>>>
>>>> P.S. The /proc/mdstat file currently contains the following:
>>>>
>>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>>         4891712 blocks [2/1] [U_]
>>>>
>>>> md2 : active raid1 sdb3[1] sda3[0]
>>>>         459073344 blocks [2/2] [UU]
>>>>
>>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>>         488383936 blocks [2/2] [UU]
>>>>
>>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>>         24418688 blocks [2/1] [U_]
>>>>
>>>> unused devices:<none>
>>>> -------------------------------------
>>>> A Fail event had been detected on md device /dev/md2.
>>>>
>>>> It could be related to component device /dev/sdb3.
>>>>
>>>> Faithfully yours, etc.
>>>>
>>>> P.S. The /proc/mdstat file currently contains the following:
>>>>
>>>> Personalities : [raid1] [raid6] [raid5] [raid4] [multipath]
>>>> md1 : active raid1 sdb2[2](F) sda2[0]
>>>>         4891712 blocks [2/1] [U_]
>>>>
>>>> md2 : active raid1 sdb3[2](F) sda3[0]
>>>>         459073344 blocks [2/1] [U_]
>>>>
>>>> md3 : active raid1 sdd1[1] sdc1[0]
>>>>         488383936 blocks [2/2] [UU]
>>>>
>>>> md0 : active raid1 sdb1[2](F) sda1[0]
>>>>         24418688 blocks [2/1] [U_]
>>>>
>>>> unused devices:<none>
>>>> -------------------------------------
>>
>>
>> Got another problem. Removed the drive and tried to start it back up and now
>> get Grub Error 2. I'm not sure if when I did the mirrors if something when
>> wrong with installing grub on the second drive<  or if is has to do with [U_]
>> which points to sda in that report instead of [_U].
>>
>> I know I pulled the correct drive. I had it labled sdb, it's the second
>> drive in the bios bootup drive check and it's the second connector on the
>> board. And when I put just it in instead of the other, I got the noise
>> again.  I think last time a drive failed it was one of these two drives
>> because I remember recopying grub.
>>
>> I do have another computer setup the same way, that I could put this
>> remaining drive on to get grub fixed, but it's a bit of a pain to get the
>> other computer hooked back up and I will have to dig through my notes about
>> getting grub setup without messing up the array and stuff. I do know that
>> both computers have been updated to grub 2
>
>
> How did you install Grub on the second drive? I have seen some
> instructions on the web that would not allow the system to boot if the
> first drive failed or was removed.
>


I think this is how I did it, at least it is what I had in my notes:

grub-install /dev/sda && grub-install /dev/sdb

And this is from my notes also. It was from an IRC chat. Don't know if 
it was the raid channel or the grub channel:

[14:02] <Jordan_U> Vorg: No. First, what is the output of grub-install 
--version?
[14:02] <Vorg>  (GNU GRUB 1.98~20100115-1)
[14:04] <Jordan_U> Vorg: Ok, then run "grub-install /dev/sda && 
grub-install /dev/sdb" (where sda and sdb are the members of the array)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-02 15:42       ` Timothy D. Lenz
@ 2011-09-03 11:35         ` Simon Matthews
  2011-09-03 12:17           ` Robin Hill
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Matthews @ 2011-09-03 11:35 UTC (permalink / raw)
  To: Timothy D. Lenz; +Cc: linux-raid

On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz <tlenz@vorgon.com> wrote:
>
>>
>> How did you install Grub on the second drive? I have seen some
>> instructions on the web that would not allow the system to boot if the
>> first drive failed or was removed.
>>
>
>
> I think this is how I did it, at least it is what I had in my notes:
>
> grub-install /dev/sda && grub-install /dev/sdb
>
> And this is from my notes also. It was from an IRC chat. Don't know if it
> was the raid channel or the grub channel:
>
> [14:02] <Jordan_U> Vorg: No. First, what is the output of grub-install
> --version?
> [14:02] <Vorg>  (GNU GRUB 1.98~20100115-1)
> [14:04] <Jordan_U> Vorg: Ok, then run "grub-install /dev/sda && grub-install
> /dev/sdb" (where sda and sdb are the members of the array)
>

Which is exactly my point. You installed grub on /dev/sdb such that it
would  boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
hard drive that was /dev/sdb is now /dev/sda, but Grub is still
looking for its files on the non-existent /dev/sdb.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-03 11:35         ` Simon Matthews
@ 2011-09-03 12:17           ` Robin Hill
  2011-09-03 17:03             ` Simon Matthews
                               ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Robin Hill @ 2011-09-03 12:17 UTC (permalink / raw)
  To: Simon Matthews; +Cc: Timothy D. Lenz, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]

On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:

> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz <tlenz@vorgon.com> wrote:
> >
> >>
> >> How did you install Grub on the second drive? I have seen some
> >> instructions on the web that would not allow the system to boot if the
> >> first drive failed or was removed.
> >>
> >
> >
> > I think this is how I did it, at least it is what I had in my notes:
> >
> > grub-install /dev/sda && grub-install /dev/sdb
> >
> > And this is from my notes also. It was from an IRC chat. Don't know if it
> > was the raid channel or the grub channel:
> >
> > [14:02] <Jordan_U> Vorg: No. First, what is the output of grub-install
> > --version?
> > [14:02] <Vorg>  (GNU GRUB 1.98~20100115-1)
> > [14:04] <Jordan_U> Vorg: Ok, then run "grub-install /dev/sda && grub-install
> > /dev/sdb" (where sda and sdb are the members of the array)
> >
> 
> Which is exactly my point. You installed grub on /dev/sdb such that it
> would  boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
> looking for its files on the non-existent /dev/sdb.
> 
The way I do it is to run grub, then for each drive do:
    device (hd0) /dev/sdX
    root (hd0,0)
    setup (hd0)

That should set up each drive to boot up as the first drive.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-03 12:17           ` Robin Hill
@ 2011-09-03 17:03             ` Simon Matthews
  2011-09-03 17:04               ` Simon Matthews
  2011-09-03 18:45             ` Timothy D. Lenz
  2011-09-05  8:57             ` CoolCold
  2 siblings, 1 reply; 18+ messages in thread
From: Simon Matthews @ 2011-09-03 17:03 UTC (permalink / raw)
  To: Simon Matthews, Timothy D. Lenz, linux-raid

On Sat, Sep 3, 2011 at 5:17 AM, Robin Hill <robin@robinhill.me.uk> wrote:
> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>
>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz <tlenz@vorgon.com> wrote:
>> >
>> >>
>> >> How did you install Grub on the second drive? I have seen some
>> >> instructions on the web that would not allow the system to boot if the
>> >> first drive failed or was removed.
>> >>
>> >
>> >
>> > I think this is how I did it, at least it is what I had in my notes:
>> >
>> > grub-install /dev/sda && grub-install /dev/sdb
>> >
>> > And this is from my notes also. It was from an IRC chat. Don't know if it
>> > was the raid channel or the grub channel:
>> >
>> > [14:02] <Jordan_U> Vorg: No. First, what is the output of grub-install
>> > --version?
>> > [14:02] <Vorg>  (GNU GRUB 1.98~20100115-1)
>> > [14:04] <Jordan_U> Vorg: Ok, then run "grub-install /dev/sda && grub-install
>> > /dev/sdb" (where sda and sdb are the members of the array)
>> >
>>
>> Which is exactly my point. You installed grub on /dev/sdb such that it
>> would  boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>> looking for its files on the non-existent /dev/sdb.
>>
> The way I do it is to run grub, then for each drive do:
>    device (hd0) /dev/sdX
>    root (hd0,0)
>    setup (hd0)
>
> That should set up each drive to boot up as the first drive.
>

How about (after installing grub on /dev/sda):
dd if=/dev/sda of=/dev/sdb bs=466 count=1

Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-03 17:03             ` Simon Matthews
@ 2011-09-03 17:04               ` Simon Matthews
  2011-09-09 22:01                 ` Bill Davidsen
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Matthews @ 2011-09-03 17:04 UTC (permalink / raw)
  To: Simon Matthews, Timothy D. Lenz, linux-raid

On Sat, Sep 3, 2011 at 10:03 AM, Simon Matthews
<simon.d.matthews@gmail.com> wrote:
> On Sat, Sep 3, 2011 at 5:17 AM, Robin Hill <robin@robinhill.me.uk> wrote:
>> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>>
>>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz <tlenz@vorgon.com> wrote:
>>> >
>>> >>
>>> >> How did you install Grub on the second drive? I have seen some
>>> >> instructions on the web that would not allow the system to boot if the
>>> >> first drive failed or was removed.
>>> >>
>>> >
>>> >
>>> > I think this is how I did it, at least it is what I had in my notes:
>>> >
>>> > grub-install /dev/sda && grub-install /dev/sdb
>>> >
>>> > And this is from my notes also. It was from an IRC chat. Don't know if it
>>> > was the raid channel or the grub channel:
>>> >
>>> > [14:02] <Jordan_U> Vorg: No. First, what is the output of grub-install
>>> > --version?
>>> > [14:02] <Vorg>  (GNU GRUB 1.98~20100115-1)
>>> > [14:04] <Jordan_U> Vorg: Ok, then run "grub-install /dev/sda && grub-install
>>> > /dev/sdb" (where sda and sdb are the members of the array)
>>> >
>>>
>>> Which is exactly my point. You installed grub on /dev/sdb such that it
>>> would  boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
>>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>>> looking for its files on the non-existent /dev/sdb.
>>>
>> The way I do it is to run grub, then for each drive do:
>>    device (hd0) /dev/sdX
>>    root (hd0,0)
>>    setup (hd0)
>>
>> That should set up each drive to boot up as the first drive.
>>
>
> How about (after installing grub on /dev/sda):
> dd if=/dev/sda of=/dev/sdb bs=466 count=1

ooops, that should be bs=446, NOT bs=466

Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-03 12:17           ` Robin Hill
  2011-09-03 17:03             ` Simon Matthews
@ 2011-09-03 18:45             ` Timothy D. Lenz
  2011-09-05  8:57             ` CoolCold
  2 siblings, 0 replies; 18+ messages in thread
From: Timothy D. Lenz @ 2011-09-03 18:45 UTC (permalink / raw)
  To: linux-raid



On 9/3/2011 5:17 AM, Robin Hill wrote:
> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>
>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz<tlenz@vorgon.com>  wrote:
>>>
>>>>
>>>> How did you install Grub on the second drive? I have seen some
>>>> instructions on the web that would not allow the system to boot if the
>>>> first drive failed or was removed.
>>>>
>>>
>>>
>>> I think this is how I did it, at least it is what I had in my notes:
>>>
>>> grub-install /dev/sda&&  grub-install /dev/sdb
>>>
>>> And this is from my notes also. It was from an IRC chat. Don't know if it
>>> was the raid channel or the grub channel:
>>>
>>> [14:02]<Jordan_U>  Vorg: No. First, what is the output of grub-install
>>> --version?
>>> [14:02]<Vorg>    (GNU GRUB 1.98~20100115-1)
>>> [14:04]<Jordan_U>  Vorg: Ok, then run "grub-install /dev/sda&&  grub-install
>>> /dev/sdb" (where sda and sdb are the members of the array)
>>>
>>
>> Which is exactly my point. You installed grub on /dev/sdb such that it
>> would  boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>> looking for its files on the non-existent /dev/sdb.
>>
> The way I do it is to run grub, then for each drive do:
>      device (hd0) /dev/sdX
>      root (hd0,0)
>      setup (hd0)
>
> That should set up each drive to boot up as the first drive.
>
> Cheers,
>      Robin


That is how I was trying to do it when I first set it up and was having 
problems with it not working. The grub people said not to do it that way 
because of a greater potential for problems.

The way I read the line I think I used, "&&" is used to put two commands 
on the same line, so it should have done both. But, If I did that from 
user vorg instead of user root, I would have needed sudo before both 
grub-install commands. I can't remember now what I did.

The second drive is teh one that died and was removed, but I guess if 
sda wasn't bootable, it could have been booting off of sdb the whole time.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-03 12:17           ` Robin Hill
  2011-09-03 17:03             ` Simon Matthews
  2011-09-03 18:45             ` Timothy D. Lenz
@ 2011-09-05  8:57             ` CoolCold
  2 siblings, 0 replies; 18+ messages in thread
From: CoolCold @ 2011-09-05  8:57 UTC (permalink / raw)
  To: Simon Matthews, Timothy D. Lenz, linux-raid

On Sat, Sep 3, 2011 at 4:17 PM, Robin Hill <robin@robinhill.me.uk> wrote:
> The way I do it is to run grub, then for each drive do:
>    device (hd0) /dev/sdX
>    root (hd0,0)
>    setup (hd0)
>
> That should set up each drive to boot up as the first drive.
$me does the same way, it works.

>
> Cheers,
>    Robin
> --
>     ___
>    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
>   / / )      | Little Jim says ....                            |
>  // !!       |      "He fallen in de water !!"                 |
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-01 17:51   ` Timothy D. Lenz
  2011-09-02  5:24     ` Simon Matthews
@ 2011-09-09 21:54     ` Bill Davidsen
  1 sibling, 0 replies; 18+ messages in thread
From: Bill Davidsen @ 2011-09-09 21:54 UTC (permalink / raw)
  To: Timothy D. Lenz; +Cc: linux-raid

Timothy D. Lenz wrote:
>
>
> On 8/26/2011 3:45 PM, NeilBrown wrote:
>
> Got another problem. Removed the drive and tried to start it back up 
> and now get Grub Error 2. I'm not sure if when I did the mirrors if 
> something when wrong with installing grub on the second drive< or if 
> is has to do with [U_] which points to sda in that report instead of 
> [_U].
>
> I know I pulled the correct drive. I had it labled sdb, it's the 
> second drive in the bios bootup drive check and it's the second 
> connector on the board. And when I put just it in instead of the 
> other, I got the noise again.  I think last time a drive failed it was 
> one of these two drives because I remember recopying grub.
>
> I do have another computer setup the same way, that I could put this 
> remaining drive on to get grub fixed, but it's a bit of a pain to get 
> the other computer hooked back up and I will have to dig through my 
> notes about getting grub setup without messing up the array and stuff. 
> I do know that both computers have been updated to grub 2
>
I like to check the table of device names vs. make/model/serno before 
going to far. I get the out of blkdevtrk to be sure.


-- 
Bill Davidsen<davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-03 17:04               ` Simon Matthews
@ 2011-09-09 22:01                 ` Bill Davidsen
  2011-09-12 20:56                   ` Timothy D. Lenz
  0 siblings, 1 reply; 18+ messages in thread
From: Bill Davidsen @ 2011-09-09 22:01 UTC (permalink / raw)
  To: Simon Matthews; +Cc: Timothy D. Lenz, linux-raid

Simon Matthews wrote:
> On Sat, Sep 3, 2011 at 10:03 AM, Simon Matthews
> <simon.d.matthews@gmail.com>  wrote:
>    
>> On Sat, Sep 3, 2011 at 5:17 AM, Robin Hill<robin@robinhill.me.uk>  wrote:
>>      
>>> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>>>
>>>        
>>>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz<tlenz@vorgon.com>  wrote:
>>>>          
>>>>>            
>>>>>> How did you install Grub on the second drive? I have seen some
>>>>>> instructions on the web that would not allow the system to boot if the
>>>>>> first drive failed or was removed.
>>>>>>
>>>>>>              
>>>>>
>>>>> I think this is how I did it, at least it is what I had in my notes:
>>>>>
>>>>> grub-install /dev/sda&&  grub-install /dev/sdb
>>>>>
>>>>> And this is from my notes also. It was from an IRC chat. Don't know if it
>>>>> was the raid channel or the grub channel:
>>>>>
>>>>> [14:02]<Jordan_U>  Vorg: No. First, what is the output of grub-install
>>>>> --version?
>>>>> [14:02]<Vorg>    (GNU GRUB 1.98~20100115-1)
>>>>> [14:04]<Jordan_U>  Vorg: Ok, then run "grub-install /dev/sda&&  grub-install
>>>>> /dev/sdb" (where sda and sdb are the members of the array)
>>>>>
>>>>>            
>>>> Which is exactly my point. You installed grub on /dev/sdb such that it
>>>> would  boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
>>>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>>>> looking for its files on the non-existent /dev/sdb.
>>>>
>>>>          
>>> The way I do it is to run grub, then for each drive do:
>>>     device (hd0) /dev/sdX
>>>     root (hd0,0)
>>>     setup (hd0)
>>>
>>> That should set up each drive to boot up as the first drive.
>>>
>>>        
>> How about (after installing grub on /dev/sda):
>> dd if=/dev/sda of=/dev/sdb bs=466 count=1
>>      
> ooops, that should be bs=446, NOT bs=466
>    

Which is why you use grub commands, because a typo can wipe out your 
drive. May or may not have in this case, but there's no reason to do 
stuff like that.

-- 
Bill Davidsen<davidsen@tmr.com>
   We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination.  -me, 2010




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Raid failing, which command to remove the bad drive?
  2011-09-09 22:01                 ` Bill Davidsen
@ 2011-09-12 20:56                   ` Timothy D. Lenz
  0 siblings, 0 replies; 18+ messages in thread
From: Timothy D. Lenz @ 2011-09-12 20:56 UTC (permalink / raw)
  To: linux-raid



On 9/9/2011 3:01 PM, Bill Davidsen wrote:
> Simon Matthews wrote:
>> On Sat, Sep 3, 2011 at 10:03 AM, Simon Matthews
>> <simon.d.matthews@gmail.com> wrote:
>>> On Sat, Sep 3, 2011 at 5:17 AM, Robin Hill<robin@robinhill.me.uk> wrote:
>>>> On Sat Sep 03, 2011 at 04:35:39 -0700, Simon Matthews wrote:
>>>>
>>>>> On Fri, Sep 2, 2011 at 8:42 AM, Timothy D. Lenz<tlenz@vorgon.com>
>>>>> wrote:
>>>>>>> How did you install Grub on the second drive? I have seen some
>>>>>>> instructions on the web that would not allow the system to boot
>>>>>>> if the
>>>>>>> first drive failed or was removed.
>>>>>>>
>>>>>>
>>>>>> I think this is how I did it, at least it is what I had in my notes:
>>>>>>
>>>>>> grub-install /dev/sda&& grub-install /dev/sdb
>>>>>>
>>>>>> And this is from my notes also. It was from an IRC chat. Don't
>>>>>> know if it
>>>>>> was the raid channel or the grub channel:
>>>>>>
>>>>>> [14:02]<Jordan_U> Vorg: No. First, what is the output of grub-install
>>>>>> --version?
>>>>>> [14:02]<Vorg> (GNU GRUB 1.98~20100115-1)
>>>>>> [14:04]<Jordan_U> Vorg: Ok, then run "grub-install /dev/sda&&
>>>>>> grub-install
>>>>>> /dev/sdb" (where sda and sdb are the members of the array)
>>>>>>
>>>>> Which is exactly my point. You installed grub on /dev/sdb such that it
>>>>> would boot off /dev/sdb. But if /dev/sda has failed, on reboot, the
>>>>> hard drive that was /dev/sdb is now /dev/sda, but Grub is still
>>>>> looking for its files on the non-existent /dev/sdb.
>>>>>
>>>> The way I do it is to run grub, then for each drive do:
>>>> device (hd0) /dev/sdX
>>>> root (hd0,0)
>>>> setup (hd0)
>>>>
>>>> That should set up each drive to boot up as the first drive.
>>>>
>>> How about (after installing grub on /dev/sda):
>>> dd if=/dev/sda of=/dev/sdb bs=466 count=1
>> ooops, that should be bs=446, NOT bs=466
>
> Which is why you use grub commands, because a typo can wipe out your
> drive. May or may not have in this case, but there's no reason to do
> stuff like that.
>

Found the problem:

[13:06] <Jordan_U> Vorg: That error is from grub legacy.
[13:08] <Jordan_U> Vorg: Grub2 doesn't use error numbers. "grub error 2" 
is from grub legacy.

I had updated the boot drives to the new Grub. Checked in bios and it 
was set to boot from SATA 3, then SATA 4 AND THEN SATA 1 :(. The second 
pair are data drives and where never ment to have grub.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-09-12 20:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-26 20:13 Raid failing, which command to remove the bad drive? Timothy D. Lenz
2011-08-26 21:25 ` Mathias Burén
2011-08-26 22:26   ` Timothy D. Lenz
2011-08-26 22:45     ` Mathias Burén
2011-08-26 23:14       ` Timothy D. Lenz
2011-08-26 22:45 ` NeilBrown
2011-09-01 17:51   ` Timothy D. Lenz
2011-09-02  5:24     ` Simon Matthews
2011-09-02 15:42       ` Timothy D. Lenz
2011-09-03 11:35         ` Simon Matthews
2011-09-03 12:17           ` Robin Hill
2011-09-03 17:03             ` Simon Matthews
2011-09-03 17:04               ` Simon Matthews
2011-09-09 22:01                 ` Bill Davidsen
2011-09-12 20:56                   ` Timothy D. Lenz
2011-09-03 18:45             ` Timothy D. Lenz
2011-09-05  8:57             ` CoolCold
2011-09-09 21:54     ` Bill Davidsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.