All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux raid wiki
@ 2016-09-22 23:31 Wols Lists
  2016-09-23 17:35 ` WNSDEV
  2016-09-24 13:18 ` Wols Lists
  0 siblings, 2 replies; 10+ messages in thread
From: Wols Lists @ 2016-09-22 23:31 UTC (permalink / raw)
  To: linux-raid

I've finally got myself access to the wiki :-) and have been working on 
bringing it a bit more up-to-date.

I've somewhat updated the overview section, so by all means be critical, 
but it's not mostly my work.

I've added the "When Things Go Wrogn" section, but so far only the first 
two pages - "Asking for help" and "Timeout Mismatch" - are all my work. 
The other three pages were already there, but I moved them here because 
I felt they belonged here.

Please feel free to criticize it (or offer bouquets :-), and give advice 
as how to improve things, either in private email or on the list.

I know we're a friendly bunch, but so many people seem to keep coming 
for the same problems over and over again, and I want this to be an 
up-to-date resource we can point people at to try and help them help 
themselves.

(And if you think any stuff is out-of-date and misleading, tell me. I'm 
planning to create a "softtware archaeology" section for all the stuff 
that is no longer relevant, but might be wanted by people looking after 
old systems.)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Linux raid wiki
  2016-09-22 23:31 Linux raid wiki Wols Lists
@ 2016-09-23 17:35 ` WNSDEV
  2016-09-23 18:43   ` Wols Lists
  2016-09-24 13:18 ` Wols Lists
  1 sibling, 1 reply; 10+ messages in thread
From: WNSDEV @ 2016-09-23 17:35 UTC (permalink / raw)
  To: 'Wols Lists', linux-raid

Hi Wol,

Thanks for your work on the wiki... I'm sending a virtual bouquet.   I've been tracking down an issue regarding role numbers which I've posted to the list http://marc.info/?l=linux-raid&m=147439325405842&w=2.  In that process I've come across other postings that say information in the wiki is incorrect so maybe this is an opportunity to update this part of the wiki.

According to  https://raid.wiki.kernel.org/index.php/Mdstat  we find:

"The raid role numbers [#] following each device indicate its role, or function, within the raid set. Any device with "n" or higher are spare disks. 0,1,..,n-1 are for the working array. Notice that there is no device 3."

But according to the following two postings, http://www.spinics.net/lists/raid/msg44491.html and http://www.spinics.net/lists/raid/msg49766.html
the information in the wiki about the numbers in brackets (role numbers ) is wrong.

Thanks,
Peter Sangas


-----Original Message-----
From: Wols Lists [mailto:antlists@youngman.org.uk] 
Sent: Thursday, September 22, 2016 4:32 PM
To: linux-raid@vger.kernel.org
Subject: Linux raid wiki

I've finally got myself access to the wiki :-) and have been working on bringing it a bit more up-to-date.

I've somewhat updated the overview section, so by all means be critical, but it's not mostly my work.

I've added the "When Things Go Wrogn" section, but so far only the first two pages - "Asking for help" and "Timeout Mismatch" - are all my work. 
The other three pages were already there, but I moved them here because I felt they belonged here.

Please feel free to criticize it (or offer bouquets :-), and give advice as how to improve things, either in private email or on the list.

I know we're a friendly bunch, but so many people seem to keep coming for the same problems over and over again, and I want this to be an up-to-date resource we can point people at to try and help them help themselves.

(And if you think any stuff is out-of-date and misleading, tell me. I'm planning to create a "softtware archaeology" section for all the stuff that is no longer relevant, but might be wanted by people looking after old systems.)

Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Linux raid wiki
  2016-09-23 17:35 ` WNSDEV
@ 2016-09-23 18:43   ` Wols Lists
  2016-09-23 19:02     ` Peter Sangas
  2016-09-24  3:38     ` Phil Turmel
  0 siblings, 2 replies; 10+ messages in thread
From: Wols Lists @ 2016-09-23 18:43 UTC (permalink / raw)
  To: WNSDEV, linux-raid

Thanks.

My understanding of role numbers is that they are the order of the disks
in which stripes are written, so role 0 has the first stripe, role 1 the
second, etc etc.

This is *normally* irrelevant, but should the array ever get trashed,
the disks need to be listed *in* *that* *order* in a new --create statement.

Can someone - Neil? Phil? - please confirm I've understood that
correctly before I update anything ...

(and I will be posting updates to the list - anything I'm not sure of I
will be asking here to check I get it right :-)

Cheers,
Wol

On 23/09/16 18:35, WNSDEV wrote:
> Hi Wol,
> 
> Thanks for your work on the wiki... I'm sending a virtual bouquet.   I've been tracking down an issue regarding role numbers which I've posted to the list http://marc.info/?l=linux-raid&m=147439325405842&w=2.  In that process I've come across other postings that say information in the wiki is incorrect so maybe this is an opportunity to update this part of the wiki.
> 
> According to  https://raid.wiki.kernel.org/index.php/Mdstat  we find:
> 
> "The raid role numbers [#] following each device indicate its role, or function, within the raid set. Any device with "n" or higher are spare disks. 0,1,..,n-1 are for the working array. Notice that there is no device 3."
> 
> But according to the following two postings, http://www.spinics.net/lists/raid/msg44491.html and http://www.spinics.net/lists/raid/msg49766.html
> the information in the wiki about the numbers in brackets (role numbers ) is wrong.
> 
> Thanks,
> Peter Sangas
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Linux raid wiki
  2016-09-23 18:43   ` Wols Lists
@ 2016-09-23 19:02     ` Peter Sangas
  2016-09-24  3:38     ` Phil Turmel
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Sangas @ 2016-09-23 19:02 UTC (permalink / raw)
  To: 'Wols Lists', linux-raid

Interesting: What about for a RAID1 array?.   My configuration is a 2 disk, 3 partition RAID1.  After installation of Ubuntu 16.04 the role numbers are 0 and 1.


Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4][raid10] 
md2 : active raid1 sda3[0] sdb3[1]
      634795008 blocks super 1.2 [2/2] [UU]
      bitmap: 0/5 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[0] sdb2[1]
      126887936 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active raid1 sda1[0] sdb1[1]
      19514368 blocks super 1.2 [2/2] [UU]


But after I replace one of the drives (sdb) with a new drive and sync it the role number of md0,sdb1 changes from 1 to 2?  Is that supposed to happen?
 

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4][raid10] 
md2 : active raid1 sdb3[1] sda3[0]
      634795008 blocks super 1.2 [2/2] [UU]
      bitmap: 0/5 pages [0KB], 65536KB chunk

md1 : active raid1 sdb2[1] sda2[0]
      126887936 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active raid1 sdb1[2] sda1[0]        
      19514368 blocks super 1.2 [2/2] [UU]


Thank you.

Pete
-----Original Message-----
From: Wols Lists [mailto:antlists@youngman.org.uk] 
Sent: Friday, September 23, 2016 11:43 AM
To: WNSDEV; linux-raid@vger.kernel.org
Subject: Re: Linux raid wiki

Thanks.

My understanding of role numbers is that they are the order of the disks in which stripes are written, so role 0 has the first stripe, role 1 the second, etc etc.

This is *normally* irrelevant, but should the array ever get trashed, the disks need to be listed *in* *that* *order* in a new --create statement.

Can someone - Neil? Phil? - please confirm I've understood that correctly before I update anything ...

(and I will be posting updates to the list - anything I'm not sure of I will be asking here to check I get it right :-)

Cheers,
Wol

On 23/09/16 18:35, WNSDEV wrote:
> Hi Wol,
> 
> Thanks for your work on the wiki... I'm sending a virtual bouquet.   I've been tracking down an issue regarding role numbers which I've posted to the list http://marc.info/?l=linux-raid&m=147439325405842&w=2.  In that process I've come across other postings that say information in the wiki is incorrect so maybe this is an opportunity to update this part of the wiki.
> 
> According to  https://raid.wiki.kernel.org/index.php/Mdstat  we find:
> 
> "The raid role numbers [#] following each device indicate its role, or function, within the raid set. Any device with "n" or higher are spare disks. 0,1,..,n-1 are for the working array. Notice that there is no device 3."
> 
> But according to the following two postings, 
> http://www.spinics.net/lists/raid/msg44491.html and 
> http://www.spinics.net/lists/raid/msg49766.html
> the information in the wiki about the numbers in brackets (role numbers ) is wrong.
> 
> Thanks,
> Peter Sangas
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Linux raid wiki
  2016-09-23 18:43   ` Wols Lists
  2016-09-23 19:02     ` Peter Sangas
@ 2016-09-24  3:38     ` Phil Turmel
  1 sibling, 0 replies; 10+ messages in thread
From: Phil Turmel @ 2016-09-24  3:38 UTC (permalink / raw)
  To: Wols Lists, WNSDEV, linux-raid

On 09/23/2016 02:43 PM, Wols Lists wrote:
> Thanks.
> 
> My understanding of role numbers is that they are the order of the disks
> in which stripes are written, so role 0 has the first stripe, role 1 the
> second, etc etc.

Yes, this is correct.  But the numbers in brackets in mdstat aren't the
role number, but the slot number in the superblock member list.  Which
only matches role number for the active devices at first creation.
Spares occupy additional slots, and keep their slot number for their
life in the array, including after assuming an active role.

> This is *normally* irrelevant, but should the array ever get trashed,
> the disks need to be listed *in* *that* *order* in a new --create statement.

You must use the "this device" role numbers in an mdadm -E report to be
sure, or use lsdrv.

mdstat is only useful for human review of array status.

Phil


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Linux raid wiki
  2016-09-22 23:31 Linux raid wiki Wols Lists
  2016-09-23 17:35 ` WNSDEV
@ 2016-09-24 13:18 ` Wols Lists
  2016-09-26 14:01   ` Phil Turmel
  1 sibling, 1 reply; 10+ messages in thread
From: Wols Lists @ 2016-09-24 13:18 UTC (permalink / raw)
  To: linux-raid

On 23/09/16 00:31, Wols Lists wrote:
> I've added the "When Things Go Wrogn" section, but so far only the first
> two pages - "Asking for help" and "Timeout Mismatch" - are all my work.
> The other three pages were already there, but I moved them here because
> I felt they belonged here.
> 
> Please feel free to criticize it (or offer bouquets :-), and give advice
> as how to improve things, either in private email or on the list.

Replying to myself, but I'm reasonably happy with the first three
sections in "When Things Go Wrogn". But it's important that they're
correct! Would a couple of experts mind looking them over and sending a
critique to the list? Just a simple "Looks good" would be great and set
my mind at rest that I have understood things properly and I'm not
giving out bad advice.

Note that the next section is going to be along the lines of "My array
won't assemble / run"

https://raid.wiki.kernel.org/index.php/Asking_for_help
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
https://raid.wiki.kernel.org/index.php/Replacing_a_failed_drive

Cheers,
Wol

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Linux raid wiki
  2016-09-24 13:18 ` Wols Lists
@ 2016-09-26 14:01   ` Phil Turmel
  2016-09-26 16:44     ` Wols Lists
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2016-09-26 14:01 UTC (permalink / raw)
  To: Wols Lists, linux-raid

Hi Wol,

A few comments below.

On 09/24/2016 09:18 AM, Wols Lists wrote:
> On 23/09/16 00:31, Wols Lists wrote:
>> I've added the "When Things Go Wrogn" section, but so far only the first
>> two pages - "Asking for help" and "Timeout Mismatch" - are all my work.
>> The other three pages were already there, but I moved them here because
>> I felt they belonged here.
>>
>> Please feel free to criticize it (or offer bouquets :-), and give advice
>> as how to improve things, either in private email or on the list.
> 
> Replying to myself, but I'm reasonably happy with the first three
> sections in "When Things Go Wrogn". But it's important that they're
> correct! Would a couple of experts mind looking them over and sending a
> critique to the list? Just a simple "Looks good" would be great and set
> my mind at rest that I have understood things properly and I'm not
> giving out bad advice.
> 
> Note that the next section is going to be along the lines of "My array
> won't assemble / run"
> 
> https://raid.wiki.kernel.org/index.php/Asking_for_help

"smartctl --all" doesn't report ERC settings.  --xall is required, or
for a somewhat shorter report, I find "smartctl -H -i -l scterc" ideal.

> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

Very good.

> https://raid.wiki.kernel.org/index.php/Replacing_a_failed_drive

You should note that USB connections are not suitable for permanent use.
 Copying a drive or doing a --replace, fine, but don't leave it set up
that way.  USB disconnects, even if only for sleep, will scramble the MD
code.

Also, any time ddrescue is used, the unreadable sectors are replaced
with zeros and there is no longer any indication that that sector is
bad.  That means assembling an array from ddrescued components will
certainly have some corrupt spots.  fsck is mandatory, and there may be
corrupt file content.  ddrescue is only appropriate if there's no
redundancy left in the array to use to fix UREs.

Overall, very good.

Phil



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Linux raid wiki
  2016-09-26 14:01   ` Phil Turmel
@ 2016-09-26 16:44     ` Wols Lists
  2016-09-26 21:19       ` Phil Turmel
  0 siblings, 1 reply; 10+ messages in thread
From: Wols Lists @ 2016-09-26 16:44 UTC (permalink / raw)
  To: Phil Turmel, linux-raid

On 26/09/16 15:01, Phil Turmel wrote:
> Hi Wol,
> 
> A few comments below.

Thank you very much.
> 
> On 09/24/2016 09:18 AM, Wols Lists wrote:
>> On 23/09/16 00:31, Wols Lists wrote:
>>> I've added the "When Things Go Wrogn" section, but so far only the first
>>> two pages - "Asking for help" and "Timeout Mismatch" - are all my work.
>>> The other three pages were already there, but I moved them here because
>>> I felt they belonged here.
>>>
>>> Please feel free to criticize it (or offer bouquets :-), and give advice
>>> as how to improve things, either in private email or on the list.
>>
>> Replying to myself, but I'm reasonably happy with the first three
>> sections in "When Things Go Wrogn". But it's important that they're
>> correct! Would a couple of experts mind looking them over and sending a
>> critique to the list? Just a simple "Looks good" would be great and set
>> my mind at rest that I have understood things properly and I'm not
>> giving out bad advice.
>>
>> Note that the next section is going to be along the lines of "My array
>> won't assemble / run"
>>
>> https://raid.wiki.kernel.org/index.php/Asking_for_help
> 
> "smartctl --all" doesn't report ERC settings.  --xall is required, or
> for a somewhat shorter report, I find "smartctl -H -i -l scterc" ideal.
> 
Thanks. Noted and updated.

>> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> 
> Very good.
> 
>> https://raid.wiki.kernel.org/index.php/Replacing_a_failed_drive
> 
> You should note that USB connections are not suitable for permanent use.
>  Copying a drive or doing a --replace, fine, but don't leave it set up
> that way.  USB disconnects, even if only for sleep, will scramble the MD
> code.

Noted. I've added a bit to say don't use USB for raid but it's okay for
salvaging a drive.
> 
> Also, any time ddrescue is used, the unreadable sectors are replaced
> with zeros and there is no longer any indication that that sector is
> bad.  That means assembling an array from ddrescued components will
> certainly have some corrupt spots.  fsck is mandatory, and there may be
> corrupt file content.  ddrescue is only appropriate if there's no
> redundancy left in the array to use to fix UREs.
> 
Or if there are no errors in the copy ...

That section tries to stress that it only applies if there are no
errors. And if you complete it successfully, you won't lose any data.

> Overall, very good.
> 
The next section -

https://raid.wiki.kernel.org/index.php/Assemble_Run

addresses what to do if the array is messed up in some way. Would you
mind taking a look at that now too :-)

Cheers,
Wol


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Linux raid wiki
  2016-09-26 16:44     ` Wols Lists
@ 2016-09-26 21:19       ` Phil Turmel
  2016-09-26 21:37         ` Wols Lists
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2016-09-26 21:19 UTC (permalink / raw)
  To: Wols Lists, linux-raid

On 09/26/2016 12:44 PM, Wols Lists wrote:
> The next section -
> 
> https://raid.wiki.kernel.org/index.php/Assemble_Run
> 
> addresses what to do if the array is messed up in some way. Would you
> mind taking a look at that now too :-)

Hmmm.  The last bit is less than ideal.  If all drives are faulty, but
runnable in the array with at least one drive of redundancy, the best
way to put good drives in service is one-by-one mdadm --replace.  That
lets the redundancy fix any errors, and doesn't load down the problem
drive any more than ddrescue would.  And it has the benefit of
increasing reliability as you go.

If you don't have any redundancy left, then ddrescue of all readable
drives is reasonable.

Phil


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Linux raid wiki
  2016-09-26 21:19       ` Phil Turmel
@ 2016-09-26 21:37         ` Wols Lists
  0 siblings, 0 replies; 10+ messages in thread
From: Wols Lists @ 2016-09-26 21:37 UTC (permalink / raw)
  To: Phil Turmel, linux-raid

On 26/09/16 22:19, Phil Turmel wrote:
> On 09/26/2016 12:44 PM, Wols Lists wrote:
>> The next section -
>>
>> https://raid.wiki.kernel.org/index.php/Assemble_Run
>>
>> addresses what to do if the array is messed up in some way. Would you
>> mind taking a look at that now too :-)
> 
> Hmmm.  The last bit is less than ideal.  If all drives are faulty, but
> runnable in the array with at least one drive of redundancy, the best
> way to put good drives in service is one-by-one mdadm --replace.  That
> lets the redundancy fix any errors, and doesn't load down the problem
> drive any more than ddrescue would.  And it has the benefit of
> increasing reliability as you go.
> 
> If you don't have any redundancy left, then ddrescue of all readable
> drives is reasonable.
> 
Which is the state this page is meant to cover. I'm assuming that if you
get this far, you have at an absolute minimum had to have done a
"--force --assemble" just to get a working array.

Each page has been intended to successively cover the state of the array
getting worse. The next page is "my metadata is trashed - mdadm says the
drive doesn't exist". I'm really not sure how to tackle that, but I know
there are several threads which cover damaged or trashed superblocks. I
think I'll just have to say "this is how you track down a GPT. This is
how you track down a superblock. This is how you interpret it. Get the
experts to help you put the array together again."

Cheers,
Wol


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-09-26 21:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 23:31 Linux raid wiki Wols Lists
2016-09-23 17:35 ` WNSDEV
2016-09-23 18:43   ` Wols Lists
2016-09-23 19:02     ` Peter Sangas
2016-09-24  3:38     ` Phil Turmel
2016-09-24 13:18 ` Wols Lists
2016-09-26 14:01   ` Phil Turmel
2016-09-26 16:44     ` Wols Lists
2016-09-26 21:19       ` Phil Turmel
2016-09-26 21:37         ` Wols Lists

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.