All of lore.kernel.org
 help / color / mirror / Atom feed
* MD RAID6 corrupted by Avago 9260-4i controller
@ 2016-05-15 12:45 Wolfgang Denk
  2016-05-15 13:37 ` Wolfgang Denk
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-15 12:45 UTC (permalink / raw)
  To: linux-raid

Hi,

I managed to kill a RAID6... My old server mainboard died, the new one
did not have PCI-X any more, so I bought it with an Avago 9260-4i, of
course after asking (but not verifying in the net, sic) that I can
export the disks as plain JBOD. Well, you cannot. So I played around
with a set of spare disks and realized that you can configure a RAID0
consisting of a single disk drive, and when you skip the initialization
of the array, it basically does what I need.  OK, so I added my real
disks, and things looked fine.  Then I added some new disks and
decided to set them up as a HW RAID6 to compare performance.  But,
when I intended to start initialization of this new RAID6, the Avago
firmware silently also initalized all my previously untouched old
disks, and boom!

The original array was created this way:

# mdadm --create --verbose /dev/md2 --metadata=1.2 --level=6 \
	--raid-devices=6 --chunk=16 --assume-clean /dev/sd[abefgh]
mdadm: layout defaults to left-symmetric
mdadm: size set to 976762448K
mdadm: array /dev/md2 started.

# mdadm -Q --detail /dev/md2
/dev/md2:
        Version : 1.02
  Creation Time : Tue Jan 18 12:38:15 2011
     Raid Level : raid6
     Array Size : 3907049792 (3726.05 GiB 4000.82 GB)
  Used Dev Size : 1953524896 (1863.03 GiB 2000.41 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Tue Jan 18 12:38:15 2011
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 16K

           Name : 2
           UUID : 7ae2c7ac:74b4b307:69c2de0e:a2735e73
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       2       8       64        2      active sync   /dev/sde
       3       8       80        3      active sync   /dev/sdf
       4       8       96        4      active sync   /dev/sdg
       5       8      112        5      active sync   /dev/sdh


Now, I see this instead:

# cat /proc/mdstat 
Personalities : [raid0] [raid1] 
md120 : active raid0 sdf[0]
      976224256 blocks super external:/md0/5 256k chunks
      
md121 : active raid0 sde[0]
      976224256 blocks super external:/md0/4 256k chunks
      
md122 : active raid0 sdd[0]
      976224256 blocks super external:/md0/3 256k chunks
      
md123 : active raid0 sdc[0]
      976224256 blocks super external:/md0/2 256k chunks
      
md124 : active raid0 sdb[0]
      976224256 blocks super external:/md0/1 256k chunks
      
md125 : active raid0 sda[0]
      976224256 blocks super external:/md0/0 256k chunks
      
md126 : inactive sda[5](S) sdf[4](S) sde[3](S) sdd[2](S) sdc[1](S) sdb[0](S)
      3229968 blocks super external:ddf

# mdadm -Q --detail /dev/md126
/dev/md126:
        Version : ddf
     Raid Level : container
  Total Devices : 6

Working Devices : 6

 Container GUID : 4C534920:20202020:10000079:10009260:446872B3:105E9355
                  (LSI      05/14/16 08:23:15)
            Seq : 00000019
  Virtual Disks : 11

  Member Arrays : /dev/md120 /dev/md121 /dev/md122 /dev/md123 /dev/md124 /dev/md125

    Number   Major   Minor   RaidDevice

       0       8       16        -        /dev/sdb
       1       8       32        -        /dev/sdc
       2       8       48        -        /dev/sdd
       3       8       64        -        /dev/sde
       4       8       80        -        /dev/sdf
       5       8        0        -        /dev/sda

# mdadm -Q --detail /dev/md120
/dev/md120:
      Container : /dev/md0, member 5
     Raid Level : raid0
     Array Size : 976224256 (931.00 GiB 999.65 GB)
   Raid Devices : 1
  Total Devices : 1

          State : clean 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

 Container GUID : 4C534920:20202020:10000079:10009260:446872B3:105E9355
                  (LSI      05/14/16 08:23:15)
            Seq : 00000019
  Virtual Disks : 11

    Number   Major   Minor   RaidDevice State
       0       8       80        0      active sync   /dev/sdf

# mdadm -Q --detail /dev/md121
/dev/md121:
      Container : /dev/md0, member 4
     Raid Level : raid0
     Array Size : 976224256 (931.00 GiB 999.65 GB)
   Raid Devices : 1
  Total Devices : 1

          State : clean 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

 Container GUID : 4C534920:20202020:10000079:10009260:446872B3:105E9355
                  (LSI      05/14/16 08:23:15)
            Seq : 00000019
  Virtual Disks : 11

    Number   Major   Minor   RaidDevice State
       0       8       64        0      active sync   /dev/sde

# mdadm -Q --detail /dev/md122
/dev/md122:
      Container : /dev/md0, member 3
     Raid Level : raid0
     Array Size : 976224256 (931.00 GiB 999.65 GB)
   Raid Devices : 1
  Total Devices : 1

          State : clean 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

 Container GUID : 4C534920:20202020:10000079:10009260:446872B3:105E9355
                  (LSI      05/14/16 08:23:15)
            Seq : 00000019
  Virtual Disks : 11

    Number   Major   Minor   RaidDevice State
       0       8       48        0      active sync   /dev/sdd

# mdadm -Q --detail /dev/md123
/dev/md123:
      Container : /dev/md0, member 2
     Raid Level : raid0
     Array Size : 976224256 (931.00 GiB 999.65 GB)
   Raid Devices : 1
  Total Devices : 1

          State : clean 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

 Container GUID : 4C534920:20202020:10000079:10009260:446872B3:105E9355
                  (LSI      05/14/16 08:23:15)
            Seq : 00000019
  Virtual Disks : 11

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc

# mdadm -Q --detail /dev/md124
/dev/md124:
      Container : /dev/md0, member 1
     Raid Level : raid0
     Array Size : 976224256 (931.00 GiB 999.65 GB)
   Raid Devices : 1
  Total Devices : 1

          State : clean 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

 Container GUID : 4C534920:20202020:10000079:10009260:446872B3:105E9355
                  (LSI      05/14/16 08:23:15)
            Seq : 00000019
  Virtual Disks : 11

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb

# mdadm -Q --detail /dev/md125
/dev/md125:
      Container : /dev/md0, member 0
     Raid Level : raid0
     Array Size : 976224256 (931.00 GiB 999.65 GB)
   Raid Devices : 1
  Total Devices : 1

          State : clean 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

 Container GUID : 4C534920:20202020:10000079:10009260:446872B3:105E9355
                  (LSI      05/14/16 08:23:15)
            Seq : 00000019
  Virtual Disks : 11

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda


Yes, I know I was stupid, but can anybody help? Is there a way to get
the old RAID6 setup running, just to recover the data (we have
backups on tape, but I figure the restore takes long....)

For the record: there were also two disks which had partitions which
were used for 2 x RAID1 arrays; these survived the Avago's firmware
initialization:

# mdadm -Q --detail /dev/md126
/dev/md126:
        Version : 1.0
  Creation Time : Fri Jan 21 11:34:46 2011
     Raid Level : raid1
     Array Size : 262132 (256.03 MiB 268.42 MB)
  Used Dev Size : 262132 (256.03 MiB 268.42 MB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sun May 15 08:10:48 2016
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : localhost.localdomain:3
           UUID : 28815077:9fe434a1:7fbd6fbb:46816ee0
         Events : 847

    Number   Major   Minor   RaidDevice State
       0       8       97        0      active sync   /dev/sdg1
       1       8      113        1      active sync   /dev/sdh1
# mdadm -Q --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Wed Jan 19 07:28:49 2011
     Raid Level : raid1
     Array Size : 970206800 (925.26 GiB 993.49 GB)
  Used Dev Size : 970206800 (925.26 GiB 993.49 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sun May 15 08:41:23 2016
          State : active 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : castor.denx.de:4
           UUID : 0551c50c:30e757d4:83368de2:9a8ff1e1
         Events : 38662

    Number   Major   Minor   RaidDevice State
       2       8       99        0      active sync   /dev/sdg3
       3       8      115        1      active sync   /dev/sdh3


Thanks in advance!

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
I am more bored than you could ever possibly be.  Go back to work.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller
  2016-05-15 12:45 MD RAID6 corrupted by Avago 9260-4i controller Wolfgang Denk
@ 2016-05-15 13:37 ` Wolfgang Denk
  2016-05-15 15:31   ` Andreas Klauer
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-15 13:37 UTC (permalink / raw)
  To: linux-raid

Hi again,

In message <20160515124534.A42D0100879@atlas.denx.de> I wrote:
> 
> I managed to kill a RAID6...
...

Trying to follow the overlay method in [1], I run into errors; guess I
must be missing something:


[1] https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file


# DEVICES="/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf"

# ls /dev/loop*
/dev/loop-control  /dev/loop0  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4
# parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}' ::: $DEVICES
# ls /dev/loop*
/dev/loop-control  /dev/loop0  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5  /dev/loop6

# parallel truncate -s4000G overlay-{/} ::: $DEVICES

# parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES

# OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
# echo $OVERLAYS 
/dev/mapper/sda /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf

# dmsetup status
castor2-git_backup: 0 67108864 linear 
live-base: 0 12582912 linear 
castor2-f22: 0 67108864 linear 
castor2-root: 0 134217728 linear 
castor2-f19: 0 100663296 linear 
castor2-f19: 100663296 33554432 linear 
castor2-f21: 0 134217728 linear 
castor2-f18: 0 67108864 linear 
castor2-f20: 0 67108864 linear 
sdf: 0 1953525168 snapshot 16/8388608000 16
sde: 0 1953525168 snapshot 16/8388608000 16
sdd: 0 1953525168 snapshot 16/8388608000 16
live-osimg-min: 0 12582912 snapshot 3688/3688 24
sdc: 0 1953525168 snapshot 16/8388608000 16
live-rw: 0 12582912 snapshot 462664/1048576 1816
sdb: 0 1953525168 snapshot 16/8388608000 16
sda: 0 1953525168 snapshot 16/8388608000 16

# devices="/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf"
# overlay_create()
> {
>         free=$((`stat -c '%a*%S/1024/1024' -f .`))
>         echo free ${free}M
>         overlays=""
>         overlay_remove
>         for d in $devices; do
>                 b=$(basename $d)
>                 size_bkl=$(blockdev --getsz $d) # in 512 blocks/sectors
>                 # reserve 1M space for snapshot header
>                 # ext3 max file length is 2TB   
>                 truncate -s$((((size_bkl+1)/2)+1024))K $b.ovr || (echo "Do you use ext4?"; return 1)
>                 loop=$(losetup -f --show -- $b.ovr)
>                 # https://www.kernel.org/doc/Documentation/device-mapper/snapshot.txt
>                 dmsetup create $b --table "0 $size_bkl snapshot $d $loop P 8"
>                 echo $d $((size_bkl/2048))M $loop /dev/mapper/$b
>                 overlays="$overlays /dev/mapper/$b"
>         done
>         overlays=${overlays# }
> }
# overlay_remove()
> {
>         for d in $devices; do
>                 b=$(basename $d)
>                 [ -e /dev/mapper/$b ] && dmsetup remove $b && echo /dev/mapper/$b 
>                 if [ -e $b.ovr ]; then
>                         echo $b.ovr
>                         l=$(losetup -j $b.ovr | cut -d : -f1)
>                         echo $l
>                         [ -n "$l" ] && losetup -d $(losetup -j $b.ovr | cut -d : -f1)
>                         rm -f $b.ovr &> /dev/null
>                 fi
>         done
> }

# echo $OVERLAYS
/dev/mapper/sda /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
#  mdadm --create --force --verbose /dev/md2 --metadata=1.2 --level=6 --raid-devices=6 --chunk=16 --assume-clean  $OVERLAYS
mdadm: layout defaults to left-symmetric
mdadm: super1.x cannot open /dev/mapper/sda: Device or resource busy
mdadm: /dev/mapper/sda is not suitable for this array.
mdadm: super1.x cannot open /dev/mapper/sdb: Device or resource busy
mdadm: /dev/mapper/sdb is not suitable for this array.
mdadm: super1.x cannot open /dev/mapper/sdc: Device or resource busy
mdadm: /dev/mapper/sdc is not suitable for this array.
mdadm: super1.x cannot open /dev/mapper/sdd: Device or resource busy
mdadm: /dev/mapper/sdd is not suitable for this array.
mdadm: super1.x cannot open /dev/mapper/sde: Device or resource busy
mdadm: /dev/mapper/sde is not suitable for this array.
mdadm: super1.x cannot open /dev/mapper/sdf: Device or resource busy
mdadm: /dev/mapper/sdf is not suitable for this array.
mdadm: create aborted

mdadm --assemble --force /dev/md2  $OVERLAYS
mdadm: /dev/mapper/sda is busy - skipping
mdadm: /dev/mapper/sdb is busy - skipping
mdadm: /dev/mapper/sdc is busy - skipping
mdadm: /dev/mapper/sdd is busy - skipping
mdadm: /dev/mapper/sde is busy - skipping
mdadm: /dev/mapper/sdf is busy - skipping

# overlay_create
free 1843M
device-mapper: remove ioctl on sda failed: Device or resource busy
Command failed
device-mapper: remove ioctl on sdb failed: Device or resource busy
Command failed
device-mapper: remove ioctl on sdc failed: Device or resource busy
Command failed
device-mapper: remove ioctl on sdd failed: Device or resource busy
Command failed
device-mapper: remove ioctl on sde failed: Device or resource busy
Command failed
device-mapper: remove ioctl on sdf failed: Device or resource busy
Command failed
device-mapper: create ioctl on sda failed: Device or resource busy
Command failed
/dev/sda 953869M /dev/loop11 /dev/mapper/sda
device-mapper: create ioctl on sdb failed: Device or resource busy
Command failed
/dev/sdb 953869M /dev/loop12 /dev/mapper/sdb
device-mapper: create ioctl on sdc failed: Device or resource busy
Command failed
/dev/sdc 953869M /dev/loop13 /dev/mapper/sdc
device-mapper: create ioctl on sdd failed: Device or resource busy
Command failed
/dev/sdd 953869M /dev/loop14 /dev/mapper/sdd
device-mapper: create ioctl on sde failed: Device or resource busy
Command failed
/dev/sde 953869M /dev/loop15 /dev/mapper/sde
device-mapper: create ioctl on sdf failed: Device or resource busy
Command failed
/dev/sdf 953869M /dev/loop16 /dev/mapper/sdf


What am I doing wrong?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
"Do we define evil as the absence of goodness? It seems only  logical
that shit happens--we discover this by the process of elimination."
                                                        -- Larry Wall

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller
  2016-05-15 13:37 ` Wolfgang Denk
@ 2016-05-15 15:31   ` Andreas Klauer
  2016-05-15 18:25     ` MD RAID6 corrupted by Avago 9260-4i controller [SOLVED] Wolfgang Denk
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Klauer @ 2016-05-15 15:31 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

On Sun, May 15, 2016 at 03:37:40PM +0200, Wolfgang Denk wrote:
> Trying to follow the overlay method in [1], I run into errors; guess I
> must be missing something:
>
> [1] https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

I think you mixed two approaches to the same thing, the wiki shows
a) how to create overlays manually and b) offers some convenience 
functions that do the same thing (the overlay create remove functions, 
you define those functions once and then you can repeatedly call them, 
basically giving you two commands overlay_create and overlay_remove).

It should work if you use only this part:

> # devices="/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf"
> # overlay_create()
> > {
> >         free=$((`stat -c '%a*%S/1024/1024' -f .`))
> >         echo free ${free}M
> >         overlays=""
> >         overlay_remove
> >         for d in $devices; do
> >                 b=$(basename $d)
> >                 size_bkl=$(blockdev --getsz $d) # in 512 blocks/sectors
> >                 # reserve 1M space for snapshot header
> >                 # ext3 max file length is 2TB   
> >                 truncate -s$((((size_bkl+1)/2)+1024))K $b.ovr || (echo "Do you use ext4?"; return 1)
> >                 loop=$(losetup -f --show -- $b.ovr)
> >                 # https://www.kernel.org/doc/Documentation/device-mapper/snapshot.txt
> >                 dmsetup create $b --table "0 $size_bkl snapshot $d $loop P 8"
> >                 echo $d $((size_bkl/2048))M $loop /dev/mapper/$b
> >                 overlays="$overlays /dev/mapper/$b"
> >         done
> >         overlays=${overlays# }
> > }
> # overlay_remove()
> > {
> >         for d in $devices; do
> >                 b=$(basename $d)
> >                 [ -e /dev/mapper/$b ] && dmsetup remove $b && echo /dev/mapper/$b 
> >                 if [ -e $b.ovr ]; then
> >                         echo $b.ovr
> >                         l=$(losetup -j $b.ovr | cut -d : -f1)
> >                         echo $l
> >                         [ -n "$l" ] && losetup -d $(losetup -j $b.ovr | cut -d : -f1)
> >                         rm -f $b.ovr &> /dev/null
> >                 fi
> >         done
> > }

And then call 'overlay_create' when you want your overlays, 
and 'overlay_remove; overlay create' when an experiment 
failed and you want to reset them to their original state.

At the time you remove the overlays, all things using them 
must also be gone, so mdadm --stop before overlay_remove. 
(And make sure no raid is running for the disks you're 
overlaying...)

As for your controller, I don't know this controller. If it's a HW-RAID 
that passes individual disks through as RAID-0, usually some sectors of 
the disk are missing (controller has to keep RAID-0 metadata somewhere) 
and that alone might be enough to damage your old setup in some way.

I prefer "dumb" controllers that pass through disks the way they are.

You showed --detail output of your old RAID; that's already very good, 
is there --examine output by any chance? --detail doesn't contain some 
things such as data offsets, and the ones mdadm picks by default have 
changed a lot, so the same --create command won't actually produce 
the same RAID. If your old RAID metadata is actually lost, if you wish 
to experiment with --create on the overlay, you'll have to specify all 
variables you know and guess the variables you don't know...

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-15 15:31   ` Andreas Klauer
@ 2016-05-15 18:25     ` Wolfgang Denk
  2016-05-15 18:31       ` Andreas Klauer
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-15 18:25 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Dear Andreas,

thanks for the quyick reply, and on a Sunday...

In message <20160515153121.GA11365@EIS.leimen.priv> you wrote:
> On Sun, May 15, 2016 at 03:37:40PM +0200, Wolfgang Denk wrote:
> > Trying to follow the overlay method in [1], I run into errors; guess I
> > must be missing something:
> >
> > [1] https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
> 
> I think you mixed two approaches to the same thing, the wiki shows
> a) how to create overlays manually and b) offers some convenience 
> functions that do the same thing (the overlay create remove functions, 
> you define those functions once and then you can repeatedly call them, 
> basically giving you two commands overlay_create and overlay_remove).

Yes, you are right.  I now realized this, too.

> And then call 'overlay_create' when you want your overlays, 
> and 'overlay_remove; overlay create' when an experiment 
> failed and you want to reset them to their original state.
> 
> At the time you remove the overlays, all things using them 
> must also be gone, so mdadm --stop before overlay_remove. 
> (And make sure no raid is running for the disks you're 
> overlaying...)

Thanks - this was the key that got me working.

After creating the overlys, the system would automatically start the
(incorrect) RAID arrays.  After manually stopping these, I had write
access to the overlays.

Recovering my data was (fortunately) simple:

1) I zeroed the incorrect superblocks on all devices:

	# mdadm --zero-superblock /dev/mapper/sda
	# mdadm --zero-superblock /dev/mapper/sdb
	# mdadm --zero-superblock /dev/mapper/sdc
	# mdadm --zero-superblock /dev/mapper/sdd
	# mdadm --zero-superblock /dev/mapper/sde
	# mdadm --zero-superblock /dev/mapper/sdf

2) Then I forced an assemble of the array:

	# mdadm --assemble --force --verbose /dev/md2 --metadata=1.2
	# $overlays
	mdadm: looking for devices for /dev/md2
	mdadm: /dev/mapper/sda is identified as a member of /dev/md2, slot 1.
	mdadm: /dev/mapper/sdb is identified as a member of /dev/md2, slot 0.
	mdadm: /dev/mapper/sdc is identified as a member of /dev/md2, slot 2.
	mdadm: /dev/mapper/sdd is identified as a member of /dev/md2, slot 3.
	mdadm: /dev/mapper/sde is identified as a member of /dev/md2, slot 5.
	mdadm: /dev/mapper/sdf is identified as a member of /dev/md2, slot 4.
	mdadm: added /dev/mapper/sda to /dev/md2 as 1
	mdadm: added /dev/mapper/sdc to /dev/md2 as 2
	mdadm: added /dev/mapper/sdd to /dev/md2 as 3
	mdadm: added /dev/mapper/sdf to /dev/md2 as 4
	mdadm: added /dev/mapper/sde to /dev/md2 as 5
	mdadm: added /dev/mapper/sdb to /dev/md2 as 0
	mdadm: /dev/md2 has been started with 6 drives.

And me was happy again.

I owe you a beer or two.  Please don't hesitate to remind me whenever
we meet...  Thanks again.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
"You can have my Unix system when you  pry  it  from  my  cold,  dead
fingers."                                                - Cal Keegan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-15 18:25     ` MD RAID6 corrupted by Avago 9260-4i controller [SOLVED] Wolfgang Denk
@ 2016-05-15 18:31       ` Andreas Klauer
  2016-05-15 19:35         ` Wolfgang Denk
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Klauer @ 2016-05-15 18:31 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

On Sun, May 15, 2016 at 08:25:24PM +0200, Wolfgang Denk wrote:
> After creating the overlys, the system would automatically start the
> (incorrect) RAID arrays.

That should be courtesy of udev, see if you have a 

    /lib/udev/rules.d/64-md-raid-assembly.rules

and if that's the case you can temporarily disable them by

    touch /etc/udev/rules.d/64-md-raid-assembly.rules

and (later) re-enable by rm'ing the /etc/ file.

> Recovering my data was (fortunately) simple:

Congrats!

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-15 18:31       ` Andreas Klauer
@ 2016-05-15 19:35         ` Wolfgang Denk
  2016-05-15 20:34           ` Andreas Klauer
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-15 19:35 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Dear Andreas,

In message <20160515183128.GA12823@EIS.leimen.priv> you wrote:
> On Sun, May 15, 2016 at 08:25:24PM +0200, Wolfgang Denk wrote:
> > After creating the overlys, the system would automatically start the
> > (incorrect) RAID arrays.
> 
> That should be courtesy of udev, see if you have a 
> 
>     /lib/udev/rules.d/64-md-raid-assembly.rules
> 
> and if that's the case you can temporarily disable them by
> 
>     touch /etc/udev/rules.d/64-md-raid-assembly.rules
> 
> and (later) re-enable by rm'ing the /etc/ file.

Thanks.

> > Recovering my data was (fortunately) simple:

Unfortunately my luck did not last very long.  While copying the first
file system from the recovered array, the system crashed - can't tell
why, when I got to the console it was all black :-(

So I tried to repeat the same procedure, but it does not work any
more:  after erasing the superblocks my attempts to assemble the array
now give only:

	#  mdadm --assemble  /dev/md2 --metadata=1.2  $overlays
	mdadm: no RAID superblock on /dev/mapper/sda
	mdadm: /dev/mapper/sda has no superblock - assembly aborted

[which is what I had initially expected; I have no idea why it worked
once, but not a second time.]

OK, I can recreate the array, but LVM does not recognize it.

You mentioned I had to play around with the offsets - do you have any
idea which values would be reasonable to try out?

Thanks in advance - once more...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
You can't depend on your eyes when your imagination is out of  focus.
- Mark Twain

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-15 19:35         ` Wolfgang Denk
@ 2016-05-15 20:34           ` Andreas Klauer
  2016-05-15 23:10             ` Wolfgang Denk
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Klauer @ 2016-05-15 20:34 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

On Sun, May 15, 2016 at 09:35:16PM +0200, Wolfgang Denk wrote:
> So I tried to repeat the same procedure, but it does not work any
> more:  after erasing the superblocks my attempts to assemble the array
> now give only:

That's strange, if it worked once, it should work twice (if no changes 
were made outside of the overlays). If you have created the udev rules 
file, just in case there is a side effect, remove it again ...

> 	#  mdadm --assemble  /dev/md2 --metadata=1.2  $overlays
> 	mdadm: no RAID superblock on /dev/mapper/sda
> 	mdadm: /dev/mapper/sda has no superblock - assembly aborted

What does mdadm --examine /dev/mapper/sda say before and after
you --zero-superblock? (is it the same for the other disks?)
 
> OK, I can recreate the array, but LVM does not recognize it.
> 
> You mentioned I had to play around with the offsets - do you have any
> idea which values would be reasonable to try out?

If it's LVM, you could search the first few hundred megs of your devices for
the LVM header, that should be the data offset you're looking for.

    hexdump -C -n $((1024*1024*1024)) /dev/mapper/sda | less

and then search for LABELONE

    /LABELONE

| 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| *
| 00000200  4c 41 42 45 4c 4f 4e 45  01 00 00 00 00 00 00 00  |LABELONE........|
| 00000210  c1 11 05 3d 20 00 00 00  4c 56 4d 32 20 30 30 31  |...= ...LVM2 001|
| 00000220  36 46 41 75 32 49 70 65  38 62 75 50 31 46 32 75  |6FAu2Ipe8buP1F2u|
| 00000230  38 4a 41 34 41 45 64 51  54 49 49 6b 4f 76 45 35  |8JA4AEdQTIIkOvE5|

That's LABELONE at offset 0x200 and you have to substract 512 bytes from it, 
so this is actually what it would look like for offset 0.

Alternatively this can also be done using

    dd bs=1M count=1024 if=... | strings -t d -n 8 | grep LABELONE
    512 LABELONE

if your version of strings supports this option.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-15 20:34           ` Andreas Klauer
@ 2016-05-15 23:10             ` Wolfgang Denk
  2016-05-16  8:39               ` Andreas Klauer
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-15 23:10 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Dear Andreas,

In message <20160515203446.GA13218@EIS.leimen.priv> you wrote:
>
> That's strange, if it worked once, it should work twice (if no changes 
> were made outside of the overlays). If you have created the udev rules 
> file, just in case there is a side effect, remove it again ...

NO, I think it's me to blame.  If I read the bash history correctly,
I made a mistake (confusing $DEVICES and $devices and $overlays)
and ran the --zero-superblock on the real disk devices, not the overlays.

On the other hand, that should not make a real difference if it worked
for the overlays.  Well, it does...

> What does mdadm --examine /dev/mapper/sda say before and after
> you --zero-superblock? (is it the same for the other disks?)

--zero-superblock now porperly reports that it cannot find a
superblock...

> If it's LVM, you could search the first few hundred megs of your devices for
> the LVM header, that should be the data offset you're looking for.

Ok, but...

> That's LABELONE at offset 0x200 and you have to substract 512 bytes from it, 
> so this is actually what it would look like for offset 0.

That would be the --data-offset= parameter?

> Alternatively this can also be done using
> 
>     dd bs=1M count=1024 if=... | strings -t d -n 8 | grep LABELONE
>     512 LABELONE
> 
> if your version of strings supports this option.

It does - but I'm confused as I get a number of different values from
the devices:

# for i in /dev/mapper/sd? ; do
> echo $i
> dd bs=1M count=1024 if=$i | strings -t d -n 8 | grep LABELONE
> done
/dev/mapper/sda
 139776 LABELONE
...
/dev/mapper/sdb
...
/dev/mapper/sdc
396826104 LABELONE
...
/dev/mapper/sdd
...
/dev/mapper/sde
387503608 LABELONE
389663524 LABELONE
...
/dev/mapper/sdf
398969636 LABELONE
...

So I have 5 numbers; minus 512 and converted to kB gives:

   139776 ->    139264 ->    136
396826104 -> 396825592 -> 887524.99
387503608 -> 387503096 -> 878410.99
389663524 -> 389663012 -> 380530.28
398969636 -> 398969124 -> 889618.28

Of these, only 136 appears to make sense.  But running

# mdadm --create --verbose /dev/md2 --metadata=1.2 --level=6   --raid-devices=6 --chunk=16 --assume-clean --data-offset=136 /dev/mapper/sd?
mdadm: layout defaults to left-symmetric
mdadm: size set to 976762448K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: array /dev/md2 started.

...creates an array, but LVM does not recognise it.

Searching the resulting /dev/md2 for the LABELONE does not find any.

Am I misinterpreting your information?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
The sixth sick sheik’s sixth sheep’s sick.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-15 23:10             ` Wolfgang Denk
@ 2016-05-16  8:39               ` Andreas Klauer
  2016-05-16 10:06                 ` Wolfgang Denk
  2016-05-16 12:06                 ` Wolfgang Denk
  0 siblings, 2 replies; 16+ messages in thread
From: Andreas Klauer @ 2016-05-16  8:39 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

On Mon, May 16, 2016 at 01:10:46AM +0200, Wolfgang Denk wrote:
> NO, I think it's me to blame.  If I read the bash history correctly,
> I made a mistake (confusing $DEVICES and $devices and $overlays)
> and ran the --zero-superblock on the real disk devices, not the overlays.

Ooh. :(
 
> On the other hand, that should not make a real difference if it worked
> for the overlays.  Well, it does...

About that, I don't know either.

> That would be the --data-offset= parameter?

Yes.

> It does - but I'm confused as I get a number of different values from
> the devices:

You can ignore those that don't even align to 512 bytes.

> So I have 5 numbers; minus 512 and converted to kB gives:
> 
> Of these, only 136 appears to make sense.  But running
> 
> # mdadm --create --verbose /dev/md2 --metadata=1.2 --level=6 --raid-devices=6 --chunk=16 --assume-clean --data-offset=136 /dev/mapper/sd?
> 
> ...creates an array, but LVM does not recognise it.

So, now here's a puzzle.

First, you can use hexdump after all to have a look at the first chunk 
(assuming the 136KiB you found is actually the data offset).

dd bs=136K skip=1 if=/dev/mapper/sda | hexdump -C | less

(same for sd?)

LVM metadata is in plaintext, example:

| 00001200  53 53 44 20 7b 0a 69 64  20 3d 20 22 74 58 4a 43  |SSD {.id = "tXJC|
| 00001210  77 31 2d 71 51 69 6e 2d  4b 78 31 6b 2d 30 65 78  |w1-qQin-Kx1k-0ex|
| 00001220  79 2d 32 6e 4d 76 2d 6a  63 57 78 2d 4f 48 70 76  |y-2nMv-jcWx-OHpv|
| 00001230  76 69 22 0a 73 65 71 6e  6f 20 3d 20 36 38 0a 66  |vi".seqno = 68.f|
| 00001240  6f 72 6d 61 74 20 3d 20  22 6c 76 6d 32 22 0a 73  |ormat = "lvm2".s|
| 00001250  74 61 74 75 73 20 3d 20  5b 22 52 45 53 49 5a 45  |tatus = ["RESIZE|
| 00001260  41 42 4c 45 22 2c 20 22  52 45 41 44 22 2c 20 22  |ABLE", "READ", "|
| 00001270  57 52 49 54 45 22 5d 0a  66 6c 61 67 73 20 3d 20  |WRITE"].flags = |

For me this starts at offset 0x1200 (roughly 4K) should be well within 
your 16K chunk. It should look similar for you on one of your disks if 
the offset is correct.

You are using your disks in alphabetical order, are you sure this is 
the same order your RAID originally used? Maybe the drive letters 
changed?

You found LABELONE on sda, which is your first drive (Device Role 0) 
in your RAID (see mdadm --examine after you create it), but when I 
create a new RAID based on loop devices, pvcreate and vgcreate it, 
the LABELONE actually appears on the 2nd drive (Device Role 1).

| # cat /proc/mdstat 
| Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
| md42 : active raid6 loop5[5] loop4[4] loop3[3] loop2[2] loop1[1] loop0[0]
|       64960 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/6] [UUUUUU]

| # strings -t d -n 8 /dev/loop0 | grep LABELONE
| # strings -t d -n 8 /dev/loop1 | grep LABELONE
|  139776 LABELONE

| # dd bs=136K skip=1 if=/dev/loop1 | hexdump -C -n $((16*1024)) | head
| 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| *
| 00000200  4c 41 42 45 4c 4f 4e 45  01 00 00 00 00 00 00 00  |LABELONE........|
| 00000210  e0 3d e7 de 20 00 00 00  4c 56 4d 32 20 30 30 31  |.=.. ...LVM2 001|
| 00000220  58 48 67 41 68 45 70 76  4c 53 36 61 41 62 4d 77  |XHgAhEpvLS6aAbMw|
| 00000230  50 6e 4c 57 4c 64 4a 46  6d 36 30 54 48 66 6d 75  |PnLWLdJFm60THfmu|

So I assume in the default left-symmetric layout in RAID6, the first chunk 
of the first disk actually ends up being a parity chunk...? I'm not too 
sure about this either right now. ;)

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-16  8:39               ` Andreas Klauer
@ 2016-05-16 10:06                 ` Wolfgang Denk
  2016-05-16 10:24                   ` Andreas Klauer
  2016-05-16 12:06                 ` Wolfgang Denk
  1 sibling, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-16 10:06 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Dear Andreas,

In message <20160516083903.GA29380@EIS.leimen.priv> you wrote:
> 
> You are using your disks in alphabetical order, are you sure this is 
> the same order your RAID originally used? Maybe the drive letters 
> changed?

Yes, this is one thing I am absolutely sure about. I have the mapping
of the disk serial numbers from initial install, and I verified that
the drive order is still the same.

> You found LABELONE on sda, which is your first drive (Device Role 0) 
> in your RAID (see mdadm --examine after you create it), but when I 
> create a new RAID based on loop devices, pvcreate and vgcreate it, 
> the LABELONE actually appears on the 2nd drive (Device Role 1).

Thi sis strage; I see you are using 6 disks and the same stripe size,
so I would also expect a layout like yours.  OK, I need to experiment
a bit...

> So I assume in the default left-symmetric layout in RAID6, the first chunk 
> of the first disk actually ends up being a parity chunk...? I'm not too 
> sure about this either right now. ;)

Hm... is --data-offset the only parameter I can play with?  (except for
variations of drive order, which appear to make no sense to me as I'm
sure I have it correct).

Then I could also start a brute force approach and just try out all
possible values until I gind a match ;-)

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
[Braddock:] Mr. Churchill, you are drunk.
[Churchill:] And you madam, are ugly.  But I shall be sober tomorrow.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-16 10:06                 ` Wolfgang Denk
@ 2016-05-16 10:24                   ` Andreas Klauer
  2016-05-16 11:05                     ` Wolfgang Denk
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Klauer @ 2016-05-16 10:24 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

On Mon, May 16, 2016 at 12:06:42PM +0200, Wolfgang Denk wrote:
> Hm... is --data-offset the only parameter I can play with?

There is also --layout=

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-16 10:24                   ` Andreas Klauer
@ 2016-05-16 11:05                     ` Wolfgang Denk
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-16 11:05 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Dear Andreas,

In message <20160516102418.GA2347@metamorpher.de> you wrote:
> On Mon, May 16, 2016 at 12:06:42PM +0200, Wolfgang Denk wrote:
> > Hm... is --data-offset the only parameter I can play with?
> 
> There is also --layout=

The output of the initial create command was 

	mdadm: layout defaults to left-symmetric
	mdadm: size set to 976762448K
	mdadm: array /dev/md2 started.

so this should be known...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
I think animal testing is a terrible idea; they get all  nervous  and
give the wrong answers.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-16  8:39               ` Andreas Klauer
  2016-05-16 10:06                 ` Wolfgang Denk
@ 2016-05-16 12:06                 ` Wolfgang Denk
  2016-05-16 12:58                   ` Wolfgang Denk
  1 sibling, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-16 12:06 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Dear Andreas,

In message <20160516083903.GA29380@EIS.leimen.priv> you wrote:
>
> First, you can use hexdump after all to have a look at the first chunk 
> (assuming the 136KiB you found is actually the data offset).
> 
> dd bs=136K skip=1 if=/dev/mapper/sda | hexdump -C | less
> 
> (same for sd?)
> 
> LVM metadata is in plaintext, example:

Well, this does not look so bad:

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200  4c 41 42 45 4c 4f 4e 45  01 00 00 00 00 00 00 00  |LABELONE........|
00000210  3c fe 50 23 20 00 00 00  4c 56 4d 32 20 30 30 31  |<.P# ...LVM2 001|
00000220  34 79 78 49 78 69 48 73  6a 68 79 64 48 6f 76 58  |4yxIxiHsjhydHovX|
00000230  55 4f 30 48 47 31 5a 70  51 45 50 53 33 43 49 61  |UO0HG1ZpQEPS3CIa|
00000240  00 00 65 83 a3 03 00 00  00 00 03 00 00 00 00 00  |..e.............|
00000250  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000260  00 00 00 00 00 00 00 00  00 10 00 00 00 00 00 00  |................|
00000270  00 f0 02 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000280  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  3a fe b8 5d 20 4c 56 4d  32 20 78 5b 35 41 25 72  |:..] LVM2 x[5A%r|
00001010  30 4e 2a 3e 01 00 00 00  00 10 00 00 00 00 00 00  |0N*>............|
00001020  00 f0 02 00 00 00 00 00  00 80 00 00 00 00 00 00  |................|
00001030  46 0b 00 00 00 00 00 00  65 8d d7 42 00 00 00 00  |F.......e..B....|
00001040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001200  63 61 73 74 6f 72 30 20  7b 0a 69 64 20 3d 20 22  |castor0 {.id = "|
00001210  56 33 52 6e 30 55 2d 47  4d 41 64 2d 34 55 73 4e  |V3Rn0U-GMAd-4UsN|
00001220  2d 61 6d 35 69 2d 66 50  59 50 2d 41 43 37 70 2d  |-am5i-fPYP-AC7p-|
00001230  66 4d 50 51 32 35 22 0a  73 65 71 6e 6f 20 3d 20  |fMPQ25".seqno = |
00001240  31 0a 73 74 61 74 75 73  20 3d 20 5b 22 52 45 53  |1.status = ["RES|
00001250  49 5a 45 41 42 4c 45 22  2c 20 22 52 45 41 44 22  |IZEABLE", "READ"|
00001260  2c 20 22 57 52 49 54 45  22 5d 0a 65 78 74 65 6e  |, "WRITE"].exten|
00001270  74 5f 73 69 7a 65 20 3d  20 38 31 39 32 0a 6d 61  |t_size = 8192.ma|
00001280  78 5f 6c 76 20 3d 20 30  0a 6d 61 78 5f 70 76 20  |x_lv = 0.max_pv |
00001290  3d 20 30 0a 0a 70 68 79  73 69 63 61 6c 5f 76 6f  |= 0..physical_vo|
000012a0  6c 75 6d 65 73 20 7b 0a  0a 70 76 30 20 7b 0a 69  |lumes {..pv0 {.i|
000012b0  64 20 3d 20 22 34 79 78  49 78 69 2d 48 73 6a 68  |d = "4yxIxi-Hsjh|
000012c0  2d 79 64 48 6f 2d 76 58  55 4f 2d 30 48 47 31 2d  |-ydHo-vXUO-0HG1-|
000012d0  5a 70 51 45 2d 50 53 33  43 49 61 22 0a 64 65 76  |ZpQE-PS3CIa".dev|
000012e0  69 63 65 20 3d 20 22 2f  64 65 76 2f 6d 64 32 22  |ice = "/dev/md2"|
000012f0  0a 0a 73 74 61 74 75 73  20 3d 20 5b 22 41 4c 4c  |..status = ["ALL|
00001300  4f 43 41 54 41 42 4c 45  22 5d 0a 64 65 76 5f 73  |OCATABLE"].dev_s|
00001310  69 7a 65 20 3d 20 37 38  31 34 30 39 39 35 38 34  |ize = 7814099584|
00001320  0a 70 65 5f 73 74 61 72  74 20 3d 20 33 38 34 0a  |.pe_start = 384.|
00001330  70 65 5f 63 6f 75 6e 74  20 3d 20 39 35 33 38 36  |pe_count = 95386|
00001340  39 0a 7d 0a 7d 0a 0a 7d  0a 23 20 47 65 6e 65 72  |9.}.}..}.# Gener|
00001350  61 74 65 64 20 62 79 20  4c 56 4d 32 20 76 65 72  |ated by LVM2 ver|
00001360  73 69 6f 6e 20 32 2e 30  32 2e 33 39 20 28 32 30  |sion 2.02.39 (20|
00001370  30 38 2d 30 36 2d 32 37  29 3a 20 54 75 65 20 4a  |08-06-27): Tue J|
00001380  61 6e 20 31 38 20 31 33  3a 30 31 3a 30 31 20 32  |an 18 13:01:01 2|
00001390  30 31 31 0a 0a 63 6f 6e  74 65 6e 74 73 20 3d 20  |011..contents = |
...

> For me this starts at offset 0x1200 (roughly 4K) should be well within 
> your 16K chunk. It should look similar for you on one of your disks if 
> the offset is correct.

Confirmed.  So we can assume the offset if OK...

> You are using your disks in alphabetical order, are you sure this is 
> the same order your RAID originally used? Maybe the drive letters 
> changed?

I rechecked again...

> You found LABELONE on sda, which is your first drive (Device Role 0) 
> in your RAID (see mdadm --examine after you create it), but when I 
> create a new RAID based on loop devices, pvcreate and vgcreate it, 
> the LABELONE actually appears on the 2nd drive (Device Role 1).

Confirmed. When I create a new array and pvcreate and vgcreate it, I
also see the LABELONE on /dev/mapper/sdb, at offset

	134218240 LABELONE

= 131072 kB.

OK, so I started playing around with the disk order - even though I
checked yet another time from the disk serial numbers that the drive
order "a b c d e f" is what was used when initially creating the
array. When swapping the first two disks (so sda where the LABELONE
is present) becomes the second disk (i. e. "b a c d e f"), then LVM
will recognize the volume group and volumes, but data are corrupted.

So I guess I have to try the possible permutations (probably with sda
being the second disk only).

Doing that now.  But I have no idea what could cause this...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Just because your doctor has a name for your condition  doesn't  mean
he knows what it is.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-16 12:06                 ` Wolfgang Denk
@ 2016-05-16 12:58                   ` Wolfgang Denk
  2016-05-16 13:14                     ` Andreas Klauer
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-16 12:58 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Dear Andreas,

In message <20160516120600.5428910035C@atlas.denx.de> I wrote:
> 
...
> OK, so I started playing around with the disk order - even though I
> checked yet another time from the disk serial numbers that the drive
> order "a b c d e f" is what was used when initially creating the
> array. When swapping the first two disks (so sda where the LABELONE
> is present) becomes the second disk (i. e. "b a c d e f"), then LVM
> will recognize the volume group and volumes, but data are corrupted.
> 
> So I guess I have to try the possible permutations (probably with sda
> being the second disk only).

Seems I was lucky - already the second of the 120 possible
combination turned out to be working: b a c d f e

But I still have not the slightest idea why the drive order might have
changed...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Do you suppose the reason the ends of the `Intel Inside'  logo  don't
match up is that it was drawn on a Pentium?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-16 12:58                   ` Wolfgang Denk
@ 2016-05-16 13:14                     ` Andreas Klauer
  2016-05-17 18:42                       ` Wolfgang Denk
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Klauer @ 2016-05-16 13:14 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

On Mon, May 16, 2016 at 02:58:01PM +0200, Wolfgang Denk wrote:
> Seems I was lucky - already the second of the 120 possible
> combination turned out to be working: b a c d f e

Find a large enough file (disks * chunksize) and verify it.

Sometimes you can be unlucky, i.e. the LVM is detected, the filesystem 
mounts, but still data is corrupt because the wrong two disks switched 
places (just not the ones that contain filesystem metadata).
 
> But I still have not the slightest idea why the drive order might have
> changed...

Me neither. :)

With GPT partition table, I set PARTLABEL to mdnumber-role so that's 
another place that has metadata in case mdadm loses its own... 
Since GPT lives at beginning and end of the disk it should have a 
good chance of surviving accidents, and you can address them as 
/dev/disk/by-partlabel/mdnumber-* in the correct order...

Anyway, glad you (hopefully, finally?) got your data back.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: MD RAID6 corrupted by Avago 9260-4i controller [SOLVED]
  2016-05-16 13:14                     ` Andreas Klauer
@ 2016-05-17 18:42                       ` Wolfgang Denk
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfgang Denk @ 2016-05-17 18:42 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Dear Andreas,

In message <20160516131439.GA2850@metamorpher.de> you wrote:
> On Mon, May 16, 2016 at 02:58:01PM +0200, Wolfgang Denk wrote:
> > Seems I was lucky - already the second of the 120 possible
> > combination turned out to be working: b a c d f e
> 
> Find a large enough file (disks * chunksize) and verify it.

Running "fsck -f -n" over one of the (big, multi million files) file
systems turned out to be a quick and good enough test.

> With GPT partition table, I set PARTLABEL to mdnumber-role so that's 
> another place that has metadata in case mdadm loses its own... 
> Since GPT lives at beginning and end of the disk it should have a 
> good chance of surviving accidents, and you can address them as 
> /dev/disk/by-partlabel/mdnumber-* in the correct order...

So far I did not use any partitioning at all on such drives.  never
needed it...

> Anyway, glad you (hopefully, finally?) got your data back.

Yes, indeed, all data recovered.  And I really owe you a beer...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Lispers are among  the  best  grads  of  the  Sweep-It-Under-Someone-
Else's-Carpet  School of Simulated Simplicity. [Was that sufficiently
incendiary? :-)]  - Larry Wall in <1992Jan10.201804.11926@netlabs.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-05-17 18:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-15 12:45 MD RAID6 corrupted by Avago 9260-4i controller Wolfgang Denk
2016-05-15 13:37 ` Wolfgang Denk
2016-05-15 15:31   ` Andreas Klauer
2016-05-15 18:25     ` MD RAID6 corrupted by Avago 9260-4i controller [SOLVED] Wolfgang Denk
2016-05-15 18:31       ` Andreas Klauer
2016-05-15 19:35         ` Wolfgang Denk
2016-05-15 20:34           ` Andreas Klauer
2016-05-15 23:10             ` Wolfgang Denk
2016-05-16  8:39               ` Andreas Klauer
2016-05-16 10:06                 ` Wolfgang Denk
2016-05-16 10:24                   ` Andreas Klauer
2016-05-16 11:05                     ` Wolfgang Denk
2016-05-16 12:06                 ` Wolfgang Denk
2016-05-16 12:58                   ` Wolfgang Denk
2016-05-16 13:14                     ` Andreas Klauer
2016-05-17 18:42                       ` Wolfgang Denk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.