All of lore.kernel.org
 help / color / mirror / Atom feed
* How will mdadm handle a wrongly added drive, when the original comes back on line?
@ 2015-08-03 13:14 Wilson, Jonathan
  2015-08-03 23:42 ` Adam Goryachev
  0 siblings, 1 reply; 3+ messages in thread
From: Wilson, Jonathan @ 2015-08-03 13:14 UTC (permalink / raw)
  To: linux-raid

Due to a bug in the driver for a Marvel chipset 4 port SATA card I think
I may have added an empty drive partition into a raid6 array and when I
get a new card I it will end up seeing not only the new drive, but also
the "missing" drive.

Events:
Upgraded jessie with latest updates (quite some time since I last did
it) and re-booted.

A 6 drive raid6 assembled, but all the drives were spare. Stopped the
array and did a mdadm --assemble /dev/md6.

It assembled with 5 drives, one missing.

Tried re-add, which failed, and then -add which completed ok.

Some time later I re-booted and the same problem happened.

All drives spare, stopped, assembled, added missing.

Its now working and I have a new card on order due to something going
badly wrong with the driver and/or card and/or chipset (Marvel 9230).

After some time passed after the second boot, I realised that one of my
drives was physically missing. I had a drive ready to go as a genuine
spare but not yet added as a spare to mdadm, so in theory it should have
been totally empty apart from a partition.

Now my problem is that firstly I can not be sure that when I looked
at /proc/mdstat/ and saw "all" the drives as spare there might have been
a missing one. (On either or both occasions.)

In my mdadm.config I don't specify the number of drives in the array,
just its name and the UUID.

Now my question is: if we call the drives in the array A,B,C,D,E,F and
the empty one G.

After the first boot I may have added G, so the array would be
A,B,C,D,E,G. (F missing from system)

After the second boot I may have added F back, so the array would be
A,B,C,D,E,F (G missing from system)

If after changing the card the system sees A,B,C,D,E,F,G how will mdadm
work? Will it fail to assemble as one of the drives is "extra" to the
metadata count (I assume even though I don't specify a count in the
conf, that internally on the partitions of the disks in the array it
knows there should be "6" disks.

Will it see that disk "7" is out of date/wrong count and decide it
should not be part of the array automagically?

If mdadm refuses to assemble the array, I assume I will need to assemble
it using a full list of the drives that should be in it? So if I check
all the disks metadata I should be able to see the current state of the
devices with --examine and then do
mdadm --assemble /dev/md6 /dev/sda6 /dev/sdb6 etc...

Then clear the superblock on the drive I know should not be a part of
the array?

I guess the important bit is the events count?

The version of the array is 1.2

mdadm - v3.3.2 - 21st August 2014

Linux borgCube 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u2
(2015-07-17) x86_64 GNU/Linux

./lsdrv is: (for the drives in the array, the scsi 6:x:x:x is the
missing disk)

> PCI [ahci] 03:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 10)
> ├scsi 6:x:x:x [Empty]
> ├scsi 7:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N6XD5SV6}
> │└sdg 2.73t [8:96] Partitioned (gpt)
> │ └sdg6 2.64t [8:102] MD raid6 (0/6) (w/ sdh6,sdi6,sdk6,sdl6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
> │  └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
> │   │                ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
> │   └Mounted as /dev/md6 @ /mnt/md6R6Backup
> ├scsi 8:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N7TS870E}
> │└sdh 2.73t [8:112] Partitioned (gpt)
> │ └sdh6 2.64t [8:118] MD raid6 (2/6) (w/ sdg6,sdi6,sdk6,sdl6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
> │  └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
> │                    ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
> ├scsi 9:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N3UR60ZV}
> │└sdi 2.73t [8:128] Partitioned (gpt)
> │ └sdi6 2.64t [8:134] MD raid6 (3/6) (w/ sdg6,sdh6,sdk6,sdl6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
> │  └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
> │                    ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
> ├scsi 10:x:x:x [Empty]
> ├scsi 11:x:x:x [Empty]
> └scsi 12:x:x:x [Empty]
> PCI [ahci] 05:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
> ├scsi 14:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N3YNA3SN}
> │└sdj 2.73t [8:144] Partitioned (gpt)
> │ └sdj5 2.64t [8:149] MD raid6 (4/5) (w/ sdc5,sdd5,sde5,sdf5) in_sync 'BorgCUBE:51' {b1cdd470-a412-bff3-e62d-cac6cafd8762}
> │  └md51 7.90t [9:51] MD v1.2 raid6 (5) clean, 512k Chunk {b1cdd470:a412bff3:e62dcac6:cafd8762}
> │                     ext4 'md51mnt' {63c25cf7-d1aa-48e5-97d1-25c34819889c}
> └scsi 15:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N4LL7U4E}
>  └sdk 2.73t [8:160] Partitioned (gpt)
>   └sdk6 2.64t [8:166] MD raid6 (1/6) (w/ sdg6,sdh6,sdi6,sdl6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
>    └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
>                      ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
> PCI [ahci] 08:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
> ├scsi 16:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N6XD5ZL3}
> │└sdl 2.73t [8:176] Partitioned (gpt)
> │ └sdl6 2.64t [8:182] MD raid6 (4/6) (w/ sdg6,sdh6,sdi6,sdk6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
> │  └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
> │                    ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
> └scsi 17:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N3UR625T}
>  └sdm 2.73t [8:192] Partitioned (gpt)
>   └sdm6 2.64t [8:198] MD raid6 (5/6) (w/ sdg6,sdh6,sdi6,sdk6,sdl6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
>    └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
>                      ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}


> ata7.00: ATA-7: MARVELL VIRTUALL, 1.09, max UDMA/66
> [    1.765384] ata7.00: 0 sectors, multi 0: LBA 
> [    1.765749] ata14.00: failed to IDENTIFY (device reports invalid type, err_mask=0x0)
> [    1.766116] ata14.00: revalidation failed (errno=-22)
> [    1.766461] ata14: limiting SATA link speed to 1.5 Gbps
> [    1.766793] ata14.00: limiting speed to UDMA/66:PIO3
> [    1.785901] usb 1-12: new full-speed USB device number 4 using xhci_hcd
> [    1.990155] usb 1-12: New USB device found, idVendor=0416, idProduct=e008
> [    1.990507] usb 1-12: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> [    1.990843] usb 1-12: Product: OLED Display Controller
> [    1.991184] usb 1-12: Manufacturer: Nuvoton
> [    1.991526] usb 1-12: SerialNumber: B02013031501
> [    2.006243] input: Nuvoton OLED Display Controller as /devices/pci0000:00/0000:00:14.0/usb1/1-12/1-12:1.0/0003:0416:E008.0002/input/input2
> [    2.007001] hid-generic 0003:0416:E008.0002: input,hidraw1: USB HID v1.10 Device [Nuvoton OLED Display Controller] on usb-0000:00:14.0-12/input0
> [    2.117789] usb 4-1: new high-speed USB device number 2 using ehci-pci
> [    2.250081] usb 4-1: New USB device found, idVendor=8087, idProduct=8008
> [    2.250511] usb 4-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> [    2.251086] hub 4-1:1.0: USB hub found
> [    2.251580] hub 4-1:1.0: 6 ports detected
> [    2.361715] usb 6-1: new high-speed USB device number 2 using ehci-pci
> [    2.494011] usb 6-1: New USB device found, idVendor=8087, idProduct=8000
> [    2.494446] usb 6-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> [    2.495038] hub 6-1:1.0: USB hub found
> [    2.495516] hub 6-1:1.0: 8 ports detected
> [    2.501791] Switched to clocksource tsc
> [    6.204677] ata8.00: ATA-9: WDC WD30EFRX-68EUZN0, 82.00A82, max UDMA/133
> [    6.205300] ata8.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
> [    6.205911] ata9.00: ATA-9: WDC WD30EFRX-68EUZN0, 82.00A82, max UDMA/133
> [    6.206526] ata9.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
> [    6.207245] ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [    6.207966] ata7.00: model number mismatch 'MARVELL VIRTUALL' != 'WDC WD30EFRX-68EUZN0'
> [    6.208615] ata7.00: revalidation failed (errno=-19)
> [    6.209256] ata7: limiting SATA link speed to 3.0 Gbps
> [    6.209896] ata7.00: limiting speed to UDMA/66:PIO3
> [    6.376797] ata8.00: configured for UDMA/133
> [    6.377453] ata9.00: configured for UDMA/133
> [    6.378094] ata10.00: ATA-9: WDC WD30EFRX-68EUZN0, 82.00A82, max UDMA/133
> [    6.378738] ata10.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
> [    6.380308] ata10.00: configured for UDMA/133
> [    6.696465] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [    6.697195] ata14.00: configured for UDMA/66
> [    6.704462] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
> [    6.705538] ata7.00: model number mismatch 'MARVELL VIRTUALL' != 'WDC WD30EFRX-68EUZN0'
> [    6.706016] ata7.00: revalidation failed (errno=-19)
> [    6.706496] ata7.00: disabled

ata14 is spurious, even when all was working ok that error would show
up. (I think its the 4 port sata card)

ata7 is sdg

Annoyingly the ata numbers don't actually correspond to the scsi
numbers, because ata6 is actually port 6 of the intel on board chipset.

scsi14-17 are 4 additional on board ports, and 6-12 are the four port
card.

The card allows for 7 drives, 4 of which can be expanded on a single
port via a multiplier/external esata cable. (only 1 port at a time can
be expanded) and I think this is what is causing the problem.

Hopefully I wont have to boot the system for a couple of days so can
take in any replies before either a power cut or the new card arrives.

Thanks in advance.

Jon

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How will mdadm handle a wrongly added drive, when the original comes back on line?
  2015-08-03 13:14 How will mdadm handle a wrongly added drive, when the original comes back on line? Wilson, Jonathan
@ 2015-08-03 23:42 ` Adam Goryachev
  2015-08-14  8:04   ` Wilson, Jonathan
  0 siblings, 1 reply; 3+ messages in thread
From: Adam Goryachev @ 2015-08-03 23:42 UTC (permalink / raw)
  To: Wilson, Jonathan, linux-raid

On 03/08/15 23:14, Wilson, Jonathan wrote:
> Due to a bug in the driver for a Marvel chipset 4 port SATA card I think
> I may have added an empty drive partition into a raid6 array and when I
> get a new card I it will end up seeing not only the new drive, but also
> the "missing" drive.
>
> Events:
> Upgraded jessie with latest updates (quite some time since I last did
> it) and re-booted.
>
> A 6 drive raid6 assembled, but all the drives were spare. Stopped the
> array and did a mdadm --assemble /dev/md6.
>
> It assembled with 5 drives, one missing.
>
> Tried re-add, which failed, and then -add which completed ok.
At this point the array should have done a resync to add the 6th drive.
> Some time later I re-booted and the same problem happened.
>
> All drives spare, stopped, assembled, added missing.
At this point the array should have done a resync to add the 6th drive. 
Whether this is the same "6th" drive or not doesn't matter.
> Its now working and I have a new card on order due to something going
> badly wrong with the driver and/or card and/or chipset (Marvel 9230).
>
> After some time passed after the second boot, I realised that one of my
> drives was physically missing. I had a drive ready to go as a genuine
> spare but not yet added as a spare to mdadm, so in theory it should have
> been totally empty apart from a partition.
>
> Now my problem is that firstly I can not be sure that when I looked
> at /proc/mdstat/ and saw "all" the drives as spare there might have been
> a missing one. (On either or both occasions.)
>
> In my mdadm.config I don't specify the number of drives in the array,
> just its name and the UUID.
>
> Now my question is: if we call the drives in the array A,B,C,D,E,F and
> the empty one G.
>
> After the first boot I may have added G, so the array would be
> A,B,C,D,E,G. (F missing from system)
>
> After the second boot I may have added F back, so the array would be
> A,B,C,D,E,F (G missing from system)
>
> If after changing the card the system sees A,B,C,D,E,F,G how will mdadm
> work? Will it fail to assemble as one of the drives is "extra" to the
> metadata count (I assume even though I don't specify a count in the
> conf, that internally on the partitions of the disks in the array it
> knows there should be "6" disks.
It should reject the "older" 6th drive because the event count will be 
older, and should auto-assemble with all the other drives. The older 
"6th" drive will either be spare, or not added to the array at all, and 
you would need to add it to the array for it to become a spare.
> Will it see that disk "7" is out of date/wrong count and decide it
> should not be part of the array automagically?
>
> If mdadm refuses to assemble the array, I assume I will need to assemble
> it using a full list of the drives that should be in it? So if I check
> all the disks metadata I should be able to see the current state of the
> devices with --examine and then do
> mdadm --assemble /dev/md6 /dev/sda6 /dev/sdb6 etc...
You should be able to list all 7 drives, and md should work it out 
without any issue.
> Then clear the superblock on the drive I know should not be a part of
> the array?
I don't think this should be needed, but if it is, be careful.
> I guess the important bit is the events count?
>
> The version of the array is 1.2
>
> mdadm - v3.3.2 - 21st August 2014
>
> Linux borgCube 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u2
> (2015-07-17) x86_64 GNU/Linux
>
> ./lsdrv is: (for the drives in the array, the scsi 6:x:x:x is the
> missing disk)
>
>> PCI [ahci] 03:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 10)
>> ├scsi 6:x:x:x [Empty]
>> ├scsi 7:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N6XD5SV6}
>> │└sdg 2.73t [8:96] Partitioned (gpt)
>> │ └sdg6 2.64t [8:102] MD raid6 (0/6) (w/ sdh6,sdi6,sdk6,sdl6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
>> │  └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
>> │   │                ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
>> │   └Mounted as /dev/md6 @ /mnt/md6R6Backup
>> ├scsi 8:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N7TS870E}
>> │└sdh 2.73t [8:112] Partitioned (gpt)
>> │ └sdh6 2.64t [8:118] MD raid6 (2/6) (w/ sdg6,sdi6,sdk6,sdl6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
>> │  └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
>> │                    ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
>> ├scsi 9:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N3UR60ZV}
>> │└sdi 2.73t [8:128] Partitioned (gpt)
>> │ └sdi6 2.64t [8:134] MD raid6 (3/6) (w/ sdg6,sdh6,sdk6,sdl6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
>> │  └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
>> │                    ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
>> ├scsi 10:x:x:x [Empty]
>> ├scsi 11:x:x:x [Empty]
>> └scsi 12:x:x:x [Empty]
>> PCI [ahci] 05:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
>> ├scsi 14:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N3YNA3SN}
>> │└sdj 2.73t [8:144] Partitioned (gpt)
>> │ └sdj5 2.64t [8:149] MD raid6 (4/5) (w/ sdc5,sdd5,sde5,sdf5) in_sync 'BorgCUBE:51' {b1cdd470-a412-bff3-e62d-cac6cafd8762}
>> │  └md51 7.90t [9:51] MD v1.2 raid6 (5) clean, 512k Chunk {b1cdd470:a412bff3:e62dcac6:cafd8762}
>> │                     ext4 'md51mnt' {63c25cf7-d1aa-48e5-97d1-25c34819889c}
>> └scsi 15:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N4LL7U4E}
>>   └sdk 2.73t [8:160] Partitioned (gpt)
>>    └sdk6 2.64t [8:166] MD raid6 (1/6) (w/ sdg6,sdh6,sdi6,sdl6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
>>     └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
>>                       ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
>> PCI [ahci] 08:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
>> ├scsi 16:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N6XD5ZL3}
>> │└sdl 2.73t [8:176] Partitioned (gpt)
>> │ └sdl6 2.64t [8:182] MD raid6 (4/6) (w/ sdg6,sdh6,sdi6,sdk6,sdm6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
>> │  └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
>> │                    ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
>> └scsi 17:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N3UR625T}
>>   └sdm 2.73t [8:192] Partitioned (gpt)
>>    └sdm6 2.64t [8:198] MD raid6 (5/6) (w/ sdg6,sdh6,sdi6,sdk6,sdl6) in_sync 'borgCube:R6Backup' {0e1215fc-1eab-5943-c28d-a7cb399353a3}
>>     └md6 10.54t [9:6] MD v1.2 raid6 (6) clean, 512k Chunk {0e1215fc:1eab5943:c28da7cb:399353a3}
>>                       ext4 'R6Backup' {142323a9-02d5-4fd5-b8a9-309a2cafde2a}
>
>> ata7.00: ATA-7: MARVELL VIRTUALL, 1.09, max UDMA/66
>> [    1.765384] ata7.00: 0 sectors, multi 0: LBA
>> [    1.765749] ata14.00: failed to IDENTIFY (device reports invalid type, err_mask=0x0)
>> [    1.766116] ata14.00: revalidation failed (errno=-22)
>> [    1.766461] ata14: limiting SATA link speed to 1.5 Gbps
>> [    1.766793] ata14.00: limiting speed to UDMA/66:PIO3
>> [    1.785901] usb 1-12: new full-speed USB device number 4 using xhci_hcd
>> [    1.990155] usb 1-12: New USB device found, idVendor=0416, idProduct=e008
>> [    1.990507] usb 1-12: New USB device strings: Mfr=1, Product=2, SerialNumber=3
>> [    1.990843] usb 1-12: Product: OLED Display Controller
>> [    1.991184] usb 1-12: Manufacturer: Nuvoton
>> [    1.991526] usb 1-12: SerialNumber: B02013031501
>> [    2.006243] input: Nuvoton OLED Display Controller as /devices/pci0000:00/0000:00:14.0/usb1/1-12/1-12:1.0/0003:0416:E008.0002/input/input2
>> [    2.007001] hid-generic 0003:0416:E008.0002: input,hidraw1: USB HID v1.10 Device [Nuvoton OLED Display Controller] on usb-0000:00:14.0-12/input0
>> [    2.117789] usb 4-1: new high-speed USB device number 2 using ehci-pci
>> [    2.250081] usb 4-1: New USB device found, idVendor=8087, idProduct=8008
>> [    2.250511] usb 4-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
>> [    2.251086] hub 4-1:1.0: USB hub found
>> [    2.251580] hub 4-1:1.0: 6 ports detected
>> [    2.361715] usb 6-1: new high-speed USB device number 2 using ehci-pci
>> [    2.494011] usb 6-1: New USB device found, idVendor=8087, idProduct=8000
>> [    2.494446] usb 6-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
>> [    2.495038] hub 6-1:1.0: USB hub found
>> [    2.495516] hub 6-1:1.0: 8 ports detected
>> [    2.501791] Switched to clocksource tsc
>> [    6.204677] ata8.00: ATA-9: WDC WD30EFRX-68EUZN0, 82.00A82, max UDMA/133
>> [    6.205300] ata8.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
>> [    6.205911] ata9.00: ATA-9: WDC WD30EFRX-68EUZN0, 82.00A82, max UDMA/133
>> [    6.206526] ata9.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
>> [    6.207245] ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> [    6.207966] ata7.00: model number mismatch 'MARVELL VIRTUALL' != 'WDC WD30EFRX-68EUZN0'
>> [    6.208615] ata7.00: revalidation failed (errno=-19)
>> [    6.209256] ata7: limiting SATA link speed to 3.0 Gbps
>> [    6.209896] ata7.00: limiting speed to UDMA/66:PIO3
>> [    6.376797] ata8.00: configured for UDMA/133
>> [    6.377453] ata9.00: configured for UDMA/133
>> [    6.378094] ata10.00: ATA-9: WDC WD30EFRX-68EUZN0, 82.00A82, max UDMA/133
>> [    6.378738] ata10.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
>> [    6.380308] ata10.00: configured for UDMA/133
>> [    6.696465] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [    6.697195] ata14.00: configured for UDMA/66
>> [    6.704462] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
>> [    6.705538] ata7.00: model number mismatch 'MARVELL VIRTUALL' != 'WDC WD30EFRX-68EUZN0'
>> [    6.706016] ata7.00: revalidation failed (errno=-19)
>> [    6.706496] ata7.00: disabled
> ata14 is spurious, even when all was working ok that error would show
> up. (I think its the 4 port sata card)
>
> ata7 is sdg
>
> Annoyingly the ata numbers don't actually correspond to the scsi
> numbers, because ata6 is actually port 6 of the intel on board chipset.
>
> scsi14-17 are 4 additional on board ports, and 6-12 are the four port
> card.
>
> The card allows for 7 drives, 4 of which can be expanded on a single
> port via a multiplier/external esata cable. (only 1 port at a time can
> be expanded) and I think this is what is causing the problem.
>
> Hopefully I wont have to boot the system for a couple of days so can
> take in any replies before either a power cut or the new card arrives.

When referring to drives, try to use UUID or the serial number of the 
drive. The "names" sdX can change on each boot.

Regards,
Adam


-- 
Adam Goryachev Website Managers www.websitemanagers.com.au
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How will mdadm handle a wrongly added drive, when the original comes back on line?
  2015-08-03 23:42 ` Adam Goryachev
@ 2015-08-14  8:04   ` Wilson, Jonathan
  0 siblings, 0 replies; 3+ messages in thread
From: Wilson, Jonathan @ 2015-08-14  8:04 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: linux-raid

On Tue, 2015-08-04 at 09:42 +1000, Adam Goryachev wrote:
> On 03/08/15 23:14, Wilson, Jonathan wrote:
> > Due to a bug in the driver for a Marvel chipset 4 port SATA card I think
> > I may have added an empty drive partition into a raid6 array and when I
> > get a new card I it will end up seeing not only the new drive, but also
> > the "missing" drive.
> >
> > Events:
> > Upgraded jessie with latest updates (quite some time since I last did
> > it) and re-booted.
> >
> > A 6 drive raid6 assembled, but all the drives were spare. Stopped the
> > array and did a mdadm --assemble /dev/md6.
> >
> > It assembled with 5 drives, one missing.
> >
> > Tried re-add, which failed, and then -add which completed ok.
> At this point the array should have done a resync to add the 6th drive.
> > Some time later I re-booted and the same problem happened.
> >
> > All drives spare, stopped, assembled, added missing.
> At this point the array should have done a resync to add the 6th drive. 
> Whether this is the same "6th" drive or not doesn't matter.
> > Its now working and I have a new card on order due to something going
> > badly wrong with the driver and/or card and/or chipset (Marvel 9230).
> >
> > After some time passed after the second boot, I realised that one of my
> > drives was physically missing. I had a drive ready to go as a genuine
> > spare but not yet added as a spare to mdadm, so in theory it should have
> > been totally empty apart from a partition.
> >
> > Now my problem is that firstly I can not be sure that when I looked
> > at /proc/mdstat/ and saw "all" the drives as spare there might have been
> > a missing one. (On either or both occasions.)
> >
> > In my mdadm.config I don't specify the number of drives in the array,
> > just its name and the UUID.
> >
> > Now my question is: if we call the drives in the array A,B,C,D,E,F and
> > the empty one G.
> >
> > After the first boot I may have added G, so the array would be
> > A,B,C,D,E,G. (F missing from system)
> >
> > After the second boot I may have added F back, so the array would be
> > A,B,C,D,E,F (G missing from system)
> >
> > If after changing the card the system sees A,B,C,D,E,F,G how will mdadm
> > work? Will it fail to assemble as one of the drives is "extra" to the
> > metadata count (I assume even though I don't specify a count in the
> > conf, that internally on the partitions of the disks in the array it
> > knows there should be "6" disks.
> It should reject the "older" 6th drive because the event count will be 
> older, and should auto-assemble with all the other drives. The older 
> "6th" drive will either be spare, or not added to the array at all, and 
> you would need to add it to the array for it to become a spare.

I wanted to thank you for your help with this post some days ago.

The drives had indeed swapped so I had 7 disks in a 6 disk array. After
swapping the 4 port Marvel chipset card for two 2 port ASM1062's
(Interestingly enough these happened to be exactly the same chips as the
additional on board extra satas) the system booted with a nice clean
dmesg log. (No "red line" errors)

The older, original, drive of the array was kicked:

 
> [    3.269932] md: bind<sdb4>
> [    3.272343] md: bind<sda4>
> [    3.274762] md: raid10 personality registered for level 10
> [    3.275748] md/raid10:md4: active with 2 out of 2 devices
> [    3.276617] md4: detected capacity change from 0 to 64390955008
> [    3.277958]  md4: unknown partition table
> [    3.346450] md: bind<sdn6>
> [    3.370188] md: bind<sdg6>
> [    3.372120] md: kicking non-fresh sdh6 from array! <<<<<<<<<<<<<<<<<<<<<
> [    3.372956] md: unbind<sdh6>
> [    3.383684] md: export_rdev(sdh6)
> [    3.452610] raid6: sse2x1   13949 MB/s
> [    3.520586] raid6: sse2x2   17910 MB/s
> [    3.588568] raid6: sse2x4   20617 MB/s
> [    3.656547] raid6: avx2x1   27298 MB/s
> [    3.724527] raid6: avx2x2   32586 MB/s
> [    3.792508] raid6: avx2x4   36498 MB/s
> [    3.793243] raid6: using algorithm avx2x4 (36498 MB/s)
> [    3.793973] raid6: using avx2x2 recovery algorithm
> [    3.794829] xor: automatically using best checksumming function:
> [    3.832495]    avx       : 43866.000 MB/sec
> [    3.833351] async_tx: api initialized (async)
> [    3.834559] md: raid6 personality registered for level 6
> [    3.835255] md: raid5 personality registered for level 5
> [    3.835945] md: raid4 personality registered for level 4
> [    3.836716] md/raid:md6: device sdg6 operational as raid disk 0
> [    3.837391] md/raid:md6: device sdn6 operational as raid disk 5
> [    3.838043] md/raid:md6: device sdl6 operational as raid disk 1
> [    3.838688] md/raid:md6: device sdm6 operational as raid disk 4
> [    3.839315] md/raid:md6: device sdj6 operational as raid disk 2
> [    3.839916] md/raid:md6: device sdi6 operational as raid disk 3
> [    3.840715] md/raid:md6: allocated 0kB
> [    3.841341] md/raid:md6: raid level 6 active with 6 out of 6 devices, algorithm 2
> [    3.841963] RAID conf printout:
> [    3.841963]  --- level:6 rd:6 wd:6
> [    3.841964]  disk 0, o:1, dev:sdg6
> [    3.841965]  disk 1, o:1, dev:sdl6
> [    3.841965]  disk 2, o:1, dev:sdj6
> [    3.841966]  disk 3, o:1, dev:sdi6
> [    3.841966]  disk 4, o:1, dev:sdm6
> [    3.841967]  disk 5, o:1, dev:sdn6
> [    3.842047] created bitmap (22 pages) for device md6
> [    3.842997] md6: bitmap initialized from disk: read 2 pages, set 0 of 43172 bits
> [    3.855294] md6: detected capacity change from 0 to 11588669014016
> 

The booted drive is now sitting as "inactive" so when I get time I will
clear it and add it as a hot spare.

Thanks again, and thanks to Neil and others for all their hard work in
developing mdadm.



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-08-14  8:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-03 13:14 How will mdadm handle a wrongly added drive, when the original comes back on line? Wilson, Jonathan
2015-08-03 23:42 ` Adam Goryachev
2015-08-14  8:04   ` Wilson, Jonathan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.