All of lore.kernel.org
 help / color / mirror / Atom feed
* mdadm reshaping stuck problem
@ 2017-12-03 12:47 rene.feistle
  2017-12-03 14:17 ` Phil Turmel
  0 siblings, 1 reply; 7+ messages in thread
From: rene.feistle @ 2017-12-03 12:47 UTC (permalink / raw)
  To: linux-raid

Hello,

after hours and hours of googling and trying out things, I gave up on 
this. This email is my last hope of getting my data back.

I have 4*4TB drives installed and want to create a raid 5 with them.

So what I did is create an array of 3 disks (raid 5), copy the data from 
the 4th drive (I don't have more space available) to the raid and then I 
wanted to add the last drive to the raid.
I made a mistake here. I accidentally grew the raid to 4 disks with

sudo mdadm --grow --raid-devices=4 /dev/md0 --backup-file=/tmp/md0.bak

BEFORE adding the last drive as a hot spare. Mdadm immediately started a 
reshape and says that it failed - because it consists of 4 drives but 
only 3 drives are available.

I thought okay, let him complete the reshape and everything will be 
okay. But no - the reshape is stuck at 34.3%.

What I have tried:

- Reboot ( about a 100 times)
- increase stripe cache size up to 32768

mdadm --assemble --invalid-backup --backup-file=/root/mdadm0.bak 
/dev/md0 /dev/sdc1 /dev/sde1 /dev/sdf1

And some other things.

The raid is not mountable. When I try to mount it, the mount command 
just hungs and nothing happens. That means that I had to edit my fstab 
with a rescue cd because it would never boot again.
That also means that I have no access to my data.

When I shutdown or reboot the computer, it also hungs at shutdown, I can 
only hard reset it.

cat /proc/mdstat:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
[raid4] [r$
md0 : active raid5 sdc1[0] sdf1[3] sde1[1]
       7813771264 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] 
[UU__]
       [======>..............]  reshape = 34.3% (1340465664/3906885632) 
finish=3$
       bitmap: 3/30 pages [12KB], 65536KB chunk

unused devices: <none>


mdadm --detail /dev/md0


/dev/md0:
         Version : 1.2
   Creation Time : Fri Dec  1 02:10:06 2017
      Raid Level : raid5
      Array Size : 7813771264 (7451.79 GiB 8001.30 GB)
   Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
    Raid Devices : 4
   Total Devices : 3
     Persistence : Superblock is persistent

   Intent Bitmap : Internal

     Update Time : Sun Dec  3 13:34:43 2017
           State : active, FAILED, reshaping
  Active Devices : 2
Working Devices : 3
  Failed Devices : 0
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 512K

  Reshape Status : 34% complete
   Delta Devices : 1, (3->4)

            Name : nas-server:0  (local to host nas-server)
            UUID : e410e68d:76460b65:69c056c0:d2645d55
          Events : 28155

     Number   Major   Minor   RaidDevice State
        0       8       33        0      active sync   /dev/sdc1
        1       8       65        1      active sync   /dev/sde1
        3       8       81        2      spare rebuilding   /dev/sdf1
        6       0        0        6      removed



Any help is appreciated, I'm lost.

Kind Regards

René

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm reshaping stuck problem
  2017-12-03 12:47 mdadm reshaping stuck problem rene.feistle
@ 2017-12-03 14:17 ` Phil Turmel
  2017-12-03 14:59   ` rene.feistle
  0 siblings, 1 reply; 7+ messages in thread
From: Phil Turmel @ 2017-12-03 14:17 UTC (permalink / raw)
  To: rene.feistle, linux-raid

Hi Rene,

On 12/03/2017 07:47 AM, rene.feistle@posteo.de wrote:
> Hello,
> 
> after hours and hours of googling and trying out things, I gave up on
> this. This email is my last hope of getting my data back.

I'm worried for you -- "trying out things" can be dangerous.

> I have 4*4TB drives installed and want to create a raid 5 with them.
> 
> So what I did is create an array of 3 disks (raid 5), copy the data from
> the 4th drive (I don't have more space available) to the raid and then I
> wanted to add the last drive to the raid.

Ok.

> I made a mistake here. I accidentally grew the raid to 4 disks with
> 
> sudo mdadm --grow --raid-devices=4 /dev/md0 --backup-file=/tmp/md0.bak
> 
> BEFORE adding the last drive as a hot spare. Mdadm immediately started a
> reshape and says that it failed - because it consists of 4 drives but
> only 3 drives are available.

Adding the fourth drive at this point should have enabled the reshape to
resume.

> I thought okay, let him complete the reshape and everything will be
> okay. But no - the reshape is stuck at 34.3%.
> 
> What I have tried:
> 
> - Reboot ( about a 100 times)
> - increase stripe cache size up to 32768
> 
> mdadm --assemble --invalid-backup --backup-file=/root/mdadm0.bak
> /dev/md0 /dev/sdc1 /dev/sde1 /dev/sdf1
> 
> And some other things.

We will probably need you to detail "some other things".

> The raid is not mountable. When I try to mount it, the mount command
> just hungs and nothing happens. That means that I had to edit my fstab
> with a rescue cd because it would never boot again.
> That also means that I have no access to my data.
> 
> When I shutdown or reboot the computer, it also hungs at shutdown, I can
> only hard reset it.
> 
> cat /proc/mdstat:
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [r$
> md0 : active raid5 sdc1[0] sdf1[3] sde1[1]
>       7813771264 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU__]
>       [======>..............]  reshape = 34.3% (1340465664/3906885632) finish=3$
>       bitmap: 3/30 pages [12KB], 65536KB chunk
> 
> unused devices: <none>

Note the "UU__".  That means as some point your three-drive array lost a
drive, and the reshape is showing another missing drive.  A
doubly-degraded array cannot run.

> mdadm --detail /dev/md0
> 
> 
> /dev/md0:
>         Version : 1.2
>   Creation Time : Fri Dec  1 02:10:06 2017
>      Raid Level : raid5
>      Array Size : 7813771264 (7451.79 GiB 8001.30 GB)
>   Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
>    Raid Devices : 4
>   Total Devices : 3
>     Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
>     Update Time : Sun Dec  3 13:34:43 2017
>           State : active, FAILED, reshaping
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 1
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>  Reshape Status : 34% complete
>   Delta Devices : 1, (3->4)
> 
>            Name : nas-server:0  (local to host nas-server)
>            UUID : e410e68d:76460b65:69c056c0:d2645d55
>          Events : 28155
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       33        0      active sync   /dev/sdc1
>        1       8       65        1      active sync   /dev/sde1
>        3       8       81        2      spare rebuilding   /dev/sdf1
>        6       0        0        6      removed

Note the "spare rebuilding" on sdf1.  That means at some point sdf1 was
ejected from your array and you --added it back.  A supposition
buttressed by its slot number displayed in mdstat.  sdf1 was already a
critical device, so --add destroyed important data on it.

> Any help is appreciated, I'm lost.

With the current status of the array, doubly-degraded with a reshape
quite far along, I am not optimistic for you.  However, you have not
provided all the information that might be helpful here.  Please supply
the output (cat'd to a file, not copied from a narrow terminal, please)
of these commands:

for x in /dev/sd[cef]1 ; do echo $x ; mdadm -E $x ; done

for x in /dev/sd[cef] ; do echo $x ; smartctl -iA -l scterc $x ; done

Please make sure your mailer is in plain text mode with line wrap
disabled to ensure the content isn't corrupted when you paste it into
your reply.

Regards,

Phil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm reshaping stuck problem
  2017-12-03 14:17 ` Phil Turmel
@ 2017-12-03 14:59   ` rene.feistle
  2017-12-03 17:20     ` Phil Turmel
  0 siblings, 1 reply; 7+ messages in thread
From: rene.feistle @ 2017-12-03 14:59 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5411 bytes --]

Hello Phil,

Thanks for your fast reply.

I did run your commands and the results are attached to this email and 
on pastebin here:

https://pastebin.com/EVpLfmAe
https://pastebin.com/ZMBYB5CW


The drive names have changed because I deinstalled one drive that was 
not in the raid. I had a copy of all data on this drive so I'm trying to 
recover my data with that drive now. The chances are good because I did 
overwrite the partition table only.



Am 03.12.2017 15:17 schrieb Phil Turmel:
> Hi Rene,
> 
> On 12/03/2017 07:47 AM, rene.feistle@posteo.de wrote:
>> Hello,
>> 
>> after hours and hours of googling and trying out things, I gave up on
>> this. This email is my last hope of getting my data back.
> 
> I'm worried for you -- "trying out things" can be dangerous.
> 
>> I have 4*4TB drives installed and want to create a raid 5 with them.
>> 
>> So what I did is create an array of 3 disks (raid 5), copy the data 
>> from
>> the 4th drive (I don't have more space available) to the raid and then 
>> I
>> wanted to add the last drive to the raid.
> 
> Ok.
> 
>> I made a mistake here. I accidentally grew the raid to 4 disks with
>> 
>> sudo mdadm --grow --raid-devices=4 /dev/md0 --backup-file=/tmp/md0.bak
>> 
>> BEFORE adding the last drive as a hot spare. Mdadm immediately started 
>> a
>> reshape and says that it failed - because it consists of 4 drives but
>> only 3 drives are available.
> 
> Adding the fourth drive at this point should have enabled the reshape 
> to
> resume.
> 
>> I thought okay, let him complete the reshape and everything will be
>> okay. But no - the reshape is stuck at 34.3%.
>> 
>> What I have tried:
>> 
>> - Reboot ( about a 100 times)
>> - increase stripe cache size up to 32768
>> 
>> mdadm --assemble --invalid-backup --backup-file=/root/mdadm0.bak
>> /dev/md0 /dev/sdc1 /dev/sde1 /dev/sdf1
>> 
>> And some other things.
> 
> We will probably need you to detail "some other things".
> 
>> The raid is not mountable. When I try to mount it, the mount command
>> just hungs and nothing happens. That means that I had to edit my fstab
>> with a rescue cd because it would never boot again.
>> That also means that I have no access to my data.
>> 
>> When I shutdown or reboot the computer, it also hungs at shutdown, I 
>> can
>> only hard reset it.
>> 
>> cat /proc/mdstat:
>> 
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
>> [raid4] [r$
>> md0 : active raid5 sdc1[0] sdf1[3] sde1[1]
>>       7813771264 blocks super 1.2 level 5, 512k chunk, algorithm 2 
>> [4/3] [UU__]
>>       [======>..............]  reshape = 34.3% (1340465664/3906885632) 
>> finish=3$
>>       bitmap: 3/30 pages [12KB], 65536KB chunk
>> 
>> unused devices: <none>
> 
> Note the "UU__".  That means as some point your three-drive array lost 
> a
> drive, and the reshape is showing another missing drive.  A
> doubly-degraded array cannot run.
> 
>> mdadm --detail /dev/md0
>> 
>> 
>> /dev/md0:
>>         Version : 1.2
>>   Creation Time : Fri Dec  1 02:10:06 2017
>>      Raid Level : raid5
>>      Array Size : 7813771264 (7451.79 GiB 8001.30 GB)
>>   Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
>>    Raid Devices : 4
>>   Total Devices : 3
>>     Persistence : Superblock is persistent
>> 
>>   Intent Bitmap : Internal
>> 
>>     Update Time : Sun Dec  3 13:34:43 2017
>>           State : active, FAILED, reshaping
>>  Active Devices : 2
>> Working Devices : 3
>>  Failed Devices : 0
>>   Spare Devices : 1
>> 
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>> 
>>  Reshape Status : 34% complete
>>   Delta Devices : 1, (3->4)
>> 
>>            Name : nas-server:0  (local to host nas-server)
>>            UUID : e410e68d:76460b65:69c056c0:d2645d55
>>          Events : 28155
>> 
>>     Number   Major   Minor   RaidDevice State
>>        0       8       33        0      active sync   /dev/sdc1
>>        1       8       65        1      active sync   /dev/sde1
>>        3       8       81        2      spare rebuilding   /dev/sdf1
>>        6       0        0        6      removed
> 
> Note the "spare rebuilding" on sdf1.  That means at some point sdf1 was
> ejected from your array and you --added it back.  A supposition
> buttressed by its slot number displayed in mdstat.  sdf1 was already a
> critical device, so --add destroyed important data on it.
> 
>> Any help is appreciated, I'm lost.
> 
> With the current status of the array, doubly-degraded with a reshape
> quite far along, I am not optimistic for you.  However, you have not
> provided all the information that might be helpful here.  Please supply
> the output (cat'd to a file, not copied from a narrow terminal, please)
> of these commands:
> 
> for x in /dev/sd[cef]1 ; do echo $x ; mdadm -E $x ; done
> 
> for x in /dev/sd[cef] ; do echo $x ; smartctl -iA -l scterc $x ; done
> 
> Please make sure your mailer is in plain text mode with line wrap
> disabled to ensure the content isn't corrupted when you paste it into
> your reply.
> 
> Regards,
> 
> Phil

[-- Attachment #2: mdadm.txt --]
[-- Type: text/plain, Size: 3204 bytes --]

/dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : e410e68d:76460b65:69c056c0:d2645d55
           Name : nas-server:0  (local to host nas-server)
  Creation Time : Fri Dec  1 02:10:06 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
     Array Size : 11720656896 (11177.69 GiB 12001.95 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : f4490b9f:475aca83:2ff93d65:a4fabf8c

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 4021168128 (3834.88 GiB 4117.68 GB)
  Delta Devices : 1 (3->4)

    Update Time : Sun Dec  3 13:34:43 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : d1b15eed - correct
         Events : 28155

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : e410e68d:76460b65:69c056c0:d2645d55
           Name : nas-server:0  (local to host nas-server)
  Creation Time : Fri Dec  1 02:10:06 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
     Array Size : 11720656896 (11177.69 GiB 12001.95 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : eb679765:5771cc1a:651f5a86:e166424b

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 4021168128 (3834.88 GiB 4117.68 GB)
  Delta Devices : 1 (3->4)

    Update Time : Sun Dec  3 13:34:43 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : ede2668 - correct
         Events : 28155

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x7
     Array UUID : e410e68d:76460b65:69c056c0:d2645d55
           Name : nas-server:0  (local to host nas-server)
  Creation Time : Fri Dec  1 02:10:06 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
     Array Size : 11720656896 (11177.69 GiB 12001.95 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
Recovery Offset : 2680909424 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : 2c5c510d:3ddd9cb3:85782829:cdce1b89

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 4021168128 (3834.88 GiB 4117.68 GB)
  Delta Devices : 1 (3->4)

    Update Time : Sun Dec  3 13:34:43 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : cfdbb60f - correct
         Events : 28155

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)

[-- Attachment #3: smart.txt --]
[-- Type: text/plain, Size: 9933 bytes --]

/dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-40-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN008-2DR166
Serial Number:    ZDH2M2PV
LU WWN Device Id: 5 000c50 0a5690986
Firmware Version: SC60
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec  3 15:48:15 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   073   064   044    Pre-fail  Always       -       19890494
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   073   060   045    Pre-fail  Always       -       21251399
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       134 (224 223 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       8
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   054   040    Old_age   Always       -       33 (Min/Max 33/34)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       53
194 Temperature_Celsius     0x0022   033   046   000    Old_age   Always       -       33 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       132 (215 163 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       28126761649
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       13670857188

SCT Error Recovery Control:
           Read:     70 (7,0 seconds)
          Write:     70 (7,0 seconds)

/dev/sdd
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-40-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN008-2DR166
Serial Number:    ZDH2GRNF
LU WWN Device Id: 5 000c50 0a556eef6
Firmware Version: SC60
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec  3 15:48:15 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   079   066   044    Pre-fail  Always       -       83554824
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   072   060   045    Pre-fail  Always       -       16531420
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       134 (30 205 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       8
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   058   040    Old_age   Always       -       33 (Min/Max 33/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       79
194 Temperature_Celsius     0x0022   033   042   000    Old_age   Always       -       33 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       132 (251 231 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       16036671217
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       13248987983

SCT Error Recovery Control:
           Read:     70 (7,0 seconds)
          Write:     70 (7,0 seconds)

/dev/sde
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-40-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Desktop HDD.15
Device Model:     ST4000DM000-1F2168
Serial Number:    Z3018XTT
LU WWN Device Id: 5 000c50 065b12345
Firmware Version: CC54
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec  3 15:48:15 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   113   099   006    Pre-fail  Always       -       56920152
  3 Spin_Up_Time            0x0003   092   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2179
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   066   060   030    Pre-fail  Always       -       4457875
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       18477
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       136
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   051   045    Old_age   Always       -       34 (Min/Max 33/34)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   085   085   000    Old_age   Always       -       31842
194 Temperature_Celsius     0x0022   034   049   000    Old_age   Always       -       34 (0 10 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       5936h+59m+15.445s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       34663048643
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       14331594669

SCT Error Recovery Control command not supported


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mdadm reshaping stuck problem
  2017-12-03 14:59   ` rene.feistle
@ 2017-12-03 17:20     ` Phil Turmel
  2017-12-03 18:14       ` ERC for raid [forked from "mdadm reshaping stuck problem"] Matthias Walther
  0 siblings, 1 reply; 7+ messages in thread
From: Phil Turmel @ 2017-12-03 17:20 UTC (permalink / raw)
  To: rene.feistle; +Cc: linux-raid

Hi Rene,

{ Convention on kernel.org is to trim replies & either bottom post or
interleave.  Please do. }

On 12/03/2017 09:59 AM, rene.feistle@posteo.de wrote:
> Hello Phil,
> 
> Thanks for your fast reply.
> 
> I did run your commands and the results are attached to this email and
> on pastebin here:
> 
> https://pastebin.com/EVpLfmAe
> https://pastebin.com/ZMBYB5CW

Very good.  At some point you need to replace the desktop drive -- it's
unsafe to use in a raid array -- but it doesn't look like it's blowing
up at the moment.  Use the following workaround on every boot until you
replace it:

echo 180 > /sys/block/sde/device/timeout

Search the archives for "timeout mismatch" to see many discussions on
why that drive is a time bomb.

> The drive names have changed because I deinstalled one drive that was
> not in the raid. I had a copy of all data on this drive so I'm trying to
> recover my data with that drive now. The chances are good because I did
> overwrite the partition table only.

Ok.

Based on the mdadm -E reports, I recommend stopping the array, booting
with a rescue CD that has mdadm tools (I recommend SystemRescueCD, btw)
and a recent kernel.  Your situation sounds like a kernel bug and your
best bet to get out of this is to use a more recent kernel for the
duration of the reshape and rebuild.

So, in the rescue environment, use --assemble --force with these three
drives.  Then --add your fourth drive, and wait for both rebuild and
reshape to complete.

Be sure to explicitly --stop the array before leaving the rescue
environment.

Phil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* ERC for raid [forked from "mdadm reshaping stuck problem"]
  2017-12-03 17:20     ` Phil Turmel
@ 2017-12-03 18:14       ` Matthias Walther
  2017-12-03 18:59         ` Wols Lists
  2017-12-03 21:23         ` Phil Turmel
  0 siblings, 2 replies; 7+ messages in thread
From: Matthias Walther @ 2017-12-03 18:14 UTC (permalink / raw)
  To: Phil Turmel, rene.feistle; +Cc: linux-raid

Hello,

Am 03.12.2017 um 18:20 schrieb Phil Turmel:
> Very good.  At some point you need to replace the desktop drive -- it's
> unsafe to use in a raid array -- but it doesn't look like it's blowing
> up at the moment.  Use the following workaround on every boot until you
> replace it:
>
> echo 180 > /sys/block/sde/device/timeout
>
> Search the archives for "timeout mismatch" to see many discussions on
> why that drive is a time bomb.

this is an interesting point. As far as I understand it, there's no
difference between a) the device tells the kernel, that an error
occurred (ERC) or b) the kernel just waits three minutes.

From my point of understanding, I see no reason to avoid those disks.
Just raise this timeout to 180 on all disks. Even those with ERC can be
set to 180 seconds, because on some mainboards the order of sdX changes
every boot. On your home nas it doesn't really matter if there's an
access delay. This is of course not acceptable on enterprise systems.

By the way, the kernel doesn't just easily throw the device out. From my
experiences it hard resets the link and completely reinitializes the
device. Only if that fails, the raid will be degraded and if this fails,
the device probably has a problem and should be replaced.

I run a raid-6 on six really cheap old second hand 4 TB drives and never
had an issue with that in the past two years. I had no real failures and
no accidentally or prematurely dropped devices. Mdadm just runs. And
this raid writes about 50 GB each and every single day and never goes to
sleep. This is what differs mdadm from hardware raid controllers, which
really shouldn't used with non ERC drives due to exactly that timing
problem.

Though I run a check every month, where all data is read, just to make
sure it doesn't rot on the discs. In my opinion a (monitored) raid-6 on
old, cheap non ERC drives is safer, than a raid-5 on „premium
overpriced“ drives. Never forget, it's call raid - random array of
inexpensive disks.

In cynical words, I see it this way: The hdd and nas manufactures came
together and found a way to push the prices up.

Regards,
Matthias

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ERC for raid [forked from "mdadm reshaping stuck problem"]
  2017-12-03 18:14       ` ERC for raid [forked from "mdadm reshaping stuck problem"] Matthias Walther
@ 2017-12-03 18:59         ` Wols Lists
  2017-12-03 21:23         ` Phil Turmel
  1 sibling, 0 replies; 7+ messages in thread
From: Wols Lists @ 2017-12-03 18:59 UTC (permalink / raw)
  To: Matthias Walther, rene.feistle; +Cc: linux-raid

On 03/12/17 18:14, Matthias Walther wrote:
> Hello,
> 
> Am 03.12.2017 um 18:20 schrieb Phil Turmel:
>> Very good.  At some point you need to replace the desktop drive -- it's
>> unsafe to use in a raid array -- but it doesn't look like it's blowing
>> up at the moment.  Use the following workaround on every boot until you
>> replace it:
>>
>> echo 180 > /sys/block/sde/device/timeout
>>
>> Search the archives for "timeout mismatch" to see many discussions on
>> why that drive is a time bomb.
> 
> this is an interesting point. As far as I understand it, there's no
> difference between a) the device tells the kernel, that an error
> occurred (ERC) or b) the kernel just waits three minutes.
> 
> From my point of understanding, I see no reason to avoid those disks.
> Just raise this timeout to 180 on all disks. Even those with ERC can be
> set to 180 seconds, because on some mainboards the order of sdX changes
> every boot. On your home nas it doesn't really matter if there's an
> access delay. This is of course not acceptable on enterprise systems.

"Just raise the timeout". THAT is the problem. BY DEFAULT these drives
will mess up an array. If you know what you're doing, it isn't a
problem, but how many people don't know what they're doing? Most of them?

Oh - and a three-minute access delay? When you're watching a film? Who
are you kidding! A three minute delay will cause pretty much any normal
user to throw their toys out the pram, home or enterprise.
> 
> By the way, the kernel doesn't just easily throw the device out. From my
> experiences it hard resets the link and completely reinitializes the
> device. Only if that fails, the raid will be degraded and if this fails,
> the device probably has a problem and should be replaced.

And if the user doesn't know what they're doing, this is exactly what
will happen. The hard reset will work, the "reinitialise the device"
will fail (because the drive is hung trying to carry out the read
request), and the drive will get kicked.
> 
> I run a raid-6 on six really cheap old second hand 4 TB drives and never
> had an issue with that in the past two years. I had no real failures and
> no accidentally or prematurely dropped devices. Mdadm just runs. And
> this raid writes about 50 GB each and every single day and never goes to
> sleep. This is what differs mdadm from hardware raid controllers, which
> really shouldn't used with non ERC drives due to exactly that timing
> problem.
> 
> Though I run a check every month, where all data is read, just to make
> sure it doesn't rot on the discs. In my opinion a (monitored) raid-6 on
> old, cheap non ERC drives is safer, than a raid-5 on „premium
> overpriced“ drives. Never forget, it's call raid - random array of
> inexpensive disks.

Raid 6 is safer than raid 5. PROPERLY CONFIGURED non-ERC drives are
fine. You know what you're doing, that's fine. Joe Random Luser is going
to have his array blow up in his face.
> 
> In cynical words, I see it this way: The hdd and nas manufactures came
> together and found a way to push the prices up.
> 
:-) Couldn't agree more.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ERC for raid [forked from "mdadm reshaping stuck problem"]
  2017-12-03 18:14       ` ERC for raid [forked from "mdadm reshaping stuck problem"] Matthias Walther
  2017-12-03 18:59         ` Wols Lists
@ 2017-12-03 21:23         ` Phil Turmel
  1 sibling, 0 replies; 7+ messages in thread
From: Phil Turmel @ 2017-12-03 21:23 UTC (permalink / raw)
  To: Matthias Walther, rene.feistle; +Cc: linux-raid

Hi Matthias,

On 12/03/2017 01:14 PM, Matthias Walther wrote:
> Hello,
> 
> Am 03.12.2017 um 18:20 schrieb Phil Turmel:
>> Very good.  At some point you need to replace the desktop drive -- it's
>> unsafe to use in a raid array -- but it doesn't look like it's blowing
>> up at the moment.  Use the following workaround on every boot until you
>> replace it:
>>
>> echo 180 > /sys/block/sde/device/timeout
>>
>> Search the archives for "timeout mismatch" to see many discussions on
>> why that drive is a time bomb.
> 
> this is an interesting point. As far as I understand it, there's no
> difference between a) the device tells the kernel, that an error
> occurred (ERC) or b) the kernel just waits three minutes.

From MD raid's perspective, as long as the link doesn't time out, no.
Many services that one might want to use with such a server will have
problems with a 3-minute filesystem freeze, which is why I highly
recommend replacing the drives with something that'll respond quicker.

> From my point of understanding, I see no reason to avoid those disks.
> Just raise this timeout to 180 on all disks. Even those with ERC can be
> set to 180 seconds, because on some mainboards the order of sdX changes
> every boot. On your home nas it doesn't really matter if there's an
> access delay. This is of course not acceptable on enterprise systems.

No, lots of protocols can't wait that long.  Lots of humans can't wait
that long either, and will start physical interventions.

> By the way, the kernel doesn't just easily throw the device out. From my
> experiences it hard resets the link and completely reinitializes the
> device. Only if that fails, the raid will be degraded and if this fails,> the device probably has a problem and should be replaced.

MD raid tries to fix read errors.  When a read returns an error, MD
retrieves the data from a mirror (raid1, raid10) or reconstructs it from
parity and/or syndrome (raid4,5,6) and then writes it back to the
problem sector.  This is entirely appropriate as large modern hard
drives do occassionally experience transient read errors.  Transient
read errors are fixable by writing new content to that sector location.
Even if the error is not transient, modern drives use the write
operation to verify that problem and then relocate the sector.

If the link resets because the driver timed out before the device
responded, then MD gets another error message *while* the link is
resetting.  The follow-up write to correct the sector fails immediately
because the link is down.  The *write error* kicks the drive out.

A quick burst of read errors will kick out a drive (20 in one hour), or
a steady stream of read errors (10 per hour sustained), or *any* write
error.

> I run a raid-6 on six really cheap old second hand 4 TB drives and never
> had an issue with that in the past two years. I had no real failures and
> no accidentally or prematurely dropped devices. Mdadm just runs. And
> this raid writes about 50 GB each and every single day and never goes to
> sleep. This is what differs mdadm from hardware raid controllers, which
> really shouldn't used with non ERC drives due to exactly that timing
> problem.

If you are using the driver timeout workaround, of course you would see
your array collapse.  And for household use, you probably don't care if
your movie playback freezes for the occassional minute or two.

> Though I run a check every month, where all data is read, just to make
> sure it doesn't rot on the discs.

During scrubs, the long timeout on a URE won't impact the filesystem, so
your users are even less likely to notice.  This is very good practice.

> In my opinion a (monitored) raid-6 on
> old, cheap non ERC drives is safer, than a raid-5 on „premium
> overpriced“ drives.

No question about it.  Raid6 is *always* safer than raid5.  That doesn't
mean non-ERC drives are a good idea.

> Never forget, it's call raid - random array of
> inexpensive disks.

The original name is "Redundant Array of Inexpensive Disks".  The
current standard uses "Independent" instead of "Inexpensive" because the
standards body is made up of manufacturers.  /-:

> In cynical words, I see it this way: The hdd and nas manufactures came
> together and found a way to push the prices up.

Oh, I'm pretty cynical.  You should read my posts in 2011 when I worked
all this out -- after Seagate screwed me by taking scterc out of their
desktop drives.

But timeout mismatch is a real problem.  The NAS drives didn't exist as
an option back then, and I'm sure it was complaints like ours that
caused that niche to come into existence.  At a 10% or so price premium.
 (Vs. 2x pricing for enterprise drives.)

> Regards,
> Matthias

Phil

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-03 21:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-03 12:47 mdadm reshaping stuck problem rene.feistle
2017-12-03 14:17 ` Phil Turmel
2017-12-03 14:59   ` rene.feistle
2017-12-03 17:20     ` Phil Turmel
2017-12-03 18:14       ` ERC for raid [forked from "mdadm reshaping stuck problem"] Matthias Walther
2017-12-03 18:59         ` Wols Lists
2017-12-03 21:23         ` Phil Turmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.