All of lore.kernel.org
 help / color / mirror / Atom feed
* Input/Output error reading from a clean raid
@ 2017-01-22 14:08 Salatiel Filho
  2017-01-23  0:18 ` John Stoffel
       [not found] ` <20170123010334.GA7546@metamorpher.de>
  0 siblings, 2 replies; 11+ messages in thread
From: Salatiel Filho @ 2017-01-22 14:08 UTC (permalink / raw)
  To: linux-raid

I am trying to recover a few  files from my backup. The backup is on a
raid 5 + ext4.
There are several files where i get I/O error. The raid appears to be
clean and fsck shows no errors. Any ideas what could it be ?

md1 : active raid5 sdd1[0] sdg1[4] sdf1[2] sde1[1]
      3220829184 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 1/8 pages [4KB], 65536KB chunk



[]'s
Salatiel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-22 14:08 Input/Output error reading from a clean raid Salatiel Filho
@ 2017-01-23  0:18 ` John Stoffel
  2017-01-23 14:42   ` Salatiel Filho
       [not found] ` <20170123010334.GA7546@metamorpher.de>
  1 sibling, 1 reply; 11+ messages in thread
From: John Stoffel @ 2017-01-23  0:18 UTC (permalink / raw)
  To: Salatiel Filho; +Cc: linux-raid


Salatiel> I am trying to recover a few files from my backup. The
Salatiel> backup is on a raid 5 + ext4.  There are several files where
Salatiel> i get I/O error. The raid appears to be clean and fsck shows
Salatiel> no errors. Any ideas what could it be ?

Salatiel> md1 : active raid5 sdd1[0] sdg1[4] sdf1[2] sde1[1]
Salatiel>       3220829184 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
Salatiel>       bitmap: 1/8 pages [4KB], 65536KB chunk

It would help if you could post the error(s) you're getting, along
with any output from dmesg during that time.  Have you done a full
scan of the disk looking for errors?  You might just have silent
read errors in your array.  So as root do:

   # echo check >>/sys/block/md??/md/sync_action

where md?? is the name of your md array you want to check.  You can
get the name from:

   cat /proc/mdstat

and of course it would help to post that info as well if you want more
help.

John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
       [not found] ` <20170123010334.GA7546@metamorpher.de>
@ 2017-01-23 14:02   ` Salatiel Filho
  2017-01-23 17:07     ` John Stoffel
  2017-01-23 17:34     ` Andreas Klauer
  0 siblings, 2 replies; 11+ messages in thread
From: Salatiel Filho @ 2017-01-23 14:02 UTC (permalink / raw)
  To: linux-raid

Ok, i have run echo check >>/sys/block/md1/md/sync_action, and now the output of
mdadm mdadm --examine-badblocks /dev/sdd1 /dev/sdg1 /dev/sdf1  /dev/sde1

Bad-blocks on /dev/sdd1:
          1515723072 for 512 sectors
          1515723584 for 512 sectors
          1515724096 for 512 sectors
          1515724608 for 512 sectors
          1515725120 for 512 sectors
          1515725632 for 512 sectors
          1515726144 for 512 sectors
          1515726656 for 512 sectors
          1515727168 for 512 sectors
          1515727680 for 512 sectors
          1515728192 for 512 sectors
          1515728704 for 512 sectors
          1515729216 for 512 sectors
          1515729728 for 512 sectors
          1515730240 for 512 sectors
          1515730752 for 512 sectors
          1515731264 for 512 sectors
          1515731776 for 512 sectors
          1515732288 for 512 sectors
          1515732800 for 512 sectors
          1515733312 for 512 sectors
          1515733824 for 512 sectors
          1515734336 for 512 sectors
          1515734848 for 512 sectors
          1515735360 for 512 sectors
          1515735872 for 512 sectors
          1515736384 for 512 sectors
          1515736896 for 512 sectors
          1515737408 for 512 sectors
          1515737920 for 512 sectors
          1515738432 for 512 sectors
          1515738944 for 512 sectors
          1515739456 for 512 sectors
          1515739968 for 512 sectors
          1515740480 for 512 sectors
          1515740992 for 512 sectors
          1515741504 for 512 sectors
          1515742016 for 192 sectors
          1515743712 for 512 sectors
          1515744224 for 512 sectors
          1515744736 for 512 sectors
          1515745248 for 512 sectors
          1515745760 for 512 sectors
          1515746272 for 512 sectors
          1515746784 for 512 sectors
          1515747296 for 512 sectors
          1515747808 for 512 sectors
          1515748320 for 512 sectors
          1515749072 for 304 sectors
          1515750400 for 512 sectors
          1515750912 for 512 sectors
          1515751424 for 512 sectors
          1515751936 for 512 sectors
          1515752448 for 512 sectors
          1515752960 for 512 sectors
          1515753472 for 512 sectors
          1515753984 for 512 sectors
          1515754496 for 232 sectors
Bad-blocks list is empty in /dev/sdg1
Bad-blocks list is empty in /dev/sdf1
Bad-blocks on /dev/sde1:
          1515723072 for 512 sectors
          1515723584 for 512 sectors
          1515724096 for 512 sectors
          1515724608 for 512 sectors
          1515725120 for 512 sectors
          1515725632 for 512 sectors
          1515726144 for 512 sectors
          1515726656 for 512 sectors
          1515727168 for 512 sectors
          1515727680 for 512 sectors
          1515728192 for 512 sectors
          1515728704 for 512 sectors
          1515729216 for 512 sectors
          1515729728 for 512 sectors
          1515730240 for 512 sectors
          1515730752 for 512 sectors
          1515731264 for 512 sectors
          1515731776 for 512 sectors
          1515732288 for 512 sectors
          1515732800 for 512 sectors
          1515733312 for 512 sectors
          1515733824 for 512 sectors
          1515734336 for 512 sectors
          1515734848 for 512 sectors
          1515735360 for 512 sectors
          1515735872 for 512 sectors
          1515736384 for 512 sectors
          1515736896 for 512 sectors
          1515737408 for 512 sectors
          1515737920 for 512 sectors
          1515738432 for 512 sectors
          1515738944 for 512 sectors
          1515739456 for 512 sectors
          1515739968 for 512 sectors
          1515740480 for 512 sectors
          1515740992 for 512 sectors
          1515741504 for 512 sectors
          1515742016 for 192 sectors
          1515743712 for 512 sectors
          1515744224 for 512 sectors
          1515744736 for 512 sectors
          1515745248 for 512 sectors
          1515745760 for 512 sectors
          1515746272 for 512 sectors
          1515746784 for 512 sectors
          1515747296 for 512 sectors
          1515747808 for 512 sectors
          1515748320 for 512 sectors
          1515749072 for 304 sectors
          1515750400 for 512 sectors
          1515750912 for 512 sectors
          1515751424 for 512 sectors
          1515751936 for 512 sectors
          1515752448 for 512 sectors
          1515752960 for 512 sectors
          1515753472 for 512 sectors
          1515753984 for 512 sectors
          1515754496 for 232 sectors
[]'s
Salatiel


On Sun, Jan 22, 2017 at 10:03 PM, Andreas Klauer
<Andreas.Klauer@metamorpher.de> wrote:
> On Sun, Jan 22, 2017 at 11:08:40AM -0300, Salatiel Filho wrote:
>> Any ideas what could it be ?
>
> mdadm --examine-badblocks
>
> Regards
> Andreas Klauer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-23  0:18 ` John Stoffel
@ 2017-01-23 14:42   ` Salatiel Filho
  2017-01-23 16:12     ` John Stoffel
  0 siblings, 1 reply; 11+ messages in thread
From: Salatiel Filho @ 2017-01-23 14:42 UTC (permalink / raw)
  To: John Stoffel; +Cc: linux-raid

The output of the command is:

# dd if=Fotos.zip of=/dev/null
dd: error reading ‘Fotos.zip’: Input/output error
328704+0 records in
328704+0 records out
168296448 bytes (168 MB) copied, 0.127723 s, 1.3 GB/s

or

# cp Fotos.zip /tmp/
cp: error reading ‘Fotos.zip’: Input/output error
cp: failed to extend ‘/tmp/Fotos.zip’: Input/output error


There is nothing on dmesg after running those commands;

[]'s
Salatiel


On Sun, Jan 22, 2017 at 9:18 PM, John Stoffel <john@stoffel.org> wrote:
>
> Salatiel> I am trying to recover a few files from my backup. The
> Salatiel> backup is on a raid 5 + ext4.  There are several files where
> Salatiel> i get I/O error. The raid appears to be clean and fsck shows
> Salatiel> no errors. Any ideas what could it be ?
>
> Salatiel> md1 : active raid5 sdd1[0] sdg1[4] sdf1[2] sde1[1]
> Salatiel>       3220829184 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
> Salatiel>       bitmap: 1/8 pages [4KB], 65536KB chunk
>
> It would help if you could post the error(s) you're getting, along
> with any output from dmesg during that time.  Have you done a full
> scan of the disk looking for errors?  You might just have silent
> read errors in your array.  So as root do:
>
>    # echo check >>/sys/block/md??/md/sync_action
>
> where md?? is the name of your md array you want to check.  You can
> get the name from:
>
>    cat /proc/mdstat
>
> and of course it would help to post that info as well if you want more
> help.
>
> John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-23 14:42   ` Salatiel Filho
@ 2017-01-23 16:12     ` John Stoffel
  0 siblings, 0 replies; 11+ messages in thread
From: John Stoffel @ 2017-01-23 16:12 UTC (permalink / raw)
  To: Salatiel Filho; +Cc: John Stoffel, linux-raid


Salatiel> The output of the command is:
Salatiel> # dd if=Fotos.zip of=/dev/null
Salatiel> dd: error reading ‘Fotos.zip’: Input/output error
Salatiel> 328704+0 records in
Salatiel> 328704+0 records out
Salatiel> 168296448 bytes (168 MB) copied, 0.127723 s, 1.3 GB/s

Salatiel> or

Salatiel> # cp Fotos.zip /tmp/
Salatiel> cp: error reading ‘Fotos.zip’: Input/output error
Salatiel> cp: failed to extend ‘/tmp/Fotos.zip’: Input/output error

Can you do a 'zip -l Fotos.zip' and get anything back?  It looks like
the first 168mb might be ok... so you might get something back.

You might also want to try and start doing a dd from 328705 records
(or even a couple more records farther) to see if you can get anything
else from there.

In this case, the tool 'ddrescue' might be your answer, since it is
designed to handle errors like this and continue reading past errors.
It might, or might not, let you get more of your data back.  On debian based
systems you should be able to just do:

	apt-get install gddrescue

or just do:

   apt-cache search ddrescue

For RedHat fedora you could do:

   dnf search ddrescue

too.  

Did you run the "echo check > ..." command at all?  What did it say in
the output of: cat /proc/mdstat  when you did this?  

Salatiel> There is nothing on dmesg after running those commands;

You might be out of luck.  This is one reason why I like A) mirroring
my data and B) saving multiple copies to multiple locations.  Storage
is cheap these days.

Though I admit I'm not perfect either.

Please get us more information so we can try to help more.

Also, have you unmounted the filesystem and done an 'fsck -y /dev/...'
on it as well?  You might want to do a more in-depth check of the
filesystem to see if there's any corruption somewhere.

Also, going to the end of the file, and seeking backwards and reading
off blocks might help you recover more of the zip file.



Salatiel> On Sun, Jan 22, 2017 at 9:18 PM, John Stoffel <john@stoffel.org> wrote:
>> 
Salatiel> I am trying to recover a few files from my backup. The
Salatiel> backup is on a raid 5 + ext4.  There are several files where
Salatiel> i get I/O error. The raid appears to be clean and fsck shows
Salatiel> no errors. Any ideas what could it be ?
>> 
Salatiel> md1 : active raid5 sdd1[0] sdg1[4] sdf1[2] sde1[1]
Salatiel> 3220829184 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
Salatiel> bitmap: 1/8 pages [4KB], 65536KB chunk
>> 
>> It would help if you could post the error(s) you're getting, along
>> with any output from dmesg during that time.  Have you done a full
>> scan of the disk looking for errors?  You might just have silent
>> read errors in your array.  So as root do:
>> 
>> # echo check >>/sys/block/md??/md/sync_action
>> 
>> where md?? is the name of your md array you want to check.  You can
>> get the name from:
>> 
>> cat /proc/mdstat
>> 
>> and of course it would help to post that info as well if you want more
>> help.
>> 
>> John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-23 14:02   ` Salatiel Filho
@ 2017-01-23 17:07     ` John Stoffel
  2017-01-23 17:23       ` Wols Lists
  2017-01-23 17:34     ` Andreas Klauer
  1 sibling, 1 reply; 11+ messages in thread
From: John Stoffel @ 2017-01-23 17:07 UTC (permalink / raw)
  To: Salatiel Filho; +Cc: linux-raid

>>>>> "Salatiel" == Salatiel Filho <salatiel.filho@gmail.com> writes:

Salatiel> Ok, i have run echo check >>/sys/block/md1/md/sync_action,
Salatiel> and now the output of mdadm mdadm --examine-badblocks
Salatiel> /dev/sdd1 /dev/sdg1 /dev/sdf1 /dev/sde1


Salatiel> Bad-blocks on /dev/sdd1:
Salatiel>           1515723072 for 512 sectors
Salatiel>           1515723584 for 512 sectors
Salatiel>           1515724096 for 512 sectors
Salatiel>           1515724608 for 512 sectors

You have bad disks in your array.  First thing off is that I would go
buy replacements and then use 'ddrescue' to copy the data from the old
disks to new disks.  Then I'd try to assemble the NEW disks only into
an array, and then I'd fsck the filesystem(s).

You're going to lose data, no doubt about it.  You're now in the mode
where you're trying to save as much as you can as quickly as possible.

Personally, I'd be setting up a RAID6 array for your new setup.  Then
I would also be setting up weekly checks of the raid array as well.

You're going to lose data no matter what.  So get new disks and start
copying what you can.

John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-23 17:07     ` John Stoffel
@ 2017-01-23 17:23       ` Wols Lists
  0 siblings, 0 replies; 11+ messages in thread
From: Wols Lists @ 2017-01-23 17:23 UTC (permalink / raw)
  To: John Stoffel, Salatiel Filho; +Cc: linux-raid

On 23/01/17 17:07, John Stoffel wrote:
> You have bad disks in your array.  First thing off is that I would go
> buy replacements and then use 'ddrescue' to copy the data from the old
> disks to new disks.  Then I'd try to assemble the NEW disks only into
> an array, and then I'd fsck the filesystem(s).
> 
> You're going to lose data, no doubt about it.  You're now in the mode
> where you're trying to save as much as you can as quickly as possible.
> 
> Personally, I'd be setting up a RAID6 array for your new setup.  Then
> I would also be setting up weekly checks of the raid array as well.
> 
> You're going to lose data no matter what.  So get new disks and start
> copying what you can.

Go read the raid wiki. https://raid.wiki.kernel.org/index.php/Linux_Raid

Especially replacing a failed drive
https://raid.wiki.kernel.org/index.php/Replacing_a_failed_drive

And please - can you get ddrescue's error log that it mentions and email
me a copy. If you've got some Perl or Python or shell skills, maybe you
could even write that script it mentions (which is described in a bit
more detail in programming projects
https://raid.wiki.kernel.org/index.php/Programming_projects) Otherwise
I'll try and write it - might be a good way of learning Python :-) but
at the moment I think I'm learning by jumping in out of my depth, so
we'll see how far I get :-)

Cheers,
Wol



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-23 14:02   ` Salatiel Filho
  2017-01-23 17:07     ` John Stoffel
@ 2017-01-23 17:34     ` Andreas Klauer
  2017-01-24 21:15       ` Salatiel Filho
  1 sibling, 1 reply; 11+ messages in thread
From: Andreas Klauer @ 2017-01-23 17:34 UTC (permalink / raw)
  To: Salatiel Filho; +Cc: linux-raid

On Mon, Jan 23, 2017 at 11:02:24AM -0300, Salatiel Filho wrote:
> mdadm mdadm --examine-badblocks /dev/sdd1 /dev/sdg1 /dev/sdf1  /dev/sde1
> 
> Bad-blocks on /dev/sdd1:
>           1515723072 for 512 sectors
> Bad-blocks on /dev/sde1:
>           1515723072 for 512 sectors

md believes you have bad blocks in identical places so it won't return 
whatever data is in these blocks. Thus you get read errors even if there 
is no bad block on the disk itself. Those bad block entries can be caused 
by cable or controller flukes, making temporary problems permanent...

Personally I disable the bad block list everywhere.

You can search this list for old messages regarding --examine-badblocks, 
this problem came up several times. Clearing the mdadm bad block list is 
worth a try. There's an undocumented option, update=force-no-bbl or such.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-23 17:34     ` Andreas Klauer
@ 2017-01-24 21:15       ` Salatiel Filho
  2017-01-24 21:58         ` Wols Lists
  2017-01-25 15:54         ` John Stoffel
  0 siblings, 2 replies; 11+ messages in thread
From: Salatiel Filho @ 2017-01-24 21:15 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

On Mon, Jan 23, 2017 at 2:34 PM, Andreas Klauer
<Andreas.Klauer@metamorpher.de> wrote:
> On Mon, Jan 23, 2017 at 11:02:24AM -0300, Salatiel Filho wrote:
>> mdadm mdadm --examine-badblocks /dev/sdd1 /dev/sdg1 /dev/sdf1  /dev/sde1
>>
>> Bad-blocks on /dev/sdd1:
>>           1515723072 for 512 sectors
>> Bad-blocks on /dev/sde1:
>>           1515723072 for 512 sectors
>
> md believes you have bad blocks in identical places so it won't return
> whatever data is in these blocks. Thus you get read errors even if there
> is no bad block on the disk itself. Those bad block entries can be caused
> by cable or controller flukes, making temporary problems permanent...
>
> Personally I disable the bad block list everywhere.
>
> You can search this list for old messages regarding --examine-badblocks,
> this problem came up several times. Clearing the mdadm bad block list is
> worth a try. There's an undocumented option, update=force-no-bbl or such.
>
> Regards
> Andreas Klauer

Thanks all of you for the help.
Andreas, the force-no-bbl from mdadm 3.4 did the trick. I was able to
retrieve all files and their md5 matches, so it is great =)
I really think it is very unlikely that two different disks from two
different brands would have problems at exactly the same block.
I have a question, who populates the badblock list ? Is the check
action send to the /sys/block/md??/md/sync_action OR each read error
updates it ?
I think it was maybe some problem with the cable ( it is a 4 disks usb3 bay ).
Anyway, thank you very much !

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-24 21:15       ` Salatiel Filho
@ 2017-01-24 21:58         ` Wols Lists
  2017-01-25 15:54         ` John Stoffel
  1 sibling, 0 replies; 11+ messages in thread
From: Wols Lists @ 2017-01-24 21:58 UTC (permalink / raw)
  To: Salatiel Filho, Andreas Klauer; +Cc: linux-raid

On 24/01/17 21:15, Salatiel Filho wrote:
> I really think it is very unlikely that two different disks from two
> different brands would have problems at exactly the same block.
> I have a question, who populates the badblock list ? Is the check
> action send to the /sys/block/md??/md/sync_action OR each read error
> updates it ?

I think it's a known problem - nobody seems to know quite why it happens
but when a block is added to the badblocks list it seems to get added to
every device. Given that modern hard-drives are supposed to relocate bad
blocks and not need a badblock list, I think that's why it's not been
found, most people especially those in the know just tend to disable
os-level badblocks.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Input/Output error reading from a clean raid
  2017-01-24 21:15       ` Salatiel Filho
  2017-01-24 21:58         ` Wols Lists
@ 2017-01-25 15:54         ` John Stoffel
  1 sibling, 0 replies; 11+ messages in thread
From: John Stoffel @ 2017-01-25 15:54 UTC (permalink / raw)
  To: Salatiel Filho; +Cc: Andreas Klauer, linux-raid

>>>>> "Salatiel" == Salatiel Filho <salatiel.filho@gmail.com> writes:

Salatiel> On Mon, Jan 23, 2017 at 2:34 PM, Andreas Klauer
Salatiel> <Andreas.Klauer@metamorpher.de> wrote:
>> On Mon, Jan 23, 2017 at 11:02:24AM -0300, Salatiel Filho wrote:
>>> mdadm mdadm --examine-badblocks /dev/sdd1 /dev/sdg1 /dev/sdf1  /dev/sde1
>>> 
>>> Bad-blocks on /dev/sdd1:
>>> 1515723072 for 512 sectors
>>> Bad-blocks on /dev/sde1:
>>> 1515723072 for 512 sectors
>> 
>> md believes you have bad blocks in identical places so it won't return
>> whatever data is in these blocks. Thus you get read errors even if there
>> is no bad block on the disk itself. Those bad block entries can be caused
>> by cable or controller flukes, making temporary problems permanent...
>> 
>> Personally I disable the bad block list everywhere.
>> 
>> You can search this list for old messages regarding --examine-badblocks,
>> this problem came up several times. Clearing the mdadm bad block list is
>> worth a try. There's an undocumented option, update=force-no-bbl or such.
>> 
>> Regards
>> Andreas Klauer

Salatiel> Thanks all of you for the help.
Salatiel> Andreas, the force-no-bbl from mdadm 3.4 did the trick. I was able to
Salatiel> retrieve all files and their md5 matches, so it is great =)

Great news, glad I could help, wish I had pin-pointed the root cause
better.

john

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-01-25 15:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-22 14:08 Input/Output error reading from a clean raid Salatiel Filho
2017-01-23  0:18 ` John Stoffel
2017-01-23 14:42   ` Salatiel Filho
2017-01-23 16:12     ` John Stoffel
     [not found] ` <20170123010334.GA7546@metamorpher.de>
2017-01-23 14:02   ` Salatiel Filho
2017-01-23 17:07     ` John Stoffel
2017-01-23 17:23       ` Wols Lists
2017-01-23 17:34     ` Andreas Klauer
2017-01-24 21:15       ` Salatiel Filho
2017-01-24 21:58         ` Wols Lists
2017-01-25 15:54         ` John Stoffel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.