All of lore.kernel.org
 help / color / mirror / Atom feed
* I/O errors without erros from underlying device
@ 2015-12-07 16:05 Arkadiusz Miskiewicz
  2015-12-07 16:37 ` John Stoffel
  0 siblings, 1 reply; 8+ messages in thread
From: Arkadiusz Miskiewicz @ 2015-12-07 16:05 UTC (permalink / raw)
  To: linux-raid


Hi.

4.3.0 kernel, raid6 array:

md7 : active raid6 sdg[10] sdad1[9] sdac1[8] sdag1[7] sdaf1[6] sdae1[5] sdaj1[4] sdai1[3] sdah1[2] sdn1[1]
      31255089152 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
      bitmap: 1/30 pages [4KB], 65536KB chunk

array had weird failure where many disks went into failed state but
remove && adding these disks "fixed" it (turns out not really fixed it).

Unfortunately now some reads fail:

pread(4, 0x1483a00, 4096, 16003680464896) = -1 EIO (Input/output error)

To reproduce used xfs_io
 xfs_io -d -c "pread 16003680464896 4096" /dev/md7
pread64: Input/output error
which does pread exactly as shown above.

write also fails for that area:
xfs_io -d -c "pwrite 16003680464896 4096" /dev/md7
pwrite64: Input/output error

Note that nothing is written in dmesg when that happens.

I've tried various offsets and sizes of pread and at some point that was logged:
[  848.988518] Buffer I/O error on dev md7, logical block 3907148544, async page read

but no error from underlying devices.

List of bad blocks:
http://sprunge.us/XSWI

What can I do now?

(loosing data from that few sectors is acceptable if the rest will be readable)

Thanks,
-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: I/O errors without erros from underlying device
  2015-12-07 16:05 I/O errors without erros from underlying device Arkadiusz Miskiewicz
@ 2015-12-07 16:37 ` John Stoffel
  2015-12-07 17:06   ` Arkadiusz Miskiewicz
       [not found]   ` <201512071803.26434.arekm@maven.pl>
  0 siblings, 2 replies; 8+ messages in thread
From: John Stoffel @ 2015-12-07 16:37 UTC (permalink / raw)
  To: arekm; +Cc: linux-raid


Arkadiusz> 4.3.0 kernel, raid6 array:

I think there's a bug in the 4.3.x and 4.4-rc3 and lower with block
merges.  I ran into these over the weekend, where v4.2.6 was stable,
but anything higher would lock up and crash on me.

So first step would be to make sure you get and test v4.4-rc4.

Arkadiusz> md7 : active raid6 sdg[10] sdad1[9] sdac1[8] sdag1[7] sdaf1[6] sdae1[5] sdaj1[4] sdai1[3] sdah1[2] sdn1[1]
Arkadiusz>       31255089152 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
Arkadiusz>       bitmap: 1/30 pages [4KB], 65536KB chunk

Arkadiusz> array had weird failure where many disks went into failed state but
Arkadiusz> remove && adding these disks "fixed" it (turns out not really fixed it).

Arkadiusz> Unfortunately now some reads fail:

Arkadiusz> pread(4, 0x1483a00, 4096, 16003680464896) = -1 EIO (Input/output error)

Arkadiusz> To reproduce used xfs_io
Arkadiusz>  xfs_io -d -c "pread 16003680464896 4096" /dev/md7
Arkadiusz> pread64: Input/output error
Arkadiusz> which does pread exactly as shown above.

Arkadiusz> write also fails for that area:
Arkadiusz> xfs_io -d -c "pwrite 16003680464896 4096" /dev/md7
Arkadiusz> pwrite64: Input/output error

Arkadiusz> Note that nothing is written in dmesg when that happens.

Arkadiusz> I've tried various offsets and sizes of pread and at some point that was logged:
Arkadiusz> [  848.988518] Buffer I/O error on dev md7, logical block 3907148544, async page read

Arkadiusz> but no error from underlying devices.

Arkadiusz> List of bad blocks:
Arkadiusz> http://sprunge.us/XSWI

Arkadiusz> What can I do now?

Arkadiusz> (loosing data from that few sectors is acceptable if the rest will be readable)

Arkadiusz> Thanks,
Arkadiusz> -- 
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
Arkadiusz> --
Arkadiusz> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
Arkadiusz> the body of a message to majordomo@vger.kernel.org
Arkadiusz> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: I/O errors without erros from underlying device
  2015-12-07 16:37 ` John Stoffel
@ 2015-12-07 17:06   ` Arkadiusz Miskiewicz
       [not found]   ` <201512071803.26434.arekm@maven.pl>
  1 sibling, 0 replies; 8+ messages in thread
From: Arkadiusz Miskiewicz @ 2015-12-07 17:06 UTC (permalink / raw)
  To: John Stoffel; +Cc: linux-raid

On Monday 07 of December 2015, John Stoffel wrote:
> Arkadiusz> 4.3.0 kernel, raid6 array:
> 
> I think there's a bug in the 4.3.x and 4.4-rc3 and lower with block
> merges.  I ran into these over the weekend, where v4.2.6 was stable,
> but anything higher would lock up and crash on me.

Well, no crashes here.

> So first step would be to make sure you get and test v4.4-rc4.

Do you know which commit there?

> 
> Arkadiusz> md7 : active raid6 sdg[10] sdad1[9] sdac1[8] sdag1[7] sdaf1[6]
> sdae1[5] sdaj1[4] sdai1[3] sdah1[2] sdn1[1] Arkadiusz>       31255089152
> blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
> Arkadiusz>       bitmap: 1/30 pages [4KB], 65536KB chunk
> 
> Arkadiusz> array had weird failure where many disks went into failed state
> but Arkadiusz> remove && adding these disks "fixed" it (turns out not
> really fixed it).
> 
> Arkadiusz> Unfortunately now some reads fail:
> 
> Arkadiusz> pread(4, 0x1483a00, 4096, 16003680464896) = -1 EIO (Input/output
> error)
> 
> Arkadiusz> To reproduce used xfs_io
> Arkadiusz>  xfs_io -d -c "pread 16003680464896 4096" /dev/md7
> Arkadiusz> pread64: Input/output error
> Arkadiusz> which does pread exactly as shown above.
> 
> Arkadiusz> write also fails for that area:
> Arkadiusz> xfs_io -d -c "pwrite 16003680464896 4096" /dev/md7
> Arkadiusz> pwrite64: Input/output error
> 
> Arkadiusz> Note that nothing is written in dmesg when that happens.
> 
> Arkadiusz> I've tried various offsets and sizes of pread and at some point
> that was logged: Arkadiusz> [  848.988518] Buffer I/O error on dev md7,
> logical block 3907148544, async page read
> 
> Arkadiusz> but no error from underlying devices.
> 
> Arkadiusz> List of bad blocks:
> Arkadiusz> http://sprunge.us/XSWI
> 
> Arkadiusz> What can I do now?
> 
> Arkadiusz> (loosing data from that few sectors is acceptable if the rest
> will be readable)
> 
> Arkadiusz> Thanks,
> Arkadiusz> --
> Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
> Arkadiusz> --
> Arkadiusz> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in Arkadiusz> the body of a message to
> majordomo@vger.kernel.org
> Arkadiusz> More majordomo info at 
> http://vger.kernel.org/majordomo-info.html


-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: I/O errors without erros from underlying device
       [not found]   ` <201512071803.26434.arekm@maven.pl>
@ 2015-12-07 17:23     ` John Stoffel
  2015-12-07 20:46       ` Arkadiusz Miskiewicz
  0 siblings, 1 reply; 8+ messages in thread
From: John Stoffel @ 2015-12-07 17:23 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: John Stoffel, linux-raid

>>>>> "Arkadiusz" == Arkadiusz Miśkiewicz <arekm@maven.pl> writes:

Arkadiusz> On Monday 07 of December 2015, John Stoffel wrote:
Arkadiusz> 4.3.0 kernel, raid6 array:
>> 
>> I think there's a bug in the 4.3.x and 4.4-rc3 and lower with block
>> merges.  I ran into these over the weekend, where v4.2.6 was stable,
>> but anything higher would lock up and crash on me.

Arkadiusz> Well, no crashes here.

That's good.  It was hard(er) to hit when I wasn't running KVM VMs at
the same time on the server, and I was running strictly RAID1 disks,
so it's hard to know.

>> So first step would be to make sure you get and test v4.4-rc4.

Arkadiusz> Do you know which commit there?

Try this, from the master lkml git repository:

    2873d32ff493ecbfb7d2c7f56812ab941dda42f4




>> 
Arkadiusz> md7 : active raid6 sdg[10] sdad1[9] sdac1[8] sdag1[7] sdaf1[6]
>> sdae1[5] sdaj1[4] sdai1[3] sdah1[2] sdn1[1] Arkadiusz>       31255089152
>> blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
Arkadiusz> bitmap: 1/30 pages [4KB], 65536KB chunk
>> 
Arkadiusz> array had weird failure where many disks went into failed state
>> but Arkadiusz> remove && adding these disks "fixed" it (turns out not
>> really fixed it).
>> 
Arkadiusz> Unfortunately now some reads fail:
>> 
Arkadiusz> pread(4, 0x1483a00, 4096, 16003680464896) = -1 EIO (Input/output
>> error)
>> 
Arkadiusz> To reproduce used xfs_io
Arkadiusz> xfs_io -d -c "pread 16003680464896 4096" /dev/md7
Arkadiusz> pread64: Input/output error
Arkadiusz> which does pread exactly as shown above.
>> 
Arkadiusz> write also fails for that area:
Arkadiusz> xfs_io -d -c "pwrite 16003680464896 4096" /dev/md7
Arkadiusz> pwrite64: Input/output error
>> 
Arkadiusz> Note that nothing is written in dmesg when that happens.
>> 
Arkadiusz> I've tried various offsets and sizes of pread and at some point
>> that was logged: Arkadiusz> [  848.988518] Buffer I/O error on dev md7,
>> logical block 3907148544, async page read
>> 
Arkadiusz> but no error from underlying devices.
>> 
Arkadiusz> List of bad blocks:
Arkadiusz> http://sprunge.us/XSWI
>> 
Arkadiusz> What can I do now?
>> 
Arkadiusz> (loosing data from that few sectors is acceptable if the rest
>> will be readable)
>> 
Arkadiusz> Thanks,
Arkadiusz> --
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
Arkadiusz> --
Arkadiusz> To unsubscribe from this list: send the line "unsubscribe
>> linux-raid" in Arkadiusz> the body of a message to
>> majordomo@vger.kernel.org
Arkadiusz> More majordomo info at 
>> http://vger.kernel.org/majordomo-info.html


Arkadiusz> -- 
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: I/O errors without erros from underlying device
  2015-12-07 17:23     ` John Stoffel
@ 2015-12-07 20:46       ` Arkadiusz Miskiewicz
  2015-12-08  4:02         ` John Stoffel
  2015-12-08 11:05         ` Arkadiusz Miskiewicz
  0 siblings, 2 replies; 8+ messages in thread
From: Arkadiusz Miskiewicz @ 2015-12-07 20:46 UTC (permalink / raw)
  To: John Stoffel; +Cc: linux-raid

On Monday 07 of December 2015, John Stoffel wrote:
> >>>>> "Arkadiusz" == Arkadiusz Miśkiewicz <arekm@maven.pl> writes:
> Arkadiusz> On Monday 07 of December 2015, John Stoffel wrote:
> 
> Arkadiusz> 4.3.0 kernel, raid6 array:
> >> I think there's a bug in the 4.3.x and 4.4-rc3 and lower with block
> >> merges.  I ran into these over the weekend, where v4.2.6 was stable,
> >> but anything higher would lock up and crash on me.
> 
> Arkadiusz> Well, no crashes here.
> 
> That's good.  It was hard(er) to hit when I wasn't running KVM VMs at
> the same time on the server, and I was running strictly RAID1 disks,
> so it's hard to know.
> 
> >> So first step would be to make sure you get and test v4.4-rc4.
> 
> Arkadiusz> Do you know which commit there?
> 
> Try this, from the master lkml git repository:
> 
>     2873d32ff493ecbfb7d2c7f56812ab941dda42f4

It's merge commit. Don't see any obvious patch in that merge that would help 
my case.


Anyway I would expect my problem to be related to badblock lists which numbers 
are close to dmesg error message: [  848.988518] Buffer I/O error on dev md7, 
logical block 3907148544, async page read

> >> http://sprunge.us/XSWI

But how to repair these if write() also fails and 
http://www.spinics.net/lists/raid/msg49325.html suggests that write should 
"fix" these (by using replacement blocks I guess) ?


 
> Arkadiusz> md7 : active raid6 sdg[10] sdad1[9] sdac1[8] sdag1[7] sdaf1[6]
> 
> >> sdae1[5] sdaj1[4] sdai1[3] sdah1[2] sdn1[1] Arkadiusz>       31255089152
> >> blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
> 
> Arkadiusz> bitmap: 1/30 pages [4KB], 65536KB chunk
> 
> Arkadiusz> array had weird failure where many disks went into failed state
> 
> >> but Arkadiusz> remove && adding these disks "fixed" it (turns out not
> >> really fixed it).
> 
> Arkadiusz> Unfortunately now some reads fail:
> 
> Arkadiusz> pread(4, 0x1483a00, 4096, 16003680464896) = -1 EIO (Input/output
> 
> >> error)
> 
> Arkadiusz> To reproduce used xfs_io
> Arkadiusz> xfs_io -d -c "pread 16003680464896 4096" /dev/md7
> Arkadiusz> pread64: Input/output error
> Arkadiusz> which does pread exactly as shown above.
> 
> Arkadiusz> write also fails for that area:
> Arkadiusz> xfs_io -d -c "pwrite 16003680464896 4096" /dev/md7
> Arkadiusz> pwrite64: Input/output error
> 
> Arkadiusz> Note that nothing is written in dmesg when that happens.
> 
> Arkadiusz> I've tried various offsets and sizes of pread and at some point
> 
> >> that was logged: Arkadiusz> [  848.988518] Buffer I/O error on dev md7,
> >> logical block 3907148544, async page read
> 
> Arkadiusz> but no error from underlying devices.
> 
> Arkadiusz> List of bad blocks:
> Arkadiusz> http://sprunge.us/XSWI
> 
> Arkadiusz> What can I do now?
> 
> Arkadiusz> (loosing data from that few sectors is acceptable if the rest
> 
> >> will be readable)
> 
> Arkadiusz> Thanks,
> Arkadiusz> --
> Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
> Arkadiusz> --
> Arkadiusz> To unsubscribe from this list: send the line "unsubscribe
> 
> >> linux-raid" in Arkadiusz> the body of a message to
> >> majordomo@vger.kernel.org
> 
> Arkadiusz> More majordomo info at
> 
> >> http://vger.kernel.org/majordomo-info.html
> 
> Arkadiusz> --
> Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )


-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: I/O errors without erros from underlying device
  2015-12-07 20:46       ` Arkadiusz Miskiewicz
@ 2015-12-08  4:02         ` John Stoffel
  2015-12-08 11:05         ` Arkadiusz Miskiewicz
  1 sibling, 0 replies; 8+ messages in thread
From: John Stoffel @ 2015-12-08  4:02 UTC (permalink / raw)
  To: arekm; +Cc: John Stoffel, linux-raid

>>>>> "Arkadiusz" == Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> writes:

Arkadiusz> On Monday 07 of December 2015, John Stoffel wrote:
>> >>>>> "Arkadiusz" == Arkadiusz Miśkiewicz <arekm@maven.pl> writes:
Arkadiusz> On Monday 07 of December 2015, John Stoffel wrote:
>> 
Arkadiusz> 4.3.0 kernel, raid6 array:
>> >> I think there's a bug in the 4.3.x and 4.4-rc3 and lower with block
>> >> merges.  I ran into these over the weekend, where v4.2.6 was stable,
>> >> but anything higher would lock up and crash on me.
>> 
Arkadiusz> Well, no crashes here.
>> 
>> That's good.  It was hard(er) to hit when I wasn't running KVM VMs at
>> the same time on the server, and I was running strictly RAID1 disks,
>> so it's hard to know.
>> 
>> >> So first step would be to make sure you get and test v4.4-rc4.
>> 
Arkadiusz> Do you know which commit there?
>> 
>> Try this, from the master lkml git repository:
>> 
>> 2873d32ff493ecbfb7d2c7f56812ab941dda42f4

Arkadiusz> It's merge commit. Don't see any obvious patch in that merge that would help 
Arkadiusz> my case.

The merge from Jens Axboe talking about blk something or other.  In my
case, it lead to instant lockups.  In your case... hard to know.
Sorry. 

Arkadiusz> Anyway I would expect my problem to be related to badblock
Arkadiusz> lists which numbers are close to dmesg error message: [
Arkadiusz> 848.988518] Buffer I/O error on dev md7, logical block
Arkadiusz> 3907148544, async page read

>> >> http://sprunge.us/XSWI

Arkadiusz> But how to repair these if write() also fails and 
Arkadiusz> http://www.spinics.net/lists/raid/msg49325.html suggests that write should 
Arkadiusz> "fix" these (by using replacement blocks I guess) ?


 
Arkadiusz> md7 : active raid6 sdg[10] sdad1[9] sdac1[8] sdag1[7] sdaf1[6]
>> 
>> >> sdae1[5] sdaj1[4] sdai1[3] sdah1[2] sdn1[1] Arkadiusz>       31255089152
>> >> blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
>> 
Arkadiusz> bitmap: 1/30 pages [4KB], 65536KB chunk
>> 
Arkadiusz> array had weird failure where many disks went into failed state
>> 
>> >> but Arkadiusz> remove && adding these disks "fixed" it (turns out not
>> >> really fixed it).
>> 
Arkadiusz> Unfortunately now some reads fail:
>> 
Arkadiusz> pread(4, 0x1483a00, 4096, 16003680464896) = -1 EIO (Input/output
>> 
>> >> error)
>> 
Arkadiusz> To reproduce used xfs_io
Arkadiusz> xfs_io -d -c "pread 16003680464896 4096" /dev/md7
Arkadiusz> pread64: Input/output error
Arkadiusz> which does pread exactly as shown above.
>> 
Arkadiusz> write also fails for that area:
Arkadiusz> xfs_io -d -c "pwrite 16003680464896 4096" /dev/md7
Arkadiusz> pwrite64: Input/output error
>> 
Arkadiusz> Note that nothing is written in dmesg when that happens.
>> 
Arkadiusz> I've tried various offsets and sizes of pread and at some point
>> 
>> >> that was logged: Arkadiusz> [  848.988518] Buffer I/O error on dev md7,
>> >> logical block 3907148544, async page read
>> 
Arkadiusz> but no error from underlying devices.
>> 
Arkadiusz> List of bad blocks:
Arkadiusz> http://sprunge.us/XSWI
>> 
Arkadiusz> What can I do now?
>> 
Arkadiusz> (loosing data from that few sectors is acceptable if the rest
>> 
>> >> will be readable)
>> 
Arkadiusz> Thanks,
Arkadiusz> --
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
Arkadiusz> --
Arkadiusz> To unsubscribe from this list: send the line "unsubscribe
>> 
>> >> linux-raid" in Arkadiusz> the body of a message to
>> >> majordomo@vger.kernel.org
>> 
Arkadiusz> More majordomo info at
>> 
>> >> http://vger.kernel.org/majordomo-info.html
>> 
Arkadiusz> --
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )


Arkadiusz> -- 
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

Arkadiusz> -- 
Arkadiusz> Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
Arkadiusz> --
Arkadiusz> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
Arkadiusz> the body of a message to majordomo@vger.kernel.org
Arkadiusz> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: I/O errors without erros from underlying device
  2015-12-07 20:46       ` Arkadiusz Miskiewicz
  2015-12-08  4:02         ` John Stoffel
@ 2015-12-08 11:05         ` Arkadiusz Miskiewicz
  2015-12-21  2:25           ` NeilBrown
  1 sibling, 1 reply; 8+ messages in thread
From: Arkadiusz Miskiewicz @ 2015-12-08 11:05 UTC (permalink / raw)
  To: linux-raid

On Monday 07 of December 2015, Arkadiusz Miskiewicz wrote:

> Anyway I would expect my problem to be related to badblock lists which
> numbers are close to dmesg error message: [  848.988518] Buffer I/O error
> on dev md7, logical block 3907148544, async page read
> 
> > >> http://sprunge.us/XSWI
> 
> But how to repair these if write() also fails and
> http://www.spinics.net/lists/raid/msg49325.html suggests that write should
> "fix" these (by using replacement blocks I guess) ?

Tried to get rid of badblock lists (well, corruption in that area is better 
than no access at all):

mdadm --assemble /dev/md7 --force --update=no-bbl
mdadm: Cannot remove active bbl from /dev/sdae1
mdadm: Cannot remove active bbl from /dev/sdag1
mdadm: Cannot remove active bbl from /dev/sdai1
mdadm: Cannot remove active bbl from /dev/sdn1
mdadm: Cannot remove active bbl from /dev/sdg
mdadm: Cannot remove active bbl from /dev/sdad1
mdadm: /dev/md7 has been started with 10 drives.

Is there a way to archieve that anyway?

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: I/O errors without erros from underlying device
  2015-12-08 11:05         ` Arkadiusz Miskiewicz
@ 2015-12-21  2:25           ` NeilBrown
  0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2015-12-21  2:25 UTC (permalink / raw)
  To: arekm, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1917 bytes --]

On Tue, Dec 08 2015, Arkadiusz Miskiewicz wrote:

> On Monday 07 of December 2015, Arkadiusz Miskiewicz wrote:
>
>> Anyway I would expect my problem to be related to badblock lists which
>> numbers are close to dmesg error message: [  848.988518] Buffer I/O error
>> on dev md7, logical block 3907148544, async page read
>> 
>> > >> http://sprunge.us/XSWI
>> 
>> But how to repair these if write() also fails and
>> http://www.spinics.net/lists/raid/msg49325.html suggests that write should
>> "fix" these (by using replacement blocks I guess) ?
>
> Tried to get rid of badblock lists (well, corruption in that area is better 
> than no access at all):
>
> mdadm --assemble /dev/md7 --force --update=no-bbl
> mdadm: Cannot remove active bbl from /dev/sdae1
> mdadm: Cannot remove active bbl from /dev/sdag1
> mdadm: Cannot remove active bbl from /dev/sdai1
> mdadm: Cannot remove active bbl from /dev/sdn1
> mdadm: Cannot remove active bbl from /dev/sdg
> mdadm: Cannot remove active bbl from /dev/sdad1
> mdadm: /dev/md7 has been started with 10 drives.
>
> Is there a way to archieve that anyway?
>

You probably have bad blocks in multiple disks in the one stripe
(look in /sys/block/md7/md/dev-*/badblocks or something like that to
see).

To get rid of these you would need to write to every block in the
stripe.  I guess I should try to find a way to make that easier.

If you like you could hack mdadm to allow you to remove the bbl even
though they aren't empty.
In super1.c look for:
	} else if (strcmp(update, "no-bbl") == 0) {
		if (sb->feature_map & __cpu_to_le32(MD_FEATURE_BAD_BLOCKS))
			pr_err("Cannot remove active bbl from %s\n",devname);
		else {
			sb->bblog_size = 0;
			sb->bblog_shift = 0;
			sb->bblog_offset = 0;
		}

and change it to be unconditional and also to clear
MD_FEATURE_BAD_BLOCKS.

No warranty expressed or implied.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-21  2:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-07 16:05 I/O errors without erros from underlying device Arkadiusz Miskiewicz
2015-12-07 16:37 ` John Stoffel
2015-12-07 17:06   ` Arkadiusz Miskiewicz
     [not found]   ` <201512071803.26434.arekm@maven.pl>
2015-12-07 17:23     ` John Stoffel
2015-12-07 20:46       ` Arkadiusz Miskiewicz
2015-12-08  4:02         ` John Stoffel
2015-12-08 11:05         ` Arkadiusz Miskiewicz
2015-12-21  2:25           ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.