All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID6 - CPU At 100% Usage After Reassembly
@ 2016-09-04  4:04 Francisco Parada
  2016-09-04 14:38 ` Michael J. Shaver
  0 siblings, 1 reply; 16+ messages in thread
From: Francisco Parada @ 2016-09-04  4:04 UTC (permalink / raw)
  To: linux-raid

Hello everyone,

I know this gets a ton of visibility, so I'll keep it as concise as possible.

I'm running Ubuntu 16.04.1 and I have (read had) a 7 drive RAID6
array.  I attempted to grow the array by adding 3 additional drives
for a total of 10, but it seems that one of the brand new drives had
60+ bad blocks (according to "badblocks -vw").  I came to this
conclusion, because I had a power outage during the grow that lasted
longer than my 1500VA battery backup could withstand, so when I
attempted to continue the reshape, I noticed that the assemble
wouldn't start upon reboot.  All drives were marked as spares:

=================================================================================================================
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : inactive sdi[0](S) sdh[2](S) sdj[3](S) sdf[7](S) sdg[9](S)
sdd[10](S) sde[11](S) sdb[13](S) sdc[12](S)
      26371219608 blocks super 1.2
=================================================================================================================


Notice above, that there's only 9 drives instead of 10, which I was
supposed to have.  The drive that's missing is "sdk", but that's
because using "badblocks -vw" has wiped out the drive in an effort to
figure out if there was actually something wrong with said drive
(You're probably gasping, but it had a missing GPT table, and no
matter what I tried to recover it, the drive would just stop
responding to reads and writes).  So I attempted to assemble the array
with "/dev/sdk" missing as shown below, but I get this:

===================================================================================================================
# mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdi /dev/sdj missing
mdadm: looking for devices for /dev/md127
mdadm: cannot open device missing: No such file or directory
mdadm: missing has no superblock - assembly aborted
===================================================================================================================


But I guess that doesn't matter, because almost all other drives are
almost sync'ed as specified in the events output of mdadm (once again,
keep in mind that "/dev/sdk" is blank, thus the "no md superblock"
error):

==============================================
# mdadm -E /dev/sd[b-k] | grep Events
         Events : 280026
         Events : 280026
         Events : 280026
         Events : 280026
         Events : 280026
         Events : 280026
         Events : 280026
mdadm: No md superblock detected on /dev/sdk.
         Events : 280026
         Events : 280011
==============================================


So I attempt to reassemble it, by leaving out "/dev/sdk" and it seems
to assemble it, with some warnings of course:

===========================================================================================================
# mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdi /dev/sdj
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb is identified as a member of /dev/md127, slot 7.
mdadm: /dev/sdc is identified as a member of /dev/md127, slot 8.
mdadm: /dev/sdd is identified as a member of /dev/md127, slot 6.
mdadm: /dev/sde is identified as a member of /dev/md127, slot 9.
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdh is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sdi is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdj is identified as a member of /dev/md127, slot 3.
mdadm: :/dev/md127 has an active reshape - checking if critical
section needs to be restored
mdadm: No backup metadata on device-7
mdadm: No backup metadata on device-8
mdadm: No backup metadata on device-9
mdadm: added /dev/sdg to /dev/md127 as 1
mdadm: added /dev/sdh to /dev/md127 as 2
mdadm: added /dev/sdj to /dev/md127 as 3 (possibly out of date)
mdadm: added /dev/sdf to /dev/md127 as 4
mdadm: no uptodate device for slot 10 of /dev/md127
mdadm: added /dev/sdd to /dev/md127 as 6
mdadm: added /dev/sdb to /dev/md127 as 7
mdadm: added /dev/sdc to /dev/md127 as 8
mdadm: added /dev/sde to /dev/md127 as 9
mdadm: added /dev/sdi to /dev/md127 as 0
mdadm: /dev/md127 has been started with 8 drives (out of 10).
===========================================================================================================


But now the reshape goes from 80000K to 1000K and eventually 0K speed
shortly after hitting "enter" to reassemble:

===========================================================================================================
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : active raid6 sdi[0] sde[11] sdc[12] sdb[13] sdd[10] sdf[7] sdh[2] sdg[9]
      14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
[10/8] [UUU_U_UUUU]
      [=======>.............]  reshape = 39.1% (1146348628/2930135040)
finish=51538126.9min speed=0K/sec
      bitmap: 0/22 pages [0KB], 65536KB chunk

unused devices: <none>
===========================================================================================================


So I did a little probing and it seems that my CPU is running at 100%
by "md127_raid6".  I should note that it has been this way for over a
week now, the time doesn't reflect it because I had to perform a
reboot.  So I'm at a loss, because even if I try to optimize reshape
speeds, the reshape still remains at 0K/sec.

=================================================================================
top - 22:28:53 up  1:56,  3 users,  load average: 3.05, 2.04, 0.92
Tasks: 317 total,   4 running, 313 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.4 us, 50.5 sy,  0.0 ni, 44.1 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1521584 total,   220812 free,   774868 used,   525904 buff/cache
KiB Swap: 25153532 total, 25000764 free,   152768 used.   477708 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+   COMMAND
  435 root      20   0       0      0      0 R  98.0  0.0   5:27.12
md127_raid6
28941 cisco     20   0  546436  34336  25080 R   2.9  2.3   0:18.32
gnome-disks
 3557 message+  20   0   44364   4632   3068 S   2.0  0.3   0:06.53
dbus-daemon
=================================================================================

Any ideas?  Your help would be greatly appreciated.

Thanks in advance

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-09-04  4:04 RAID6 - CPU At 100% Usage After Reassembly Francisco Parada
@ 2016-09-04 14:38 ` Michael J. Shaver
       [not found]   ` <CAOW94uv4zSGs+6be3zhcQaGZdiAgg-s4ZHZ=mszcURo6pqJyqA@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Michael J. Shaver @ 2016-09-04 14:38 UTC (permalink / raw)
  To: mdraid

Hello Francisco,

You are almost certainly hitting the same issue reported several times
both here and on other forums, although your case is the first one I
have seen for raid6:

http://www.spinics.net/lists/raid/msg53056.html
http://www.spinics.net/lists/raid/msg52235.html
https://bbs.archlinux.org/viewtopic.php?id=212108
https://forums.gentoo.org/viewtopic-t-1043706.html

At this time, there have been a couple suggestions on possible fixes
(disable transparent huge page support in the kernel)

Another gentleman, Bart Van Assche, had suggested a set of patches to
the kernel scheduler that may help with the problem:

https://lkml.org/lkml/2016/8/3/289

I am still trying to wrap my head around the patches themselves, and
haven't tried each of the patches individually. Disabling transparent
huge page support had no effect for me. At this time, my array is
still locked up with the exact s
ame symptoms you report. I am slowly learning about the spin lock
mechanism within the kernel to try to identify the underlying problem,
but this is admittedly out of my area of expertise.

To help correlate your problem with what others observed, would it be
possible for you to share the call stack for the following three
processes?

mdXXX_raid6
mdXXX_reshape
systemd-udevd

Or any other processes reporting deadlock while the reshape is trying to run.

Curious to see if you observe the same call stack.

I will definitely let you know if I have any major revelations. thanks Michael

On Sun, Sep 4, 2016 at 12:04 AM, Francisco Parada
<advanceandconquer@gmail.com> wrote:
> Hello everyone,
>
> I know this gets a ton of visibility, so I'll keep it as concise as possible.
>
> I'm running Ubuntu 16.04.1 and I have (read had) a 7 drive RAID6
> array.  I attempted to grow the array by adding 3 additional drives
> for a total of 10, but it seems that one of the brand new drives had
> 60+ bad blocks (according to "badblocks -vw").  I came to this
> conclusion, because I had a power outage during the grow that lasted
> longer than my 1500VA battery backup could withstand, so when I
> attempted to continue the reshape, I noticed that the assemble
> wouldn't start upon reboot.  All drives were marked as spares:
>
> =================================================================================================================
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md127 : inactive sdi[0](S) sdh[2](S) sdj[3](S) sdf[7](S) sdg[9](S)
> sdd[10](S) sde[11](S) sdb[13](S) sdc[12](S)
>       26371219608 blocks super 1.2
> =================================================================================================================
>
>
> Notice above, that there's only 9 drives instead of 10, which I was
> supposed to have.  The drive that's missing is "sdk", but that's
> because using "badblocks -vw" has wiped out the drive in an effort to
> figure out if there was actually something wrong with said drive
> (You're probably gasping, but it had a missing GPT table, and no
> matter what I tried to recover it, the drive would just stop
> responding to reads and writes).  So I attempted to assemble the array
> with "/dev/sdk" missing as shown below, but I get this:
>
> ===================================================================================================================
> # mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
> /dev/sdg /dev/sdh /dev/sdi /dev/sdj missing
> mdadm: looking for devices for /dev/md127
> mdadm: cannot open device missing: No such file or directory
> mdadm: missing has no superblock - assembly aborted
> ===================================================================================================================
>
>
> But I guess that doesn't matter, because almost all other drives are
> almost sync'ed as specified in the events output of mdadm (once again,
> keep in mind that "/dev/sdk" is blank, thus the "no md superblock"
> error):
>
> ==============================================
> # mdadm -E /dev/sd[b-k] | grep Events
>          Events : 280026
>          Events : 280026
>          Events : 280026
>          Events : 280026
>          Events : 280026
>          Events : 280026
>          Events : 280026
> mdadm: No md superblock detected on /dev/sdk.
>          Events : 280026
>          Events : 280011
> ==============================================
>
>
> So I attempt to reassemble it, by leaving out "/dev/sdk" and it seems
> to assemble it, with some warnings of course:
>
> ===========================================================================================================
> # mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
> /dev/sdg /dev/sdh /dev/sdi /dev/sdj
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 7.
> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 8.
> mdadm: /dev/sdd is identified as a member of /dev/md127, slot 6.
> mdadm: /dev/sde is identified as a member of /dev/md127, slot 9.
> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 4.
> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 2.
> mdadm: /dev/sdi is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdj is identified as a member of /dev/md127, slot 3.
> mdadm: :/dev/md127 has an active reshape - checking if critical
> section needs to be restored
> mdadm: No backup metadata on device-7
> mdadm: No backup metadata on device-8
> mdadm: No backup metadata on device-9
> mdadm: added /dev/sdg to /dev/md127 as 1
> mdadm: added /dev/sdh to /dev/md127 as 2
> mdadm: added /dev/sdj to /dev/md127 as 3 (possibly out of date)
> mdadm: added /dev/sdf to /dev/md127 as 4
> mdadm: no uptodate device for slot 10 of /dev/md127
> mdadm: added /dev/sdd to /dev/md127 as 6
> mdadm: added /dev/sdb to /dev/md127 as 7
> mdadm: added /dev/sdc to /dev/md127 as 8
> mdadm: added /dev/sde to /dev/md127 as 9
> mdadm: added /dev/sdi to /dev/md127 as 0
> mdadm: /dev/md127 has been started with 8 drives (out of 10).
> ===========================================================================================================
>
>
> But now the reshape goes from 80000K to 1000K and eventually 0K speed
> shortly after hitting "enter" to reassemble:
>
> ===========================================================================================================
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md127 : active raid6 sdi[0] sde[11] sdc[12] sdb[13] sdd[10] sdf[7] sdh[2] sdg[9]
>       14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [10/8] [UUU_U_UUUU]
>       [=======>.............]  reshape = 39.1% (1146348628/2930135040)
> finish=51538126.9min speed=0K/sec
>       bitmap: 0/22 pages [0KB], 65536KB chunk
>
> unused devices: <none>
> ===========================================================================================================
>
>
> So I did a little probing and it seems that my CPU is running at 100%
> by "md127_raid6".  I should note that it has been this way for over a
> week now, the time doesn't reflect it because I had to perform a
> reboot.  So I'm at a loss, because even if I try to optimize reshape
> speeds, the reshape still remains at 0K/sec.
>
> =================================================================================
> top - 22:28:53 up  1:56,  3 users,  load average: 3.05, 2.04, 0.92
> Tasks: 317 total,   4 running, 313 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  4.4 us, 50.5 sy,  0.0 ni, 44.1 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
> KiB Mem :  1521584 total,   220812 free,   774868 used,   525904 buff/cache
> KiB Swap: 25153532 total, 25000764 free,   152768 used.   477708 avail Mem
>
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+   COMMAND
>   435 root      20   0       0      0      0 R  98.0  0.0   5:27.12
> md127_raid6
> 28941 cisco     20   0  546436  34336  25080 R   2.9  2.3   0:18.32
> gnome-disks
>  3557 message+  20   0   44364   4632   3068 S   2.0  0.3   0:06.53
> dbus-daemon
> =================================================================================
>
> Any ideas?  Your help would be greatly appreciated.
>
> Thanks in advance
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
       [not found]   ` <CAOW94uv4zSGs+6be3zhcQaGZdiAgg-s4ZHZ=mszcURo6pqJyqA@mail.gmail.com>
@ 2016-09-04 22:48     ` Francisco Parada
  2016-09-13 15:22       ` Francisco Parada
  0 siblings, 1 reply; 16+ messages in thread
From: Francisco Parada @ 2016-09-04 22:48 UTC (permalink / raw)
  To: Michael J. Shaver; +Cc: mdraid

> You are almost certainly hitting the same issue reported several times
> both here and on other forums, although your case is the first one I
> have seen for raid6:


What a bummer, was hoping I was just being dumb and missing something
(here's to hoping though). Luckily, my real important data is backed up.
Just trying to see if I can still recover everything else.



> At this time, there have been a couple suggestions on possible fixes
> (disable transparent huge page support in the kernel)


OK, I can try giving that a shot.



> Another gentleman, Bart Van Assche, had suggested a set of patches to
> the kernel scheduler that may help with the problem:
>
> https://lkml.org/lkml/2016/8/3/289


I'll read into this, thank you!



> At this time, my array is still locked up with the exact same symptoms you
> report.


Hopefully we can all work to figure this one out.


To help correlate your problem with what others observed, would it be
> possible for you to share the call stack for the following three
> processes?
>
> mdXXX_raid6
> mdXXX_reshape
> systemd-udevd
>
> Or any other processes reporting deadlock while the reshape is trying to
> run.
>
> Curious to see if you observe the same call stack.


Would that be using "strace", "ptrace", or both? Pardon my ignorance, I've
never used them. I'm pretty sure it's pstack, but want to make completely
sure!


> I will definitely let you know if I have any major revelations. thanks
> Michael


Thank you kindly, Michael.  I appreciate your input.

On Sun, Sep 4, 2016 at 11:41 AM, Francisco Parada
<advanceandconquer@gmail.com> wrote:
>
>> You are almost certainly hitting the same issue reported several times
>> both here and on other forums, although your case is the first one I
>> have seen for raid6:
>
>
> What a bummer, was hoping I was just being dumb and missing something
> (here's to hoping though). Luckily, my real important data is backed up.
> Just trying to see if I can still recover everything else.
>
>
>>
>> At this time, there have been a couple suggestions on possible fixes
>> (disable transparent huge page support in the kernel)
>
>
> OK, I can try giving that a shot.
>
>
>>
>> Another gentleman, Bart Van Assche, had suggested a set of patches to
>> the kernel scheduler that may help with the problem:
>>
>> https://lkml.org/lkml/2016/8/3/289
>
>
> I'll read into this, thank you!
>
>
>>
>> At this time, my array is still locked up with the exact same symptoms you
>> report.
>
>
> Hopefully we can all work to figure this one out.
>
>
>> To help correlate your problem with what others observed, would it be
>> possible for you to share the call stack for the following three
>> processes?
>>
>> mdXXX_raid6
>> mdXXX_reshape
>> systemd-udevd
>>
>> Or any other processes reporting deadlock while the reshape is trying to
>> run.
>>
>> Curious to see if you observe the same call stack.
>
>
> Would that be using "strace", "ptrace", or both? Pardon my ignorance, I've
> never used them. I'm pretty sure it's pstack, but want to make completely
> sure!
>
>>
>> I will definitely let you know if I have any major revelations. thanks
>> Michael
>
>
> Thank you kindly, Michael.  I appreciate your input.
>
>>
>>
>> On Sun, Sep 4, 2016 at 12:04 AM, Francisco Parada
>> <advanceandconquer@gmail.com> wrote:
>> > Hello everyone,
>> >
>> > I know this gets a ton of visibility, so I'll keep it as concise as
>> > possible.
>> >
>> > I'm running Ubuntu 16.04.1 and I have (read had) a 7 drive RAID6
>> > array.  I attempted to grow the array by adding 3 additional drives
>> > for a total of 10, but it seems that one of the brand new drives had
>> > 60+ bad blocks (according to "badblocks -vw").  I came to this
>> > conclusion, because I had a power outage during the grow that lasted
>> > longer than my 1500VA battery backup could withstand, so when I
>> > attempted to continue the reshape, I noticed that the assemble
>> > wouldn't start upon reboot.  All drives were marked as spares:
>> >
>> >
>> > =================================================================================================================
>> > # cat /proc/mdstat
>> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> > [raid4] [raid10]
>> > md127 : inactive sdi[0](S) sdh[2](S) sdj[3](S) sdf[7](S) sdg[9](S)
>> > sdd[10](S) sde[11](S) sdb[13](S) sdc[12](S)
>> >       26371219608 blocks super 1.2
>> >
>> > =================================================================================================================
>> >
>> >
>> > Notice above, that there's only 9 drives instead of 10, which I was
>> > supposed to have.  The drive that's missing is "sdk", but that's
>> > because using "badblocks -vw" has wiped out the drive in an effort to
>> > figure out if there was actually something wrong with said drive
>> > (You're probably gasping, but it had a missing GPT table, and no
>> > matter what I tried to recover it, the drive would just stop
>> > responding to reads and writes).  So I attempted to assemble the array
>> > with "/dev/sdk" missing as shown below, but I get this:
>> >
>> >
>> > ===================================================================================================================
>> > # mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
>> > /dev/sdg /dev/sdh /dev/sdi /dev/sdj missing
>> > mdadm: looking for devices for /dev/md127
>> > mdadm: cannot open device missing: No such file or directory
>> > mdadm: missing has no superblock - assembly aborted
>> >
>> > ===================================================================================================================
>> >
>> >
>> > But I guess that doesn't matter, because almost all other drives are
>> > almost sync'ed as specified in the events output of mdadm (once again,
>> > keep in mind that "/dev/sdk" is blank, thus the "no md superblock"
>> > error):
>> >
>> > ==============================================
>> > # mdadm -E /dev/sd[b-k] | grep Events
>> >          Events : 280026
>> >          Events : 280026
>> >          Events : 280026
>> >          Events : 280026
>> >          Events : 280026
>> >          Events : 280026
>> >          Events : 280026
>> > mdadm: No md superblock detected on /dev/sdk.
>> >          Events : 280026
>> >          Events : 280011
>> > ==============================================
>> >
>> >
>> > So I attempt to reassemble it, by leaving out "/dev/sdk" and it seems
>> > to assemble it, with some warnings of course:
>> >
>> >
>> > ===========================================================================================================
>> > # mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
>> > /dev/sdg /dev/sdh /dev/sdi /dev/sdj
>> > mdadm: looking for devices for /dev/md127
>> > mdadm: /dev/sdb is identified as a member of /dev/md127, slot 7.
>> > mdadm: /dev/sdc is identified as a member of /dev/md127, slot 8.
>> > mdadm: /dev/sdd is identified as a member of /dev/md127, slot 6.
>> > mdadm: /dev/sde is identified as a member of /dev/md127, slot 9.
>> > mdadm: /dev/sdf is identified as a member of /dev/md127, slot 4.
>> > mdadm: /dev/sdg is identified as a member of /dev/md127, slot 1.
>> > mdadm: /dev/sdh is identified as a member of /dev/md127, slot 2.
>> > mdadm: /dev/sdi is identified as a member of /dev/md127, slot 0.
>> > mdadm: /dev/sdj is identified as a member of /dev/md127, slot 3.
>> > mdadm: :/dev/md127 has an active reshape - checking if critical
>> > section needs to be restored
>> > mdadm: No backup metadata on device-7
>> > mdadm: No backup metadata on device-8
>> > mdadm: No backup metadata on device-9
>> > mdadm: added /dev/sdg to /dev/md127 as 1
>> > mdadm: added /dev/sdh to /dev/md127 as 2
>> > mdadm: added /dev/sdj to /dev/md127 as 3 (possibly out of date)
>> > mdadm: added /dev/sdf to /dev/md127 as 4
>> > mdadm: no uptodate device for slot 10 of /dev/md127
>> > mdadm: added /dev/sdd to /dev/md127 as 6
>> > mdadm: added /dev/sdb to /dev/md127 as 7
>> > mdadm: added /dev/sdc to /dev/md127 as 8
>> > mdadm: added /dev/sde to /dev/md127 as 9
>> > mdadm: added /dev/sdi to /dev/md127 as 0
>> > mdadm: /dev/md127 has been started with 8 drives (out of 10).
>> >
>> > ===========================================================================================================
>> >
>> >
>> > But now the reshape goes from 80000K to 1000K and eventually 0K speed
>> > shortly after hitting "enter" to reassemble:
>> >
>> >
>> > ===========================================================================================================
>> > # cat /proc/mdstat
>> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> > [raid4] [raid10]
>> > md127 : active raid6 sdi[0] sde[11] sdc[12] sdb[13] sdd[10] sdf[7]
>> > sdh[2] sdg[9]
>> >       14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
>> > [10/8] [UUU_U_UUUU]
>> >       [=======>.............]  reshape = 39.1% (1146348628/2930135040)
>> > finish=51538126.9min speed=0K/sec
>> >       bitmap: 0/22 pages [0KB], 65536KB chunk
>> >
>> > unused devices: <none>
>> >
>> > ===========================================================================================================
>> >
>> >
>> > So I did a little probing and it seems that my CPU is running at 100%
>> > by "md127_raid6".  I should note that it has been this way for over a
>> > week now, the time doesn't reflect it because I had to perform a
>> > reboot.  So I'm at a loss, because even if I try to optimize reshape
>> > speeds, the reshape still remains at 0K/sec.
>> >
>> >
>> > =================================================================================
>> > top - 22:28:53 up  1:56,  3 users,  load average: 3.05, 2.04, 0.92
>> > Tasks: 317 total,   4 running, 313 sleeping,   0 stopped,   0 zombie
>> > %Cpu(s):  4.4 us, 50.5 sy,  0.0 ni, 44.1 id,  1.0 wa,  0.0 hi,  0.0 si,
>> > 0.0 st
>> > KiB Mem :  1521584 total,   220812 free,   774868 used,   525904
>> > buff/cache
>> > KiB Swap: 25153532 total, 25000764 free,   152768 used.   477708 avail
>> > Mem
>> >
>> >   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>> > COMMAND
>> >   435 root      20   0       0      0      0 R  98.0  0.0   5:27.12
>> > md127_raid6
>> > 28941 cisco     20   0  546436  34336  25080 R   2.9  2.3   0:18.32
>> > gnome-disks
>> >  3557 message+  20   0   44364   4632   3068 S   2.0  0.3   0:06.53
>> > dbus-daemon
>> >
>> > =================================================================================
>> >
>> > Any ideas?  Your help would be greatly appreciated.
>> >
>> > Thanks in advance
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-09-04 22:48     ` Francisco Parada
@ 2016-09-13 15:22       ` Francisco Parada
  2016-09-13 17:43         ` Shaohua Li
  0 siblings, 1 reply; 16+ messages in thread
From: Francisco Parada @ 2016-09-13 15:22 UTC (permalink / raw)
  To: Michael J. Shaver; +Cc: mdraid

Hi all,

I've tried the suggestions mentioned and also tried booting into an
older version of Ubuntu (14.04.5 instead of my current 16.04.1), which
has an older kernel per a private suggestion, but that too was
unsuccessful.  Just wanted to give a quick update that all of the
mentioned workarounds were unfruitful.

Does anyone else have any suggestions or workarounds?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-09-13 15:22       ` Francisco Parada
@ 2016-09-13 17:43         ` Shaohua Li
  2016-09-13 18:15           ` Francisco Parada
  0 siblings, 1 reply; 16+ messages in thread
From: Shaohua Li @ 2016-09-13 17:43 UTC (permalink / raw)
  To: Francisco Parada; +Cc: Michael J. Shaver, mdraid

On Tue, Sep 13, 2016 at 11:22:17AM -0400, Francisco Parada wrote:
> Hi all,
> 
> I've tried the suggestions mentioned and also tried booting into an
> older version of Ubuntu (14.04.5 instead of my current 16.04.1), which
> has an older kernel per a private suggestion, but that too was
> unsuccessful.  Just wanted to give a quick update that all of the
> mentioned workarounds were unfruitful.

If you could rebuild kernel and apply this debug patch, I'd like to check what
happens. When the cpu is in 100% usage and reshape stalls, please capture the
/sys/kernel/debug/tracing/trace output and send to me.

Thanks,
Shaohua

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5883ef0..db484ca 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -62,6 +62,9 @@
 #include "raid0.h"
 #include "bitmap.h"
 
+#undef pr_debug
+#define pr_debug trace_printk
+
 #define cpu_to_group(cpu) cpu_to_node(cpu)
 #define ANY_GROUP NUMA_NO_NODE
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-09-13 17:43         ` Shaohua Li
@ 2016-09-13 18:15           ` Francisco Parada
       [not found]             ` <CAOW94utVBcLz191ifzKkjn+nsSthPWDAQF8R-PabcS2uPareag@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Francisco Parada @ 2016-09-13 18:15 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Michael J. Shaver, mdraid

Hi Shaohua,

>If you could rebuild kernel and apply this debug patch

It would be my pleasure to do this.  Thank you for your reply!
Do you have a decent reference site where I could read through and do
this from?  I've been to a ton of pages in the past, but none that are
concise enough to follow a rebuild and apply a patch like you ask?  If
not, no worries, I'll just do a bit of digging and reading.  I am
working from home today though, so if I could bang this one out, it
would be amazing!

>please capture the /sys/kernel/debug/tracing/trace output and send to me.

No problem, once I do this I will send that right over.


On Tue, Sep 13, 2016 at 1:43 PM, Shaohua Li <shli@kernel.org> wrote:
> On Tue, Sep 13, 2016 at 11:22:17AM -0400, Francisco Parada wrote:
>> Hi all,
>>
>> I've tried the suggestions mentioned and also tried booting into an
>> older version of Ubuntu (14.04.5 instead of my current 16.04.1), which
>> has an older kernel per a private suggestion, but that too was
>> unsuccessful.  Just wanted to give a quick update that all of the
>> mentioned workarounds were unfruitful.
>
> If you could rebuild kernel and apply this debug patch, I'd like to check what
> happens. When the cpu is in 100% usage and reshape stalls, please capture the
> /sys/kernel/debug/tracing/trace output and send to me.
>
> Thanks,
> Shaohua
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5883ef0..db484ca 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -62,6 +62,9 @@
>  #include "raid0.h"
>  #include "bitmap.h"
>
> +#undef pr_debug
> +#define pr_debug trace_printk
> +
>  #define cpu_to_group(cpu) cpu_to_node(cpu)
>  #define ANY_GROUP NUMA_NO_NODE
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
       [not found]             ` <CAOW94utVBcLz191ifzKkjn+nsSthPWDAQF8R-PabcS2uPareag@mail.gmail.com>
@ 2016-09-26 14:29               ` Francisco Parada
  2016-09-30 13:06                 ` Francisco Parada
  0 siblings, 1 reply; 16+ messages in thread
From: Francisco Parada @ 2016-09-26 14:29 UTC (permalink / raw)
  To: Shaohua Li, mdraid, Michael J. Shaver

Hi all,

It doesn't seem like my response from last night, made it to the list:


Hi Shaohua and all,

I was finally able to upgrade my Ubuntu server to a newer version of
the kernel and mdadm:
==========================
$ uname -r; mdadm -V

4.8.0-rc7-custom

mdadm - v3.4 - 28th January 2016
==========================


I rebuilt the kernel with the options that Shaohua asked me to build it with:
======================================
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5883ef0..db484ca 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -62,6 +62,9 @@
 #include "raid0.h"
 #include "bitmap.h"

+#undef pr_debug
+#define pr_debug trace_printk
+
 #define cpu_to_group(cpu) cpu_to_node(cpu)
 #define ANY_GROUP NUMA_NO_NODE
======================================


Here's how things look so far, nothing different yet:
======================================
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : inactive sdd[10](S) sdk[0](S) sdj[2](S) sdh[3](S) sde[11](S)
sdg[9](S) sdf[7](S) sdb[13](S) sdc[12](S)
      26371219608 blocks super 1.2

unused devices: <none>
======================================


Here's an event snapshot of my array, just keep in mind that
"/dev/sdi" is my failed drive, so I omitted it from the examination:
======================================
# mdadm -E /dev/sd[b-h,j,k] |grep Events
         Events : 280033
         Events : 280033
         Events : 280033
         Events : 280033
         Events : 280033
         Events : 280033
         Events : 280011
         Events : 280033
         Events : 280033
======================================


It's important to note, that since I haven't done anything yet, my CPU is idle:
======================================
top - 20:22:00 up  5:56,  2 users,  load average: 0.04, 0.03, 0.00

Tasks: 221 total,   1 running, 220 sleeping,   0 stopped,   0 zombie

%Cpu(s):  1.0 us,  1.0 sy,  0.0 ni, 97.5 id,  0.5 wa,  0.0 hi,  0.0 si,  0.0 st

KiB Mem :  1525400 total,   103836 free,   696208 used,   725356 buff/cache

KiB Swap: 25153532 total, 25117380 free,    36152 used.   454808 avail Mem
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2093 cisco     20   0 1761112 153108  55640 S   1.0 10.0   0:12.61 gnome-shell
 4322 root      20   0   40520   3684   3100 R   1.0  0.2   0:00.22 top
    1 root      20   0  119692   5540   3992 S   0.0  0.4   0:02.44 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.09 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:
======================================


Now onto the fun part.  I stopped "/dev/md127":
======================================
# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
unused devices: <none>
======================================


For completion, here's the trace output after stopping the array, and
before reassembling:
======================================
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:2
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
/sys/kernel/debug/tracing/trace (END)
======================================


Then I reassembled the array:
======================================
# mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdj /dev/sdk
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb is busy - skipping
mdadm: Merging with already-assembled /dev/md/en1
mdadm: /dev/sdb is identified as a member of /dev/md/en1, slot 7.
mdadm: /dev/sdc is identified as a member of /dev/md/en1, slot 8.
mdadm: /dev/sdd is identified as a member of /dev/md/en1, slot 6.
mdadm: /dev/sde is identified as a member of /dev/md/en1, slot 9.
mdadm: /dev/sdf is identified as a member of /dev/md/en1, slot 4.
mdadm: /dev/sdg is identified as a member of /dev/md/en1, slot 1.
mdadm: /dev/sdh is identified as a member of /dev/md/en1, slot 3.
mdadm: /dev/sdj is identified as a member of /dev/md/en1, slot 2.
mdadm: /dev/sdk is identified as a member of /dev/md/en1, slot 0.
mdadm: Marking array /dev/md/en1 as 'clean'
mdadm: /dev/md/en1 has an active reshape - checking if critical
section needs to be restored
mdadm: No backup metadata on device-7
mdadm: No backup metadata on device-8
mdadm: No backup metadata on device-9
mdadm: added /dev/sdg to /dev/md/en1 as 1
mdadm: added /dev/sdj to /dev/md/en1 as 2
mdadm: added /dev/sdh to /dev/md/en1 as 3 (possibly out of date)
mdadm: added /dev/sdf to /dev/md/en1 as 4
mdadm: no uptodate device for slot 5 of /dev/md/en1
mdadm: added /dev/sdd to /dev/md/en1 as 6
mdadm: /dev/sdb is already in /dev/md/en1 as 7
mdadm: added /dev/sdc to /dev/md/en1 as 8
mdadm: added /dev/sde to /dev/md/en1 as 9
mdadm: added /dev/sdk to /dev/md/en1 as 0
======================================


And of course, CPU shoots to 100%:
======================================
top - 20:38:44 up  6:13,  3 users,  load average: 5.05, 3.25, 1.41
Tasks: 239 total,   3 running, 236 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.9 us, 52.7 sy,  0.0 ni,  0.0 id, 41.4 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1525400 total,    73124 free,   739576 used,   712700 buff/cache
KiB Swap: 25153532 total, 25111140 free,    42392 used.   415840 avail Mem

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 6423 root      20   0       0      0      0 R  99.0  0.0   4:43.51 md127_raid6
 1166 root      20   0  280588   8780   6192 S   3.0  0.6   0:06.56 polkitd
 4022 cisco     20   0  394756  32884  26064 S   3.0  2.2   0:08.77 gnome-disks
 1903 cisco     20   0  256660  34060  26280 S   2.0  2.2   0:29.56 Xorg
 2093 cisco     20   0 1760364 153572  55572 S   2.0 10.1   0:17.96 gnome-shell
======================================


Then surely the array reshape speed goes back down to nothing:
======================================
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : active raid6 sdk[0] sde[11] sdc[12] sdd[10] sdf[7] sdj[2] sdg[9] sdb[13]
      14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
[10/8] [UUU_U_UUUU]
      [=======>.............]  reshape = 39.1% (1146348512/2930135040)
finish=22057575.1min speed=1K/sec
      bitmap: 0/22 pages [0KB], 65536KB chunk

unused devices: <none>
======================================


The size of the trace file is gigantic, so hopefully it doesn't get
trimmed in the email, but any help would be appreciated, thanks in
advance:
================
# tracer: nop
#
# entries-in-buffer/entries-written: 44739/81554230   #P:2
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
     md127_raid6-6423  [001] .... 22228.159918: analyse_stripe: check
7: state 0x13 read           (null) write           (null) written
      (null)
     md127_raid6-6423  [001] .... 22228.159919: analyse_stripe: check
6: state 0xa01 read           (null) write           (null) written
       (null)
     md127_raid6-6423  [001] .... 22228.159919: analyse_stripe: check
5: state 0x801 read           (null) write           (null) written
       (null)
     md127_raid6-6423  [001] .... 22228.159920: analyse_stripe: check
4: state 0x811 read           (null) write           (null) written
       (null)
     md127_raid6-6423  [001] .... 22228.159921: analyse_stripe: check
3: state 0x801 read           (null) write           (null) written
       (null)
     md127_raid6-6423  [001] .... 22228.159921: analyse_stripe: check
2: state 0x811 read           (null) write           (null) written
       (null)
     md127_raid6-6423  [001] .... 22228.159922: analyse_stripe: check
1: state 0xa01 read           (null) write           (null) written
       (null)
     md127_raid6-6423  [001] .... 22228.159923: analyse_stripe: check
0: state 0x811 read           (null) write           (null) written
       (null)
     md127_raid6-6423  [001] .... 22228.159924: handle_stripe:
locked=2 uptodate=10 to_read=0 to_write=0 failed=4 failed_num=6,5
     md127_raid6-6423  [001] .... 22228.159925:
schedule_reconstruction: schedule_reconstruction: stripe 2292697672
locked: 4 ops_request: 10
     md127_raid6-6423  [001] .... 22228.159925: raid_run_ops:
ops_run_reconstruct6: stripe 2292697672
     md127_raid6-6423  [001] .... 22228.159943:
ops_complete_reconstruct: ops_complete_reconstruct: stripe 2292697672
     md127_raid6-6423  [001] .... 22228.159944: handle_stripe:
handling stripe 2292697680, state=0x1401 cnt=1, pd_idx=7, qd_idx=8
, check:0, reconstruct:6
===========================================

I trimmed it because of the failure to send issue.  However, if
someone needs a lengthier snip of the trace, let me know.

Thanks,

'Cisco

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-09-26 14:29               ` Francisco Parada
@ 2016-09-30 13:06                 ` Francisco Parada
  2016-10-05 11:12                   ` Francisco Parada
  0 siblings, 1 reply; 16+ messages in thread
From: Francisco Parada @ 2016-09-30 13:06 UTC (permalink / raw)
  To: Shaohua Li, mdraid, Michael J. Shaver

Hello Shaohua and all,

Was anyone able to take a look at the trace I provided?  Curious if I
should just give up hope and start from scratch.  Any feedback is
appreciated.

Thank you,
Cisco

On Mon, Sep 26, 2016 at 10:29 AM, Francisco Parada
<advanceandconquer@gmail.com> wrote:
> Hi all,
>
> It doesn't seem like my response from last night, made it to the list:
>
>
> Hi Shaohua and all,
>
> I was finally able to upgrade my Ubuntu server to a newer version of
> the kernel and mdadm:
> ==========================
> $ uname -r; mdadm -V
>
> 4.8.0-rc7-custom
>
> mdadm - v3.4 - 28th January 2016
> ==========================
>
>
> I rebuilt the kernel with the options that Shaohua asked me to build it with:
> ======================================
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5883ef0..db484ca 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -62,6 +62,9 @@
>  #include "raid0.h"
>  #include "bitmap.h"
>
> +#undef pr_debug
> +#define pr_debug trace_printk
> +
>  #define cpu_to_group(cpu) cpu_to_node(cpu)
>  #define ANY_GROUP NUMA_NO_NODE
> ======================================
>
>
> Here's how things look so far, nothing different yet:
> ======================================
> $ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md127 : inactive sdd[10](S) sdk[0](S) sdj[2](S) sdh[3](S) sde[11](S)
> sdg[9](S) sdf[7](S) sdb[13](S) sdc[12](S)
>       26371219608 blocks super 1.2
>
> unused devices: <none>
> ======================================
>
>
> Here's an event snapshot of my array, just keep in mind that
> "/dev/sdi" is my failed drive, so I omitted it from the examination:
> ======================================
> # mdadm -E /dev/sd[b-h,j,k] |grep Events
>          Events : 280033
>          Events : 280033
>          Events : 280033
>          Events : 280033
>          Events : 280033
>          Events : 280033
>          Events : 280011
>          Events : 280033
>          Events : 280033
> ======================================
>
>
> It's important to note, that since I haven't done anything yet, my CPU is idle:
> ======================================
> top - 20:22:00 up  5:56,  2 users,  load average: 0.04, 0.03, 0.00
>
> Tasks: 221 total,   1 running, 220 sleeping,   0 stopped,   0 zombie
>
> %Cpu(s):  1.0 us,  1.0 sy,  0.0 ni, 97.5 id,  0.5 wa,  0.0 hi,  0.0 si,  0.0 st
>
> KiB Mem :  1525400 total,   103836 free,   696208 used,   725356 buff/cache
>
> KiB Swap: 25153532 total, 25117380 free,    36152 used.   454808 avail Mem
>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>  2093 cisco     20   0 1761112 153108  55640 S   1.0 10.0   0:12.61 gnome-shell
>  4322 root      20   0   40520   3684   3100 R   1.0  0.2   0:00.22 top
>     1 root      20   0  119692   5540   3992 S   0.0  0.4   0:02.44 systemd
>     2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
>     3 root      20   0       0      0      0 S   0.0  0.0   0:00.09 ksoftirqd/0
>     5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:
> ======================================
>
>
> Now onto the fun part.  I stopped "/dev/md127":
> ======================================
> # mdadm --stop /dev/md127
> mdadm: stopped /dev/md127
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> unused devices: <none>
> ======================================
>
>
> For completion, here's the trace output after stopping the array, and
> before reassembling:
> ======================================
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 0/0   #P:2
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
> /sys/kernel/debug/tracing/trace (END)
> ======================================
>
>
> Then I reassembled the array:
> ======================================
> # mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
> /dev/sdg /dev/sdh /dev/sdj /dev/sdk
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdb is busy - skipping
> mdadm: Merging with already-assembled /dev/md/en1
> mdadm: /dev/sdb is identified as a member of /dev/md/en1, slot 7.
> mdadm: /dev/sdc is identified as a member of /dev/md/en1, slot 8.
> mdadm: /dev/sdd is identified as a member of /dev/md/en1, slot 6.
> mdadm: /dev/sde is identified as a member of /dev/md/en1, slot 9.
> mdadm: /dev/sdf is identified as a member of /dev/md/en1, slot 4.
> mdadm: /dev/sdg is identified as a member of /dev/md/en1, slot 1.
> mdadm: /dev/sdh is identified as a member of /dev/md/en1, slot 3.
> mdadm: /dev/sdj is identified as a member of /dev/md/en1, slot 2.
> mdadm: /dev/sdk is identified as a member of /dev/md/en1, slot 0.
> mdadm: Marking array /dev/md/en1 as 'clean'
> mdadm: /dev/md/en1 has an active reshape - checking if critical
> section needs to be restored
> mdadm: No backup metadata on device-7
> mdadm: No backup metadata on device-8
> mdadm: No backup metadata on device-9
> mdadm: added /dev/sdg to /dev/md/en1 as 1
> mdadm: added /dev/sdj to /dev/md/en1 as 2
> mdadm: added /dev/sdh to /dev/md/en1 as 3 (possibly out of date)
> mdadm: added /dev/sdf to /dev/md/en1 as 4
> mdadm: no uptodate device for slot 5 of /dev/md/en1
> mdadm: added /dev/sdd to /dev/md/en1 as 6
> mdadm: /dev/sdb is already in /dev/md/en1 as 7
> mdadm: added /dev/sdc to /dev/md/en1 as 8
> mdadm: added /dev/sde to /dev/md/en1 as 9
> mdadm: added /dev/sdk to /dev/md/en1 as 0
> ======================================
>
>
> And of course, CPU shoots to 100%:
> ======================================
> top - 20:38:44 up  6:13,  3 users,  load average: 5.05, 3.25, 1.41
> Tasks: 239 total,   3 running, 236 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  5.9 us, 52.7 sy,  0.0 ni,  0.0 id, 41.4 wa,  0.0 hi,  0.0 si,  0.0 st
> KiB Mem :  1525400 total,    73124 free,   739576 used,   712700 buff/cache
> KiB Swap: 25153532 total, 25111140 free,    42392 used.   415840 avail Mem
>
>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>  6423 root      20   0       0      0      0 R  99.0  0.0   4:43.51 md127_raid6
>  1166 root      20   0  280588   8780   6192 S   3.0  0.6   0:06.56 polkitd
>  4022 cisco     20   0  394756  32884  26064 S   3.0  2.2   0:08.77 gnome-disks
>  1903 cisco     20   0  256660  34060  26280 S   2.0  2.2   0:29.56 Xorg
>  2093 cisco     20   0 1760364 153572  55572 S   2.0 10.1   0:17.96 gnome-shell
> ======================================
>
>
> Then surely the array reshape speed goes back down to nothing:
> ======================================
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md127 : active raid6 sdk[0] sde[11] sdc[12] sdd[10] sdf[7] sdj[2] sdg[9] sdb[13]
>       14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [10/8] [UUU_U_UUUU]
>       [=======>.............]  reshape = 39.1% (1146348512/2930135040)
> finish=22057575.1min speed=1K/sec
>       bitmap: 0/22 pages [0KB], 65536KB chunk
>
> unused devices: <none>
> ======================================
>
>
> The size of the trace file is gigantic, so hopefully it doesn't get
> trimmed in the email, but any help would be appreciated, thanks in
> advance:
> ================
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 44739/81554230   #P:2
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>      md127_raid6-6423  [001] .... 22228.159918: analyse_stripe: check
> 7: state 0x13 read           (null) write           (null) written
>       (null)
>      md127_raid6-6423  [001] .... 22228.159919: analyse_stripe: check
> 6: state 0xa01 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159919: analyse_stripe: check
> 5: state 0x801 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159920: analyse_stripe: check
> 4: state 0x811 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159921: analyse_stripe: check
> 3: state 0x801 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159921: analyse_stripe: check
> 2: state 0x811 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159922: analyse_stripe: check
> 1: state 0xa01 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159923: analyse_stripe: check
> 0: state 0x811 read           (null) write           (null) written
>        (null)
>      md127_raid6-6423  [001] .... 22228.159924: handle_stripe:
> locked=2 uptodate=10 to_read=0 to_write=0 failed=4 failed_num=6,5
>      md127_raid6-6423  [001] .... 22228.159925:
> schedule_reconstruction: schedule_reconstruction: stripe 2292697672
> locked: 4 ops_request: 10
>      md127_raid6-6423  [001] .... 22228.159925: raid_run_ops:
> ops_run_reconstruct6: stripe 2292697672
>      md127_raid6-6423  [001] .... 22228.159943:
> ops_complete_reconstruct: ops_complete_reconstruct: stripe 2292697672
>      md127_raid6-6423  [001] .... 22228.159944: handle_stripe:
> handling stripe 2292697680, state=0x1401 cnt=1, pd_idx=7, qd_idx=8
> , check:0, reconstruct:6
> ===========================================
>
> I trimmed it because of the failure to send issue.  However, if
> someone needs a lengthier snip of the trace, let me know.
>
> Thanks,
>
> 'Cisco

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-09-30 13:06                 ` Francisco Parada
@ 2016-10-05 11:12                   ` Francisco Parada
  2016-10-06 23:55                     ` Shaohua Li
  0 siblings, 1 reply; 16+ messages in thread
From: Francisco Parada @ 2016-10-05 11:12 UTC (permalink / raw)
  To: mdraid

Hi all,

Just wanted to send one last ping on this.  I didn't hear back and
really don't know where else to turn to before abandoning hope.
Should I just start wiping the drives so I can start from scratch at
this point?

I rebuilt the kernel with debug patch that Shaohua provided and I sent
the list the output of the trace.  Does any one have any other
suggestions?

Thank you once again,
Cisco


On Fri, Sep 30, 2016 at 9:06 AM, Francisco Parada
<advanceandconquer@gmail.com> wrote:
> Hello Shaohua and all,
>
> Was anyone able to take a look at the trace I provided?  Curious if I
> should just give up hope and start from scratch.  Any feedback is
> appreciated.
>
> Thank you,
> Cisco
>
> On Mon, Sep 26, 2016 at 10:29 AM, Francisco Parada
> <advanceandconquer@gmail.com> wrote:
>> Hi all,
>>
>> It doesn't seem like my response from last night, made it to the list:
>>
>>
>> Hi Shaohua and all,
>>
>> I was finally able to upgrade my Ubuntu server to a newer version of
>> the kernel and mdadm:
>> ==========================
>> $ uname -r; mdadm -V
>>
>> 4.8.0-rc7-custom
>>
>> mdadm - v3.4 - 28th January 2016
>> ==========================
>>
>>
>> I rebuilt the kernel with the options that Shaohua asked me to build it with:
>> ======================================
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 5883ef0..db484ca 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -62,6 +62,9 @@
>>  #include "raid0.h"
>>  #include "bitmap.h"
>>
>> +#undef pr_debug
>> +#define pr_debug trace_printk
>> +
>>  #define cpu_to_group(cpu) cpu_to_node(cpu)
>>  #define ANY_GROUP NUMA_NO_NODE
>> ======================================
>>
>>
>> Here's how things look so far, nothing different yet:
>> ======================================
>> $ cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md127 : inactive sdd[10](S) sdk[0](S) sdj[2](S) sdh[3](S) sde[11](S)
>> sdg[9](S) sdf[7](S) sdb[13](S) sdc[12](S)
>>       26371219608 blocks super 1.2
>>
>> unused devices: <none>
>> ======================================
>>
>>
>> Here's an event snapshot of my array, just keep in mind that
>> "/dev/sdi" is my failed drive, so I omitted it from the examination:
>> ======================================
>> # mdadm -E /dev/sd[b-h,j,k] |grep Events
>>          Events : 280033
>>          Events : 280033
>>          Events : 280033
>>          Events : 280033
>>          Events : 280033
>>          Events : 280033
>>          Events : 280011
>>          Events : 280033
>>          Events : 280033
>> ======================================
>>
>>
>> It's important to note, that since I haven't done anything yet, my CPU is idle:
>> ======================================
>> top - 20:22:00 up  5:56,  2 users,  load average: 0.04, 0.03, 0.00
>>
>> Tasks: 221 total,   1 running, 220 sleeping,   0 stopped,   0 zombie
>>
>> %Cpu(s):  1.0 us,  1.0 sy,  0.0 ni, 97.5 id,  0.5 wa,  0.0 hi,  0.0 si,  0.0 st
>>
>> KiB Mem :  1525400 total,   103836 free,   696208 used,   725356 buff/cache
>>
>> KiB Swap: 25153532 total, 25117380 free,    36152 used.   454808 avail Mem
>>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>>  2093 cisco     20   0 1761112 153108  55640 S   1.0 10.0   0:12.61 gnome-shell
>>  4322 root      20   0   40520   3684   3100 R   1.0  0.2   0:00.22 top
>>     1 root      20   0  119692   5540   3992 S   0.0  0.4   0:02.44 systemd
>>     2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
>>     3 root      20   0       0      0      0 S   0.0  0.0   0:00.09 ksoftirqd/0
>>     5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:
>> ======================================
>>
>>
>> Now onto the fun part.  I stopped "/dev/md127":
>> ======================================
>> # mdadm --stop /dev/md127
>> mdadm: stopped /dev/md127
>> # cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> unused devices: <none>
>> ======================================
>>
>>
>> For completion, here's the trace output after stopping the array, and
>> before reassembling:
>> ======================================
>> # tracer: nop
>> #
>> # entries-in-buffer/entries-written: 0/0   #P:2
>> #
>> #                              _-----=> irqs-off
>> #                             / _----=> need-resched
>> #                            | / _---=> hardirq/softirq
>> #                            || / _--=> preempt-depth
>> #                            ||| /     delay
>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>> #              | |       |   ||||       |         |
>> /sys/kernel/debug/tracing/trace (END)
>> ======================================
>>
>>
>> Then I reassembled the array:
>> ======================================
>> # mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
>> /dev/sdg /dev/sdh /dev/sdj /dev/sdk
>> mdadm: looking for devices for /dev/md127
>> mdadm: /dev/sdb is busy - skipping
>> mdadm: Merging with already-assembled /dev/md/en1
>> mdadm: /dev/sdb is identified as a member of /dev/md/en1, slot 7.
>> mdadm: /dev/sdc is identified as a member of /dev/md/en1, slot 8.
>> mdadm: /dev/sdd is identified as a member of /dev/md/en1, slot 6.
>> mdadm: /dev/sde is identified as a member of /dev/md/en1, slot 9.
>> mdadm: /dev/sdf is identified as a member of /dev/md/en1, slot 4.
>> mdadm: /dev/sdg is identified as a member of /dev/md/en1, slot 1.
>> mdadm: /dev/sdh is identified as a member of /dev/md/en1, slot 3.
>> mdadm: /dev/sdj is identified as a member of /dev/md/en1, slot 2.
>> mdadm: /dev/sdk is identified as a member of /dev/md/en1, slot 0.
>> mdadm: Marking array /dev/md/en1 as 'clean'
>> mdadm: /dev/md/en1 has an active reshape - checking if critical
>> section needs to be restored
>> mdadm: No backup metadata on device-7
>> mdadm: No backup metadata on device-8
>> mdadm: No backup metadata on device-9
>> mdadm: added /dev/sdg to /dev/md/en1 as 1
>> mdadm: added /dev/sdj to /dev/md/en1 as 2
>> mdadm: added /dev/sdh to /dev/md/en1 as 3 (possibly out of date)
>> mdadm: added /dev/sdf to /dev/md/en1 as 4
>> mdadm: no uptodate device for slot 5 of /dev/md/en1
>> mdadm: added /dev/sdd to /dev/md/en1 as 6
>> mdadm: /dev/sdb is already in /dev/md/en1 as 7
>> mdadm: added /dev/sdc to /dev/md/en1 as 8
>> mdadm: added /dev/sde to /dev/md/en1 as 9
>> mdadm: added /dev/sdk to /dev/md/en1 as 0
>> ======================================
>>
>>
>> And of course, CPU shoots to 100%:
>> ======================================
>> top - 20:38:44 up  6:13,  3 users,  load average: 5.05, 3.25, 1.41
>> Tasks: 239 total,   3 running, 236 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  5.9 us, 52.7 sy,  0.0 ni,  0.0 id, 41.4 wa,  0.0 hi,  0.0 si,  0.0 st
>> KiB Mem :  1525400 total,    73124 free,   739576 used,   712700 buff/cache
>> KiB Swap: 25153532 total, 25111140 free,    42392 used.   415840 avail Mem
>>
>>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>>  6423 root      20   0       0      0      0 R  99.0  0.0   4:43.51 md127_raid6
>>  1166 root      20   0  280588   8780   6192 S   3.0  0.6   0:06.56 polkitd
>>  4022 cisco     20   0  394756  32884  26064 S   3.0  2.2   0:08.77 gnome-disks
>>  1903 cisco     20   0  256660  34060  26280 S   2.0  2.2   0:29.56 Xorg
>>  2093 cisco     20   0 1760364 153572  55572 S   2.0 10.1   0:17.96 gnome-shell
>> ======================================
>>
>>
>> Then surely the array reshape speed goes back down to nothing:
>> ======================================
>> # cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md127 : active raid6 sdk[0] sde[11] sdc[12] sdd[10] sdf[7] sdj[2] sdg[9] sdb[13]
>>       14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
>> [10/8] [UUU_U_UUUU]
>>       [=======>.............]  reshape = 39.1% (1146348512/2930135040)
>> finish=22057575.1min speed=1K/sec
>>       bitmap: 0/22 pages [0KB], 65536KB chunk
>>
>> unused devices: <none>
>> ======================================
>>
>>
>> The size of the trace file is gigantic, so hopefully it doesn't get
>> trimmed in the email, but any help would be appreciated, thanks in
>> advance:
>> ================
>> # tracer: nop
>> #
>> # entries-in-buffer/entries-written: 44739/81554230   #P:2
>> #
>> #                              _-----=> irqs-off
>> #                             / _----=> need-resched
>> #                            | / _---=> hardirq/softirq
>> #                            || / _--=> preempt-depth
>> #                            ||| /     delay
>> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>> #              | |       |   ||||       |         |
>>      md127_raid6-6423  [001] .... 22228.159918: analyse_stripe: check
>> 7: state 0x13 read           (null) write           (null) written
>>       (null)
>>      md127_raid6-6423  [001] .... 22228.159919: analyse_stripe: check
>> 6: state 0xa01 read           (null) write           (null) written
>>        (null)
>>      md127_raid6-6423  [001] .... 22228.159919: analyse_stripe: check
>> 5: state 0x801 read           (null) write           (null) written
>>        (null)
>>      md127_raid6-6423  [001] .... 22228.159920: analyse_stripe: check
>> 4: state 0x811 read           (null) write           (null) written
>>        (null)
>>      md127_raid6-6423  [001] .... 22228.159921: analyse_stripe: check
>> 3: state 0x801 read           (null) write           (null) written
>>        (null)
>>      md127_raid6-6423  [001] .... 22228.159921: analyse_stripe: check
>> 2: state 0x811 read           (null) write           (null) written
>>        (null)
>>      md127_raid6-6423  [001] .... 22228.159922: analyse_stripe: check
>> 1: state 0xa01 read           (null) write           (null) written
>>        (null)
>>      md127_raid6-6423  [001] .... 22228.159923: analyse_stripe: check
>> 0: state 0x811 read           (null) write           (null) written
>>        (null)
>>      md127_raid6-6423  [001] .... 22228.159924: handle_stripe:
>> locked=2 uptodate=10 to_read=0 to_write=0 failed=4 failed_num=6,5
>>      md127_raid6-6423  [001] .... 22228.159925:
>> schedule_reconstruction: schedule_reconstruction: stripe 2292697672
>> locked: 4 ops_request: 10
>>      md127_raid6-6423  [001] .... 22228.159925: raid_run_ops:
>> ops_run_reconstruct6: stripe 2292697672
>>      md127_raid6-6423  [001] .... 22228.159943:
>> ops_complete_reconstruct: ops_complete_reconstruct: stripe 2292697672
>>      md127_raid6-6423  [001] .... 22228.159944: handle_stripe:
>> handling stripe 2292697680, state=0x1401 cnt=1, pd_idx=7, qd_idx=8
>> , check:0, reconstruct:6
>> ===========================================
>>
>> I trimmed it because of the failure to send issue.  However, if
>> someone needs a lengthier snip of the trace, let me know.
>>
>> Thanks,
>>
>> 'Cisco

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-10-05 11:12                   ` Francisco Parada
@ 2016-10-06 23:55                     ` Shaohua Li
  2016-10-07 11:23                       ` Francisco Parada
  0 siblings, 1 reply; 16+ messages in thread
From: Shaohua Li @ 2016-10-06 23:55 UTC (permalink / raw)
  To: Francisco Parada; +Cc: mdraid

On Wed, Oct 05, 2016 at 07:12:59AM -0400, Francisco Parada wrote:
> Hi all,
> 
> Just wanted to send one last ping on this.  I didn't hear back and
> really don't know where else to turn to before abandoning hope.
> Should I just start wiping the drives so I can start from scratch at
> this point?
> 
> I rebuilt the kernel with debug patch that Shaohua provided and I sent
> the list the output of the trace.  Does any one have any other
> suggestions?

Sorry for the long delay. I got a chance to look at the log today. Looks it shows
there are read errors in two disks and the raid6 thread is keeping loop to
handle the case, but doesn't success. The disks are sdd and sdg. I have no idea
how to fix this so far though, sorry.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-10-06 23:55                     ` Shaohua Li
@ 2016-10-07 11:23                       ` Francisco Parada
  2016-10-10 19:52                         ` Anthony Youngman
  2016-10-11  3:53                         ` Brad Campbell
  0 siblings, 2 replies; 16+ messages in thread
From: Francisco Parada @ 2016-10-07 11:23 UTC (permalink / raw)
  To: Shaohua Li; +Cc: mdraid

> Sorry for the long delay.

No worries at all, Shaohua!!!! I really appreciate you getting back to me.

> Looks it showsthere are read errors in two disks and the raid6 thread is keeping loop to
handle the case, but doesn't success. The disks are sdd and sdg. I have no idea
how to fix this so far though, sorry.

OK, let me try running diagnostics on the disks again.  I ran
"badblocks" on them with the "-sv" options when I originally
encountered the issue, and only had found errors on one drive.  But
I'll run them again just in case, as well as a few other diagnostics.
I'll have to come up with something to try and detect the errors.  I'm
wondering if these read errors are caused by the controller, or the
drives themselves.  I can try swapping controllers.  Unless it's an
issue with the model drives that I'm using.  I've got WD 3TB Green
EZRX drives, which I recently found out via the RAID Wiki, that they
didn't have error correction (after I spent over a thousand dollars on
the drives in 3 years) ... Had I known better, I would have opted for
different models.

Thanks a million, Shaohua!



On Thu, Oct 6, 2016 at 7:55 PM, Shaohua Li <shli@kernel.org> wrote:
> On Wed, Oct 05, 2016 at 07:12:59AM -0400, Francisco Parada wrote:
>> Hi all,
>>
>> Just wanted to send one last ping on this.  I didn't hear back and
>> really don't know where else to turn to before abandoning hope.
>> Should I just start wiping the drives so I can start from scratch at
>> this point?
>>
>> I rebuilt the kernel with debug patch that Shaohua provided and I sent
>> the list the output of the trace.  Does any one have any other
>> suggestions?
>
> Sorry for the long delay. I got a chance to look at the log today. Looks it shows
> there are read errors in two disks and the raid6 thread is keeping loop to
> handle the case, but doesn't success. The disks are sdd and sdg. I have no idea
> how to fix this so far though, sorry.
>
> Thanks,
> Shaohua

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-10-07 11:23                       ` Francisco Parada
@ 2016-10-10 19:52                         ` Anthony Youngman
  2016-10-11  3:53                         ` Brad Campbell
  1 sibling, 0 replies; 16+ messages in thread
From: Anthony Youngman @ 2016-10-10 19:52 UTC (permalink / raw)
  To: Francisco Parada, Shaohua Li; +Cc: mdraid



On 07/10/16 12:23, Francisco Parada wrote:
> I've got WD 3TB Green
> EZRX drives, which I recently found out via the RAID Wiki, that they
> didn't have error correction (after I spent over a thousand dollars on
> the drives in 3 years) ... Had I known better, I would have opted for
> different models.

There's more to greens than that. Apart from not having error recovery, 
they also apparently have a habit of going to sleep. Search the list for 
greens, but you need to do something else to stop them going to sleep, 
as that will also interfere with the raid.

I'll probably be adding that to the wiki at some point - I want to add a 
section on drives readily available, with that sort of information.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-10-07 11:23                       ` Francisco Parada
  2016-10-10 19:52                         ` Anthony Youngman
@ 2016-10-11  3:53                         ` Brad Campbell
  2016-10-11  9:23                           ` Wols Lists
  1 sibling, 1 reply; 16+ messages in thread
From: Brad Campbell @ 2016-10-11  3:53 UTC (permalink / raw)
  To: Francisco Parada; +Cc: mdraid

On 07/10/16 19:23, Francisco Parada wrote:

> issue with the model drives that I'm using.  I've got WD 3TB Green
> EZRX drives, which I recently found out via the RAID Wiki, that they
> didn't have error correction (after I spent over a thousand dollars on
> the drives in 3 years) ... Had I known better, I would have opted for
> different models.

That's not entirely true. EZRX drives have just as much "error 
correction" as any other drive. What they don't have is the ability to 
rapidly terminate their error correction (WD call it Time Limited Error 
Recovery or TLER). Nothing wrong with that provided your OS storage 
stack timeouts are set appropriately, as without that the drives will 
take longer to process the error than the OS will wait and things get 
ugly resulting in drives kicked from arrays.

Sure, they're not the best drives for the job, and buying new I wouldn't 
recommend them; but disable periodic head parking and spindown, set the 
timeouts appropriately and they'll do the job until they need replacing.

Regards,
Brad

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-10-11  3:53                         ` Brad Campbell
@ 2016-10-11  9:23                           ` Wols Lists
  2016-10-11 12:46                             ` Brad Campbell
  0 siblings, 1 reply; 16+ messages in thread
From: Wols Lists @ 2016-10-11  9:23 UTC (permalink / raw)
  To: Brad Campbell, Francisco Parada; +Cc: mdraid

On 11/10/16 04:53, Brad Campbell wrote:
> On 07/10/16 19:23, Francisco Parada wrote:
> 
>> issue with the model drives that I'm using.  I've got WD 3TB Green
>> EZRX drives, which I recently found out via the RAID Wiki, that they
>> didn't have error correction (after I spent over a thousand dollars on
>> the drives in 3 years) ... Had I known better, I would have opted for
>> different models.
> 
> That's not entirely true. EZRX drives have just as much "error
> correction" as any other drive. What they don't have is the ability to
> rapidly terminate their error correction (WD call it Time Limited Error
> Recovery or TLER). Nothing wrong with that provided your OS storage
> stack timeouts are set appropriately, as without that the drives will
> take longer to process the error than the OS will wait and things get
> ugly resulting in drives kicked from arrays.
> 
> Sure, they're not the best drives for the job, and buying new I wouldn't
> recommend them; but disable periodic head parking and spindown, set the
> timeouts appropriately and they'll do the job until they need replacing.
> 
I've put your timeout fix script on the wiki :-)

It strikes me this would be a good addition too - I could probably
manage something of the sort, but do you or anyone have the code to
detect a green, and do that "disable parking and spindown"? It's a pain,
but having a script that deals with all known and common issues with
drives is probably a good thing.

I ought to go through that script and comment it heavily - I know the
experts will say "Why?" :-) but (although I may be a guru elsewhere)
this is an area I'm learning - hopefully fast - and comments are helpful
when reading code :-) They can always be deleted in production.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID6 - CPU At 100% Usage After Reassembly
  2016-10-11  9:23                           ` Wols Lists
@ 2016-10-11 12:46                             ` Brad Campbell
  0 siblings, 0 replies; 16+ messages in thread
From: Brad Campbell @ 2016-10-11 12:46 UTC (permalink / raw)
  To: Wols Lists, Francisco Parada; +Cc: mdraid

On 11/10/16 17:23, Wols Lists wrote:

> It strikes me this would be a good addition too - I could probably
> manage something of the sort, but do you or anyone have the code to
> detect a green, and do that "disable parking and spindown"? It's a pain,
> but having a script that deals with all known and common issues with
> drives is probably a good thing.
>

The command to disable head parking is a vendor specific command, is 
non-volatile and requires a drive power cycle. So it's a one off set and 
forget when you get the drive.

Excerpt from hdparm man page :
        -J     Get/set the Western Digital (WD) Green Drive's "idle3" 
timeout value.  This timeout controls how often the drive parks its 
heads and enters a low power consumption state.  The factory default is 
eight (8) seconds, which is a very poor choice for use with Linux. 
Leaving it at the  default  will
               result in hundreds of thousands of head load/unload 
cycles in a very short period of time.  The drive mechanism is only 
rated for 300,000 to 1,000,000 cycles, so leaving it at the default 
could result in premature failure, not to mention the performance impact 
of the drive often having to wake-
               up before doing routine I/O.

I just set them to J 0. This applies to the WD Red drives also.

I forget how to tackle the spindown command. I think I used hdparm -S 
every boot, same as the setting scterc timeout.

None of the machines I use with RAID arrays are idle long enough to spin 
down the disks.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RAID6 - CPU At 100% Usage After Reassembly
@ 2016-09-04  2:56 Francisco Parada
  0 siblings, 0 replies; 16+ messages in thread
From: Francisco Parada @ 2016-09-04  2:56 UTC (permalink / raw)
  To: linux-raid

Hello everyone,

I know this gets a ton of visibility, so I'll keep it as concise as possible.

I'm running Ubuntu 16.04.1 and I have (read had) a 7 drive RAID6
array.  I attempted to grow the array by adding 3 additional drives
for a total of 10, but it seems that one of the brand new drives had
60+ bad blocks (according to "badblocks -vw").  I came to this
conclusion, because I had a power outage during the grow that lasted
longer than my 1500VA battery backup could withstand, so when I
attempted to continue the reshape, I noticed that the assemble
wouldn't start upon reboot.  All drives were marked as spares:

=================================================================================================================
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : inactive sdi[0](S) sdh[2](S) sdj[3](S) sdf[7](S) sdg[9](S)
sdd[10](S) sde[11](S) sdb[13](S) sdc[12](S)
      26371219608 blocks super 1.2
=================================================================================================================


Notice above, that there's only 9 drives instead of 10, which I was
supposed to have.  The drive that's missing is "sdk", but that's
because using "badblocks -vw" has wiped out the drive in an effort to
figure out if there was actually something wrong with said drive
(You're probably gasping, but it had a missing GPT table, and no
matter what I tried to recover it, the drive would just stop
responding to reads and writes).  So I attempted to assemble the array
with "/dev/sdk" missing as shown below, but I get this:

===================================================================================================================
# mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdi /dev/sdj missing
mdadm: looking for devices for /dev/md127
mdadm: cannot open device missing: No such file or directory
mdadm: missing has no superblock - assembly aborted
===================================================================================================================


But I guess that doesn't matter, because almost all other drives are
almost sync'ed as specified in the events output of mdadm (once again,
keep in mind that "/dev/sdk" is blank, thus the "no md superblock"
error):

==============================================
# mdadm -E /dev/sd[b-k] | grep Events
         Events : 280026
         Events : 280026
         Events : 280026
         Events : 280026
         Events : 280026
         Events : 280026
         Events : 280026
mdadm: No md superblock detected on /dev/sdk.
         Events : 280026
         Events : 280011
==============================================


So I attempt to reassemble it, by leaving out "/dev/sdk" and it seems
to assemble it, with some warnings of course:

===========================================================================================================
# mdadm -Afv /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdi /dev/sdj
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb is identified as a member of /dev/md127, slot 7.
mdadm: /dev/sdc is identified as a member of /dev/md127, slot 8.
mdadm: /dev/sdd is identified as a member of /dev/md127, slot 6.
mdadm: /dev/sde is identified as a member of /dev/md127, slot 9.
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdh is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sdi is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdj is identified as a member of /dev/md127, slot 3.
mdadm: :/dev/md127 has an active reshape - checking if critical
section needs to be restored
mdadm: No backup metadata on device-7
mdadm: No backup metadata on device-8
mdadm: No backup metadata on device-9
mdadm: added /dev/sdg to /dev/md127 as 1
mdadm: added /dev/sdh to /dev/md127 as 2
mdadm: added /dev/sdj to /dev/md127 as 3 (possibly out of date)
mdadm: added /dev/sdf to /dev/md127 as 4
mdadm: no uptodate device for slot 10 of /dev/md127
mdadm: added /dev/sdd to /dev/md127 as 6
mdadm: added /dev/sdb to /dev/md127 as 7
mdadm: added /dev/sdc to /dev/md127 as 8
mdadm: added /dev/sde to /dev/md127 as 9
mdadm: added /dev/sdi to /dev/md127 as 0
mdadm: /dev/md127 has been started with 8 drives (out of 10).
===========================================================================================================


But now the reshape goes from 80000K to 1000K and eventually 0K speed
shortly after hitting "enter" to reassemble:

===========================================================================================================
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : active raid6 sdi[0] sde[11] sdc[12] sdb[13] sdd[10] sdf[7] sdh[2] sdg[9]
      14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
[10/8] [UUU_U_UUUU]
      [=======>.............]  reshape = 39.1% (1146348628/2930135040)
finish=51538126.9min speed=0K/sec
      bitmap: 0/22 pages [0KB], 65536KB chunk

unused devices: <none>
===========================================================================================================


So I did a little probing and it seems that my CPU is running at 100%
by "md127_raid6".  I should note that it has been this way for over a
week now, the time doesn't reflect it because I had to perform a
reboot.  So I'm at a loss, because even if I try to optimize reshape
speeds, the reshape still remains at 0K/sec.

=================================================================================
top - 22:28:53 up  1:56,  3 users,  load average: 3.05, 2.04, 0.92
Tasks: 317 total,   4 running, 313 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.4 us, 50.5 sy,  0.0 ni, 44.1 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1521584 total,   220812 free,   774868 used,   525904 buff/cache
KiB Swap: 25153532 total, 25000764 free,   152768 used.   477708 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
  435 root      20   0       0      0      0 R  98.0  0.0   5:27.12
md127_raid6
28941 cisco     20   0  546436  34336  25080 R   2.9  2.3   0:18.32
gnome-disks
 3557 message+  20   0   44364   4632   3068 S   2.0  0.3   0:06.53
dbus-daemon
=================================================================================

Any ideas?  Your help would be greatly appreciated.

Thanks in advance

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-10-11 12:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-04  4:04 RAID6 - CPU At 100% Usage After Reassembly Francisco Parada
2016-09-04 14:38 ` Michael J. Shaver
     [not found]   ` <CAOW94uv4zSGs+6be3zhcQaGZdiAgg-s4ZHZ=mszcURo6pqJyqA@mail.gmail.com>
2016-09-04 22:48     ` Francisco Parada
2016-09-13 15:22       ` Francisco Parada
2016-09-13 17:43         ` Shaohua Li
2016-09-13 18:15           ` Francisco Parada
     [not found]             ` <CAOW94utVBcLz191ifzKkjn+nsSthPWDAQF8R-PabcS2uPareag@mail.gmail.com>
2016-09-26 14:29               ` Francisco Parada
2016-09-30 13:06                 ` Francisco Parada
2016-10-05 11:12                   ` Francisco Parada
2016-10-06 23:55                     ` Shaohua Li
2016-10-07 11:23                       ` Francisco Parada
2016-10-10 19:52                         ` Anthony Youngman
2016-10-11  3:53                         ` Brad Campbell
2016-10-11  9:23                           ` Wols Lists
2016-10-11 12:46                             ` Brad Campbell
  -- strict thread matches above, loose matches on Subject: below --
2016-09-04  2:56 Francisco Parada

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.