All of lore.kernel.org
 help / color / mirror / Atom feed
From: Manibalan P <pmanibalan@amiindia.co.in>
To: NeilBrown <neilb@suse.de>, linux-raid <linux-raid@vger.kernel.org>
Cc: "Pasi Kärkkäinen" <pasik@iki.fi>
Subject: RE: md_raid5 using 100% CPU and hang with status resync=PENDING, if a drive is removed during initialization
Date: Wed, 4 Feb 2015 05:56:32 +0000	[thread overview]
Message-ID: <CD8664C5675EDF49A5E76D7DB099D7B326133051@VENUS1.in.megatrends.com> (raw)
In-Reply-To: <20150203093040.569aa5e1@notabene.brown>

>> Dear All,
>> 	Any updates on this issue.

>Probably the same as:

 > http://marc.info/?l=linux-raid&m=142283560704091&w=2
Dear Neil
	This patch is not fixing this issue.

	This issue happens only if a drive removed from a RAID5 array, which is "initializing" and "heavy IO" is performed on the array.
	In such case, as soon as the drive removed, the array state changed to resync=PENDING and md0_raid5 thread using 100% of CPU.

Thanks,
Manibalan.
>which follows on from
> http://marc.info/?t=142221642300001&r=1&w=2
>and
>  http://marc.info/?t=142172432500001&r=1&w=2

>NeilBrown

-----Original Message-----
From: NeilBrown [mailto:neilb@suse.de] 
Sent: Tuesday, February 3, 2015 4:01 AM
To: Manibalan P
Cc: Pasi Kärkkäinen; linux-raid
Subject: Re: md_raid5 using 100% CPU and hang with status resync=PENDING, if a drive is removed during initialization

On Mon, 2 Feb 2015 07:10:14 +0000 Manibalan P <pmanibalan@amiindia.co.in>
wrote:

> Dear All,
> 	Any updates on this issue.

Probably the same as:

  http://marc.info/?l=linux-raid&m=142283560704091&w=2

which follows on from
  http://marc.info/?t=142221642300001&r=1&w=2
and
  http://marc.info/?t=142172432500001&r=1&w=2

NeilBrown


> Thanks,
> Manibalan.
> 
> -----Original Message-----
> From: Manibalan P
> Sent: Wednesday, January 14, 2015 3:55 PM
> To: 'Pasi Kärkkäinen'
> Cc: 'neilb@suse.de'; 'linux-raid'
> Subject: RE: md_raid5 using 100% CPU and hang with status 
> resync=PENDING, if a drive is removed during initialization
> 
> Dear Pasi,
> Could you able to find something on this issue.
> 
> Thanks,
> Manibalan.
> 
> -----Original Message-----
> From: Manibalan P
> Sent: Friday, January 2, 2015 12:08 PM
> To: 'Pasi Kärkkäinen'
> Cc: neilb@suse.de; linux-raid
> Subject: RE: md_raid5 using 100% CPU and hang with status 
> resync=PENDING, if a drive is removed during initialization
> 
> Dear Pasi,
> 
> I have add the bug in 
> https://bugzilla.redhat.com/show_bug.cgi?id=1178080
> 
> Thanks,
> Manibalan.
> 
> -----Original Message-----
> From: Pasi Kärkkäinen [mailto:pasik@iki.fi]
> Sent: Wednesday, December 31, 2014 10:18 PM
> To: Manibalan P
> Cc: neilb@suse.de; linux-raid
> Subject: Re: md_raid5 using 100% CPU and hang with status 
> resync=PENDING, if a drive is removed during initialization
> 
> On Tue, Dec 30, 2014 at 11:06:47AM +0000, Manibalan P wrote:
> > Dear Neil,
> >
> 
> Hello,
>  
> > Few this for you kind attention,
> > 1. I tried the same test with FC11 (2.6.32 kernel before MD code 
> > change). And the issue is not there 2. But with Centos 6.4 (2.6.32 kernel after MD code change). I am getting this issue.. and also even with the latest kernel, able to reproduce the issue.
> > 
> > Also, a bug has been raise with RHEL regarding this issue. Please find the bug link "https://access.redhat.com/support/cases/#/case/01320319"
> > 
> 
> That support case URL can only be accessed by you and Redhat. Do you happen to have a public bugzilla link? 
> 
> 
> Thanks,
> 
> -- Pasi
> 
> > Thanks,
> > Manibalan.
> > 
> > -----Original Message-----
> > From: Manibalan P
> > Sent: Wednesday, December 24, 2014 12:15 PM
> > To: neilb@suse.de; 'linux-raid'
> > Cc: 'NeilBrown'
> > Subject: RE: md_raid5 using 100% CPU and hang with status 
> > resync=PENDING, if a drive is removed during initialization
> > 
> > 
> > Dear Neil,
> > 
> > Few this for you kind attention,
> > 1. I tried the same tesst with FC11 (2.6 kernel before MD code change). And the issue is not there 2. But with Centos 6.4 (2.6 after MD code change). I am getting this issue.. and also even with the latest kernel, able to reproduce the issue.
> > 
> > Thanks,
> > Manibalan.
> > 
> > -----Original Message-----
> > From: Manibalan P
> > Sent: Thursday, December 18, 2014 11:38 AM
> > To: 'linux-raid'
> > Cc: 'NeilBrown'; Vijayarankan Muthirisavengopal; Dinakaran N
> > Subject: RE: md_raid5 using 100% CPU and hang with status 
> > resync=PENDING, if a drive is removed during initialization
> > 
> > Dear neil,
> > 
> > I also compiled the latest 3.18 kernel on CentOS 6.4 with GIT MD pull patches form 3.19, that also ran in to the same issue after removing a drive during resync.
> > 
> > Dec 17 19:07:32 ITX002590129362 kernel: Linux version 3.18.0 (root@mycentos6) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Wed Dec 17 15:59:09 EST 2014 Dec 17 19:07:32 ITX002590129362 kernel: Command line: ro root=/dev/md255 rd_NO_LVM rd_NO_DM rhgb quiet md_mod.start_ro=1 nmi_watchdog=1 md_mod.start_dirty_degraded=1 ??? Dec 17 19:10:15 ITX002590129362 kernel: md: bind<sda6> Dec 17 19:10:15 ITX002590129362 kernel: md: bind<sdb6> Dec 17 19:10:15 ITX002590129362 kernel: md: bind<sdc6> Dec 17 19:10:15 ITX002590129362 kernel: md: bind<sdh6> Dec 17 19:10:15 ITX002590129362 kernel: md: bind<sdi6> Dec 17 19:10:15 ITX002590129362 kernel: md: bind<sdj6> Dec 17 19:10:15 ITX002590129362 kernel: async_tx: api initialized (async) Dec 17 19:10:15 ITX002590129362 kernel: xor: measuring software checksum speed
> > Dec 17 19:10:15 ITX002590129362 kernel:   prefetch64-sse: 10048.000 MB/sec
> > Dec 17 19:10:15 ITX002590129362 kernel:   generic_sse:  8824.000 MB/sec
> > Dec 17 19:10:15 ITX002590129362 kernel: xor: using function: prefetch64-sse (10048.000 MB/sec)
> > Dec 17 19:10:15 ITX002590129362 kernel: raid6: sse2x1    5921 MB/s
> > Dec 17 19:10:15 ITX002590129362 kernel: raid6: sse2x2    6933 MB/s
> > Dec 17 19:10:15 ITX002590129362 kernel: raid6: sse2x4    7476 MB/s
> > Dec 17 19:10:15 ITX002590129362 kernel: raid6: using algorithm 
> > sse2x4
> > (7476 MB/s) Dec 17 19:10:15 ITX002590129362 kernel: raid6: using
> > ssse3x2 recovery algorithm Dec 17 19:10:15 ITX002590129362 kernel: md: 
> > raid6 personality registered for level 6 Dec 17 19:10:15
> > ITX002590129362 kernel: md: raid5 personality registered for level 5 
> > Dec 17 19:10:15 ITX002590129362 kernel: md: raid4 personality 
> > registered for level 4 Dec 17 19:10:15 ITX002590129362 kernel:
> > md/raid:md0: not clean -- starting background reconstruction Dec 17
> > 19:10:15 ITX002590129362 kernel: md/raid:md0: device sdj6 
> > operational as raid disk 5 Dec 17 19:10:15 ITX002590129362 kernel: md/raid:md0:
> > device sdi6 operational as raid disk 4 Dec 17 19:10:15 
> > ITX002590129362
> > kernel: md/raid:md0: device sdh6 operational as raid disk 3 Dec 17
> > 19:10:15 ITX002590129362 kernel: md/raid:md0: device sdc6 
> > operational as raid disk 2 Dec 17 19:10:15 ITX002590129362 kernel: md/raid:md0:
> > device sdb6 operational as raid disk 1 Dec 17 19:10:15 
> > ITX002590129362
> > kernel: md/ra
> > id:md0: device sda6 operational as raid disk 0 Dec 17 19:10:15 ITX002590129362 kernel: md/raid:md0: allocated 0kB Dec 17 19:10:15 ITX002590129362 kernel: md/raid:md0: raid level 5 active with 6 out of 6 devices, algorithm 2 Dec 17 19:10:15 ITX002590129362 kernel: md0: detected capacity change from 0 to 2361059573760 Dec 17 19:10:15 ITX002590129362 kernel: md0: unknown partition table Dec 17 19:10:35 ITX002590129362 kernel: md: md0 switched to read-write mode.
> > Dec 17 19:10:35 ITX002590129362 kernel: md: resync of RAID array md0 Dec 17 19:10:35 ITX002590129362 kernel: md: minimum _guaranteed_  speed: 10000 KB/sec/disk.
> > Dec 17 19:10:35 ITX002590129362 kernel: md: using maximum available idle IO bandwidth (but not more than 30000 KB/sec) for resync.
> > Dec 17 19:10:35 ITX002590129362 kernel: md: using 128k window, over a total of 461144448k.
> > ???
> > Started IOs using fio tool.
> > 
> > ./fio --name=md0 --filename=/dev/md0 --thread --numjobs=10 
> > --direct=1 --group_reporting --unlink=0 --loops=1 --offset=0 
> > --randrepeat=1 --norandommap --scramble_buffers=1 --stonewall 
> > --ioengine=libaio --rw=randwrite --bs=8704 --iodepth=4000 
> > --runtime=3000
> > --blockalign=512
> > 
> > ???
> > Removed a drive form the system..
> > 
> > Dec 17 19:13:23 ITX002590129362 kernel: mpt2sas0: log_info(0x31120101): originator(PL), code(0x12), sub_code(0x0101) Dec 17 19:13:23 ITX002590129362 kernel: mpt2sas0: log_info(0x31120101): originator(PL), code(0x12), sub_code(0x0101) Dec 17 19:13:23 ITX002590129362 kernel: mpt2sas0: log_info(0x31120101): originator(PL), code(0x12), sub_code(0x0101) Dec 17 19:13:23 ITX002590129362 kernel: mpt2sas0: log_info(0x31120101): originator(PL), code(0x12), sub_code(0x0101) ..
> > Dec 17 19:13:23 ITX002590129362 kernel: sd 0:0:7:0: [sdh] Dec 17 19:13:23 ITX002590129362 kernel: Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK Dec 17 19:13:23 ITX002590129362 kernel: sd 0:0:7:0: [sdh] CDB:
> > Dec 17 19:13:23 ITX002590129362 kernel: Read(10): 28 00 02 69 03 70 00 00 10 00 Dec 17 19:13:23 ITX002590129362 kernel: blk_update_request: I/O error, dev sdh, sector 40436592 Dec 17 19:13:23 ITX002590129362 kernel: sd 0:0:7:0: [sdh] Dec 17 19:13:23 ITX002590129362 kernel: Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK Dec 17 19:13:23 ITX002590129362 kernel: sd 0:0:7:0: [sdh] CDB:
> > Dec 17 19:13:23 ITX002590129362 kernel: Read(10): 28 00 0c 51 b3 d0 00 00 18 00 Dec 17 19:13:23 ITX002590129362 kernel: blk_update_request: I/O error, dev sdh, sector 206681040 Dec 17 19:13:23 ITX002590129362 kernel: sd 0:0:7:0: [sdh] Dec 17 19:13:23 ITX002590129362 kernel: Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK Dec 17 19:13:23 ITX002590129362 kernel: sd 0:0:7:0: [sdh] CDB:
> > Dec 17 19:13:23 ITX002590129362 kernel: Read(10): 28 00 0c 3a f3 40 00 00 18 00 Dec 17 19:13:23 ITX002590129362 kernel: blk_update_request: I/O error, dev sdh, sector 205189952 Dec 17 19:13:23 ITX002590129362 kernel: sd 0:0:7:0: [sdh] Dec 17 19:13:23 ITX002590129362 kernel: Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK ??? Dec 17 19:13:25 ITX002590129362 kernel: sd 0:0:7:0: [sdh] CDB:
> > Dec 17 19:13:25 ITX002590129362 kernel: Read(10): 28 00 26 8d eb 00 00 00 08 00 Dec 17 19:13:25 ITX002590129362 kernel: sd 0:0:7:0: [sdh] Dec 17 19:13:25 ITX002590129362 kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 17 19:13:25 ITX002590129362 kernel: sd 0:0:7:0: [sdh] CDB:
> > Dec 17 19:13:25 ITX002590129362 kernel: Read(10): 28 00 26 8d eb f0 00 00 10 00 Dec 17 19:13:25 ITX002590129362 aghswap: devpath [0:0:7:0] action [remove] devtype [scsi_disk] Dec 17 19:13:25 ITX002590129362 aghswap: MHSA: Sent event 0 0 7 0 remove scsi_disk Dec 17 19:13:25 ITX002590129362 kernel: mpt2sas0: removing handle(0x0011), sas_addr(0x500605ba0101e305) Dec 17 19:13:25 ITX002590129362 kernel: md/raid:md0: Disk failure on sdh6, disabling device.
> > Dec 17 19:13:25 ITX002590129362 kernel: md/raid:md0: Operation continuing on 5 devices.
> > Dec 17 19:13:25 ITX002590129362 kernel: md: md0: resync interrupted.
> > Dec 17 19:13:25 ITX002590129362 kernel: md: checkpointing resync of md0.
> > ..
> > Log messages after enabling debufgs on raid5.c, it is getting repeated continuously.
> > 
> > __get_priority_stripe: handle: busy hold: empty full_writes: 0
> > bypass_count: 0
> > __get_priority_stripe: handle: busy hold: empty full_writes: 0
> > bypass_count: 0
> > __get_priority_stripe: handle: busy hold: empty full_writes: 0
> > bypass_count: 0
> > __get_priority_stripe: handle: busy hold: empty full_writes: 0
> > bypass_count: 0
> > __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0 handling stripe 273480328, state=0x2041 cnt=1, pd_idx=5, qd_idx=-1 , check:0, reconstruct:0
> > check 5: state 0x10 read           (null) write           (null) written           (null)
> > check 4: state 0x11 read           (null) write           (null) written           (null)
> > check 3: state 0x0 read           (null) write           (null) written           (null)
> > check 2: state 0x11 read           (null) write           (null) written           (null)
> > check 1: state 0x11 read           (null) write           (null) written           (null)
> > check 0: state 0x18 read           (null) write ffff8808029b6b00 written           (null)
> > locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1 force RCW max_degraded=1, recovery_cp=7036944 sh->sector=273480328 for sector 273480328, rmw=2 rcw=1 handling stripe 65238568, state=0x2041 cnt=1, pd_idx=5, qd_idx=-1 , check:0, reconstruct:0
> > check 5: state 0x10 read           (null) write           (null) written           (null)
> > check 4: state 0x11 read           (null) write           (null) written           (null)
> > check 3: state 0x0 read           (null) write           (null) written           (null)
> > check 2: state 0x18 read           (null) write ffff88081a956b00 written           (null)
> > check 1: state 0x11 read           (null) write           (null) written           (null)
> > check 0: state 0x11 read           (null) write           (null) written           (null)
> > locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1 force RCW max_degraded=1, recovery_cp=7036944 sh->sector=65238568 for sector 65238568, rmw=2 rcw=1 handling stripe 713868672, state=0x2041 cnt=1, pd_idx=4, qd_idx=-1 , check:0, reconstruct:0
> > check 5: state 0x11 read           (null) write           (null) written           (null)
> > check 4: state 0x10 read           (null) write           (null) written           (null)
> > check 3: state 0x0 read           (null) write           (null) written           (null)
> > check 2: state 0x18 read           (null) write ffff88081f020100 written           (null)
> > check 1: state 0x11 read           (null) write           (null) written           (null)
> > check 0: state 0x11 read           (null) write           (null) written           (null)
> > locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1 force RCW max_degraded=1, recovery_cp=7036944 sh->sector=713868672 for sector 713868672, rmw=2 rcw=1 handling stripe 729622496, state=0x2041 cnt=1, pd_idx=2, qd_idx=-1 , check:0, reconstruct:0
> > check 5: state 0x11 read           (null) write           (null) written           (null)
> > check 4: state 0x11 read           (null) write           (null) written           (null)
> > check 3: state 0x0 read           (null) write           (null) written           (null)
> > check 2: state 0x10 read           (null) write           (null) written           (null)
> > check 1: state 0x18 read           (null) write ffff88081b9bae00 written           (null)
> > check 0: state 0x11 read           (null) write           (null) written           (null)
> > locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1 force RCW max_degraded=1, recovery_cp=7036944 sh->sector=729622496 for sector 729622496, rmw=2 rcw=1 handling stripe 729622504, state=0x2041 cnt=1, pd_idx=2, qd_idx=-1 , check:0, reconstruct:0
> > check 5: state 0x11 read           (null) write           (null) written           (null)
> > check 4: state 0x11 read           (null) write           (null) written           (null)
> > check 3: state 0x0 read           (null) write           (null) written           (null)
> > check 2: state 0x10 read           (null) write           (null) written           (null)
> > check 1: state 0x18 read           (null) write ffff88081b9bae00 written           (null)
> > check 0: state 0x11 read           (null) write           (null) written           (null)
> > locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1 force RCW max_degraded=1, recovery_cp=7036944 sh->sector=729622504 for sector 729622504, rmw=2 rcw=1 handling stripe 245773680, state=0x2041 cnt=1, pd_idx=0, qd_idx=-1 , check:0, reconstruct:0
> > check 5: state 0x11 read           (null) write           (null) written           (null)
> > check 4: state 0x11 read           (null) write           (null) written           (null)
> > check 3: state 0x0 read           (null) write           (null) written           (null)
> > check 2: state 0x11 read           (null) write           (null) written           (null)
> > check 1: state 0x18 read           (null) write ffff88081cab7a00 written           (null)
> > check 0: state 0x10 read           (null) write           (null) written           (null)
> > locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1 force RCW max_degraded=1, recovery_cp=7036944 sh->sector=245773680 for sector 245773680, rmw=2 rcw=1 handling stripe 867965560, state=0x2041 cnt=1, pd_idx=1, qd_idx=-1 , check:0, reconstruct:0
> > check 5: state 0x11 read           (null) write           (null) written           (null)
> > check 4: state 0x11 read           (null) write           (null) written           (null)
> > check 3: state 0x0 read           (null) write           (null) written           (null)
> > check 2: state 0x18 read           (null) write ffff880802b2bf00 written           (null)
> > check 1: state 0x10 read           (null) write           (null) written           (null)
> > check 0: state 0x11 read           (null) write           (null) written           (null)
> > locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1 force RCW max_degraded=1, recovery_cp=7036944 sh->sector=867965560 for sector 867965560, rmw=2 rcw=1 handling stripe 550162280, state=0x2041 cnt=1, pd_idx=2, qd_idx=-1 , check:0, reconstruct:0
> > check 5: state 0x11 read           (null) write           (null) written           (null)
> > check 4: state 0x18 read           (null) write ffff880802b08800 written           (null)
> > check 3: state 0x0 read           (null) write           (null) written           (null)
> > check 2: state 0x10 read           (null) write           (null) written           (null)
> > check 1: state 0x11 read           (null) write           (null) written           (null)
> > check 0: state 0x11 read           (null) write           (null) written           (null)
> > locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1 
> > force RCW max_degraded=1, recovery_cp=7036944 sh->sector=550162280 
> > for sector 550162280, rmw=2 rcw=1
> > 
> > 
> > Thanks,
> > Manibalan
> > 
> > 
> > -----Original Message-----
> > From: Manibalan P
> > Sent: Wednesday, December 17, 2014 12:11 PM
> > To: 'linux-raid'
> > Cc: 'NeilBrown'; Vijayarankan Muthirisavengopal; Dinakaran N
> > Subject: RE: md_raid5 using 100% CPU and hang with status 
> > resync=PENDING, if a drive is removed during initialization
> > 
> > Dear Neil,
> > 
> > The same Issue is reproducible in the latest upstream kernel also.
> > 
> > Tested in "3.17.6" latest stable upstream kernel and find the same issue.
> > 
> > [root@root ~]# modinfo raid456
> > filename:       /lib/modules/3.17.6/kernel/drivers/md/raid456.ko
> > alias:          raid6
> > alias:          raid5
> > alias:          md-level-6
> > alias:          md-raid6
> > alias:          md-personality-8
> > alias:          md-level-4
> > alias:          md-level-5
> > alias:          md-raid4
> > alias:          md-raid5
> > alias:          md-personality-4
> > description:    RAID4/5/6 (striping with parity) personality for MD
> > license:        GPL
> > srcversion:     0EEF680023FDC7410F7989A
> > depends:        async_raid6_recov,async_pq,async_tx,async_memcpy,async_xor
> > intree:         Y
> > vermagic:       3.17.6 SMP mod_unload modversions
> > parm:           devices_handle_discard_safely:Set to Y if all devices in each array reliably return zeroes on reads from discarded regions (bool)
> > 
> > Thanks,
> > Manibalan.
> > 
> > -----Original Message-----
> > From: Manibalan P
> > Sent: Wednesday, December 17, 2014 12:01 PM
> > To: 'linux-raid'
> > Cc: 'NeilBrown'
> > Subject: RE: md_raid5 using 100% CPU and hang with status 
> > resync=PENDING, if a drive is removed during initialization
> > 
> > Dear Neil,
> > 
> > We are facing IO struck issue with raid5  in the following scenario. 
> > (please see the attachment for the complete information) In RAID5 
> > array, if a drive is removed while initialization and the same time 
> > if IO is happening to that md. Then IO is getting struck, and 
> > md_raid5 thread is using 100 % of CPU. Also the md state showing as 
> > resync=PENDING
> > 
> > Kernel :  Issue found in the following kernels RHEL 6.5
> > (2.6.32-431.el6.x86_64) CentOS 7 (kernel-3.10.0-123.13.1.el7.x86_64)
> > 
> > Steps to Reproduce the issue:
> > 
> > 1. Created a raid 5 md with 4 drives using the below mdadm command.
> > mdadm -C /dev/md0 -c 64 -l 5 -f -n 4 -e 1.2 /dev/sdb6 /dev/sdc6
> > /dev/sdd6 /dev/sde6
> > 
> > 2. Make the md writable
> > mdadm ???readwrite /dev/md0
> > 
> > 3. Now md will start initialization
> > 
> > 4. Run FIO Tool, the the below said configuration /usr/bin/fio
> > --name=md0 --filename=/dev/md0 --thread --numjobs=10 --direct=1 
> > --group_reporting --unlink=0 --loops=1 --offset=0 --randrepeat=1 
> > --norandommap --scramble_buffers=1 --stonewall --ioengine=libaio 
> > --rw=randwrite --bs=8704 --iodepth=4000 --runtime=3000
> > --blockalign=512
> > 
> > 4. During MD initialzing, remove a drive(either using MDADM set 
> > faulty/remove or remove manually)
> > 
> > 5. Now the IO will struck, and cat /proc/mdstat shows states with 
> > resync=PENDING
> > --------------------------------------------------------------------
> > --
> > ----------------------- top - output show, md_raid5 using 100% cpu
> > 
> > top - 17:55:06 up  1:09,  3 users,  load average: 11.98, 8.53, 3.99
> > PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 2690 root      20   0     0    0    0 R 100.0  0.0   6:44.41 md0_raid5
> > --------------------------------------------------------------------
> > --
> > -----------------------
> > dmesg - show the stack trace
> > 
> > INFO: task fio:2715 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 000000000000000a     0  2715   2654 0x00000080
> > ffff88043b623598 0000000000000082 0000000000000000 ffffffff81058d53
> > ffff88043b623548 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff88043b40b098 ffff88043b623fd8 000000000000fbc8 ffff88043b40b098 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff8140fa39>] ? 
> > md_wakeup_thread+0x39/0x70 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffffa0308f66>] ? 
> > make_request+0x306/0xc6c [raid456] [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81122283>] ? 
> > mempool_alloc+0x63/0x140 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c767a>] do_direct_IO+0x7ca/0xfa0 [<ffffffff811c8196>]
> > __blockdev_direct_IO_newtrunc+0x346/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2717 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 0000000000000004     0  2717   2654 0x00000080
> > ffff880439e97698 0000000000000082 ffff880439e97628 ffffffff81058d53
> > ffff880439e97648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff88043b0adab8 ffff880439e97fd8 000000000000fbc8 ffff88043b0adab8 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8e50>] __blockdev_direct_IO_newtrunc+0x1000/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2718 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 0000000000000005     0  2718   2654 0x00000080
> > ffff88043bc13698 0000000000000082 ffff88043bc13628 ffffffff81058d53
> > ffff88043bc13648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff88043b0ad058 ffff88043bc13fd8 000000000000fbc8 ffff88043b0ad058 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c3fd2>] ? 
> > bvec_alloc_bs+0x62/0x110 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8e50>] __blockdev_direct_IO_newtrunc+0x1000/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2719 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 0000000000000001     0  2719   2654 0x00000080
> > ffff880439ebb698 0000000000000082 ffff880439ebb628 ffffffff81058d53
> > ffff880439ebb648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff88043b0ac5f8 ffff880439ebbfd8 000000000000fbc8 ffff88043b0ac5f8 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c3fd2>] ? 
> > bvec_alloc_bs+0x62/0x110 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8acd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2720 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 0000000000000008     0  2720   2654 0x00000080
> > ffff88043b8cf698 0000000000000082 ffff88043b8cf628 ffffffff81058d53
> > ffff88043b8cf648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff880439e89af8 ffff88043b8cffd8 000000000000fbc8 ffff880439e89af8 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c3fd2>] ? 
> > bvec_alloc_bs+0x62/0x110 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8acd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2721 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 0000000000000000     0  2721   2654 0x00000080
> > ffff88043b047698 0000000000000082 ffff88043b047628 ffffffff81058d53
> > ffff88043b047648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff880439e89098 ffff88043b047fd8 000000000000fbc8 ffff880439e89098 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c3fd2>] ? 
> > bvec_alloc_bs+0x62/0x110 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8acd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2722 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 0000000000000000     0  2722   2654 0x00000080
> > ffff880439ea3698 0000000000000082 ffff880439ea3628 ffffffff81058d53
> > ffff880439ea3648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff880439e88638 ffff880439ea3fd8 000000000000fbc8 ffff880439e88638 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c3fd2>] ? 
> > bvec_alloc_bs+0x62/0x110 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8acd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2723 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 0000000000000006     0  2723   2654 0x00000080
> > ffff88043bf5f698 0000000000000082 ffff88043bf5f628 ffffffff81058d53
> > ffff88043bf5f648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff88043a183ab8 ffff88043bf5ffd8 000000000000fbc8 ffff88043a183ab8 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c3fd2>] ? 
> > bvec_alloc_bs+0x62/0x110 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8acd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2724 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 000000000000000b     0  2724   2654 0x00000080
> > ffff88043be05698 0000000000000082 ffff88043be05628 ffffffff81058d53
> > ffff88043be05648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff88043a183058 ffff88043be05fd8 000000000000fbc8 ffff88043a183058 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c3fd2>] ? 
> > bvec_alloc_bs+0x62/0x110 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8acd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > INFO: task fio:2725 blocked for more than 120 seconds.
> > Not tainted 2.6.32-431.el6.x86_64 #1 "echo 0 > 
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > fio           D 0000000000000003     0  2725   2654 0x00000080
> > ffff88043be07698 0000000000000082 ffff88043be07628 ffffffff81058d53
> > ffff88043be07648 ffff880230e49cc0 ffff8802389aa228 ffff88043b2ad1b8
> > ffff88043a1825f8 ffff88043be07fd8 000000000000fbc8 ffff88043a1825f8 Call Trace:
> > [<ffffffff81058d53>] ? __wake_up+0x53/0x70 [<ffffffffa030334b>] ? 
> > md_raid5_unplug_device+0x7b/0x100 [raid456] [<ffffffffa0304146>]
> > get_active_stripe+0x236/0x830 [raid456] [<ffffffff81065df0>] ? 
> > default_wake_function+0x0/0x20 [<ffffffff8109b5ce>] ? 
> > prepare_to_wait+0x4e/0x80 [<ffffffffa0308e15>] 
> > make_request+0x1b5/0xc6c [raid456] [<ffffffff8109b2a0>] ?
> > autoremove_wake_function+0x0/0x40 [<ffffffff811220e5>] ? 
> > mempool_alloc_slab+0x15/0x20 [<ffffffff81415b41>]
> > md_make_request+0xe1/0x230 [<ffffffff811c3fd2>] ? 
> > bvec_alloc_bs+0x62/0x110 [<ffffffff811c32f0>] ? 
> > __bio_add_page+0x110/0x230 [<ffffffff81266c50>]
> > generic_make_request+0x240/0x5a0 [<ffffffff811c742c>] ? 
> > do_direct_IO+0x57c/0xfa0 [<ffffffff81267020>] submit_bio+0x70/0x120 
> > [<ffffffff811c8acd>] __blockdev_direct_IO_newtrunc+0xc7d/0x1270
> > [<ffffffff811c4330>] ? blkdev_get_block+0x0/0x20 
> > [<ffffffff811c9137>]
> > __blockdev_direct_IO+0x77/0xe0 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff811c53b7>]
> > blkdev_direct_IO+0x57/0x60 [<ffffffff811c4330>] ? 
> > blkdev_get_block+0x0/0x20 [<ffffffff81120552>]
> > generic_file_direct_write+0xc2/0x190
> > [<ffffffff81121e71>] __generic_file_aio_write+0x3a1/0x490
> > [<ffffffff811d64c0>] ? aio_read_evt+0xa0/0x170 [<ffffffff811c490c>]
> > blkdev_aio_write+0x3c/0xa0 [<ffffffff811c48d0>] ? 
> > blkdev_aio_write+0x0/0xa0 [<ffffffff811d4f64>]
> > aio_rw_vect_retry+0x84/0x200 [<ffffffff811d6924>]
> > aio_run_iocb+0x64/0x170 [<ffffffff811d7d51>] 
> > do_io_submit+0x291/0x920 [<ffffffff811d83f0>] 
> > sys_io_submit+0x10/0x20 [<ffffffff8100b072>] 
> > system_call_fastpath+0x16/0x1b
> > 
> > [root@root ~]# cat /proc/2690/stack
> > [<ffffffff810686da>] __cond_resched+0x2a/0x40 [<ffffffffa030361c>]
> > ops_run_io+0x2c/0x920 [raid456] [<ffffffffa03052cc>]
> > handle_stripe+0x9cc/0x2980 [raid456] [<ffffffffa03078a4>]
> > raid5d+0x624/0x850 [raid456] [<ffffffff81416f05>]
> > md_thread+0x115/0x150 [<ffffffff8109aef6>] kthread+0x96/0xa0 
> > [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffffffffffff>] 
> > 0xffffffffffffffff
> > 
> > [root@root ~]# cat /proc/2690/stat
> > 2690 (md0_raid5) R 2 0 0 0 -1 2149613632 0 0 0 0 0 68495 0 0 20 0 1 
> > 0
> > 350990 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483391 256 0 0 0 
> > 17
> > 2 0 0 6855 0 0 [root@root ~]# cat /proc/2690/statm
> > 0 0 0 0 0 0 0
> > [root@root ~]# cat /proc/2690/stat
> > stat    statm   status
> > [root@root ~]# cat /proc/2690/status
> > Name:   md0_raid5
> > State:  R (running)
> > Tgid:   2690
> > Pid:    2690
> > PPid:   2
> > TracerPid:      0
> > Uid:    0       0       0       0
> > Gid:    0       0       0       0
> > Utrace: 0
> > FDSize: 64
> > Groups:
> > Threads:        1
> > SigQ:   2/128402
> > SigPnd: 0000000000000000
> > ShdPnd: 0000000000000000
> > SigBlk: 0000000000000000
> > SigIgn: fffffffffffffeff
> > SigCgt: 0000000000000100
> > CapInh: 0000000000000000
> > CapPrm: ffffffffffffffff
> > CapEff: fffffffffffffeff
> > CapBnd: ffffffffffffffff
> > Cpus_allowed:   ffffff
> > Cpus_allowed_list:      0-23
> > Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
> > Mems_allowed_list:      0-1
> > voluntary_ctxt_switches:        5411612
> > nonvoluntary_ctxt_switches:     257032
> > 
> > 
> > Thanks,
> > Manibalan.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2015-02-04  5:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-30 11:06 md_raid5 using 100% CPU and hang with status resync=PENDING, if a drive is removed during initialization Manibalan P
2014-12-31 16:48 ` Pasi Kärkkäinen
2015-01-02  6:38   ` Manibalan P
2015-01-14 10:24   ` Manibalan P
2015-02-02  7:10   ` Manibalan P
2015-02-02 22:30     ` NeilBrown
2015-02-04  5:56       ` Manibalan P [this message]
2015-02-12 13:56       ` Manibalan P
2015-02-16 20:36       ` Jes Sorensen
2015-02-16 22:49         ` Jes Sorensen
2015-02-18  0:03           ` Jes Sorensen
2015-02-18  0:27             ` NeilBrown
2015-02-18  1:01               ` Jes Sorensen
2015-02-18  1:07                 ` Jes Sorensen
2015-02-18  1:16                   ` NeilBrown
2015-02-18  5:05                     ` Jes Sorensen
  -- strict thread matches above, loose matches on Subject: below --
2014-12-24  6:45 Manibalan P
2014-12-18  6:08 Manibalan P
2014-12-17  6:40 Manibalan P
2014-12-17  6:31 Manibalan P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CD8664C5675EDF49A5E76D7DB099D7B326133051@VENUS1.in.megatrends.com \
    --to=pmanibalan@amiindia.co.in \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=pasik@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.