Raid10 device hangs during resync and heavy I/O.

From: Justin Bronder <jsbronder@gentoo.org>
To: linux-raid@vger.kernel.org
Subject: Raid10 device hangs during resync and heavy I/O.
Date: Fri, 16 Jul 2010 14:46:18 -0400	[thread overview]
Message-ID: <20100716184618.GA25890@gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 9135 bytes --]

I've been able to reproduce this across a number of machines with the same
hardware configuration.  During a raid10 resync, it's possible to hang the
device so that any further I/O operations will also block.  This can be
fairly simply done using dd.  

Interestingly, this is not reproducible when using a non-partitioned device.
That is, creating the device with --auto=yes and then directly using it
functions as expected.  However, using --auto=yes or --auto=mdp and then
creating a partition across the device will cause the hang.

From all appearances, this is not just slow I/O, days later the same tasks
are still blocked.  The rest of the system continues to function normally,
including other raid devices.

Below I'm going to include the script I'm using to reproduce, the relevant
kernel tracebacks, and /proc/mdstat.  Thanks in advance for any help
resolving this.

=== md10-hang.sh ===
#!/bin/bash

MDP=false
# Pick two unused drives here.
MD_DRIVES="sdc sdd"

if ${MDP}; then
    MD_DEV="md_d99"
else
    MD_DEV="md99"
fi

M="/mnt/mdmount"
SIZE=8192

die () {
    echo
    echo "ERROR:  $*"
    echo
    exit 1
}

mkraid() {
    local d
    local drives
    local mdargs="--auto=yes"

    ${MDP} && mdargs="--auto=mdp"

    mkdir -p ${M}
    umount -f ${M} &>/dev/null
    mdadm --stop /dev/md_d99 &>/dev/null
    mdadm --stop /dev/md99 &>/dev/null

    for d in ${MD_DRIVES}; do
        sfdisk -uM /dev/${d} <<-EOF
,${SIZE},83
,,83
EOF
        mdadm --zero-superblock /dev/${d}1 &>/dev/null
        drives="${drives} /dev/${d}1"
    done
    mdadm --create /dev/${MD_DEV} \
        --run \
        --force \
        --level=10 \
        --layout=f2 \
        --raid-devices=2 \
        ${mdargs} ${drives} || die "mdadm --create failed"

    if ${MDP}; then
        printf ",,83\n" | sfdisk -uM /dev/${MD_DEV}
        mkfs.ext2 -q /dev/${MD_DEV}p1
        mount /dev/${MD_DEV}p1 ${M} || die "Mount failed"
    else
        printf ",,83\n" | sfdisk -uM /dev/${MD_DEV}
        mkfs.ext2 -q /dev/${MD_DEV}p1
        mount /dev/${MD_DEV}p1 ${M} || die "Mount failed"
    fi

    echo "Creating tmp file"
    dd if=/dev/zero of=${M}/tmpfile bs=1M count=4000
}

mkraid
i=1
while [ "$(</sys/block/${MD_DEV}/md/sync_action)"  != "idle" ]; do
    echo "Attempt ${i} to cause crash"
    cat /proc/mdstat
    dd if=${M}/tmpfile of=${M}/cpfile bs=1M
    i=$((i++))
done

=== kernel trace ===
[ 9002.405247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9002.433361]  ffff88025436fc30 0000000000000046 ffff88025436fc10 ffff880254616800
[ 9002.460415]  ffff88025d40dd70 ffff88025d40a3f0 0000000354616800 00000000000de600
[ 9002.487497]  ffff88025436fc10 ffff8801570343c0 ffff880157034420 ffff880157034448
[ 9002.514575] Call Trace:
[ 9002.526609]  [<ffffffff81320efb>] raise_barrier+0x167/0x1a3
[ 9002.548139]  [<ffffffff810383b6>] ? default_wake_function+0x0/0xf
[ 9002.571218]  [<ffffffff813238e1>] sync_request+0x57d/0x8a8
[ 9002.592430]  [<ffffffff81320ca5>] ? raid10_unplug+0x24/0x28
[ 9002.613833]  [<ffffffff8132ad63>] ? md_thread+0x0/0xe8
[ 9002.633938]  [<ffffffff8132dab2>] md_do_sync+0x685/0xa9d
[ 9002.654556]  [<ffffffff8132ad63>] ? md_thread+0x0/0xe8
[ 9002.674650]  [<ffffffff8132ae31>] md_thread+0xce/0xe8
[ 9002.694435]  [<ffffffff81034aa6>] ? spin_unlock_irqrestore+0x9/0xb
[ 9002.717583]  [<ffffffff81056cc0>] kthread+0x69/0x71
[ 9002.736753]  [<ffffffff810037e4>] kernel_thread_helper+0x4/0x10
[ 9002.759062]  [<ffffffff81056c57>] ? kthread+0x0/0x71
[ 9002.778478]  [<ffffffff810037e0>] ? kernel_thread_helper+0x0/0x10
[ 9002.801286] INFO: task flush-9:99:5896 blocked for more than 120 seconds.
[ 9002.826287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9002.854475]  ffff88012fa8b870 0000000000000046 ffff88012fa8b850 ffff880254616800
[ 9002.881589]  ffff88025d40ebd0 ffff88025d40a3f0 000000036fb73300 0000000000000001
[ 9002.908691]  ffff88012fa8b850 ffff8801570343c0 ffff880157034420 ffff880157034448
[ 9002.935778] Call Trace:
[ 9002.947781]  [<ffffffff81320d5b>] wait_barrier+0xa7/0xe0
[ 9002.968438]  [<ffffffff810383b6>] ? default_wake_function+0x0/0xf
[ 9002.991452]  [<ffffffff8132163e>] make_request+0x121/0x507
[ 9003.012697]  [<ffffffff8132d2aa>] md_make_request+0xc7/0x101
[ 9003.034515]  [<ffffffff811dc817>] generic_make_request+0x1af/0x276
[ 9003.057953]  [<ffffffff811dda3b>] submit_bio+0x9e/0xa7
[ 9003.078197]  [<ffffffff810e950d>] submit_bh+0x11b/0x13f
[ 9003.098648]  [<ffffffff810ebba9>] __block_write_full_page+0x20b/0x310
[ 9003.122755]  [<ffffffff810ec383>] ? end_buffer_async_write+0x0/0x13a
[ 9003.146576]  [<ffffffff810ef5b2>] ? blkdev_get_block+0x0/0x50
[ 9003.168612]  [<ffffffff810ec383>] ? end_buffer_async_write+0x0/0x13a
[ 9003.192521]  [<ffffffff810ef5b2>] ? blkdev_get_block+0x0/0x50
[ 9003.214684]  [<ffffffff810ebd30>] block_write_full_page_endio+0x82/0x8e
[ 9003.239426]  [<ffffffff810ebd4c>] block_write_full_page+0x10/0x12
[ 9003.262539]  [<ffffffff810eea92>] blkdev_writepage+0x13/0x15
[ 9003.284297]  [<ffffffff8109e005>] __writepage+0x12/0x2b
[ 9003.304718]  [<ffffffff8109e46c>] write_cache_pages+0x1fa/0x306
[ 9003.327254]  [<ffffffff8109dff3>] ? __writepage+0x0/0x2b
[ 9003.347976]  [<ffffffff810e9f65>] ? mark_buffer_dirty+0x85/0x89
[ 9003.370488]  [<ffffffff8109e597>] generic_writepages+0x1f/0x25
[ 9003.392661]  [<ffffffff8109e5b9>] do_writepages+0x1c/0x25
[ 9003.413477]  [<ffffffff810e43e0>] writeback_single_inode+0xb0/0x1c7
[ 9003.436962]  [<ffffffff810e4b5a>] writeback_inodes_wb+0x2bf/0x35a
[ 9003.459949]  [<ffffffff810e4d1a>] wb_writeback+0x125/0x1a1
[ 9003.481145]  [<ffffffff810e4f66>] wb_do_writeback+0x138/0x14f
[ 9003.503124]  [<ffffffff810ab3e7>] ? bdi_start_fn+0x0/0xca
[ 9003.524024]  [<ffffffff810e4fa4>] bdi_writeback_task+0x27/0x92
[ 9003.546241]  [<ffffffff810ab44c>] bdi_start_fn+0x65/0xca
[ 9003.566875]  [<ffffffff81056cc0>] kthread+0x69/0x71
[ 9003.586179]  [<ffffffff810037e4>] kernel_thread_helper+0x4/0x10
[ 9003.608532]  [<ffffffff81056c57>] ? kthread+0x0/0x71
[ 9003.627949]  [<ffffffff810037e0>] ? kernel_thread_helper+0x0/0x10
[ 9003.650791] INFO: task dd:5912 blocked for more than 120 seconds.
[ 9003.673621] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9003.701721]  ffff88024c84d7b8 0000000000000082 ffff88024c84d798 ffff880254616800
[ 9003.728793]  ffff88025d6840b0 ffff88025f065640 0000000296364968 0000000000000000
[ 9003.755893]  000000014c84d798 ffff8801570343c0 ffff880157034420 ffff880157034448
[ 9003.773323] Call Trace:
[ 9003.773326]  [<ffffffff81320d5b>] wait_barrier+0xa7/0xe0
[ 9003.773328]  [<ffffffff810383b6>] ? default_wake_function+0x0/0xf
[ 9003.773330]  [<ffffffff8132163e>] make_request+0x121/0x507
[ 9003.773332]  [<ffffffff810edbd7>] ? bio_split+0xca/0x183
[ 9003.773334]  [<ffffffff813215d5>] make_request+0xb8/0x507
[ 9003.773337]  [<ffffffff811d780d>] ? __elv_add_request+0xa1/0xaa
[ 9003.773339]  [<ffffffff8132d2aa>] md_make_request+0xc7/0x101
[ 9003.773341]  [<ffffffff811dc817>] generic_make_request+0x1af/0x276
[ 9003.773343]  [<ffffffff810ed885>] ? bio_alloc_bioset+0x70/0xc0
[ 9003.773345]  [<ffffffff811dda3b>] submit_bio+0x9e/0xa7
[ 9003.773347]  [<ffffffff810f0d0b>] mpage_bio_submit+0x22/0x26
[ 9003.773349]  [<ffffffff810f17df>] do_mpage_readpage+0x462/0x54e
[ 9003.773352]  [<ffffffff8109fb21>] ? get_page+0x9/0xf
[ 9003.773354]  [<ffffffff810a004d>] ? __lru_cache_add+0x40/0x58
[ 9003.773357]  [<ffffffff8112c194>] ? ext2_get_block+0x0/0x78a
[ 9003.773359]  [<ffffffff810f1a66>] mpage_readpages+0xc9/0x10f
[ 9003.773361]  [<ffffffff8112c194>] ? ext2_get_block+0x0/0x78a
[ 9003.773363]  [<ffffffff81001d89>] ? __switch_to+0x10e/0x1e1
[ 9003.773366]  [<ffffffff8112b40c>] ext2_readpages+0x1a/0x1c
[ 9003.773368]  [<ffffffff8109f4d0>] __do_page_cache_readahead+0xf6/0x191
[ 9003.773370]  [<ffffffff8109f587>] ra_submit+0x1c/0x20
[ 9003.773372]  [<ffffffff8109f7e3>] ondemand_readahead+0x17b/0x18e
[ 9003.773374]  [<ffffffff8109f870>] page_cache_async_readahead+0x7a/0xa2
[ 9003.773379]  [<ffffffff81098a59>] generic_file_aio_read+0x26e/0x55d
[ 9003.773382]  [<ffffffff810cb32e>] do_sync_read+0xc2/0x106
[ 9003.773384]  [<ffffffff810a009d>] ? lru_cache_add_lru+0x38/0x3d
[ 9003.773387]  [<ffffffff8100338e>] ? apic_timer_interrupt+0xe/0x20
[ 9003.773389]  [<ffffffff810cb980>] vfs_read+0xa4/0xde
[ 9003.773391]  [<ffffffff810cbc02>] sys_read+0x47/0x6d
[ 9003.773393]  [<ffffffff81002a42>] system_call_fastpath+0x16/0x1b

=== /proc/mdstat ===
Personalities : [raid1] [raid10] 
md99 : active raid10 sdd1[1] sdc1[0]
      8393856 blocks 64K chunks 2 far-copies [2/2] [UU]
      [=>...................]  resync =  5.4% (455360/8393856) finish=3938.0min speed=33K/sec

md1 : active raid10 sda2[0] sdb2[1]
      976703488 blocks 512K chunks 2 far-copies [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      56128 blocks [2/2] [UU]

unused devices: <none>

-- 
Justin Bronder

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]