All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: Salman Qazi <sqazi@google.com>, Jens Axboe <axboe@kernel.dk>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org
Cc: Jesse Barnes <jsbarnes@google.com>, Gwendal Grignou <gwendal@google.com>
Subject: Re: Hung tasks with multiple partitions
Date: Thu, 30 Jan 2020 12:49:38 -0800	[thread overview]
Message-ID: <55c0fe61-a091-b351-11b4-fa7f668e49d7@acm.org> (raw)
In-Reply-To: <CAKUOC8WM3XU5y9QKHrO8VBdC4Dghexqy+o9OGM1qUs4kGQxZdQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2815 bytes --]

On 1/30/20 11:34 AM, Salman Qazi wrote:
> I am writing on behalf of the Chromium OS team at Google.  We found
> the root cause for some hung tasks we were experiencing and we would
> like to get your opinion on potential solutions.  The bugs were
> encountered on 4.19 kernel.
> However my reading of the code suggests that the relevant portions of the
> code have not changed since then.
> 
> We have an eMMC flash drive that has been carved into partitions on an
> 8 CPU system.  The repro case that we came up with, is to use 8
> threaded fio write-mostly workload against one partition, let the
> system use the other partition as the read-write filesystem (i.e. just
> background activity) and then run the following loop:
> 
> while true; do sync; sleep 1 ; done
> 
> The hung task stack traces look like the following:
> 
> [  128.994891] jbd2/dm-1-8     D    0   367      2 0x00000028
> last_sleep: 96340206998.  last_runnable: 96340140151
> [  128.994898] Call trace:
> [  128.994903]  __switch_to+0x120/0x13c
> [  128.994909]  __schedule+0x60c/0x7dc
> [  128.994914]  schedule+0x74/0x94
> [  128.994919]  io_schedule+0x1c/0x40
> [  128.994925]  bit_wait_io+0x18/0x58
> [  128.994930]  __wait_on_bit+0x78/0xdc
> [  128.994935]  out_of_line_wait_on_bit+0xa0/0xcc
> [  128.994943]  __wait_on_buffer+0x48/0x54
> [  128.994948]  jbd2_journal_commit_transaction+0x1198/0x1a4c
> [  128.994956]  kjournald2+0x19c/0x268
> [  128.994961]  kthread+0x120/0x130
> [  128.994967]  ret_from_fork+0x10/0x18
> 
> I added some more information to trace points to understand what was
> going on.  It turns out that blk_mq_sched_dispatch_requests had
> checked hctx->dispatch, found it empty, and then began consuming
> requests from the io scheduler (in blk_mq_do_dispatch_sched).
> Unfortunately, the deluge from the I/O scheduler (BFQ in our case)
> doesn't stop for 30 seconds and there is no mechanism present in
> blk_mq_do_dispatch_sched to terminate early or reconsider
> hctx->dispatch contents.  In the meantime, a flush command arrives in
> hctx->dispatch (via insertion in  blk_mq_sched_bypass_insert) and
> languishes there.  Eventually the thread waiting on the flush triggers
> the hung task watchdog.
> 
> The solution that comes to mind is to periodically check
> hctx->dispatch in blk_mq_do_dispatch_sched and exit early if it is
> non-empty.  However, not being an expert in this subsystem, I am not
> sure if there would be other consequences.

The call stack shown in your e-mail usually means that an I/O request 
got stuck. How about determining first whether this is caused by the BFQ 
scheduler or by the eMMC driver? I think the developers of these 
software components need that information anyway before they can step in.

The attached script may help to identify which requests got stuck.

Bart.

[-- Attachment #2: list-pending-block-requests --]
[-- Type: text/plain, Size: 1501 bytes --]

#!/bin/bash

show_state() {
    local a dev=$1

    for a in device/state queue/scheduler; do
	[ -e "$dev/$a" ] && grep -aH . "$dev/$a"
    done
}

if [ -e /sys/kernel/debug/block ]; then
    devs=($(cd /sys/kernel/debug/block && echo ./*))
else
    devs=($(cd /sys/class/block && echo ./*))
fi

cd /sys/class/block || exit $?
for dev in "${devs[@]}"; do
    dev="${dev#./}"
    echo "$dev"
    pending=0
    if [ -e "$dev/mq" ]; then
	for f in "$dev"/mq/*/{pending,*/rq_list}; do
	    [ -e "$f" ] || continue
	    if { read -r line1 && read -r line2; } <"$f"; then
		echo "$f"
		echo "$line1 $line2" >/dev/null
		head -n 9 "$f"
		((pending++))
	    fi
	done
    fi
    (
	busy=0
	cd /sys/kernel/debug/block >&/dev/null &&
	    { grep -aH . $dev/requeue_list; true; } &&
	    for d in "$dev"/mq/hctx* "$dev"/hctx*; do
		[ ! -d "$d" ] && continue
		{ [ ! -e "$d/tags" ] ||
		      grep -q '^busy=0$' "$d/tags"; } &&
		    { [ ! -e "$d/sched_tags" ] ||
			  [ "$(<"$d/sched_tags")" = "" ] ||
			  grep -q '^busy=0$' "$d/sched_tags"; } && continue
		((busy++))
	        for f in "$d"/{active,busy,dispatch,flags,requeue_list,sched_tags,state,tags*,cpu*/rq_list,sched/*rqs}; do
		    [ -e "$f" ] && grep -aH . "$f"
		done
	    done
	exit $busy
    )
    pending=$((pending+$?))
    if [ "$pending" -gt 0 ]; then
	(
	    cd /sys/kernel/debug/block >&/dev/null &&
		if [ -e "$dev/mq/state" ]; then
		    grep -aH . "$dev/mq/state"
		else
		    grep -aH . "$dev/state"
		fi
	)
	show_state "$dev"
    fi
done

  reply	other threads:[~2020-01-30 20:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-30 19:34 Hung tasks with multiple partitions Salman Qazi
2020-01-30 20:49 ` Bart Van Assche [this message]
2020-01-30 21:02   ` Salman Qazi
     [not found]     ` <20200203204554.119849-1-sqazi@google.com>
2020-02-03 20:59       ` [PATCH] block: Limit number of items taken from the I/O scheduler in one go Salman Qazi
2020-02-04  3:47         ` Bart Van Assche
2020-02-04  9:20         ` Ming Lei
2020-02-04 18:26           ` Salman Qazi
2020-02-04 19:37             ` Salman Qazi
2020-02-05  4:55               ` Ming Lei
2020-02-05 19:57                 ` Salman Qazi
2020-02-06 10:18                   ` Ming Lei
2020-02-06 21:12                     ` Salman Qazi
2020-02-07  2:07                       ` Ming Lei
2020-02-07 15:26                       ` Bart Van Assche
2020-02-07 18:45                         ` Salman Qazi
2020-02-07 19:04                           ` Salman Qazi
2020-02-07 20:19                           ` Bart Van Assche
2020-02-07 20:37                             ` Salman Qazi
2020-04-20 16:42                               ` Doug Anderson
2020-04-23 20:13                                 ` Jesse Barnes
2020-04-23 20:34                                   ` Jens Axboe
2020-04-23 20:40                                     ` Salman Qazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55c0fe61-a091-b351-11b4-fa7f668e49d7@acm.org \
    --to=bvanassche@acm.org \
    --cc=axboe@kernel.dk \
    --cc=gwendal@google.com \
    --cc=jsbarnes@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sqazi@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.