linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET RFC] sched, jbd2: mark sleeps on journal->j_checkpoint_mutex as iowait
@ 2016-10-28 16:58 Tejun Heo
  2016-10-28 16:58 ` [PATCH 1/4] sched: move IO scheduling accounting from io_schedule_timeout() to __schedule() Tejun Heo
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Tejun Heo @ 2016-10-28 16:58 UTC (permalink / raw)
  To: torvalds, akpm, mingo, peterz, axboe, tytso, jack, adilger.kernel
  Cc: linux-ext4, linux-fsdevel, linux-kernel, kernel-team, mingbo

Hello,

When there's heavy metadata operation traffic on ext4, the journal
gets filled soon and majority of filesystem users end up blocking on
journal->j_checkpoint_mutex with a stacktrace similar to the
following.

 [<ffffffff8c32e758>] __jbd2_log_wait_for_space+0xb8/0x1d0
 [<ffffffff8c3285f6>] add_transaction_credits+0x286/0x2a0
 [<ffffffff8c32876c>] start_this_handle+0x10c/0x400
 [<ffffffff8c328c5b>] jbd2__journal_start+0xdb/0x1e0
 [<ffffffff8c30ee5d>] __ext4_journal_start_sb+0x6d/0x120
 [<ffffffff8c2d713e>] __ext4_new_inode+0x64e/0x1330
 [<ffffffff8c2e9bf0>] ext4_create+0xc0/0x1c0
 [<ffffffff8c2570fd>] path_openat+0x124d/0x1380
 [<ffffffff8c258501>] do_filp_open+0x91/0x100
 [<ffffffff8c2462d0>] do_sys_open+0x130/0x220
 [<ffffffff8c2463de>] SyS_open+0x1e/0x20
 [<ffffffff8c7ec5b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
 [<ffffffffffffffff>] 0xffffffffffffffff

Because the sleeps on the mutex aren't accounted as iowait, the system
doesn't show the usual signs of being bogged down by IOs - both iowait
and /proc/stat:procs_blocked stay misleadingly low.  While propagation
of iowait through locking constructs is far from being strict, heavy
contention on j_checkpoint_mutex is easy to trigger, obviously iowait
and getting it right can help users in tracking down the issue quite a
bit.

Due to the way io_schedule() is implemented, it currently is hairy to
add an io variant to an existing interface - the schedule() call
itself, which is usually buried deep, should be replaced with
io_schedule().  As we already have current->in_iowait to mark the task
as sleeping for iowait, this can be made easy by breaking up
io_schedule() into multiple steps so that the preparation and marking
can be done before calling an existing interafce and the actual iowait
accounting can be done from inside the scheduler.

What do you think?

This patch contains the following four patches.

 0001-sched-move-IO-scheduling-accounting-from-io_schedule.patch
 0002-sched-separate-out-io_schedule_prepare-and-io_schedu.patch
 0003-mutex-add-mutex_lock_io.patch
 0004-jbd2-use-mutex_lock_io-for-journal-j_checkpoint_mute.patch

0001-0002 implement io_schedule_prepare/finish().
0003 implements mutex_lock_io() using io_schedule_prepare/finish().
0004 uses mutex_lock_io() on journal->j_checkpoint_mutex.

This patchset is also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-mutex_lock_io

Thanks, diffstat follows.

 fs/jbd2/commit.c       |    2 -
 fs/jbd2/journal.c      |   14 ++++++-------
 include/linux/mutex.h  |    4 +++
 include/linux/sched.h  |    8 ++-----
 kernel/locking/mutex.c |   24 ++++++++++++++++++++++
 kernel/sched/core.c    |   52 +++++++++++++++++++++++++++++++++++++------------
 6 files changed, 79 insertions(+), 25 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-01-14 16:10 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-28 16:58 [PATCHSET RFC] sched, jbd2: mark sleeps on journal->j_checkpoint_mutex as iowait Tejun Heo
2016-10-28 16:58 ` [PATCH 1/4] sched: move IO scheduling accounting from io_schedule_timeout() to __schedule() Tejun Heo
2016-10-28 18:27   ` Peter Zijlstra
2016-10-28 19:07     ` Peter Zijlstra
2016-10-28 19:12       ` Tejun Heo
2016-10-29  3:21         ` Peter Zijlstra
2016-10-31 16:45           ` Tejun Heo
2016-12-06 21:30             ` Tejun Heo
2016-11-03 15:33   ` Pavan Kondeti
2016-11-08 22:51     ` Tejun Heo
2016-12-06 21:29   ` [PATCH v2 " Tejun Heo
2016-12-07  9:35     ` Peter Zijlstra
2016-12-07 20:48       ` [PATCH v3 1/4] sched: move IO scheduling accounting from io_schedule_timeout() into scheduler Tejun Heo
2017-01-14 12:49         ` [tip:sched/core] sched/core: " tip-bot for Tejun Heo
2016-10-28 16:58 ` [PATCH 2/4] sched: separate out io_schedule_prepare() and io_schedule_finish() Tejun Heo
2017-01-14 12:49   ` [tip:sched/core] sched/core: Separate " tip-bot for Tejun Heo
2016-10-28 16:58 ` [PATCH 3/4] mutex: add mutex_lock_io() Tejun Heo
2017-01-14 12:50   ` [tip:sched/core] locking/mutex, sched/wait: Add mutex_lock_io() tip-bot for Tejun Heo
2017-01-14 14:13     ` Mike Galbraith
2017-01-14 16:10       ` Ingo Molnar
2016-10-28 16:58 ` [PATCH 4/4] jbd2: use mutex_lock_io() for journal->j_checkpoint_mutex Tejun Heo
2017-01-14 12:51   ` [tip:sched/core] fs/jbd2, locking/mutex, sched/wait: Use " tip-bot for Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).