All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Mel Gorman <mgorman@suse.de>
Cc: linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, Jiri Slaby <jslaby@suse.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2
Date: Wed, 3 Apr 2013 08:05:30 -0400	[thread overview]
Message-ID: <20130403120529.GA7741@thunk.org> (raw)
In-Reply-To: <20130403101925.GA7341@suse.de>

On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote:
> 
> I'm running with -rc5 now. I have not noticed much interactivity problems
> as such but the stall detection script reported that mutt stalled for
> 20 seconds opening an inbox and imapd blocked for 59 seconds doing path
> lookups, imaps blocked again for 12 seconds doing an atime update, an RSS
> reader blocked for 3.5 seconds writing a file. etc.

If imaps blocked for 12 seconds during an atime update, combined with
everything else, at a guess it got caught by something holding up a
journal commit.  Could you try enabling the jbd2_run_stats tracepoint
and grabbing the trace log?  This will give you statistics on how long
(in milliseconds) each of the various phases of a jbd2 commit is
taking, i.e.:

    jbd2/sdb1-8-327   [002] .... 39681.874661: jbd2_run_stats: dev 8,17 tid 7163786 wait 0 request_delay 0 running 3530 locked 0 flushing 0 logging 0 handle_count 75 blocks 8 blocks_logged 9
     jbd2/sdb1-8-327   [003] .... 39682.514153: jbd2_run_stats: dev 8,17 tid 7163787 wait 0 request_delay 0 running 640 locked 0 flushing 0 logging 0 handle_count 39 blocks 12 blocks_logged 13
     jbd2/sdb1-8-327   [000] .... 39687.665609: jbd2_run_stats: dev 8,17 tid 7163788 wait 0 request_delay 0 running 5150 locked 0 flushing 0 logging 0 handle_count 60 blocks 13 blocks_logged 14
     jbd2/sdb1-8-327   [000] .... 39693.200453: jbd2_run_stats: dev 8,17 tid 7163789 wait 0 request_delay 0 running 4840 locked 0 flushing 0 logging 0 handle_count 53 blocks 10 blocks_logged 11
     jbd2/sdb1-8-327   [001] .... 39695.061657: jbd2_run_stats: dev 8,17 tid 7163790 wait 0 request_delay 0 running 1860 locked 0 flushing 0 logging 0 handle_count 124 blocks 19 blocks_logged 20

In the above sample each journal commit is running for no more than 5
seconds or so (since that's the default jbd2 commit timeout; if a
transaction is running for less than 5 seconds, then either we ran out
of room in the journal, and the blocks_logged number will be high, or
a commit was forced by something such as an fsync call).  

If an atime update is getting blocked by 12 seconds, then it would be
interesting to see if a journal commit is running for significantly
longer than 5 seconds, or if one of the other commit phases is taking
significant amounts of time.  (On the example above they are all
taking no time, since I ran this on a relatively uncontended system;
only a single git operation taking place.)

Something else that might be worth trying is grabbing a lock_stat
report and see if something is sitting on an ext4 or jbd2 mutex for a
long time.

Finally, as I mentioned I tried some rather simplistic tests and I
didn't notice any difference between a 3.2 kernel and a 3.8/3.9-rc5
kernel.  Assuming you can get a version of systemtap that
simultaneously works on 3.2 and 3.9-rc5 :-P, and chance you could do a
quick experiment and see if you're seeing a difference on your setup?

Thanks!!

					 - Ted

WARNING: multiple messages have this Message-ID (diff)
From: Theodore Ts'o <tytso@mit.edu>
To: Mel Gorman <mgorman@suse.de>
Cc: linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, Jiri Slaby <jslaby@suse.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2
Date: Wed, 3 Apr 2013 08:05:30 -0400	[thread overview]
Message-ID: <20130403120529.GA7741@thunk.org> (raw)
In-Reply-To: <20130403101925.GA7341@suse.de>

On Wed, Apr 03, 2013 at 11:19:25AM +0100, Mel Gorman wrote:
> 
> I'm running with -rc5 now. I have not noticed much interactivity problems
> as such but the stall detection script reported that mutt stalled for
> 20 seconds opening an inbox and imapd blocked for 59 seconds doing path
> lookups, imaps blocked again for 12 seconds doing an atime update, an RSS
> reader blocked for 3.5 seconds writing a file. etc.

If imaps blocked for 12 seconds during an atime update, combined with
everything else, at a guess it got caught by something holding up a
journal commit.  Could you try enabling the jbd2_run_stats tracepoint
and grabbing the trace log?  This will give you statistics on how long
(in milliseconds) each of the various phases of a jbd2 commit is
taking, i.e.:

    jbd2/sdb1-8-327   [002] .... 39681.874661: jbd2_run_stats: dev 8,17 tid 7163786 wait 0 request_delay 0 running 3530 locked 0 flushing 0 logging 0 handle_count 75 blocks 8 blocks_logged 9
     jbd2/sdb1-8-327   [003] .... 39682.514153: jbd2_run_stats: dev 8,17 tid 7163787 wait 0 request_delay 0 running 640 locked 0 flushing 0 logging 0 handle_count 39 blocks 12 blocks_logged 13
     jbd2/sdb1-8-327   [000] .... 39687.665609: jbd2_run_stats: dev 8,17 tid 7163788 wait 0 request_delay 0 running 5150 locked 0 flushing 0 logging 0 handle_count 60 blocks 13 blocks_logged 14
     jbd2/sdb1-8-327   [000] .... 39693.200453: jbd2_run_stats: dev 8,17 tid 7163789 wait 0 request_delay 0 running 4840 locked 0 flushing 0 logging 0 handle_count 53 blocks 10 blocks_logged 11
     jbd2/sdb1-8-327   [001] .... 39695.061657: jbd2_run_stats: dev 8,17 tid 7163790 wait 0 request_delay 0 running 1860 locked 0 flushing 0 logging 0 handle_count 124 blocks 19 blocks_logged 20

In the above sample each journal commit is running for no more than 5
seconds or so (since that's the default jbd2 commit timeout; if a
transaction is running for less than 5 seconds, then either we ran out
of room in the journal, and the blocks_logged number will be high, or
a commit was forced by something such as an fsync call).  

If an atime update is getting blocked by 12 seconds, then it would be
interesting to see if a journal commit is running for significantly
longer than 5 seconds, or if one of the other commit phases is taking
significant amounts of time.  (On the example above they are all
taking no time, since I ran this on a relatively uncontended system;
only a single git operation taking place.)

Something else that might be worth trying is grabbing a lock_stat
report and see if something is sitting on an ext4 or jbd2 mutex for a
long time.

Finally, as I mentioned I tried some rather simplistic tests and I
didn't notice any difference between a 3.2 kernel and a 3.8/3.9-rc5
kernel.  Assuming you can get a version of systemtap that
simultaneously works on 3.2 and 3.9-rc5 :-P, and chance you could do a
quick experiment and see if you're seeing a difference on your setup?

Thanks!!

					 - Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-04-03 12:06 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-02 14:27 Excessive stall times on ext4 in 3.9-rc2 Mel Gorman
2013-04-02 14:27 ` Mel Gorman
2013-04-02 15:00 ` Jiri Slaby
2013-04-02 15:00   ` Jiri Slaby
2013-04-02 15:03 ` Zheng Liu
2013-04-02 15:03   ` Zheng Liu
2013-04-02 15:15   ` Mel Gorman
2013-04-02 15:15     ` Mel Gorman
2013-04-02 15:06 ` Theodore Ts'o
2013-04-02 15:06   ` Theodore Ts'o
2013-04-02 15:14   ` Theodore Ts'o
2013-04-02 15:14     ` Theodore Ts'o
2013-04-02 18:19     ` Theodore Ts'o
2013-04-02 18:19       ` Theodore Ts'o
2013-04-07 21:59       ` Frank Ch. Eigler
2013-04-07 21:59         ` Frank Ch. Eigler
2013-04-08  8:36         ` Mel Gorman
2013-04-08  8:36           ` Mel Gorman
2013-04-08 10:52           ` Frank Ch. Eigler
2013-04-08 10:52             ` Frank Ch. Eigler
2013-04-08 11:01         ` Theodore Ts'o
2013-04-08 11:01           ` Theodore Ts'o
2013-04-03 10:19     ` Mel Gorman
2013-04-03 10:19       ` Mel Gorman
2013-04-03 12:05       ` Theodore Ts'o [this message]
2013-04-03 12:05         ` Theodore Ts'o
2013-04-03 15:15         ` Mel Gorman
2013-04-05 22:18       ` Jiri Slaby
2013-04-05 22:18         ` Jiri Slaby
2013-04-05 23:16         ` Theodore Ts'o
2013-04-05 23:16           ` Theodore Ts'o
2013-04-06  7:29           ` Jiri Slaby
2013-04-06  7:29             ` Jiri Slaby
2013-04-06  7:37             ` Jiri Slaby
2013-04-06  7:37               ` Jiri Slaby
2013-04-06  8:19               ` Jiri Slaby
2013-04-06 13:15             ` Theodore Ts'o
2013-04-06 13:15               ` Theodore Ts'o
2013-04-10 10:56   ` Mel Gorman
2013-04-10 10:56     ` Mel Gorman
2013-04-10 13:12     ` Theodore Ts'o
2013-04-10 13:12       ` Theodore Ts'o
2013-04-11 17:04       ` Mel Gorman
2013-04-11 17:04         ` Mel Gorman
2013-04-11 18:35         ` Theodore Ts'o
2013-04-11 18:35           ` Theodore Ts'o
2013-04-11 21:33           ` Jan Kara
2013-04-11 21:33             ` Jan Kara
2013-04-12  2:57             ` Theodore Ts'o
2013-04-12  2:57               ` Theodore Ts'o
2013-04-12  4:50               ` Dave Chinner
2013-04-12  4:50                 ` Dave Chinner
2013-04-12 15:19                 ` Theodore Ts'o
2013-04-12 15:19                   ` Theodore Ts'o
2013-04-13  1:23                   ` Dave Chinner
2013-04-13  1:23                     ` Dave Chinner
2013-04-22 14:38                   ` Mel Gorman
2013-04-22 14:38                     ` Mel Gorman
2013-04-22 22:42                     ` Jeff Moyer
2013-04-22 22:42                       ` Jeff Moyer
2013-04-23  0:02                       ` Theodore Ts'o
2013-04-23  0:02                         ` Theodore Ts'o
2013-04-23  9:31                       ` Jan Kara
2013-04-23  9:31                         ` Jan Kara
2013-04-23 14:01                       ` Mel Gorman
2013-04-23 14:01                         ` Mel Gorman
2013-04-24 19:09                         ` Jeff Moyer
2013-04-24 19:09                           ` Jeff Moyer
2013-04-25 12:21                           ` Mel Gorman
2013-04-25 12:21                             ` Mel Gorman
2013-04-12  9:47               ` Mel Gorman
2013-04-12  9:47                 ` Mel Gorman
2013-04-21  0:05                 ` Theodore Ts'o
2013-04-21  0:05                   ` Theodore Ts'o
2013-04-21  0:07                   ` [PATCH 1/3] ext4: mark all metadata I/O with REQ_META Theodore Ts'o
2013-04-21  0:07                     ` Theodore Ts'o
2013-04-21  0:07                     ` [PATCH 2/3] buffer: add BH_Prio and BH_Meta flags Theodore Ts'o
2013-04-21  0:07                       ` Theodore Ts'o
2013-04-21  0:07                     ` [PATCH 3/3] ext4: mark metadata blocks using bh flags Theodore Ts'o
2013-04-21  0:07                       ` Theodore Ts'o
2013-04-21  6:09                       ` Jiri Slaby
2013-04-21  6:09                         ` Jiri Slaby
2013-04-21  6:09                         ` Jiri Slaby
2013-04-21 19:55                         ` Theodore Ts'o
2013-04-21 19:55                           ` Theodore Ts'o
2013-04-21 19:55                           ` Theodore Ts'o
2013-04-21 20:48                           ` [PATCH 3/3 -v2] " Theodore Ts'o
2013-04-21 20:48                             ` Theodore Ts'o
2013-04-21 20:48                             ` Theodore Ts'o
2013-04-22 12:06                     ` [PATCH 1/3] ext4: mark all metadata I/O with REQ_META Zheng Liu
2013-04-22 12:06                       ` Zheng Liu
2013-04-23 15:33                   ` Excessive stall times on ext4 in 3.9-rc2 Mel Gorman
2013-04-23 15:33                     ` Mel Gorman
2013-04-23 15:50                     ` Theodore Ts'o
2013-04-23 15:50                       ` Theodore Ts'o
2013-04-23 16:13                       ` Mel Gorman
2013-04-23 16:13                         ` Mel Gorman
2013-04-12 10:18               ` Tvrtko Ursulin
2013-04-12 10:18                 ` Tvrtko Ursulin
2013-04-12  9:45           ` Mel Gorman
2013-04-12  9:45             ` Mel Gorman
2013-04-02 23:16 ` Theodore Ts'o
2013-04-02 23:16   ` Theodore Ts'o
2013-04-03 15:22   ` Mel Gorman
2013-04-03 15:22     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130403120529.GA7741@thunk.org \
    --to=tytso@mit.edu \
    --cc=jslaby@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.