All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>,
	"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"Li, Michael" <huayil@qti.qualcomm.com>
Subject: Re: ext4 out of order when use cfq scheduler
Date: Mon, 14 Mar 2016 08:39:28 +0100	[thread overview]
Message-ID: <20160314073928.GD5213@quack.suse.cz> (raw)
In-Reply-To: <20160313042723.GC29218@thunk.org>

On Sat 12-03-16 23:27:23, Ted Tso wrote:
> On Thu, Jan 07, 2016 at 12:47:36PM +0100, Jan Kara wrote:
> > The problem is in all kernels starting with 3.8. Attached is a patch which
> > should fix the issue. Can you test whether it fixes the problem for you?
> 
> Sorry, I missed this patch because it was attached to an discussion
> thread.

I have actually sent this patch in a standalone thread on January 11
(http://lists.openwall.net/linux-ext4/2016/01/11/3) together with one more
cleanup.

> > The problem is that although for delayed allocated blocks we write their
> > contents immediately after allocating them, there is no guarantee that
> > the IO scheduler or device doesn't reorder things
> 
> I don't think that's the problem.  In the commit thread when we call
> blkdev_issue_flush() that acts as a barrier so the I/O scheduler won't
> reorder writes after that point, which is before we write the commit
> block.  Instead, I believe the problem is in ext4_writepages:
> 
> 		ext4_journal_stop(handle);
> 		/* Submit prepared bio */
> 		ext4_io_submit(&mpd.io_submit);
> 
> Once we release the handle, the commit can start --- *before* we have
> a chance to submit the I/O.   Oops.
> 
> I believe if we swap these two calls, it should fix the problem Huang
> was seeing.

No, that won't be enough. blkdev_issue_flush() is not guaranteed to do
anything to IOs which have not reported completion before
blkdev_issue_flush() was called. Specifically, CFQ will queue submitted bio
in its internal RB tree, following flush request completely bypasses this
tree and goes directly to the disk where it flushes caches. And only later
CFQ decides to schedule async writeback from the flusher thread which is
queued in the RB tree...

Note that the behavior has changed to be like this with the flushing
machinery rewrite. Before that, IO scheduler had to drain all the
outstanding IO requests (IO cache flush behaved like IO barrier). So your
patch would be enough with the old flushing machinery but is not enough
since 3.0 or so...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2016-03-14  7:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-22  6:24 ext4 out of order when use cfq scheduler HUANG Weller (CM/EPF1-CN)
2015-12-22 15:00 ` Jan Kara
     [not found]   ` <c67f356b63d94d35ad010a6e987b68f0@SGPMBX1004.APAC.bosch.com>
2016-01-05 15:30     ` Jan Kara
2016-01-06  2:39       ` HUANG Weller (CM/ESW12-CN)
2016-01-06 19:17         ` Andreas Dilger
2016-01-07  6:51           ` HUANG Weller (CM/ESW12-CN)
     [not found]         ` <20160106100621.GA24046@quack.suse.cz>
     [not found]           ` <3ab48fa47e434455b101251730e69bd2@SGPMBX1004.APAC.bosch.com>
2016-01-07 10:24             ` Jan Kara
2016-01-07 11:02               ` HUANG Weller (CM/ESW12-CN)
2016-01-07 11:47                 ` Jan Kara
2016-01-07 12:19                   ` Jan Kara
2016-01-08  2:18                     ` HUANG Weller (CM/ESW12-CN)
2016-01-08  0:46                   ` HUANG Weller (CM/ESW12-CN)
2016-01-11  9:05                   ` HUANG Weller (CM/ESW12-CN)
2016-01-11 10:21                     ` Jan Kara
2016-03-13  4:27                   ` Theodore Ts'o
2016-03-14  2:43                     ` HUANG Weller (CM/ESW12-CN)
2016-03-14  7:39                     ` Jan Kara [this message]
2016-03-14 14:36                       ` Theodore Ts'o
2016-03-15 10:46                         ` Jan Kara
2016-03-15 14:46                           ` Jan Kara
2016-03-15 20:09                             ` Jan Kara
2016-03-16  2:30                               ` HUANG Weller (CM/ESW12-CN)
2016-03-18  9:20                                 ` Jan Kara
2016-06-22 11:55                               ` FW: " HUANG Weller (CM/ESW12-CN)
2016-06-22 13:09                                 ` Jan Kara
2016-03-16  0:41                             ` HUANG Weller (CM/ESW12-CN)
2016-03-24 10:16                             ` HUANG Weller (CM/ESW12-CN)
2016-03-24 12:17                               ` Jan Kara
2016-01-28  8:02 ` Xiong Zhou
2016-02-03  6:08   ` HUANG Weller (CM/ESW12-CN)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160314073928.GD5213@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=Weller.Huang@cn.bosch.com \
    --cc=huayil@qti.qualcomm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.