From: Theodore Tso <tytso@mit.edu>
To: Ric Wheeler <rwheeler@redhat.com>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 2/2] ext4: Automatically enable journal_async_commit on ext4 file systems
Date: Thu, 10 Sep 2009 22:45:05 -0400 [thread overview]
Message-ID: <20090911024505.GA9363@mit.edu> (raw)
In-Reply-To: <4AA6450B.9040001@redhat.com>
On Tue, Sep 08, 2009 at 07:50:35AM -0400, Ric Wheeler wrote:
>>
>> So here's what we do on a non-async commit:
>>
>> This is what we do with an async commit:
>>
>> That's the only difference at this point. The fatal flaw with async
>> commit from before was this that we weren't writing the commit block
>> in step (2) with a barrier --- and that *was* disastrous, since it
>> meant the equivalent of mounting with barrier=0.
>
> I think that the difference is basically that in the original mode,
> waiting for stage (2) to finish means that our commit block will never
> hit the storage before the dependent data is committed. Remember that
> barriers are actually 2 CACHE_FLUSH_EXT commands - one before the
> flagged barrier IO is issued and one afterwards.
I didn't realize that doing an ordered write meant that we had a
barrier *before* and *after* the commit block; I didn't realiuze it
was quite that strong. I thought an ordered write only put a barrier
*after* the commit block. Looking more closely, you're right, and
that actually explains why I wasn't see that much of a difference with
and without journal_async_write.
The fact that an ordered write puts barriers before and after the
commit means that right now the two scenarios above are in fact
*identical*.
So here's a respin of the fix-async-journal patch that changes what we
do from:
1) Write the journal data, revoke, and descriptor blocks
2) Wait for the block I/O layer to signal that all of these blocks
have been written out --- *without* a barrier
3) Write the commit block in ordered mode
4) Wait for the I/O to commit block to be done
To this (in journal_async_commit):
1) Write the journal data, revoke, and descriptor blocks
2) Write the commit block (with a checksum) without setting ordered mode
3) Send an empty barrier bio (so we only send a *single* CACHE_FLUSH_EXT)
4) Wait for the I/O to in steps (1) and (2) to be done
This *does* show significant improvements:
Using ./fs_mark -d /mnt -s 10240 -n 1000
W/o journal_async_commit:
FSUse% Count Size Files/sec App Overhead
8 1000 10240 30.5 28242
w/ journal_async_commit:
8 1000 10240 45.8 28620
w/ barrier=0
8 1000 10240 320.0 27699
Since this patch is a bit more complicated, I'll hold off on making it
be the default for now, but if the testing goes well, I plan to make
it default in the next kernel release, since an increase of 50% of
fs_mark is something I think we all would agree counts as a "clear
performance advantage". :-)
- Ted
commit fd67d1cfd73f554bae6c37745222eac2723983c8
Author: Theodore Ts'o <tytso@mit.edu>
Date: Thu Sep 10 22:34:27 2009 -0400
ext4: Fix async commit mode to be safe by using a barrier
Previously the journal_async_commit mount option was equivalent to
using barrier=0 (and just as unsafe). This patch fixes it so that we
eliminate the barrier before the commit block (by not using ordered
mode), and explicitly issuing an empty barrier bio after writing the
commit block. Because of the journal checksum, it is safe to do this;
if the journal blocks are not all written before a power failure, the
checksum in the commit block will prevent the last transaction from
being replayed.
Using the fs_mark benchmark, using journal_async_commit shows a 50%
improvement:
FSUse% Count Size Files/sec App Overhead
8 1000 10240 30.5 28242
vs.
FSUse% Count Size Files/sec App Overhead
8 1000 10240 45.8 28620
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 7b4088b..d6f4763 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -25,6 +25,7 @@
#include <linux/writeback.h>
#include <linux/backing-dev.h>
#include <linux/bio.h>
+#include <linux/blkdev.h>
#include <trace/events/jbd2.h>
/*
@@ -83,6 +84,34 @@ nope:
__brelse(bh);
}
+static void end_empty_barrier(struct bio *bio, int err)
+{
+ if (err) {
+ if (err == -EOPNOTSUPP)
+ set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
+ clear_bit(BIO_UPTODATE, &bio->bi_flags);
+ }
+ complete(bio->bi_private);
+}
+
+struct bio *issue_flush(struct block_device *bdev, struct completion *wait)
+{
+
+ struct bio *bio;
+
+ if (!bdev->bd_disk || !bdev->bd_disk->queue)
+ return NULL;
+
+ bio = bio_alloc(GFP_KERNEL, 0);
+ if (!bio)
+ return NULL;
+ bio->bi_end_io = end_empty_barrier;
+ bio->bi_private = wait;
+ bio->bi_bdev = bdev;
+ submit_bio(WRITE_BARRIER, bio);
+ return bio;
+}
+
/*
* Done it all: now submit the commit record. We should have
* cleaned up our previous buffers by now, so if we are in abort
@@ -133,8 +162,8 @@ static int journal_submit_commit_record(journal_t *journal,
bh->b_end_io = journal_end_buffer_io_sync;
if (journal->j_flags & JBD2_BARRIER &&
- !JBD2_HAS_INCOMPAT_FEATURE(journal,
- JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
+ !JBD2_HAS_INCOMPAT_FEATURE(journal,
+ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
set_buffer_ordered(bh);
barrier_done = 1;
}
@@ -352,6 +381,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
transaction_t *commit_transaction;
struct journal_head *jh, *new_jh, *descriptor;
struct buffer_head **wbuf = journal->j_wbuf;
+ struct bio *bio_flush = NULL;
+ DECLARE_COMPLETION_ONSTACK(wait_flush);
int bufs;
int flags;
int err;
@@ -707,11 +738,13 @@ start_journal_io:
/* Done it all: now write the commit record asynchronously. */
if (JBD2_HAS_INCOMPAT_FEATURE(journal,
- JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
+ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
err = journal_submit_commit_record(journal, commit_transaction,
&cbh, crc32_sum);
if (err)
__jbd2_journal_abort_hard(journal);
+ if (journal->j_flags & JBD2_BARRIER)
+ bio_flush = issue_flush(journal->j_dev, &wait_flush);
}
/*
@@ -833,8 +866,13 @@ wait_for_iobuf:
jbd_debug(3, "JBD: commit phase 5\n");
- if (!JBD2_HAS_INCOMPAT_FEATURE(journal,
- JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
+ if (JBD2_HAS_INCOMPAT_FEATURE(journal,
+ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
+ if (bio_flush) {
+ wait_for_completion(&wait_flush);
+ bio_put(bio_flush);
+ }
+ } else {
err = journal_submit_commit_record(journal, commit_transaction,
&cbh, crc32_sum);
if (err)
next prev parent reply other threads:[~2009-09-11 2:45 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-05 22:32 [PATCH 1/2] ext4: Remove journal_checksum mount option and enable it by default Theodore Ts'o
2009-09-05 22:32 ` [PATCH 2/2] ext4: Automatically enable journal_async_commit on ext4 file systems Theodore Ts'o
2009-09-05 22:57 ` Andreas Dilger
2009-09-06 1:32 ` Theodore Tso
2009-09-06 2:57 ` Eric Sandeen
2009-09-07 23:48 ` Ric Wheeler
2009-09-07 23:42 ` Ric Wheeler
2009-09-08 4:45 ` Theodore Tso
2009-09-08 11:50 ` Ric Wheeler
2009-09-11 2:45 ` Theodore Tso [this message]
2009-09-11 11:07 ` Ric Wheeler
2009-09-11 13:13 ` Theodore Tso
2009-09-11 14:39 ` Ric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090911024505.GA9363@mit.edu \
--to=tytso@mit.edu \
--cc=linux-ext4@vger.kernel.org \
--cc=rwheeler@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.