linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Linux Kernel Developers List <linux-kernel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [GIT PULL] Ext3 latency fixes
Date: Fri, 3 Apr 2009 15:54:53 -0400	[thread overview]
Message-ID: <20090403195453.GC11661@mit.edu> (raw)
In-Reply-To: <alpine.LFD.2.00.0904031110420.19690@localhost.localdomain>

On Fri, Apr 03, 2009 at 11:24:50AM -0700, Linus Torvalds wrote:
> But at the same time, I now suspect that we could actually have solved 
> this problem more easily by just doing things the other way around: make 
> the default "WRITE" be the high-priority one (to match "READ"), and then 
> just explicitly marking the data writes with "WRITE_ASYNC".
> 
> Why? Because I think that with all the writes sprinkled around in random 
> places, it's probably _easier_ to find the bulk writes that cause the 
> biggest issues, and just fix _those_ to be WRITE_ASYNC. They may be bulk, 
> they may be the common case, but they also tend to be the case where we 
> write with generic routines (eg the whole "do_writepages()" thing).
> 
> So the VFS layer tends to already do much of the bulk writeout, and maybe 
> we would have been better off just changing those to ASYNC and leaving any 
> more specialized cases as the SYNC case? That would have avoided a lot of 
> this effort at the filesystem level. We'd just assume that the default 
> filesystem-specific writes tend to all be SYNC.

Well, most filesystem-specific writes tend all to be ASYNC; it's only
those related to commits and fsync() which are SYNC.  Ext3 is unusual
in that data=ordered and the physical-block journalling design of the
jbd layer means that we actually have a much larger number of blocks
that need to be written out synchronously than most other filesystems.
But even so, the number of callsites that I needed to change weren't
that large; in fact, over half of them weren't in the filesystem at
all, but in the page writeback code, since fsync() and data=ordered
both need to wait for the inodes's pages to be flushed out to disk,
and that's all done in common code.  The other 40% was in the jbd's
commit code, while we are writing out the journal buffers.

I suspect the more important thing to address is the fact that
WRITE_SYNC unplugs the block device queue, and we would be better off
separating marking a particular I/O as "a user is waiting for this"
from "unplug the device queue now".  That will hopefully improve
things even more.

					- Ted

  parent reply	other threads:[~2009-04-03 19:56 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-03  7:01 [GIT PULL] Ext3 latency fixes Theodore Ts'o
2009-04-03  7:01 ` [PATCH 1/4] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
2009-04-03  7:01   ` [PATCH 2/4] ext3: Use WRITE_SYNC for commits which are caused by fsync() Theodore Ts'o
2009-04-03  7:01     ` [PATCH 3/4] ext3: Add replace-on-truncate hueristics for data=writeback mode Theodore Ts'o
2009-04-03  7:01       ` [PATCH 4/4] ext3: Add replace-on-rename " Theodore Ts'o
2009-04-03 18:24 ` [GIT PULL] Ext3 latency fixes Linus Torvalds
2009-04-03 18:47   ` Jens Axboe
2009-04-03 19:13     ` Theodore Tso
2009-04-03 21:01     ` Chris Mason
2009-04-03 19:02   ` Linus Torvalds
2009-04-03 20:41     ` Linus Torvalds
2009-04-04 13:57       ` Theodore Tso
2009-04-04 15:16         ` Jens Axboe
2009-04-04 15:57           ` Linus Torvalds
2009-04-04 16:06             ` Linus Torvalds
2009-04-04 17:36               ` Jens Axboe
2009-04-04 17:34             ` Jens Axboe
2009-04-04 17:44               ` Linus Torvalds
2009-04-04 18:00                 ` Trenton D. Adams
2009-04-04 18:01                 ` Jens Axboe
2009-04-04 18:10                   ` Linus Torvalds
2009-04-04 23:22                   ` Theodore Tso
2009-04-04 23:33                     ` Arjan van de Ven
2009-04-05  0:10                       ` Theodore Tso
2009-04-05 15:05                         ` Arjan van de Ven
2009-04-05 17:01                         ` Linus Torvalds
2009-04-05 17:15                           ` Mark Lord
2009-04-05 20:57                             ` Jeff Garzik
2009-04-05 23:48                               ` Arjan van de Ven
2009-04-06  2:32                                 ` Mark Lord
2009-04-06  5:47                                 ` Jeff Garzik
2009-04-07 18:18                                   ` Linus Torvalds
2009-04-07 18:22                                     ` Linus Torvalds
2009-04-07 19:40                                     ` [PATCH libata: add SSD detection hueristic; move SSD setup to ata_dev_configure (was Re: [GIT PULL] Ext3 latency fixes) Jeff Garzik
2009-04-09 18:21                                       ` Tejun Heo
2009-04-18  3:02                                         ` George Spelvin
2009-04-06  8:13                             ` [GIT PULL] Ext3 latency fixes Jens Axboe
2009-04-05 18:56                           ` Arjan van de Ven
2009-04-05 19:34                             ` Linus Torvalds
2009-04-05 20:06                               ` Arjan van de Ven
2009-04-06  6:25                               ` Jens Axboe
2009-04-06  6:05                           ` Theodore Tso
2009-04-06  6:23                           ` Jens Axboe
2009-04-06  8:16                       ` Jens Axboe
2009-04-06 14:48                         ` Linus Torvalds
2009-04-06 15:09                           ` Jens Axboe
2009-04-06  6:15                     ` Jens Axboe
2009-04-04 20:18               ` Ingo Molnar
2009-04-06 21:50                 ` Lennart Sorensen
2009-04-07 13:31                   ` Mark Lord
2009-04-07 14:48                     ` Lennart Sorensen
2009-04-07 19:21                       ` Mark Lord
2009-04-07 19:57                         ` Lennart Sorensen
2009-04-04 20:56               ` Arjan van de Ven
2009-04-06  7:06                 ` Jens Axboe
2009-04-07 15:39             ` Indan Zupancic
2009-04-04 19:18           ` Theodore Tso
2009-04-06  8:12             ` Jens Axboe
2009-04-04 22:13         ` Linus Torvalds
2009-04-04 22:19           ` Linus Torvalds
2009-04-05  0:20           ` Theodore Tso
2009-04-03 19:54   ` Theodore Tso [this message]
2009-04-08 23:40 Theodore Ts'o
2009-04-09 15:49 ` Linus Torvalds
2009-04-09 16:23   ` Chris Mason
2009-04-09 17:49     ` Jan Kara
2009-04-09 18:10       ` Chris Mason
2009-04-09 19:04         ` Jan Kara
2009-04-09 17:36   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090403195453.GC11661@mit.edu \
    --to=tytso@mit.edu \
    --cc=jens.axboe@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).