From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+w=401wt.eu-S1752327AbZC3LpI@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752327AbZC3LpI (ORCPT <rfc822;w@1wt.eu>);
	Mon, 30 Mar 2009 07:45:08 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750922AbZC3Lox
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Mon, 30 Mar 2009 07:44:53 -0400
Received: from acsinet12.oracle.com ([141.146.126.234]:49248 "EHLO
	acsinet12.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750754AbZC3Low (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 30 Mar 2009 07:44:52 -0400
Subject: Re: [PATCH 0/3] Ext3 latency improvement patches
From: Chris Mason <chris.mason@oracle.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Theodore Tso <tytso@mit.edu>, Ric Wheeler <rwheeler@redhat.com>,
       Linux Kernel Developers List <linux-kernel@vger.kernel.org>,
       Ext4 Developers List <linux-ext4@vger.kernel.org>, jack@suse.cz
In-Reply-To: <20090330112330.GA11357@skywalker>
References: <1238185471-31152-1-git-send-email-tytso@mit.edu>
	 <1238187031.27455.212.camel@think.oraclecorp.com>
	 <1238187818.27455.217.camel@think.oraclecorp.com>
	 <20090327213052.GC5176@mit.edu>  <20090330112330.GA11357@skywalker>
Content-Type: text/plain
Date: Mon, 30 Mar 2009 07:44:22 -0400
Message-Id: <1238413462.30488.0.camel@think.oraclecorp.com>
Mime-Version: 1.0
X-Mailer: Evolution 2.24.1 
Content-Transfer-Encoding: 7bit
X-Source-IP: acsmt704.oracle.com [141.146.40.82]
X-Auth-Type: Internal IP
X-CT-RefId: str=0001.0A01020A.49D0B09A.0038:SCFMA4539814,ss=1,fgs=0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, 2009-03-30 at 16:53 +0530, Aneesh Kumar K.V wrote:
> On Fri, Mar 27, 2009 at 05:30:52PM -0400, Theodore Tso wrote:
> > On Fri, Mar 27, 2009 at 05:03:38PM -0400, Chris Mason wrote:
> > > > Ric had asked me about a test program that would show the worst case
> > > > ext3 behavior.  So I've modified your ext3 program a little.  It now
> > > > creates a 8G file and forks off another proc to do random IO to that
> > > > file.
> > > > 
> > > 
> > > My understanding of ext4 delalloc is that once blocks are allocated to
> > > file, we go back to data=ordered.  
> > 
> > Yes, that's correct.
> > 
> > > Ext4 is going pretty slowly for this fsync test (slower than ext3), it
> > > looks like we're going for a very long time in
> > > jbd2_journal_commit_transaction -> write_cache_pages.
> > 
> > One of the things that we can do to optimize this case for ext4 (and
> > ext3) is that if block has already been written out to disk once, we
> > don't have to flush it to disk a second time.  So if we add a new
> > buffer_head flag which can distinguish between blocks that have been
> > newly allocated (and not yet been flushed to disk) versus blocks that
> > have already been flushed to disk at least once, we wouldn't need to
> > force I/O for blocks in the latter case.
> 
> write_cache_pages should only look at pages which are marked dirty right
> ?. So why are we writing these pages again and again ?

The test program is constantly creating new dirty pages to random
offsets on the disk ;)

-chris