From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andreas Dilger <adilger@sun.com>
Subject: Re: large fs testing
Date: Tue, 26 May 2009 15:21:32 -0600
Message-ID: <20090526212132.GE3218@webber.adilger.int>
References: <4A17FFD8.80401@redhat.com>
 <5971.1243359565@gamaville.dokosmarshall.org> <4A1C2B40.30102@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; CHARSET=US-ASCII
Content-Transfer-Encoding: 7BIT
Cc: nicholas.dokos@hp.com, linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Douglas Shakshober <dshaks@redhat.com>,
	Joshua Giles <jgiles@redhat.com>,
	Valerie Aurora <vaurora@redhat.com>,
	Eric Sandeen <esandeen@redhat.com>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Edward Shishkin <edward@redhat.com>,
	Josef Bacik <jbacik@redhat.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	Chris Mason <chris.mason@oracle.com>,
	"Whitney, Eric" <eric.whitney@hp.com>, Theodore Tso <tytso@mit.edu>
To: Ric Wheeler <rwheeler@redhat.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from sca-es-mail-1.Sun.COM ([192.18.43.132]:38483 "EHLO
	sca-es-mail-1.sun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753001AbZEZVV6 (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 26 May 2009 17:21:58 -0400
Received: from fe-sfbay-09.sun.com ([192.18.43.129])
	by sca-es-mail-1.sun.com (8.13.7+Sun/8.12.9) with ESMTP id n4QLLnAj027537
	for <linux-fsdevel@vger.kernel.org>; Tue, 26 May 2009 14:21:51 -0700 (PDT)
Content-disposition: inline
Received: from conversion-daemon.fe-sfbay-09.sun.com by fe-sfbay-09.sun.com
 (Sun Java(tm) System Messaging Server 7u2-7.02 64bit (built Apr 16 2009))
 id <0KK900M00SEKB700@fe-sfbay-09.sun.com> for linux-fsdevel@vger.kernel.org;
 Tue, 26 May 2009 14:21:49 -0700 (PDT)
In-reply-to: <4A1C2B40.30102@redhat.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On May 26, 2009  13:47 -0400, Ric Wheeler wrote:
> These runs were without lazy init, so I would expect to be a little more 
> than twice as slow as your second run (not the three times I saw) 
> assuming that it scales linearly.

Making lazy_itable_init the default formatting option for ext4 is/was
dependent upon the kernel doing the zeroing of the inode table blocks
at first mount time.  I'm not sure if that was implemented yet.

> This run was with limited DRAM on the 
> box (6GB) and only a single HBA, but I am afraid that I did not get any 
> good insight into what was the bottleneck during my runs.

For a very large array (80TB) this could be 1TB or more of inode tables
that are being zeroed out at format time.  After 64TB the default mke2fs
options will cap out at 4B inodes in the filesystem.  1TB/90min ~= 200MB/s
so this is probably your bottleneck.

> Do you have any access to even larger storage, say the mythical 100TB :-) 
> ? Any insight on interesting workloads?

I would definitely be most interested in e2fsck performance at this scale
(RAM usage and elapsed time) because this will in the end be the defining
limit on how large a usable filesystem can actually be in practise.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.