From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ric Wheeler <rwheeler@redhat.com>
Subject: large fs testing
Date: Sat, 23 May 2009 09:53:28 -0400
Message-ID: <4A17FFD8.80401@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Christoph Hellwig <hch@infradead.org>,
	Douglas Shakshober <dshaks@redhat.com>,
	Joshua Giles <jgiles@redhat.com>,
	Valerie Aurora <vaurora@redhat.com>,
	Eric Sandeen <esandeen@redhat.com>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Edward Shishkin <edward@redhat.com>,
	Josef Bacik <jbacik@redhat.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	Chris Mason <chris.mason@oracle.com>,
	"Whitney, Eric" <eric.whitney@hp.com>, Theodore Tso <tytso@mit.edu>
To: linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:49946 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751963AbZEWNyq (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Sat, 23 May 2009 09:54:46 -0400
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Jeff Moyer & I have been working with EMC elab over the last week or so testing 
ext4, xfs and gfs2 at roughly 80TB striped across a set of 12TB LUNs (single 
server, 6GB of DRAM, 2 quad core HT enabled CPU's).

The goal of the testing is (in decreasing priority) is to validate Val's 64 bit 
patches for ext4 e2fsprogs, do a very quick sanity check that XFS does indeed 
scale as well as I hear (and it has so far :-)) and to test gfs2 tools at that 
high capacity. Not enough time to get it all done and significant fumbling on my 
part made it go even slower.

Never the less, I have come to a rough idea of what a useful benchmark would be. 
If this sounds sane to all, I would like to try and put something together that 
we could provide to places like the EMC people who have large storage 
occasionally, are not kernel hackers, but would be willing to test for us. It 
will need to be fairly bullet proof and avoid doing performance numbers on the 
storage for normal things I assume (to avoid leaking competitive benchmarks out).

Motivation - all things being equal, users benefit from having all storage 
consumed by one massive file system since that single file system manages space 
allocation, avoids seekiness, etc (something that applications have to do 
manually when using sets of file systems, the current state of the art for ext3 
for example).

The challenges are:

(1) object count - how many files can you pack into that file system with 
reasonable performance? (The test to date filled the single ext4 fs with 207 
million 20KB files)

(2) files per directory - how many files per directory?

(3) FS creation time - can you create a file system in reasonable time? 
(mkfs.xfs took seconds, mkfs.ext4 took 90 minutes). I think that 90 minutes is 
definitely on the painful side, but usable for most.

(4) FS check time at a given fill rate for a healthy device (no IO errors). 
Testing at empty, 25%, 50%, 75% and 95% and full would all be interesting. Can 
you run these checks with a reasonable amount of DRAM - if not, what guidance do 
we need to give to customers on how big the servers need to be?

It would seem to be a nice goal to be able to fsck a file system in one working 
day - say 8 hours - so that you could get a customer back on their feet, but 
maybe 24 hours would be an outside goal?

(5) Write rate as the fs fills (picking the same set of fill rates?)

To make is some how a tractable problem, I wanted to define small (20KB), medium 
(MP3 sized, say 4MB) and large (video sized, 4GB?) files to do the test with. I 
used fs_mark (no fsync's and 256 directories) to fill the file system (at least 
until my patience/time ran out!). With these options, it still hits very high 
file/directory counts (I am thinking about tweaking fs_mark to dynamically 
create a time based directory scheme, something like day/hour/min and giving it 
an option to stop at a specified fill rate).

Sorry for the long ramble, I was curious to see if this makes sense to the 
broader set of you all & if you have had any similar experiences to share.

Thanks!

Ric