linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: Alexander Viro <viro@math.psu.edu>
Cc: Andreas Dilger <adilger@turbolabs.com>,
	ext2-devel@lists.sourceforge.net,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [Ext2-devel] ext2/ialloc.c cleanup
Date: Thu, 08 Nov 2001 22:15:00 -0800	[thread overview]
Message-ID: <3BEB7464.245FBB23@zip.com.au> (raw)
In-Reply-To: <20011108154311.E9043@lynx.no> <Pine.GSO.4.21.0111081802250.8052-100000@weyl.math.psu.edu>

Thanks, Al.

First a couple of comments on the patch (looks nice, BTW):

/*
 * Orlov's allocator for directories.
 *
 * We always try to spread first-level directories:
 *      If there are directories with both free inodes and free blocks counts
                     ^^^^^^^^^^^
                               cylinder groups
 *      not worse than average we return one with smallest directory count.

(I agree with Andreas on this one.  Why switch terminology?)




                get_random_bytes(&group, sizeof(group));
                parent_cg = group % ngroups;
                goto fallback;

AFAICT, get_random_bytes() here can set `group' to a negative
value, and parent_cg can go negative, and that propagates to
`group' going negative, and getting passed to ext2_get_group_desc(),
and everything goes generally pear-shaped.  Really, all this arith
should be using unsigneds.



>From here:

        max_dirs = ndirs / ngroups + inodes_per_group / 16;
        min_inodes = avefreei - inodes_per_group / 4;
        min_blocks = avefreeb - EXT2_BLOCKS_PER_GROUP(sb) / 4;

things start to get a bit confusing.  A couple of 1-2 line comments
which explain what the variables actually represent would help to
clarify things.  Also, an explanation of `debt' is needed.



Offtopic, in ext2_new_inode():

        mark_buffer_dirty(bh);
        if (sb->s_flags & MS_SYNCHRONOUS) {
                ll_rw_block (WRITE, 1, &bh);
                wait_on_buffer (bh);
        }

Does the inode bitmap writeout actually need to be synchronous
here?  The file will still be recoverable by fsck if this is
not done?  If the sync _is_ needed, shouldn't we be doing it with
the group descriptors?


Finally, please, please, please take the opportunity to rename
`bh', `bh2', `i' and `j' in ext2_new_inode() to something
semantically meaningful.  What we have now is just insane.


We need to test carefully with ENOSPC, but it looks solid.


Performance-wise, the Orlov approach is almost as good as
the `if (0 &&' approach for fast growth.  This is the "manipulate
kernel trees on an empty partition" test:

		Stock	Patched	Orlov
				
untar		31	14	14
diff		184	72	82.6
tar		15	3	3
rm		30	10	10.3

So the diffing was a bit slower; not much.


For the slow growth test, same as last time (cut-n-paste
from the very excellent staroffice 6 spreadsheet):

Tree	Stock	Stock/ideal	Patched	Patched/stock	Orlov	Orlov/ideal
	(secs)			(secs)			(secs)	
0	37	2.85		74	200.00%		63	4.85
1	41	3.15		89	217.07%		68	5.23
2	41	3.15		97	236.59%		74	5.69
3	38	2.92		97	255.26%		84	6.46
4	55	4.23		102	185.45%		78	6
5	51	3.92		98	192.16%		76	5.85
6	72	5.54		94	130.56%		73	5.62
7	56	4.31		96	171.43%		71	5.46
8	64	4.92		102	159.38%		73	5.62
9	63	4.85		100	158.73%		71	5.46
10	57	4.38		95	166.67%		74	5.69
11	83	6.38		102	122.89%		78	6
12	54	4.15		101	187.04%		76	5.85
13	82	6.31		104	126.83%		78	6
14	89	6.85		103	115.73%		77	5.92
15	88	6.77		95	107.95%		77	5.92
16	106	8.15		99	93.40%		77	5.92


We see that Orlov is more prone to fragmentation than stock 2.4.14: The
time to read the first batch of 378 megs of files is 37 seconds with
2.4.14, 63 seconds with Orlov.  But it's better than the `if (0 && '
approach.


So I just don't know at this stage.  Even after a single pass of the Smith
workload, we're running at 3x to 5x worse than ideal.  If that's simply
the best we can do, then we need to compare stock 2.4.14 with Orlov
partway through the progress of the Smith workload to evaluate how much
more serious the fragmentation is.   That's easy enough - I'll do it.


The next task is to work out why a single pass of the Smith workload
fragments the filesystem so much, and whether this can be improved.


Oh.  And some preliminary testing with SCSI shows markedly different
behaviour: those 3x to 5x speedups I was demonstrating with IDE are
only 1.5x with a Quantum Atlas IV.  There is some magical characteristic
of modern IDE disks which stock 2.4.14 is failing to exploit, and I
don't yet have a handle on precisely what it is.

-

  reply	other threads:[~2001-11-09  6:20 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-11-05  2:13 disk throughput Andrew Morton
2001-11-05  3:20 ` Mohammad A. Haque
2001-11-05  3:31   ` Andrew Morton
2001-11-05  3:32 ` [Ext2-devel] " Mike Fedyk
2001-11-05  3:45   ` Andrew Morton
2001-11-05  4:39     ` Mike Fedyk
2001-11-05  7:06     ` Jens Axboe
2001-11-05  7:14       ` Andrew Morton
2001-11-05  7:26         ` Jens Axboe
2001-11-05  7:14       ` Mike Fedyk
2001-11-05  7:18         ` Jens Axboe
2001-11-05  7:18       ` Jens Axboe
2001-11-05  9:14         ` Mike Fedyk
2001-11-05  9:20           ` Jens Axboe
2001-11-05  5:54   ` Albert D. Cahalan
2001-11-05  8:04     ` Andrew Morton
2001-11-05 12:28       ` Matthias Andree
2001-11-05 14:23       ` Alexander Viro
2001-11-05 22:22         ` Andrew Morton
2001-11-05 22:41           ` Andreas Dilger
2001-11-05 22:53             ` Andrew Morton
2001-11-08 15:28               ` Constantin Loizides
2001-11-05 23:14             ` Dan Hollis
2001-11-06 10:52           ` Daniel Phillips
2001-11-06 16:17           ` Jeremy Fitzhardinge
2001-11-08 15:24             ` Constantin Loizides
2001-11-08 16:46             ` Jeremy Fitzhardinge
2001-11-09  6:08               ` Andrew Morton
2001-11-09  8:49               ` Jeremy Fitzhardinge
2001-11-06 21:45           ` Stephen Tweedie
2001-11-05 20:16       ` Andreas Dilger
2001-11-05 20:28         ` m
2001-11-05 21:39           ` Andrew Morton
2001-11-05 22:59             ` Linus Torvalds
2001-11-05 23:36               ` Alexander Viro
2001-11-05 23:50                 ` Linus Torvalds
2001-11-06  0:03                   ` Linus Torvalds
2001-11-06  1:33                     ` Alexander Viro
2001-11-06  2:10                       ` Linus Torvalds
2001-11-06  3:02                         ` Alexander Viro
2001-11-06  8:39                           ` Alan Cox
2001-11-06  8:37                             ` Alexander Viro
2001-11-06  8:48                               ` Andrew Morton
2001-11-06  3:49                         ` Alexander Viro
2001-11-06  4:01                           ` Linus Torvalds
2001-11-06  4:21                             ` Alexander Viro
2001-11-06  5:01                               ` Linus Torvalds
2001-11-06  5:31                                 ` Andrew Morton
2001-11-06  5:48                                   ` Linus Torvalds
2001-11-06  7:34                                     ` Mike Castle
2001-11-06  7:10                                   ` Kai Henningsen
2001-11-09 22:35                       ` Riley Williams
2001-11-06  1:28                   ` Alexander Viro
2001-11-06  9:16                     ` Wojtek Pilorz
2001-11-06  9:58                       ` Alexander Viro
2001-11-08 12:51                   ` Pavel Machek
2001-11-06 21:48           ` Stephen Tweedie
2001-11-06 23:17             ` ext2/ialloc.c cleanup Alexander Viro
2001-11-07 19:34               ` [Ext2-devel] " Andreas Dilger
2001-11-07 20:02                 ` Alexander Viro
2001-11-08  2:06                   ` Andrew Morton
2001-11-08 20:45                     ` Andrew Morton
2001-11-08 22:16                       ` Alexander Viro
2001-11-08 22:43                         ` Andreas Dilger
2001-11-08 23:08                           ` Alexander Viro
2001-11-09  6:15                             ` Andrew Morton [this message]
2001-11-09  6:56                               ` Andreas Dilger
2001-11-09  7:09                                 ` Andrew Morton
2001-11-09  7:12                                 ` Alexander Viro
2001-11-09  7:18                                   ` Andrew Morton
2001-11-05  9:45     ` [Ext2-devel] disk throughput Alex Bligh - linux-kernel
2001-11-05  9:58       ` Alex Bligh - linux-kernel
2001-11-05  8:47 ` Jan Kara
2001-11-05  8:50   ` [Ext2-devel] " Mike Fedyk
2001-11-05  9:01     ` Jan Kara
2001-11-05 12:23 ` Matthias Andree
2001-11-05 22:39   ` Andrew Morton
2001-11-05 23:41     ` Matthias Andree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3BEB7464.245FBB23@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=adilger@turbolabs.com \
    --cc=ext2-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@math.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).