From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S270030AbTGMASn (ORCPT ); Sat, 12 Jul 2003 20:18:43 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S270039AbTGMASn (ORCPT ); Sat, 12 Jul 2003 20:18:43 -0400 Received: from ppp-217-133-42-200.cust-adsl.tiscali.it ([217.133.42.200]:3712 "EHLO dualathlon.random") by vger.kernel.org with ESMTP id S270030AbTGMASl (ORCPT ); Sat, 12 Jul 2003 20:18:41 -0400 Date: Sun, 13 Jul 2003 02:33:07 +0200 From: Andrea Arcangeli To: Chris Mason Cc: Jens Axboe , Marcelo Tosatti , lkml , "Stephen C. Tweedie" , Alan Cox , Jeff Garzik , Andrew Morton , Alexander Viro Subject: Re: RFC on io-stalls patch Message-ID: <20030713003307.GH16313@dualathlon.random> References: <20030710135747.GT825@suse.de> <1057932804.13313.58.camel@tiny.suse.com> <20030712073710.GK843@suse.de> <1058034751.13318.95.camel@tiny.suse.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1058034751.13318.95.camel@tiny.suse.com> User-Agent: Mutt/1.4i X-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 X-PGP-Key: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Sat, Jul 12, 2003 at 02:32:32PM -0400, Chris Mason wrote: > Anyway, if you've got doubts about the current patch, I'd be happy to > run a specific benchmark you think will show the bad read > characteristics. the main reason I dropped the two queues in elevator-lowatency is that the ability of calling schedule just once only for the I/O completion with reads was a minor thing compared having to wait a potential 32M queue to be flushed to disk before being able to read a byte from disk. So I didn't want to complicate the code with minor things, while fixing what I considered the major issue (i.e. the overkill queue size with contigous writes, and too small queue size while seeking). I already attacked that problem years ago with the max_bomb_segments (the dead ioctl ;). You know, at that time I attacked the problem from the wrong side: rather than limting the mbytes in the queue like elevator-lowatency does, I enforced a max limit on the single request size, because I didn't have an idea of how critical it is to get 512k requests for each SG DMA operation in any actual storage (the same mistake that the anticipatory scheduler developers did when they thought anticipatory scheduler could in any way obviate the need of very aggressive readahead). elevator-lowlatency is the max_bomb thing in a way that doesn't hurt contigous I/O throughput at all, with very similar latency benefits. Furthmore elevator-lowatency allowed me to grow a lot the number of requests without killing down a box with gigabytes large I/O queues, so now in presence of heavily-seeking 512bytes per-request (the opposite of 512k per request with contigous I/O) many more requests can sit in the elevator in turn allowing a more aggressive reordering of requests during seeking. That should result in a performance improvement when seeking (besides the fariness improvement under a flood of contigous I/O). Having two queues, could allow a reader to sleep just once, while this way it also has to wait for batch_sectors before being able to queue. So I think what it could do is basically only a cpu saving thing (one less schedule) and I doubt it would be noticeable. And in general I don't like too much assumptions that considers reads different than writes, writes can be latency critical too with fsync (especially with journaling). Infact if it was just the sync-reads that we cared about I think read-latency2 from Andrew would already work well and it's much less invasive, but I consider that a dirty hack compared to the elevator-lowlatency that fixes stuff for sync writes too, as well as sync reads, without assuming special workloads. read-latency2 basically makes a very hardcoded assumption that writes aren't latency critical and it tries to hide the effect of the overkill size of the queue, by simply putting all reads near the head of the queue, no matter if the queue size is 1m or 10m or 100m. The whole point of elevator-lowlatency is not to add read-hacks that assumes only reads are latency critical. and of course an overkill size of the queue isn't good for the VM system too, since that memory is unfreeable and it could render much harder for the VM to be nice with apps allocating ram during write throttling etc.. Overall I'm not against resurrecting the specialized read queue, after all writes also gets a special queue (so one could claim that's an optimization for sync-writes not sync-reads, it's very symmetric ;), but conceptually I don't find it very worthwhile. So I would prefer to have a benchmark as you suggested, before complicating things in mainline (and as you can see now it's a bit more tricky to retain the two queues ;). Andrea