From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3D2530B9.8BC0C0AE@zip.com.au> Date: Thu, 04 Jul 2002 22:38:01 -0700 From: Andrew Morton MIME-Version: 1.0 Subject: Re: vm lock contention reduction References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Linus Torvalds Cc: Rik van Riel , Andrea Arcangeli , "linux-mm@kvack.org" List-ID: Linus Torvalds wrote: > > ... > Think batching. It's _more_ efficient to batch stuff than it is to try to > switch back and forth between working and waiting as quickly as you can. Yup. I've been moaning about this for months. Trivial example: run `vmstat 1' and then start pounding the disk. vmstat will exhibit very long pauses when *clearly* thousands of pages are coming clean every second. Unreasonably long pauses. Sometimes in get_request_wait(), sometimes in shrink_cache->wait_on_page/buffer. We should be giving some of those pages to vmstat more promptly. After all, that process is not a heavy allocator of pages. > So don't just nod your heads when you see something that sounds sane. > Think critically. And the critical thinking says: > > - you should wait the _maximum_ amount that > (a) is fair > (b) doesn't introduce bad latency issues > (c) still allows overlap of IO and processing > > Get away from this "minimum wait" thing, because it is WRONG. Well yes, we do want to batch work up. And a crude way of doing that is "each time 64 pages have come clean, wake up one waiter". Or "as soon as the number of reclaimable pages exceeds zone->pages_min". Some logic would also be needed to prevent new page allocators from jumping the queue, of course. We're still throttling on I/O, but we're throttling against *any* I/O, and not a single randomly-chosen disk block. This scheme is more fair - processes which are allocating more pages get to wait more. > Try to shoot me down, but do so with real logic and real arguments, not > some fuzzy feeling about "we shouldn't wait unless we have to". We _do_ > have to wait. Sure, page allocators must throttle their allocation rate to that at which the IO system can retire writes. But by waiting on a randomly-chosen disk block, we're at the mercy of the elevator. If you happen to choose a page whose blocks are at the far side of the disk, you lose. There could easily be 100 megabytes of reclaimable memory by the time you start running again. We can fit 256 seeks into the request queue. That's 1-2 seconds. I started developing a dumb prototype of this a while back, but it locks up. I'll dust it off and get it going as a "technology demonstration". - -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/