From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262613AbVAEVql (ORCPT ); Wed, 5 Jan 2005 16:46:41 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262612AbVAEVqj (ORCPT ); Wed, 5 Jan 2005 16:46:39 -0500 Received: from fw.osdl.org ([65.172.181.6]:28330 "EHLO mail.osdl.org") by vger.kernel.org with ESMTP id S262611AbVAEVp1 (ORCPT ); Wed, 5 Jan 2005 16:45:27 -0500 Date: Wed, 5 Jan 2005 13:44:57 -0800 From: Andrew Morton To: Marcelo Tosatti Cc: riel@redhat.com, andrea@suse.de, linux-kernel@vger.kernel.org Subject: Re: [PATCH][5/?] count writeback pages in nr_scanned Message-Id: <20050105134457.03aca488.akpm@osdl.org> In-Reply-To: <20050105174934.GC15739@logos.cnet> References: <20050105020859.3192a298.akpm@osdl.org> <20050105180651.GD4597@dualathlon.random> <20050105174934.GC15739@logos.cnet> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Marcelo Tosatti wrote: > > On Wed, Jan 05, 2005 at 01:50:51PM -0500, Rik van Riel wrote: > > On Wed, 5 Jan 2005, Andrea Arcangeli wrote: > > > > >Another unrelated problem I have in this same area and that can explain > > >VM troubles at least theoretically, is that blk_congestion_wait is > > >broken by design. First we cannot wait on random I/O not related to > > >write back. Second blk_congestion_wait gets trivially fooled by > > >direct-io for example. Plus the timeout may cause it to return too early > > >with slow blkdev. That's true, as we discussed a couple of months back. But the current code is nice and simple and has been there for a couple of years with no observed problems. > > Or the IO that just finished, finished for pages in > > another memory zone, or pages we won't scan again in > > our current go-around through the VM... > > Thing is there is no distinction between pages which have been written out > for what purpose at the block level. > > One can conjecture the following: per-zone waitqueue to be awakened from > end_page_writeback() (for PG_reclaim pages only of course), and a function > to wait on the perzone waitqueue: > > wait_vm_writeback (zone, timeout); > > Instead of the current blk_congestion_wait() on try_to_free_pages/balance_pgdat. > The caller would need to wait on all the zones which can satisfy the caller's allocation request. A bit messy, although not rocket science. One would have to be careful to avoid additional CPU consumption due to delivery of multiple wakeups at each I/O completion. We should be able to demonstrate that such a change really fixes some problem though. Otherwise, why bother?