From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757314Ab0FNXXS (ORCPT ); Mon, 14 Jun 2010 19:23:18 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:47949 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753764Ab0FNXXR (ORCPT ); Mon, 14 Jun 2010 19:23:17 -0400 Date: Mon, 14 Jun 2010 16:21:43 -0700 From: Andrew Morton To: Dave Chinner Cc: Mel Gorman , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Chris Mason , Nick Piggin , Rik van Riel , Johannes Weiner , Christoph Hellwig , KAMEZAWA Hiroyuki Subject: Re: [PATCH 11/12] vmscan: Write out dirty pages in batch Message-Id: <20100614162143.04783749.akpm@linux-foundation.org> In-Reply-To: <20100614231144.GG6590@dastard> References: <1276514273-27693-1-git-send-email-mel@csn.ul.ie> <1276514273-27693-12-git-send-email-mel@csn.ul.ie> <20100614231144.GG6590@dastard> X-Mailer: Sylpheed 2.4.8 (GTK+ 2.12.9; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 15 Jun 2010 09:11:44 +1000 Dave Chinner wrote: > On Mon, Jun 14, 2010 at 12:17:52PM +0100, Mel Gorman wrote: > > Page reclaim cleans individual pages using a_ops->writepage() because from > > the VM perspective, it is known that pages in a particular zone must be freed > > soon, it considers the target page to be the oldest and it does not want > > to wait while background flushers cleans other pages. From a filesystem > > perspective this is extremely inefficient as it generates a very seeky > > IO pattern leading to the perverse situation where it can take longer to > > clean all dirty pages than it would have otherwise. > > > > This patch queues all dirty pages at once to maximise the chances that > > the write requests get merged efficiently. It also makes the next patch > > that avoids writeout from direct reclaim more straight-forward. > > Seeing as you have a list of pages for IO, perhaps they could be sorted > before issuing ->writepage on them. > > That is, while this patch issues all the IO in one hit, it doesn't > change the order in which the IO is issued - it is still issued in > LRU order. Given that they are issued in a short period of time now, > rather than across a longer scan period, it is likely that it will > not be any faster as: > > a) IO will not be started as soon, and > b) the IO scheduler still only has a small re-ordering > window and will choke just as much on random IO patterns. > > However, there is a list_sort() function that could be used to sort > the list; sorting the list of pages by mapping and page->index > within the mapping would result in all the pages on each mapping > being sent down in ascending offset order at once - exactly how the > filesystems want IO to be sent to it. Perhaps this is a simple > improvement that can be made to this code that will make a big > difference to worst case performance. > > FWIW, I did this for delayed metadata buffer writeback in XFS > recently (i.e. sort the queue of (potentially tens of thousands of) > buffers in ascending block order before dispatch) and that showed a > 10-15% reduction in seeks on simple kernel compile workloads. This > shows that if we optimise IO patterns at higher layers where the > sort window is much, much larger than in the IO scheduler, then > overall system performance improves.... Yup. But then, this all really should be done at the block layer so other io-submitting-paths can benefit from it. IOW, maybe "the sort queue is the submission queue" wasn't a good idea. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Subject: Re: [PATCH 11/12] vmscan: Write out dirty pages in batch Date: Mon, 14 Jun 2010 16:21:43 -0700 Message-ID: <20100614162143.04783749.akpm@linux-foundation.org> References: <1276514273-27693-1-git-send-email-mel@csn.ul.ie> <1276514273-27693-12-git-send-email-mel@csn.ul.ie> <20100614231144.GG6590@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Mel Gorman , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Chris Mason , Nick Piggin , Rik van Riel , Johannes Weiner , Christoph Hellwig , KAMEZAWA Hiroyuki To: Dave Chinner Return-path: In-Reply-To: <20100614231144.GG6590@dastard> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Tue, 15 Jun 2010 09:11:44 +1000 Dave Chinner wrote: > On Mon, Jun 14, 2010 at 12:17:52PM +0100, Mel Gorman wrote: > > Page reclaim cleans individual pages using a_ops->writepage() because from > > the VM perspective, it is known that pages in a particular zone must be freed > > soon, it considers the target page to be the oldest and it does not want > > to wait while background flushers cleans other pages. From a filesystem > > perspective this is extremely inefficient as it generates a very seeky > > IO pattern leading to the perverse situation where it can take longer to > > clean all dirty pages than it would have otherwise. > > > > This patch queues all dirty pages at once to maximise the chances that > > the write requests get merged efficiently. It also makes the next patch > > that avoids writeout from direct reclaim more straight-forward. > > Seeing as you have a list of pages for IO, perhaps they could be sorted > before issuing ->writepage on them. > > That is, while this patch issues all the IO in one hit, it doesn't > change the order in which the IO is issued - it is still issued in > LRU order. Given that they are issued in a short period of time now, > rather than across a longer scan period, it is likely that it will > not be any faster as: > > a) IO will not be started as soon, and > b) the IO scheduler still only has a small re-ordering > window and will choke just as much on random IO patterns. > > However, there is a list_sort() function that could be used to sort > the list; sorting the list of pages by mapping and page->index > within the mapping would result in all the pages on each mapping > being sent down in ascending offset order at once - exactly how the > filesystems want IO to be sent to it. Perhaps this is a simple > improvement that can be made to this code that will make a big > difference to worst case performance. > > FWIW, I did this for delayed metadata buffer writeback in XFS > recently (i.e. sort the queue of (potentially tens of thousands of) > buffers in ascending block order before dispatch) and that showed a > 10-15% reduction in seeks on simple kernel compile workloads. This > shows that if we optimise IO patterns at higher layers where the > sort window is much, much larger than in the IO scheduler, then > overall system performance improves.... Yup. But then, this all really should be done at the block layer so other io-submitting-paths can benefit from it. IOW, maybe "the sort queue is the submission queue" wasn't a good idea. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org