From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 21 Jun 2001 20:29:31 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 21 Jun 2001 20:29:21 -0400 Received: from humbolt.nl.linux.org ([131.211.28.48]:65288 "EHLO humbolt.nl.linux.org") by vger.kernel.org with ESMTP id ; Thu, 21 Jun 2001 20:29:14 -0400 Content-Type: text/plain; charset=US-ASCII From: Daniel Phillips To: Marcelo Tosatti Subject: Re: Linux 2.4.5-ac15 Date: Fri, 22 Jun 2001 02:32:00 +0200 X-Mailer: KMail [version 1.2] Cc: Mike Galbraith , linux-kernel In-Reply-To: In-Reply-To: MIME-Version: 1.0 Message-Id: <01062202320001.00455@starship> Content-Transfer-Encoding: 7BIT Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thursday 21 June 2001 21:50, Marcelo Tosatti wrote: > On Thu, 21 Jun 2001, Daniel Phillips wrote: > > On Thursday 21 June 2001 07:44, Marcelo Tosatti wrote: > > > On Thu, 21 Jun 2001, Mike Galbraith wrote: > > > > On Thu, 21 Jun 2001, Marcelo Tosatti wrote: > > > > > Ok, I suspect that GFP_BUFFER allocations are fucking up here (they > > > > > can't block on IO, so they loop insanely). > > > > > > > > Why doesn't the VM hang the syncing of queued IO on these guys via > > > > wait_event or such instead of trying to just let the allocation fail? > > > > > > Actually the VM should limit the amount of data being queued for _all_ > > > kind of allocations. > > > > > > The problem is the lack of a mechanism which allows us to account the > > > approximated amount of queued IO by the VM. (except for swap pages) > > > > Coincidence - that's what I started working on two days ago, and I'm > > moving into the second generation design today. Look at > > 'queued_sectors'. I found pretty quickly it's not enough, today I'm > > adding 'submitted_sectors' to the soup. This will allow me to > > distinguish between traffic generated by my own thread and other traffic. > > Could you expand on this, please ? OK, I am doing opportunistic flushing, so I want to know that nobody else is using the disk, and so long as that's true, I'll keep flushing out buffers. Conversely, if anybody else queues a request I'll bail out of the flush loop as soon as I've flushed the absolute minimum number of buffers, i.e., the ones that were dirtied more than bdflush_params->age_buffer ago. But how do I know if somebody else is submitting requests? The surest way to know is to have a sumitted_sectors counter that just counts every submission, and compare that to the number of sectors I know I've submitted. (This counter wraps, so I actually track the difference from value on entering the flush loop). The first thing I found (duh) is that nobody else ever submits anything while I'm in the flush loop because I'm on UP and I never (almost never) yield the CPU. On SMP I will get other threads submitting, but only rarely will the submission happen while I'm in the flush loop. No good, I'm not detecting the other disk activity reliably, back to the drawing board. My original plan was to compute a running average of submission rates and use that to control my opportunistic flushing. I departed from that because I seemed to get good results with a much simpler strategy, the patch I already posted. It's fundamentally flawed though - it works fine for constant light load and constant full load, but not for sporadic loads. What I need is something a lot smoother, more analog, so I'll return to my original plan. What I want to notice is that the IO submission rate has fallen below a certain level then, when the IO backlog has also fallen below a few ms worth of transfers I can do the opportunistic flushing. In the flush loop I want to submit enough buffers to make sure I'm using the full bandwidth, but not so many that I create a big backlog that gets in the way of a surge in demand from some other source. I'm still working out the details of that, I will not post an updated patch today after all ;-) By the way, there's a really important throughput benefit for doing this early flushing that I didn't put in the list when I first wrote about it. It's this: whenever we have a bunch of buffers dirtied, if the disk bandwidth is available we want to load up the disk right away, not 5 seconds from now. If we wait 5 seconds, we just wasted 5 seconds of disk bandwidth. Again, duh. So my goal in doing this was initially do have it cost as little in throughput as possible - I see now that it's actually a win for throughput. End of discussion about whether to put in the effort or not. -- Daniel