linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Galbraith <mikeg@wen-online.de>
To: Daniel Phillips <phillips@bonn-fries.net>
Cc: Rik van Riel <riel@conectiva.com.br>,
	Pavel Machek <pavel@suse.cz>, John Stoffel <stoffel@casc.com>,
	Roger Larsson <roger.larsson@norran.net>, <thunder7@xs4all.nl>,
	Linux-Kernel <linux-kernel@vger.kernel.org>
Subject: Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
Date: Tue, 19 Jun 2001 06:35:38 +0200 (CEST)	[thread overview]
Message-ID: <Pine.LNX.4.33.0106190617430.483-100000@mikeg.weiden.de> (raw)
In-Reply-To: <01061816220503.11745@starship>

On Mon, 18 Jun 2001, Daniel Phillips wrote:

> On Sunday 17 June 2001 12:05, Mike Galbraith wrote:
> > It _juuust_ so happens that I was tinkering... what do you think of
> > something like the below?  (and boy do I ever wonder what a certain
> > box doing slrn stuff thinks of it.. hint hint;)
>
> It's too subtle for me ;-)  (Not shy about sying that because this part of
> the kernel is probably subtle for everyone.)

No subtltry (hammer), it just draws a line that doesn't move around
in unpredictable ways.  For example, nr_free_buffer_pages() adds in
free pages to the line it draws.  You may have a large volume of dirty
data, decide it would be prudent to flush, then someone frees a nice
chunk of memory...  (send morse code messages via malloc/free?:)

Anyway it's crude, but it seems to have gotten results from the slrn
load.  I received logs for ac15 and ac15+patch.  ac15 took 265 seconds
to do the job whereas with the patch it took 227 seconds.  I haven't
poured over the logs yet, but there seems to be throughput to be had.

If anyone is interested in the logs, they're much smaller than expected
-rw-r--r--   1 mikeg    users       11993 Jun 19 05:58 ac15_mike.log
-rw-r--r--   1 mikeg    users       13015 Jun 19 05:58 ac15_org.log

> The question I'm tackling right now is how the system behaves when the load
> goes away, or doesn't get heavy.  Your patch doesn't measure the load
> directly - it may attempt to predict it as a function of memory pressure, but
> that's a little more loosely coupled than what I had in mind.

It doesn't attempt to predict, it reacts to the existing situation.

> I'm now in the midst of hatching a patch. [1] The first thing I had to do is
> go explore the block driver code, yum yum.  I found that it already computes
> the statistic I'm interested in, namely queued_sectors, which is used to pace
> the IO on block devices.  It's a little crude - we really want this to be
> per-queue and have one queue per "spindle" - but even in its current form
> it's workable.
>
> The idea is that when queued_sectors drops below some threshold we have
> 'unused disk bandwidth' so it would be nice to do something useful with it:

(that's much more subtle/clever:)

>   1) Do an early 'sync_old_buffers'
>   2) Do some preemptive pageout
>
> The benefit of (1) is that it lets disks go idle a few seconds earlier, and
> (2) should improve the system's latency in response to load surges.  There
> are drawbacks too, which have been pointed out to me privately, but they tend
> to be pretty minor, for example: on a flash disk you'd do a few extra writes
> and wear it out ever-so-slightly sooner.  All the same, such special devices
> can be dealt easily once we progress a little further in improving the
> kernel's 'per spindle' intelligence.
>
> Now how to implement this.  I considered putting a (newly minted)
> wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded
> transition, and that's fine except it doesn't do the whole job: we also need
> to have the early flush for any write to a disk file while the disks are
> lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to
> trigger it.  The missing trigger could be inserted into __mark_dirty, but
> that would penalize the loaded state (a little, but that's still too much).
> Furthermore, it's probably desirable to maintain a small delay between the
> dirty and the flush.  So what I'll try first is just running kflush's timer
> faster, and make its reschedule period vary with disk load, i.e., when there
> are fewer queued_sectors, kflush looks at the dirty buffer list more often.
>
> The rest of what has to happen in kflush is pretty straightforward.  It just
> uses queued_sectors to determine how far to walk the dirty buffer list, which
> is maintained in time-since-dirtied order.  If queued_sectors is below some
> threshold the entire list is flushed.  Note that we want to change the sense
> of b_flushtime to b_timedirtied.  It's more efficient to do it this way
> anyway.
>
> I haven't done anything about preemptive pageout yet, but similar ideas apply.

Preemptive pageout could be simply walk the dirty list looking for swap
pages and writing them out.  With the fair aging change that's already
in, there will be some.  If the fair aging change to background aging
works out, there will be more (don't want too many more though;).  The
only problem I can see with that simle method is that once written, the
page lands on the inactive_clean list.  That list is short and does get
consumed.. might turn fake pageout into a real one unintentionally.

> [1] This is an experiment, do not worry, it will not show up in your tree any
> time soon.  IOW, constructive criticism appreciated, flames copied to
> /dev/null.

Look forward to seeing it.

	-Mike


  reply	other threads:[~2001-06-19  4:37 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-06-13 19:31 2.4.6-pre2, pre3 VM Behavior Tom Sightler
2001-06-13 20:21 ` Rik van Riel
2001-06-14  1:49   ` Tom Sightler
2001-06-14  3:16     ` Rik van Riel
2001-06-14  7:59       ` Laramie Leavitt
2001-06-14  9:24         ` Helge Hafting
2001-06-14 17:38           ` Mark Hahn
2001-06-15  8:27             ` Helge Hafting
2001-06-14  8:47       ` Daniel Phillips
2001-06-14 20:23         ` Roger Larsson
2001-06-15  6:04           ` Mike Galbraith
2001-06-14 20:39         ` John Stoffel
2001-06-14 20:51           ` Rik van Riel
2001-06-14 21:33           ` John Stoffel
2001-06-14 22:23             ` Rik van Riel
2001-06-15 15:23           ` spindown [was Re: 2.4.6-pre2, pre3 VM Behavior] Pavel Machek
2001-06-16 20:50             ` Daniel Phillips
2001-06-16 21:06               ` Rik van Riel
2001-06-16 21:25                 ` Rik van Riel
2001-06-16 21:44                 ` Daniel Phillips
2001-06-16 21:54                   ` Rik van Riel
2001-06-17 10:28                     ` Daniel Phillips
2001-06-17 10:05                   ` Mike Galbraith
2001-06-17 12:49                     ` (lkml)Re: " thunder7
2001-06-17 16:40                       ` Mike Galbraith
2001-06-18 14:22                     ` Daniel Phillips
2001-06-19  4:35                       ` Mike Galbraith [this message]
2001-06-20  1:50                       ` [RFC] Early flush (was: spindown) Daniel Phillips
2001-06-20 20:58                         ` Tom Sightler
2001-06-20 22:09                           ` Daniel Phillips
2001-06-24  3:20                           ` Anuradha Ratnaweera
2001-06-24 11:14                             ` Daniel Phillips
2001-06-24 15:06                             ` Rik van Riel
2001-06-24 16:21                               ` Daniel Phillips
2001-06-20  4:39                       ` Richard Gooch
2001-06-20 14:29                         ` Daniel Phillips
2001-06-20 16:12                         ` Richard Gooch
2001-06-22 23:25                           ` Daniel Kobras
2001-06-23  5:10                             ` Daniel Phillips
2001-06-25 11:33                               ` Pavel Machek
2001-06-25 11:31                           ` Pavel Machek
2001-06-18 20:21             ` spindown Simon Huggins
2001-06-19 10:46               ` spindown Pavel Machek
2001-06-20 16:52                 ` spindown Daniel Phillips
2001-06-20 17:32                   ` spindown Rik van Riel
2001-06-20 18:00                     ` spindown Daniel Phillips
2001-06-21 16:07                 ` spindown Jamie Lokier
2001-06-22 22:09                   ` spindown Daniel Kobras
2001-06-28  0:27                   ` spindown Troy Benjegerdes
2001-06-14 15:10       ` 2.4.6-pre2, pre3 VM Behavior John Stoffel
2001-06-14 18:25         ` Daniel Phillips
2001-06-14  8:30   ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.33.0106190617430.483-100000@mikeg.weiden.de \
    --to=mikeg@wen-online.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@suse.cz \
    --cc=phillips@bonn-fries.net \
    --cc=riel@conectiva.com.br \
    --cc=roger.larsson@norran.net \
    --cc=stoffel@casc.com \
    --cc=thunder7@xs4all.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).