linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@bonn-fries.net>
To: Rik van Riel <riel@conectiva.com.br>, Ben LaHaise <bcrl@redhat.com>
Cc: <torvalds@transmeta.com>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>
Subject: Re: [RFC][DATA] re "ongoing vm suckage"
Date: Sat, 4 Aug 2001 05:06:57 +0200	[thread overview]
Message-ID: <0108040506570N.01827@starship> (raw)
In-Reply-To: <Pine.LNX.4.33L.0108032144310.11893-100000@imladris.rielhome.conectiva>
In-Reply-To: <Pine.LNX.4.33L.0108032144310.11893-100000@imladris.rielhome.conectiva>

On Saturday 04 August 2001 03:29, Rik van Riel wrote:
> On Fri, 3 Aug 2001, Ben LaHaise wrote:
> > --- vm-2.4.7/drivers/block/ll_rw_blk.c.2	Fri Aug  3 19:06:46 2001
> > +++ vm-2.4.7/drivers/block/ll_rw_blk.c	Fri Aug  3 19:32:46 2001
> > @@ -1037,9 +1037,16 @@
> >  		 * water mark. instead start I/O on the queued stuff.
> >  		 */
> >  		if (atomic_read(&queued_sectors) >= high_queued_sectors) {
> > -			run_task_queue(&tq_disk);
> > -			wait_event(blk_buffers_wait,
> > -			 atomic_read(&queued_sectors) < low_queued_sectors);
>
> ... OUCH ...
>
> > bah.  Doesn't fix it.  Still waiting indefinately in ll_rw_blk().
>
> And it's obvious why.
>
> The code above, as well as your replacement, are have a
> VERY serious "fairness issue".
>
> 	task 1			task 2
>
>  queued_sectors > high
>    ==> waits for
>    queued_sectors < low
>
>                              write stuff, submits IO
>                              queued_sectors < high  (but > low)
>                              ....
>                              queued sectors still < high, > low
>                              happily submits more IO
>                              ...
>                              etc..
>
> It is quite obvious that the second task can easily starve
> the first task as long as it keeps submitting IO at a rate
> where queued_sectors will stay above low_queued_sectors,
> but under high_queued sectors.

Nice shooting, this could explain the effect I noticed where
writing a linker file takes 8 times longer when competing with
a simultaneous grep.

> There are two possible solutions to the starvation scenario:
>
> 1) have one threshold
> 2) if one task is sleeping, let ALL tasks sleep
>    until we reach the lower threshold

Umm.... Hmm, there are lots more solutions than that, but those two
are nice and simple.  A quick test for (1) I hope Ben will try is
just to set high_queued_sectors = low_queued_sectors.

Currently, IO scheduling relies on the "random" algorithm for fairness
where the randomness is supplied by the processes.  This breaks down
sometimes, spectacularly, for some distinctly non-random access
patterns as you demonstrated.

Algorithm (2) above would have some potentially strange interactions
with the scheduler, it looks scary.  (E.g., change the scheduler, IO
on some people's machines suddenly goes to hell.)

Come to think of it (1) will also suffer in some cases from nonrandom
scheduling.

Now let me see, why do we even have the high+low thresholds?  I
suppose it is to avoid taking two context switches on every submitted
block, so it seems like a good idea.

For IO fairness I think we need something a little more deterministic.
I'm thinking about an IO quantum right now - when a task has used up
its quantum it yields to the next task, if any, waiting on the IO
queue.  How to preserve the effect of the high+low thresholds... it
needs more thinking, though I've already thought of several ways of
doing it badly :-)

--
Daniel

  reply	other threads:[~2001-08-04  3:01 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-08-03 23:44 [RFC][DATA] re "ongoing vm suckage" Ben LaHaise
2001-08-04  1:29 ` Rik van Riel
2001-08-04  3:06   ` Daniel Phillips [this message]
2001-08-04  3:13     ` Linus Torvalds
2001-08-04  3:23       ` Rik van Riel
2001-08-04  3:35         ` Linus Torvalds
2001-08-04  3:26       ` Ben LaHaise
2001-08-04  3:34         ` Rik van Riel
2001-08-04  3:38         ` Linus Torvalds
2001-08-04  3:48         ` Linus Torvalds
2001-08-04  4:14           ` Ben LaHaise
2001-08-04  4:20             ` Linus Torvalds
2001-08-04  4:39               ` Ben LaHaise
2001-08-04  4:47                 ` Linus Torvalds
2001-08-04  5:13                   ` Ben LaHaise
2001-08-04  5:28                     ` Linus Torvalds
2001-08-04  6:37                     ` Linus Torvalds
2001-08-04  5:38                       ` Marcelo Tosatti
2001-08-04  7:13                         ` Rik van Riel
2001-08-04 14:22                       ` Mike Black
2001-08-04 17:08                         ` Linus Torvalds
2001-08-05  4:19                           ` Michael Rothwell
2001-08-05 18:40                             ` Marcelo Tosatti
2001-08-05 20:20                             ` Linus Torvalds
2001-08-05 20:45                               ` arjan
2001-08-06 20:32                               ` Rob Landley
2001-08-05 15:24                           ` Mike Black
2001-08-05 20:04                             ` Linus Torvalds
2001-08-05 20:23                               ` Alan Cox
2001-08-05 20:33                                 ` Linus Torvalds
2001-08-04 16:21                       ` Mark Hemment
2001-08-07 15:45                       ` Ben LaHaise
2001-08-07 16:22                         ` Linus Torvalds
2001-08-07 16:51                           ` Ben LaHaise
2001-08-07 17:08                             ` Linus Torvalds
2001-08-07 18:17                             ` Andrew Morton
2001-08-07 18:40                               ` Ben LaHaise
2001-08-07 21:33                                 ` Daniel Phillips
2001-08-07 22:03                                 ` Linus Torvalds
2001-08-07 21:33                             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0108040506570N.01827@starship \
    --to=phillips@bonn-fries.net \
    --cc=bcrl@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@conectiva.com.br \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).