All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Oeser <ioe-lkml@rameria.de>
To: Greg Stark <gsstark@mit.edu>
Cc: Helge Hafting <helgehaf@aitel.hist.no>,
	Joel Becker <Joel.Becker@oracle.com>,
	Jamie Lokier <jamie@shareable.org>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	Ulrich Drepper <drepper@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: statfs() / statvfs() syscall ballsup...
Date: Thu, 16 Oct 2003 12:29:44 +0200	[thread overview]
Message-ID: <200310161229.44861.ioe-lkml@rameria.de> (raw)
In-Reply-To: <87llrmbl1g.fsf@stark.dyndns.tv>

Hi there,

first: I think the problem is solvable with mixing blocking and
non-blocking IO or simply AIO, which will be supported nicely by 2.6.0,
is a POSIX standard and is meant for doing your own IO scheduling.

On Wednesday 15 October 2003 17:03, Greg Stark wrote:
> Ingo Oeser <ioe-lkml@rameria.de> writes:
> > On Monday 13 October 2003 10:45, Helge Hafting wrote:
> > > This is easier than trying to tell the kernel that the job is
> > > less important, that goes wrong wether the job runs too much
> > > or too little.  Let that job  sleep a little when its services
> > > aren't needed, or when you need the disk bandwith elsewhere.
>
> Actually I think that's exactly backwards. The problem is that if the
> user-space tries to throttle the process it doesn't know how much or when.
> The kernel knows exactly when there are other higher priority writes, it
> can schedule just enough writes from vacuum to not interfere.

On dedicated servers this might be true. But on these you could also
solve it in user space by measuring disk bandwidth and issueing just
enough IO to keep up roughly with it.

> So if vacuum slept a bit, say every 64k of data vacuumed. It could end up
> sleeping when the disks are actually idle. Or it could be not sleeping
> enough and still be interfering with transactions.

The vacuum io is submitted (via AIO or simulation of it) normally in a
unit U and waiting ALWAYS for U to complete, before submitting a new one.
Between submitting units, the vacuums checks for outstanding transactions 
and stops, when we have one.

Now a transaction is submitted and the submitting from vacuum is stopped
by it existing. The transaction waits for completion (e.g.  aio_suspend()) 
and signals vacuum to continue.

So the disk(s) should be always in good use.

I don't know much of the design internals of your database, but this
sounds promising and is portable.

> > The questions are: How IO-intensive vacuum? How fast can a throttling
> > free disk bandwidth (and memory)?
>
> It's purely i/o bound on large sequential reads. Ideally it should still
> have large enough sequential reads to not lose the streaming advantage, but
> not so large that it preempts the more random-access transactions.

Ok, so we can ignore the processing time and the above should just work.


Regards

Ingo Oeser



  parent reply	other threads:[~2003-10-16 10:32 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-09 22:16 Trond Myklebust
2003-10-09 22:26 ` Linus Torvalds
2003-10-09 23:19   ` Ulrich Drepper
2003-10-10  0:22     ` viro
2003-10-10  4:49       ` Jamie Lokier
2003-10-10  5:26         ` Trond Myklebust
2003-10-10 12:37           ` Jamie Lokier
2003-10-10 13:46             ` Trond Myklebust
2003-10-10 14:35               ` Jamie Lokier
2003-10-10 15:32                 ` Misc NFSv4 (was Re: statfs() / statvfs() syscall ballsup...) Trond Myklebust
2003-10-10 15:53                   ` Jamie Lokier
2003-10-10 16:07                     ` Trond Myklebust
2003-10-10 15:55                   ` Michael Shuey
2003-10-10 16:20                     ` Trond Myklebust
2003-10-10 16:45                     ` J. Bruce Fields
2003-10-10 14:39               ` statfs() / statvfs() syscall ballsup Jamie Lokier
2003-10-09 23:31   ` Trond Myklebust
2003-10-10 12:27   ` Joel Becker
2003-10-10 14:59     ` Linus Torvalds
2003-10-10 15:27       ` Joel Becker
2003-10-10 16:00         ` Linus Torvalds
2003-10-10 16:26           ` Joel Becker
2003-10-10 16:50             ` Linus Torvalds
2003-10-10 17:33               ` Joel Becker
2003-10-10 17:51                 ` Linus Torvalds
2003-10-10 18:13                   ` Joel Becker
2003-10-10 16:27           ` Valdis.Kletnieks
2003-10-10 16:33           ` Chris Friesen
2003-10-10 17:04             ` Linus Torvalds
2003-10-10 17:07               ` Linus Torvalds
2003-10-10 17:21                 ` Joel Becker
2003-10-10 16:01         ` Jamie Lokier
2003-10-10 16:33           ` Joel Becker
2003-10-10 16:58             ` Chris Friesen
2003-10-10 17:05               ` Trond Myklebust
2003-10-10 17:20               ` Joel Becker
2003-10-10 17:33                 ` Chris Friesen
2003-10-10 17:40                 ` Linus Torvalds
2003-10-10 17:54                   ` Trond Myklebust
2003-10-10 18:05                     ` Linus Torvalds
2003-10-10 20:40                       ` Trond Myklebust
2003-10-10 21:09                         ` Linus Torvalds
2003-10-10 22:17                           ` Trond Myklebust
2003-10-11  2:53                     ` Andrew Morton
2003-10-11  3:47                       ` Trond Myklebust
2003-10-10 18:05                   ` Joel Becker
2003-10-10 18:31                     ` Andrea Arcangeli
2003-10-10 20:33                     ` Helge Hafting
2003-10-10 20:07             ` Jamie Lokier
2003-10-12 15:31             ` Greg Stark
2003-10-12 16:13               ` Linus Torvalds
2003-10-12 22:09                 ` Greg Stark
2003-10-13  8:45                   ` Helge Hafting
2003-10-15 13:25                     ` Ingo Oeser
2003-10-15 15:03                       ` Greg Stark
2003-10-15 18:37                         ` Helge Hafting
2003-10-16 10:29                         ` Ingo Oeser [this message]
2003-10-16 14:02                           ` Greg Stark
2003-10-21 11:47                             ` Ingo Oeser
2003-10-10 18:20           ` Andrea Arcangeli
2003-10-10 18:36             ` Linus Torvalds
2003-10-10 19:03               ` Andrea Arcangeli
2003-10-09 23:16 ` Andreas Dilger
2003-10-09 23:24   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200310161229.44861.ioe-lkml@rameria.de \
    --to=ioe-lkml@rameria.de \
    --cc=Joel.Becker@oracle.com \
    --cc=drepper@redhat.com \
    --cc=gsstark@mit.edu \
    --cc=helgehaf@aitel.hist.no \
    --cc=jamie@shareable.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    --subject='Re: statfs() / statvfs() syscall ballsup...' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.