All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Stark <gsstark@mit.edu>
To: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Greg Stark <gsstark@mit.edu>,
	Helge Hafting <helgehaf@aitel.hist.no>,
	Joel Becker <Joel.Becker@oracle.com>,
	Jamie Lokier <jamie@shareable.org>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	Ulrich Drepper <drepper@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: statfs() / statvfs() syscall ballsup...
Date: 16 Oct 2003 10:02:27 -0400	[thread overview]
Message-ID: <87smlt9t70.fsf@stark.dyndns.tv> (raw)
In-Reply-To: <200310161229.44861.ioe-lkml@rameria.de>


Ingo Oeser <ioe-lkml@rameria.de> writes:

> Hi there,
> 
> first: I think the problem is solvable with mixing blocking and
> non-blocking IO or simply AIO, which will be supported nicely by 2.6.0,
> is a POSIX standard and is meant for doing your own IO scheduling.

I think aio could be very useful for databases, but not in this area. I think
it's useful as a more fine-grained tool than sync/fsync. Currently the
database has to fsync a file to commit a transaction, which means flushing
_all_writes to the file even ones from other transactions. If aio inserted
write barriers to the disk controller then it would provide a way to ensure
the current transaction is synced without having to flush all other
transactions writes at the same time.

But I don't see how it's useful for the problem I'm describing.

> On Wednesday 15 October 2003 17:03, Greg Stark wrote:
> > Ingo Oeser <ioe-lkml@rameria.de> writes:
> > > On Monday 13 October 2003 10:45, Helge Hafting wrote:
> > > > This is easier than trying to tell the kernel that the job is
> > > > less important, that goes wrong wether the job runs too much
> > > > or too little.  Let that job  sleep a little when its services
> > > > aren't needed, or when you need the disk bandwith elsewhere.
> >
> > Actually I think that's exactly backwards. The problem is that if the
> > user-space tries to throttle the process it doesn't know how much or when.
> > The kernel knows exactly when there are other higher priority writes, it
> > can schedule just enough writes from vacuum to not interfere.
> 
> On dedicated servers this might be true. But on these you could also
> solve it in user space by measuring disk bandwidth and issueing just
> enough IO to keep up roughly with it.

Indeed we're discussing methods for doing that now. But this seems like a
awkward way to accomplish what the kernel could do very precisely. I don't see
why non-dedicated servers would be make priorities any less useful, in fact I
think that's exactly where they would shine.

> > So if vacuum slept a bit, say every 64k of data vacuumed. It could end up
> > sleeping when the disks are actually idle. Or it could be not sleeping
> > enough and still be interfering with transactions.
> 
> The vacuum io is submitted (via AIO or simulation of it) normally in a
> unit U and waiting ALWAYS for U to complete, before submitting a new one.
> Between submitting units, the vacuums checks for outstanding transactions 
> and stops, when we have one.
> 
> Now a transaction is submitted and the submitting from vacuum is stopped
> by it existing. The transaction waits for completion (e.g.  aio_suspend()) 
> and signals vacuum to continue.

User-space has no idea if disk i/o is occurring. The data the transaction
needs could be cached, or it could be on a different disk.

Besides, I think this is far too coarse-grained than what's needed.
Transactions sometimes run for seconds, minutes, or hours,, some of that time
is spent doing disk i/o and some of it doing cpu calculations. It can't stop
and signal another process every time it finishes reading a block and needs to
do a bit of calculation. Then context switch again a millisecond later so it
can read the next block...

And besides, this is would only useful on dedicated servers.

-- 
greg


  reply	other threads:[~2003-10-16 14:02 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-09 22:16 statfs() / statvfs() syscall ballsup Trond Myklebust
2003-10-09 22:26 ` Linus Torvalds
2003-10-09 23:19   ` Ulrich Drepper
2003-10-10  0:22     ` viro
2003-10-10  4:49       ` Jamie Lokier
2003-10-10  5:26         ` Trond Myklebust
2003-10-10 12:37           ` Jamie Lokier
2003-10-10 13:46             ` Trond Myklebust
2003-10-10 14:35               ` Jamie Lokier
2003-10-10 15:32                 ` Misc NFSv4 (was Re: statfs() / statvfs() syscall ballsup...) Trond Myklebust
2003-10-10 15:53                   ` Jamie Lokier
2003-10-10 16:07                     ` Trond Myklebust
2003-10-10 15:55                   ` Michael Shuey
2003-10-10 16:20                     ` Trond Myklebust
2003-10-10 16:45                     ` J. Bruce Fields
2003-10-10 14:39               ` statfs() / statvfs() syscall ballsup Jamie Lokier
2003-10-09 23:31   ` Trond Myklebust
2003-10-10 12:27   ` Joel Becker
2003-10-10 14:59     ` Linus Torvalds
2003-10-10 15:27       ` Joel Becker
2003-10-10 16:00         ` Linus Torvalds
2003-10-10 16:26           ` Joel Becker
2003-10-10 16:50             ` Linus Torvalds
2003-10-10 17:33               ` Joel Becker
2003-10-10 17:51                 ` Linus Torvalds
2003-10-10 18:13                   ` Joel Becker
2003-10-10 16:27           ` Valdis.Kletnieks
2003-10-10 16:33           ` Chris Friesen
2003-10-10 17:04             ` Linus Torvalds
2003-10-10 17:07               ` Linus Torvalds
2003-10-10 17:21                 ` Joel Becker
2003-10-10 16:01         ` Jamie Lokier
2003-10-10 16:33           ` Joel Becker
2003-10-10 16:58             ` Chris Friesen
2003-10-10 17:05               ` Trond Myklebust
2003-10-10 17:20               ` Joel Becker
2003-10-10 17:33                 ` Chris Friesen
2003-10-10 17:40                 ` Linus Torvalds
2003-10-10 17:54                   ` Trond Myklebust
2003-10-10 18:05                     ` Linus Torvalds
2003-10-10 20:40                       ` Trond Myklebust
2003-10-10 21:09                         ` Linus Torvalds
2003-10-10 22:17                           ` Trond Myklebust
2003-10-11  2:53                     ` Andrew Morton
2003-10-11  3:47                       ` Trond Myklebust
2003-10-10 18:05                   ` Joel Becker
2003-10-10 18:31                     ` Andrea Arcangeli
2003-10-10 20:33                     ` Helge Hafting
2003-10-10 20:07             ` Jamie Lokier
2003-10-12 15:31             ` Greg Stark
2003-10-12 16:13               ` Linus Torvalds
2003-10-12 22:09                 ` Greg Stark
2003-10-13  8:45                   ` Helge Hafting
2003-10-15 13:25                     ` Ingo Oeser
2003-10-15 15:03                       ` Greg Stark
2003-10-15 18:37                         ` Helge Hafting
2003-10-16 10:29                         ` Ingo Oeser
2003-10-16 14:02                           ` Greg Stark [this message]
2003-10-21 11:47                             ` Ingo Oeser
2003-10-10 18:20           ` Andrea Arcangeli
2003-10-10 18:36             ` Linus Torvalds
2003-10-10 19:03               ` Andrea Arcangeli
2003-10-09 23:16 ` Andreas Dilger
2003-10-09 23:24   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87smlt9t70.fsf@stark.dyndns.tv \
    --to=gsstark@mit.edu \
    --cc=Joel.Becker@oracle.com \
    --cc=drepper@redhat.com \
    --cc=helgehaf@aitel.hist.no \
    --cc=ioe-lkml@rameria.de \
    --cc=jamie@shareable.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.