All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <Joel.Becker@oracle.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
	Ulrich Drepper <drepper@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: statfs() / statvfs() syscall ballsup...
Date: Fri, 10 Oct 2003 09:26:07 -0700	[thread overview]
Message-ID: <20031010162606.GB28773@ca-server1.us.oracle.com> (raw)
In-Reply-To: <Pine.LNX.4.44.0310100839030.20420-100000@home.osdl.org>

On Fri, Oct 10, 2003 at 09:00:23AM -0700, Linus Torvalds wrote:
> I'm hoping in-memory databases will just kill off the current crop 
> totally. That solves all the IO problems - the only thing that goes to 
> disk is the log and the backups, and both go there totally linearly unless 
> the designer was crazy.

	Memory is continuously too small and too expensive.  Even if you
can buy a machine with 10TB of RAM, the price is going to be
prohibitive.  And when 10TB of RAM costs better, the database is going
to be 100TB.
	I'm not saying that in-memory is bad.  Big databases do
everything they can to make the workload look almost like in-memory.
It's the only way to go.

> But why do you think you need O_DIRECT with very bad semantics to handle
> this?

	I don't need O_DIRECT with bad semantics.  I need the semantics
I need, I know that other OSes have O_DIRECT to provide those
capabilities, and everyone loves portability.  That said...

> O_DIRECT throws the cache part away, but it throws out the baby with the
> bathwater, and breaks the other parts. Which is why O_DIRECT breaks things
> like disk scheduling in really subtle ways - think about writing and
> reading to the same area on the disk, and re-ordering at all different 
> levels. 

	Sure, but you don't do that.  The breakage in mixing O_DIRECT
with pagecache I/O to the same areas of the disk isn't even all that
subtle.  But you shouldn't be doing that, at least constantly.

> And the thing is, uncaching is _trivial_. It's not like it is hard to say
> "try to get rid of these pages if they aren't mapped anywhere" and "insert
> this user page directly into the page cache". But people are so fixated
> with "direct to disk" that they don't even think about it.

	I'm not fixated.  "Use this user page for the page cache entry
for this offset into the file", "Change this user page from representing
this offset in this file to representing that offset in that file", and
"whatever you do, always read/write from backing store for this page"
are the semantics needed.  For the latter, you'd have to have a way for
the app to trigger a read or write out of the cache.  You don't want to
do it on every page modification or access, that's too often.  The
application knows the syncronization points, not the kernel.

Joel

-- 

"There is a country in Europe where multiple-choice tests are
 illegal."
        - Sigfried Hulzer

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

  reply	other threads:[~2003-10-10 16:26 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-09 22:16 statfs() / statvfs() syscall ballsup Trond Myklebust
2003-10-09 22:26 ` Linus Torvalds
2003-10-09 23:19   ` Ulrich Drepper
2003-10-10  0:22     ` viro
2003-10-10  4:49       ` Jamie Lokier
2003-10-10  5:26         ` Trond Myklebust
2003-10-10 12:37           ` Jamie Lokier
2003-10-10 13:46             ` Trond Myklebust
2003-10-10 14:35               ` Jamie Lokier
2003-10-10 15:32                 ` Misc NFSv4 (was Re: statfs() / statvfs() syscall ballsup...) Trond Myklebust
2003-10-10 15:53                   ` Jamie Lokier
2003-10-10 16:07                     ` Trond Myklebust
2003-10-10 15:55                   ` Michael Shuey
2003-10-10 16:20                     ` Trond Myklebust
2003-10-10 16:45                     ` J. Bruce Fields
2003-10-10 14:39               ` statfs() / statvfs() syscall ballsup Jamie Lokier
2003-10-09 23:31   ` Trond Myklebust
2003-10-10 12:27   ` Joel Becker
2003-10-10 14:59     ` Linus Torvalds
2003-10-10 15:27       ` Joel Becker
2003-10-10 16:00         ` Linus Torvalds
2003-10-10 16:26           ` Joel Becker [this message]
2003-10-10 16:50             ` Linus Torvalds
2003-10-10 17:33               ` Joel Becker
2003-10-10 17:51                 ` Linus Torvalds
2003-10-10 18:13                   ` Joel Becker
2003-10-10 16:27           ` Valdis.Kletnieks
2003-10-10 16:33           ` Chris Friesen
2003-10-10 17:04             ` Linus Torvalds
2003-10-10 17:07               ` Linus Torvalds
2003-10-10 17:21                 ` Joel Becker
2003-10-10 16:01         ` Jamie Lokier
2003-10-10 16:33           ` Joel Becker
2003-10-10 16:58             ` Chris Friesen
2003-10-10 17:05               ` Trond Myklebust
2003-10-10 17:20               ` Joel Becker
2003-10-10 17:33                 ` Chris Friesen
2003-10-10 17:40                 ` Linus Torvalds
2003-10-10 17:54                   ` Trond Myklebust
2003-10-10 18:05                     ` Linus Torvalds
2003-10-10 20:40                       ` Trond Myklebust
2003-10-10 21:09                         ` Linus Torvalds
2003-10-10 22:17                           ` Trond Myklebust
2003-10-11  2:53                     ` Andrew Morton
2003-10-11  3:47                       ` Trond Myklebust
2003-10-10 18:05                   ` Joel Becker
2003-10-10 18:31                     ` Andrea Arcangeli
2003-10-10 20:33                     ` Helge Hafting
2003-10-10 20:07             ` Jamie Lokier
2003-10-12 15:31             ` Greg Stark
2003-10-12 16:13               ` Linus Torvalds
2003-10-12 22:09                 ` Greg Stark
2003-10-13  8:45                   ` Helge Hafting
2003-10-15 13:25                     ` Ingo Oeser
2003-10-15 15:03                       ` Greg Stark
2003-10-15 18:37                         ` Helge Hafting
2003-10-16 10:29                         ` Ingo Oeser
2003-10-16 14:02                           ` Greg Stark
2003-10-21 11:47                             ` Ingo Oeser
2003-10-10 18:20           ` Andrea Arcangeli
2003-10-10 18:36             ` Linus Torvalds
2003-10-10 19:03               ` Andrea Arcangeli
2003-10-09 23:16 ` Andreas Dilger
2003-10-09 23:24   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031010162606.GB28773@ca-server1.us.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=drepper@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.