From: Linus Torvalds <torvalds@osdl.org> To: Joel Becker <Joel.Becker@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no>, Ulrich Drepper <drepper@redhat.com>, Linux Kernel <linux-kernel@vger.kernel.org> Subject: Re: statfs() / statvfs() syscall ballsup... Date: Fri, 10 Oct 2003 09:00:23 -0700 (PDT) [thread overview] Message-ID: <Pine.LNX.4.44.0310100839030.20420-100000@home.osdl.org> (raw) In-Reply-To: <20031010152710.GA28773@ca-server1.us.oracle.com> On Fri, 10 Oct 2003, Joel Becker wrote: > > I hope disk-based databases die off quickly. > > As opposed to what? Not a challenge, just interested in what > you think they should be. I'm hoping in-memory databases will just kill off the current crop totally. That solves all the IO problems - the only thing that goes to disk is the log and the backups, and both go there totally linearly unless the designer was crazy. Yeah, I don't follow the db market, but it's just insane to try to keep the on-disk data in any other format if you've got enough memory. Recovery may take a long time (reading that whole backup into memory and redoing the log will be pretty expensive), but replication should handle that trivially. > Where I work doesn't change the need for O_DIRECT. If your Big > App has it's own cache, why copy the cache in the kernel? Why indeed? But why do you think you need O_DIRECT with very bad semantics to handle this? The kernel page cache does multiple things: - staging area for letting the filesystem do blocking (ie this is why a regular "write()" or "read()" doesn't need to care about alignment etc) - a synchronization entity - making sure that a write and a read cannot pass each other, and that mmap contents are always _coherent_. - a cache O_DIRECT throws the cache part away, but it throws out the baby with the bathwater, and breaks the other parts. Which is why O_DIRECT breaks things like disk scheduling in really subtle ways - think about writing and reading to the same area on the disk, and re-ordering at all different levels. And the thing is, uncaching is _trivial_. It's not like it is hard to say "try to get rid of these pages if they aren't mapped anywhere" and "insert this user page directly into the page cache". But people are so fixated with "direct to disk" that they don't even think about it. Linus
next prev parent reply other threads:[~2003-10-10 16:02 UTC|newest] Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top 2003-10-09 22:16 Trond Myklebust 2003-10-09 22:26 ` Linus Torvalds 2003-10-09 23:19 ` Ulrich Drepper 2003-10-10 0:22 ` viro 2003-10-10 4:49 ` Jamie Lokier 2003-10-10 5:26 ` Trond Myklebust 2003-10-10 12:37 ` Jamie Lokier 2003-10-10 13:46 ` Trond Myklebust 2003-10-10 14:35 ` Jamie Lokier 2003-10-10 15:32 ` Misc NFSv4 (was Re: statfs() / statvfs() syscall ballsup...) Trond Myklebust 2003-10-10 15:53 ` Jamie Lokier 2003-10-10 16:07 ` Trond Myklebust 2003-10-10 15:55 ` Michael Shuey 2003-10-10 16:20 ` Trond Myklebust 2003-10-10 16:45 ` J. Bruce Fields 2003-10-10 14:39 ` statfs() / statvfs() syscall ballsup Jamie Lokier 2003-10-09 23:31 ` Trond Myklebust 2003-10-10 12:27 ` Joel Becker 2003-10-10 14:59 ` Linus Torvalds 2003-10-10 15:27 ` Joel Becker 2003-10-10 16:00 ` Linus Torvalds [this message] 2003-10-10 16:26 ` Joel Becker 2003-10-10 16:50 ` Linus Torvalds 2003-10-10 17:33 ` Joel Becker 2003-10-10 17:51 ` Linus Torvalds 2003-10-10 18:13 ` Joel Becker 2003-10-10 16:27 ` Valdis.Kletnieks 2003-10-10 16:33 ` Chris Friesen 2003-10-10 17:04 ` Linus Torvalds 2003-10-10 17:07 ` Linus Torvalds 2003-10-10 17:21 ` Joel Becker 2003-10-10 16:01 ` Jamie Lokier 2003-10-10 16:33 ` Joel Becker 2003-10-10 16:58 ` Chris Friesen 2003-10-10 17:05 ` Trond Myklebust 2003-10-10 17:20 ` Joel Becker 2003-10-10 17:33 ` Chris Friesen 2003-10-10 17:40 ` Linus Torvalds 2003-10-10 17:54 ` Trond Myklebust 2003-10-10 18:05 ` Linus Torvalds 2003-10-10 20:40 ` Trond Myklebust 2003-10-10 21:09 ` Linus Torvalds 2003-10-10 22:17 ` Trond Myklebust 2003-10-11 2:53 ` Andrew Morton 2003-10-11 3:47 ` Trond Myklebust 2003-10-10 18:05 ` Joel Becker 2003-10-10 18:31 ` Andrea Arcangeli 2003-10-10 20:33 ` Helge Hafting 2003-10-10 20:07 ` Jamie Lokier 2003-10-12 15:31 ` Greg Stark 2003-10-12 16:13 ` Linus Torvalds 2003-10-12 22:09 ` Greg Stark 2003-10-13 8:45 ` Helge Hafting 2003-10-15 13:25 ` Ingo Oeser 2003-10-15 15:03 ` Greg Stark 2003-10-15 18:37 ` Helge Hafting 2003-10-16 10:29 ` Ingo Oeser 2003-10-16 14:02 ` Greg Stark 2003-10-21 11:47 ` Ingo Oeser 2003-10-10 18:20 ` Andrea Arcangeli 2003-10-10 18:36 ` Linus Torvalds 2003-10-10 19:03 ` Andrea Arcangeli 2003-10-09 23:16 ` Andreas Dilger 2003-10-09 23:24 ` Linus Torvalds
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Pine.LNX.4.44.0310100839030.20420-100000@home.osdl.org \ --to=torvalds@osdl.org \ --cc=Joel.Becker@oracle.com \ --cc=drepper@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=trond.myklebust@fys.uio.no \ --subject='Re: statfs() / statvfs() syscall ballsup...' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.