All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Joel Becker <Joel.Becker@oracle.com>
Cc: Chris Friesen <cfriesen@nortelnetworks.com>,
	Jamie Lokier <jamie@shareable.org>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	Ulrich Drepper <drepper@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: statfs() / statvfs() syscall ballsup...
Date: Fri, 10 Oct 2003 10:40:40 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.44.0310101024200.20420-100000@home.osdl.org> (raw)
In-Reply-To: <20031010172001.GA29301@ca-server1.us.oracle.com>


On Fri, 10 Oct 2003, Joel Becker wrote:
> 
> 	msync() forces write(), like fsync().  It doesn't force read().

Actually, the kernel has a "readahead(fd, offset, size)" system call that
will start asynchronous read-ahead on any mapping. After that, just
touching the page will obviously map in and synchronize the result.

I don't think anybody uses it, and the interface may be broken, but it was
literally 20 lines of code, and I had a trivial test program that
populated the cache for a directory structure really quickly using it.

In general, it would be really nice to have more oracle people discussing
what their particular pet horror is, and what they'd really like to do.

I know you're more used to just doing your own thing and working with
vendors, but even just people getting used to do the unofficial "this is
what we do, and it sucks because xxx" would make people more aware of what 
you wan tto do, and maybe it would suggest novel ways of doing things.

I suspect most of the things would get shot down as being impractical, but
there have always been a lot of discussion about more direct control of
the page cache for programs that really want it, and I'm more than willing
to discuss things (obviously 2.7.x material, but still.. A lot of it is
trivial and could be back-ported to 2.6.x if people start using it).

For example, things we can do, but don't, partly because of interface 
issues and because there is no point in doing it if people wouldn't use 
it:

 - moving a page back and forth between user space. It's _trivial_ to do, 
   with a fallback on copying if the page happens to be busy (ie we can 
   often just replace the existing page cache page, but if somebody else
   has it mapped, we'd have to copy the contents instead)

   We can't do this for "regular" read and write, because the resulting 
   copy-on-write sitution makes it less than desireable in most cases, but 
   if the user space specifically says "you can throw these pages away
   after moving them to the page cache", that avoids a lot of horror.

   The "remap_file_pages()" thing kind of does this on the read side (ie 
   it says "map in this page cache entry into my virtual address space"), 
   but we don't have the reverse aka "take this page in the virtual 
   address space and map it into the page cache".

   Interfaces like these would also allow things like zero-copy file
   copies with smaller page cache footprints - at the expense of 
   invalidating the cache for the source file as a result of the copy. 
   Which is why it can't be a _regular_ read - but it's one of those 
   things where if the user knows what he wants..

 - dirty mapping control (ie controlling partial page dirty state, and 
   also _delaying_ writeout if it needs to be ordered). Possibly by having 
   a separate backing store (ie a mmap that says "read from this file, but
   write back to that other file") to avoid the nasty memory management 
   problems.

A lot of these are really easy to do, but the usage and the interfaces are 
non-obvious.

		Linus


  parent reply	other threads:[~2003-10-10 17:41 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-09 22:16 statfs() / statvfs() syscall ballsup Trond Myklebust
2003-10-09 22:26 ` Linus Torvalds
2003-10-09 23:19   ` Ulrich Drepper
2003-10-10  0:22     ` viro
2003-10-10  4:49       ` Jamie Lokier
2003-10-10  5:26         ` Trond Myklebust
2003-10-10 12:37           ` Jamie Lokier
2003-10-10 13:46             ` Trond Myklebust
2003-10-10 14:35               ` Jamie Lokier
2003-10-10 15:32                 ` Misc NFSv4 (was Re: statfs() / statvfs() syscall ballsup...) Trond Myklebust
2003-10-10 15:53                   ` Jamie Lokier
2003-10-10 16:07                     ` Trond Myklebust
2003-10-10 15:55                   ` Michael Shuey
2003-10-10 16:20                     ` Trond Myklebust
2003-10-10 16:45                     ` J. Bruce Fields
2003-10-10 14:39               ` statfs() / statvfs() syscall ballsup Jamie Lokier
2003-10-09 23:31   ` Trond Myklebust
2003-10-10 12:27   ` Joel Becker
2003-10-10 14:59     ` Linus Torvalds
2003-10-10 15:27       ` Joel Becker
2003-10-10 16:00         ` Linus Torvalds
2003-10-10 16:26           ` Joel Becker
2003-10-10 16:50             ` Linus Torvalds
2003-10-10 17:33               ` Joel Becker
2003-10-10 17:51                 ` Linus Torvalds
2003-10-10 18:13                   ` Joel Becker
2003-10-10 16:27           ` Valdis.Kletnieks
2003-10-10 16:33           ` Chris Friesen
2003-10-10 17:04             ` Linus Torvalds
2003-10-10 17:07               ` Linus Torvalds
2003-10-10 17:21                 ` Joel Becker
2003-10-10 16:01         ` Jamie Lokier
2003-10-10 16:33           ` Joel Becker
2003-10-10 16:58             ` Chris Friesen
2003-10-10 17:05               ` Trond Myklebust
2003-10-10 17:20               ` Joel Becker
2003-10-10 17:33                 ` Chris Friesen
2003-10-10 17:40                 ` Linus Torvalds [this message]
2003-10-10 17:54                   ` Trond Myklebust
2003-10-10 18:05                     ` Linus Torvalds
2003-10-10 20:40                       ` Trond Myklebust
2003-10-10 21:09                         ` Linus Torvalds
2003-10-10 22:17                           ` Trond Myklebust
2003-10-11  2:53                     ` Andrew Morton
2003-10-11  3:47                       ` Trond Myklebust
2003-10-10 18:05                   ` Joel Becker
2003-10-10 18:31                     ` Andrea Arcangeli
2003-10-10 20:33                     ` Helge Hafting
2003-10-10 20:07             ` Jamie Lokier
2003-10-12 15:31             ` Greg Stark
2003-10-12 16:13               ` Linus Torvalds
2003-10-12 22:09                 ` Greg Stark
2003-10-13  8:45                   ` Helge Hafting
2003-10-15 13:25                     ` Ingo Oeser
2003-10-15 15:03                       ` Greg Stark
2003-10-15 18:37                         ` Helge Hafting
2003-10-16 10:29                         ` Ingo Oeser
2003-10-16 14:02                           ` Greg Stark
2003-10-21 11:47                             ` Ingo Oeser
2003-10-10 18:20           ` Andrea Arcangeli
2003-10-10 18:36             ` Linus Torvalds
2003-10-10 19:03               ` Andrea Arcangeli
2003-10-09 23:16 ` Andreas Dilger
2003-10-09 23:24   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0310101024200.20420-100000@home.osdl.org \
    --to=torvalds@osdl.org \
    --cc=Joel.Becker@oracle.com \
    --cc=cfriesen@nortelnetworks.com \
    --cc=drepper@redhat.com \
    --cc=jamie@shareable.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.